The Loss Triangle is a data structure familiar to both pricing and reserving Actuaries commonly used to organize losses by date of occurance (generally the vertical axis), and in the case of paid loss triangles, the date of payment (horizontal axis). In practice, triangles present losses in one of two states: Incremental loss triangles represent losses for a given accident year an a particular point in time. Cumulative loss triangles represent the cumulative losses to date up to and including the development period from which losses are evaluated. A typical example is shown below:

This post does not detail the what of loss reserving triangles, but instead focuses on the how, namely how to convert transactional loss data into a triangle structure using base R, the foreach package, the data.table package and finally the ChainLadder package. For more information specific to the what of loss triangles, refer to Jaqueline Friedland’s Estimating Unpaid Claims Using Basic Techniques, in particular chapters 5 and 6.

For the purposes of consistency and clarity of exposition, all examples reference a loss dataset made available by the Reinsurance Association of America, which can be downloaded here. This is the same dataset identified as `RAA` in the `ChainLadder` package, the only difference being the above referenced dataset contains incremental (as opposed to cumulative) losses. The available fields are:
ORIGIN - The year in which the loss occurred.

DEV - The development period. In the case of paid losses, the number of periods (months, years, etc…) after the occurance that payment was made.

* VALUE - In the case of paid losses, the paid loss amount (in dollars) for losses at each valid cell representing the intersection of `ORIGIN` and `DEV`.

## Method I: Base R

Although not the most straightforward approach, it is possible to convert transactional loss data into triangle format without the use of 3rd-party libraries. The next example demonstrates one possible method:

```# =============================================
# Loss Triangle Compilation in R: Method I    |
# =============================================

# Read in RAA.csv =>
file="E:/Repos/trikit/datasets/RAA.csv",
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)

triData1 = matrix(
nrow=length(unique(DF\$ORIGIN)),
ncol=length(unique(DF\$DEV)),
dimnames=list(
ORIGIN=sort(unique(DF\$ORIGIN)),
DEV=sort(unique(DF\$DEV))
)
)

triData1[cbind(factor(DF\$ORIGIN),DF\$DEV)] = DF\$VALUE

Printing triData1 =>
> print(triData1)

DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
1981 5012 3257 2638  898 1734 2642 1828 599  54 172
1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA
```

The base R method returns the triangle as a matrix of incremental losses. To obtain a triangle of cumulative losses, run the following:

```> triData1c = t(apply(apply(triData1, 2, cumsum), 1, cumsum))

> print(triData1c)

DEV
ORIGIN     1     2     3      4      5      6     7     8     9    10
1981  5012  8269 10907  11805  13539  16181 18009 18608 18662 18834
1982  5118 12554 16303  22471  27321  31780 33505 34777 35366    NA
1983  8528 21546 30176  38612  46056  53994 56368 58243    NA    NA
1984 14183 33101 45942  59878  69481  80077 83435    NA    NA    NA
1985 15275 42666 61778  82047  95436 106257    NA    NA    NA    NA
1986 16788 49111 73480  94982 111288     NA    NA    NA    NA    NA
1987 17345 53131 84426 107296     NA     NA    NA    NA    NA    NA
1988 18696 60078 97538     NA     NA     NA    NA    NA    NA    NA
1989 21829 65473    NA     NA     NA     NA    NA    NA    NA    NA
1990 23892    NA    NA     NA     NA     NA    NA    NA    NA    NA
```

## Method II: Using foreach+data.table

The `foreach` package provides a new looping construct for executing R code repeatedly. It is especially useful in developing platform-independent applications that distribute tasks across multiple cores.
`data.table` facilitates fast aggregation of large data, fast ordered joins and fast add/modify/delete of columns by group using no copies at all.
In this example, we use the `foreach` constructor, along with `data.table`s `rbindlist` function to compile a triangle of incremental losses:

```# =============================================
# Loss Triangle Compilation in R: Method II   |
# =============================================
suppressPackageStartupMessages(library("data.table"))
suppressPackageStartupMessages(library("foreach"))

# Read in RAA.csv =>
file="E:/Repos/trikit/datasets/RAA.csv",
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)

lossList = split(setDT(DF), by="ORIGIN")

triData2 = foreach(
i=1:length(lossList),
.final=function(l) rbindlist(l,fill=TRUE)) %do% {

iterList = setNames(
as.list(lossList[[i]]\$VALUE),
nm=as.character(lossList[[i]]\$DEV)
)

append(list(ORIGIN=names(lossList)[i]),iterList)
}

Printing triData2 =>
> print triData2

DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
1981 5012 3257 2638  898 1734 2642 1828 599  54 172
1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA
```

In Method II, the transactional loss data is first split into a list of data.tables by common `ORIGIN` which we named `lossList`. The `foreach` constructor then iterates over `lossList`, transforming each data.table into what amounts to a single row in the incremental triangle. Notice that within the `foreach` constructor, we specify a function for the `.final` parameter: When defined, `.final` specifies a function of one argument that is called to return final result. `data.table`s `rbindlist` takes as input a list of data.table’s/data.frame’s and concatenates them horizontally, returning a single data.table.
When the resulting triangle of incremental losses is returned as a data.table/data.frame, we can generate the cumulative loss triangle in the following manner:

```# Starting from triData2 incremental losses =>
devDF = triData2[,-c("ORIGIN")]

devDF[,names(devDF):=Reduce(`+`,devDF,accumulate=TRUE)]

triData2c = cbind(triData2[,.(ORIGIN)],devDF)

# Printing cumulative loss triangle =>
> print(triData2c)

DEV
ORIGIN     1     2     3      4      5      6     7     8     9    10
1981  5012  8269 10907  11805  13539  16181 18009 18608 18662 18834
1982  5118 12554 16303  22471  27321  31780 33505 34777 35366    NA
1983  8528 21546 30176  38612  46056  53994 56368 58243    NA    NA
1984 14183 33101 45942  59878  69481  80077 83435    NA    NA    NA
1985 15275 42666 61778  82047  95436 106257    NA    NA    NA    NA
1986 16788 49111 73480  94982 111288     NA    NA    NA    NA    NA
1987 17345 53131 84426 107296     NA     NA    NA    NA    NA    NA
1988 18696 60078 97538     NA     NA     NA    NA    NA    NA    NA
1989 21829 65473    NA     NA     NA     NA    NA    NA    NA    NA
1990 23892    NA    NA     NA     NA     NA    NA    NA    NA    NA
```

First the `ORIGIN` field is removed from `triData2`, so that `devDF` consists only of the development period columns. Next `Reduce` is applied to all rows, specifying `accumulate=TRUE` so that a cumulative sum is calculated. Finally, `ORIGIN` is vertically concatenated back with the cumulated losses by development period and named `triData2c`.

We next demonstrate a method for transforming datasets that highlights the power and ease-of-use of the `data.table` package.

## Method III: data.table

`data.table`s `dcast` function is a very powerful and flexibile utility used primarily for reshaping datasets. Fieldnames are specified and used as column and row indicies in terms of a formula that follows the form `LHS ~ RHS` (see the help page for more information).
For Method III, `dcast` is used to convert the RAA transactional loss data into an incremental loss triangle using a single function call:

```# =============================================
# Loss Triangle Compilation in R: Method III  |
# =============================================
suppressPackageStartupMessages(library("data.table"))

# Read in RAA.csv =>
file="E:/Repos/trikit/datasets/RAA.csv",
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)

triData3 = dcast(DF,ORIGIN~DEV,value.var="VALUE",fill=NA)

# Printing triData3 =>
> print(triData3)

DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
1981 5012 3257 2638  898 1734 2642 1828 599  54 172
1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA
```

The first argument to `dcast` is the dataset. `ORIGIN~DEV` is then specified, with `ORIGIN` corresponding to the LHS and `DEV` the RHS. `value.var` specifies the field over which to aggregate, and `fill` specifies how populate invalid cell references (defaults to 0).
Since `dcast` returns a data.table, the approach demonstrated in Method II can be used to obtain a triangle of cumulative losses, and so it will not be repeaed here. However, the data.table approach enables us to perform a task which is non-trivial to perform using the other methods: Converting a loss triangle back into a dataset of transactional losses. This is accomplished with data.table’s `melt` function. In the next example, we convert `triData3` back into the RAA dataset:

```suppressPackageStartupMessages(library("data.table"))

# Convert a loss triangle back into transactional loss data.
# Assume triData3 exists and represents incremental loss
# data compiled from RAA.csv =>

DF2 = data.table::melt(
triData3,
id.vars="ORIGIN",
variable.name="DEV",
value.name="VALUE",
value.factor=FALSE,
variable.factor=FALSE,
na.rm=TRUE
)

# Convert `DEV` back to numeric and order records =>
DF2[["DEV"]] = as.integer(DF2[["DEV"]])
DF2 = DF2[order(ORIGIN,DEV)]

# Test equivalence of DF and DF2 =>
> all.equal(DF, DF2)

[1] TRUE
```

The first argument to `melt` is the triangle of incremental losses. Next, we specify the `id.vars`, `variable.name` and ‘value.name`, with`value.name`representing the measure of interest.`value.factor`and`variable.factor`specify whether the value(s) and/or variable(s) should be returned as factors, and`na.rm`indicates whether or not records containing`NA```'s should be removed from the dataset. There are many excellent examples contained within```melt`'s help page where it greatly simplifies what were at one time arduous data manipulation tasks into a simple function call. I highly encourage anyone working with data in R on regular basis to become intimately familiar with`melt`and`dcast`, but perhaps more importantly data.table in general (to start, check out a post on data.table’s Special Symbols here).

## Method IV: ChainLadder

The `ChainLadder` package provides various statistical methods which are typically used for the estimation of outstanding claims reserves in general insurance. It includes a suite of utilities that ca nbe used to estimate outstanding claim liabilities, but those utilities will not be covered here. Of interest to this post is the `triangle` class made available by `ChainLadder`. The `as.triangle` declaration is as follows:
“`R as.triangle(Triangle, origin=”origin”, dev=”dev”, value=”value”,…)

```<br>
`Triangle` represents the transactional loss dataset. For `origin`, `dev` and `value`, specify the corresponding fieldnames contained in the dataset. Referring again to the RAA dataset:
<br>
```R
# =============================================
# Loss Triangle Compilation in R: Method IV   |
# =============================================

# Read in RAA.csv =>
file="E:/Repos/trikit/datasets/RAA.csv",
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)

# Fieldnames in DF are "ORIGIN", "DEV", "VALUE" =>
triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")

Printing triData4 =>
> print triData4

DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
1981 5012 3257 2638  898 1734 2642 1828 599  54 172
1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA
```

The ChainLadder packages comes with two convenience functions that allow for seamless conversion from incremental to cumulative or cumulative to incremental triangles. They are `incr2cum` and `cum2incr` respectively. We’ll use the former to obtain `triData4c`:

```# Assume incremental loss triangle triData4 exists =>
triData4c = incr2cum(triData4)

# Printing triData4c =>
> print(triData4c)

DEV
ORIGIN     1     2     3      4      5      6     7     8     9    10
1981  5012  8269 10907  11805  13539  16181 18009 18608 18662 18834
1982  5118 12554 16303  22471  27321  31780 33505 34777 35366    NA
1983  8528 21546 30176  38612  46056  53994 56368 58243    NA    NA
1984 14183 33101 45942  59878  69481  80077 83435    NA    NA    NA
1985 15275 42666 61778  82047  95436 106257    NA    NA    NA    NA
1986 16788 49111 73480  94982 111288     NA    NA    NA    NA    NA
1987 17345 53131 84426 107296     NA     NA    NA    NA    NA    NA
1988 18696 60078 97538     NA     NA     NA    NA    NA    NA    NA
1989 21829 65473    NA     NA     NA     NA    NA    NA    NA    NA
1990 23892    NA    NA     NA     NA     NA    NA    NA    NA    NA
```

For completeness, `cum2incr` can be used to go from cumulative to incremental:

```# Assume cumulative loss triangle triData4c exists =>
triData4 = cum2incr(triData4c)

Printing triData4 (again) =>
> print triData4

DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
1981 5012 3257 2638  898 1734 2642 1828 599  54 172
1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA
```

Something worth mentioning w.r.t. the behavior of `incr2cum` and `cum2incr`: Upon converting loss data into a triangle type, there is no internal reference that tracks whether the data originally represented cumulative or incremental losses. If you have incremental losses that are transformed into a `triangle` instance and then run `incr2cum` on that triangle, a triangle of incremental losses is returned as expected. However, if you pass that cumulative loss triangle `incr2cum` again, the function will cumulate the already cumulated losses. For example:

```# Read in RAA.csv =>
file="E:/Repos/trikit/datasets/RAA.csv",
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)

# Fieldnames in DF are "ORIGIN", "DEV", "VALUE" =>
triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")

triData4c = incr2cum(triData4)

> print(triData4c)

DEV
ORIGIN     1     2     3      4      5      6     7     8     9    10
1981  5012  8269 10907  11805  13539  16181 18009 18608 18662 18834
1982  5118 12554 16303  22471  27321  31780 33505 34777 35366    NA
1983  8528 21546 30176  38612  46056  53994 56368 58243    NA    NA
1984 14183 33101 45942  59878  69481  80077 83435    NA    NA    NA
1985 15275 42666 61778  82047  95436 106257    NA    NA    NA    NA
1986 16788 49111 73480  94982 111288     NA    NA    NA    NA    NA
1987 17345 53131 84426 107296     NA     NA    NA    NA    NA    NA
1988 18696 60078 97538     NA     NA     NA    NA    NA    NA    NA
1989 21829 65473    NA     NA     NA     NA    NA    NA    NA    NA
1990 23892    NA    NA     NA     NA     NA    NA    NA    NA    NA

# So far so good...Pass triData4c to incr2cum again =>
triData4c2 = incr2cum(triData4c)

> print(triData4c2)

DEV
ORIGIN    1     2     3     4     5      6      7      8      9     10
1981 5012 13281 24188 35993 49532  65713  83722 102330 120992 139826
1982  106  4391  9787 20453 34235  49834  65330  81499  98203     NA
1983 3410 12402 26275 42416 61151  83365 106228 129694     NA     NA
1984 5655 17210 32976 54242 77667 103750 130817     NA     NA     NA
1985 1092 10657 26493 48662 74617 100797     NA     NA     NA     NA
1986 1513  7958 19660 32595 48447     NA     NA     NA     NA     NA
1987  557  4577 15523 27837    NA     NA     NA     NA     NA     NA
1988 1351  8298 21410    NA    NA     NA     NA     NA     NA     NA
1989 3133  8528    NA    NA    NA     NA     NA     NA     NA     NA
1990 2063    NA    NA    NA    NA     NA     NA     NA     NA     NA

# Pass triData4c2 to incr2cum again =>
triData4c3 = incr2cum(triData4c2)

> print(triData4c2)

DEV
ORIGIN    1     2     3      4      5      6      7      8      9     10
1981 5012 18293 42481  78474 128006 193719 277441 379771 500763 640589
1982  106  4497 14284  34737  68972 118806 184136 265635 363838     NA
1983 3410 15812 42087  84503 145654 229019 335247 464941     NA     NA
1984 5655 22865 55841 110083 187750 291500 422317     NA     NA     NA
1985 1092 11749 38242  86904 161521 262318     NA     NA     NA     NA
1986 1513  9471 29131  61726 110173     NA     NA     NA     NA     NA
1987  557  5134 20657  48494     NA     NA     NA     NA     NA     NA
1988 1351  9649 31059     NA     NA     NA     NA     NA     NA     NA
1989 3133 11661    NA     NA     NA     NA     NA     NA     NA     NA
1990 2063    NA    NA     NA     NA     NA     NA     NA     NA     NA
```

This isn’t intended to be taken as reason against using the ChainLadder package. I only highlight this behavior since a reasonable expectation might be that the result of calling `incr2cum` on an already-cumulated triangle would be the equivalent of a no-op, returning the cumulative triangle unmodified. But as demonstrated above, this is not the case.

## Conclusion

The compilation of loss reserving triangles is a necessary part of many Actuarial analyses, but it is merely pre-processing step. In future posts, we’ll highlight various methods employed by Actuaries for the purpose of estimating reserves and reserve variablitily, in addition to stochastic reserving methods. Until then, happy coding!