The Loss Triangle is a data structure familiar to both pricing and reserving Actuaries commonly used to organize losses by date of occurance (generally the vertical axis), and in the case of paid loss triangles, the date of payment (horizontal axis). In practice, triangles present losses in one of two states: Incremental loss triangles represent losses for a given accident year an a particular point in time. Cumulative loss triangles represent the cumulative losses to date up to and including the development period from which losses are evaluated. A typical example is shown below:
This post does not detail the what of loss reserving triangles, but instead focuses on the how, namely how to convert transactional loss data into a triangle structure using base R, the foreach package, the data.table package and finally the ChainLadder package. For more information specific to the what of loss triangles, refer to Jaqueline Friedland’s Estimating Unpaid Claims Using Basic Techniques, in particular chapters 5 and 6.
For the purposes of consistency and clarity of exposition, all examples reference a loss dataset made available by the Reinsurance Association of America, which can be downloaded here. This is the same dataset identified as
RAA
in the ChainLadder
package, the only difference being the above referenced dataset contains incremental (as opposed to cumulative) losses. The available fields are:
ORIGIN - The year in which the loss occurred.
DEV - The development period. In the case of paid losses, the number of periods (months, years, etc…) after the occurance that payment was made.
* VALUE - In the case of paid losses, the paid loss amount (in dollars) for losses at each valid cell representing the intersection of
ORIGIN
and DEV
.
Method I: Base R
Although not the most straightforward approach, it is possible to convert transactional loss data into triangle format without the use of 3rd-party libraries. The next example demonstrates one possible method:
# =============================================
# Loss Triangle Compilation in R: Method I |
# =============================================
# Read in RAA.csv =>
DF = read.table(
file="E:/Repos/trikit/datasets/RAA.csv",
header=TRUE,
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)
triData1 = matrix(
nrow=length(unique(DF$ORIGIN)),
ncol=length(unique(DF$DEV)),
dimnames=list(
ORIGIN=sort(unique(DF$ORIGIN)),
DEV=sort(unique(DF$DEV))
)
)
triData1[cbind(factor(DF$ORIGIN),DF$DEV)] = DF$VALUE
Printing triData1 =>
> print(triData1)
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The base R method returns the triangle as a matrix of incremental losses. To obtain a triangle of cumulative losses, run the following:
> triData1c = t(apply(apply(triData1, 2, cumsum), 1, cumsum))
> print(triData1c)
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
Method II: Using foreach+data.table
The foreach
package provides a new looping construct for executing R code repeatedly. It is especially useful in developing platform-independent applications that distribute tasks across multiple cores.
data.table
facilitates fast aggregation of large data, fast ordered joins and fast add/modify/delete of columns by group using no copies at all.
In this example, we use the foreach
constructor, along with data.table
‘s rbindlist
function to compile a triangle of incremental losses:
# =============================================
# Loss Triangle Compilation in R: Method II |
# =============================================
suppressPackageStartupMessages(library("data.table"))
suppressPackageStartupMessages(library("foreach"))
# Read in RAA.csv =>
DF = read.table(
file="E:/Repos/trikit/datasets/RAA.csv",
header=TRUE,
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)
lossList = split(setDT(DF), by="ORIGIN")
triData2 = foreach(
i=1:length(lossList),
.final=function(l) rbindlist(l,fill=TRUE)) %do% {
iterList = setNames(
as.list(lossList[[i]]$VALUE),
nm=as.character(lossList[[i]]$DEV)
)
append(list(ORIGIN=names(lossList)[i]),iterList)
}
Printing triData2 =>
> print triData2
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
In Method II, the transactional loss data is first split into a list of data.tables by common ORIGIN
which we named lossList
. The foreach
constructor then iterates over lossList
, transforming each data.table into what amounts to a single row in the incremental triangle. Notice that within the foreach
constructor, we specify a function for the .final
parameter: When defined, .final
specifies a function of one argument that is called to return final result. data.table
‘s rbindlist
takes as input a list of data.table’s/data.frame’s and concatenates them horizontally, returning a single data.table.
When the resulting triangle of incremental losses is returned as a data.table/data.frame, we can generate the cumulative loss
triangle in the following manner:
# Starting from triData2 incremental losses =>
devDF = triData2[,-c("ORIGIN")]
devDF[,names(devDF):=Reduce(`+`,devDF,accumulate=TRUE)]
triData2c = cbind(triData2[,.(ORIGIN)],devDF)
# Printing cumulative loss triangle =>
> print(triData2c)
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
First the ORIGIN
field is removed from triData2
, so that devDF
consists only of the development period columns. Next Reduce
is applied to all rows, specifying accumulate=TRUE
so that a cumulative sum is calculated. Finally, ORIGIN
is vertically concatenated back with the cumulated losses by development period and named triData2c
.
We next demonstrate a method for transforming datasets that highlights the power and ease-of-use of the data.table
package.
Method III: data.table
data.table
‘s dcast
function is a very powerful and flexibile utility used primarily for reshaping datasets. Fieldnames are specified and used as column and row indicies in terms of a formula that follows the form LHS ~ RHS
(see the help page for more information).
For Method III, dcast
is used to convert the RAA transactional loss data into an incremental loss triangle using a single function call:
# =============================================
# Loss Triangle Compilation in R: Method III |
# =============================================
suppressPackageStartupMessages(library("data.table"))
# Read in RAA.csv =>
DF = read.table(
file="E:/Repos/trikit/datasets/RAA.csv",
header=TRUE,
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)
triData3 = dcast(DF,ORIGIN~DEV,value.var="VALUE",fill=NA)
# Printing triData3 =>
> print(triData3)
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The first argument to dcast
is the dataset. ORIGIN~DEV
is then specified, with ORIGIN
corresponding to the LHS and DEV
the RHS. value.var
specifies the field over which to aggregate, and fill
specifies how populate invalid cell references (defaults to 0).
Since dcast
returns a data.table, the approach demonstrated in Method II can be used to obtain a triangle of cumulative losses, and so it will not be repeaed here. However, the data.table approach enables us to perform a task which is non-trivial to perform using the other methods: Converting a loss triangle back into a dataset of transactional losses. This is accomplished with data.table’s melt
function. In the next example, we convert triData3
back into the RAA dataset:
suppressPackageStartupMessages(library("data.table"))
# Convert a loss triangle back into transactional loss data.
# Assume triData3 exists and represents incremental loss
# data compiled from RAA.csv =>
DF2 = data.table::melt(
triData3,
id.vars="ORIGIN",
variable.name="DEV",
value.name="VALUE",
value.factor=FALSE,
variable.factor=FALSE,
na.rm=TRUE
)
# Convert `DEV` back to numeric and order records =>
DF2[["DEV"]] = as.integer(DF2[["DEV"]])
DF2 = DF2[order(ORIGIN,DEV)]
# Test equivalence of DF and DF2 =>
> all.equal(DF, DF2)
[1] TRUE
The first argument to melt
is the triangle of incremental losses. Next, we specify the id.vars
, variable.name
and ‘value.name, with
value.namerepresenting the measure of interest.
value.factorand
variable.factorspecify whether the value(s) and/or variable(s) should be returned as factors, and
na.rmindicates whether or not records containing
NA's should be removed from the dataset.
There are many excellent examples contained within
melt's help page where it greatly simplifies what were at one time arduous data manipulation tasks into a simple function call. I highly encourage anyone working with data in R on regular basis to become intimately familiar with
meltand
dcast`, but perhaps more importantly data.table in general (to start, check out a post on data.table’s Special Symbols here).
Method IV: ChainLadder
The ChainLadder
package provides various statistical methods which are typically used for the estimation of outstanding claims reserves in general
insurance. It includes a suite of utilities that ca nbe used to estimate outstanding claim liabilities, but those utilities will not be covered here. Of interest to this post is the triangle
class made available by ChainLadder
. The as.triangle
declaration is as follows:
“`R
as.triangle(Triangle, origin=”origin”, dev=”dev”, value=”value”,…)
<br>
`Triangle` represents the transactional loss dataset. For `origin`, `dev` and `value`, specify the corresponding fieldnames contained in the dataset. Referring again to the RAA dataset:
<br>
```R
# =============================================
# Loss Triangle Compilation in R: Method IV |
# =============================================
suppressPackageStartupMessages(library("ChainLadder"))
# Read in RAA.csv =>
DF = read.table(
file="E:/Repos/trikit/datasets/RAA.csv",
header=TRUE,
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)
# Fieldnames in DF are "ORIGIN", "DEV", "VALUE" =>
triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")
Printing triData4 =>
> print triData4
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The ChainLadder packages comes with two convenience functions that allow for seamless conversion from incremental to cumulative or cumulative to incremental triangles. They are incr2cum
and cum2incr
respectively. We’ll use the former to obtain triData4c
:
# Assume incremental loss triangle triData4 exists =>
triData4c = incr2cum(triData4)
# Printing triData4c =>
> print(triData4c)
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
For completeness, cum2incr
can be used to go from cumulative to incremental:
# Assume cumulative loss triangle triData4c exists =>
triData4 = cum2incr(triData4c)
Printing triData4 (again) =>
> print triData4
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
Something worth mentioning w.r.t. the behavior of incr2cum
and cum2incr
: Upon converting loss data into a triangle type, there is no internal reference that tracks whether the data originally represented cumulative or incremental losses. If you have incremental losses that are transformed into a triangle
instance and then run incr2cum
on that triangle, a triangle of incremental losses is returned as expected. However, if you pass that cumulative loss triangle incr2cum
again, the function will cumulate the already cumulated losses. For example:
# Read in RAA.csv =>
DF = read.table(
file="E:/Repos/trikit/datasets/RAA.csv",
header=TRUE,
sep=",",
row.names=NULL,
stringsAsFactors=FALSE
)
# Fieldnames in DF are "ORIGIN", "DEV", "VALUE" =>
triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")
triData4c = incr2cum(triData4)
> print(triData4c)
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
# So far so good...Pass triData4c to incr2cum again =>
triData4c2 = incr2cum(triData4c)
> print(triData4c2)
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 13281 24188 35993 49532 65713 83722 102330 120992 139826
1982 106 4391 9787 20453 34235 49834 65330 81499 98203 NA
1983 3410 12402 26275 42416 61151 83365 106228 129694 NA NA
1984 5655 17210 32976 54242 77667 103750 130817 NA NA NA
1985 1092 10657 26493 48662 74617 100797 NA NA NA NA
1986 1513 7958 19660 32595 48447 NA NA NA NA NA
1987 557 4577 15523 27837 NA NA NA NA NA NA
1988 1351 8298 21410 NA NA NA NA NA NA NA
1989 3133 8528 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
# Pass triData4c2 to incr2cum again =>
triData4c3 = incr2cum(triData4c2)
> print(triData4c2)
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 18293 42481 78474 128006 193719 277441 379771 500763 640589
1982 106 4497 14284 34737 68972 118806 184136 265635 363838 NA
1983 3410 15812 42087 84503 145654 229019 335247 464941 NA NA
1984 5655 22865 55841 110083 187750 291500 422317 NA NA NA
1985 1092 11749 38242 86904 161521 262318 NA NA NA NA
1986 1513 9471 29131 61726 110173 NA NA NA NA NA
1987 557 5134 20657 48494 NA NA NA NA NA NA
1988 1351 9649 31059 NA NA NA NA NA NA NA
1989 3133 11661 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
This isn’t intended to be taken as reason against using the ChainLadder package. I only highlight this behavior since a reasonable expectation might be that the result of calling incr2cum
on an already-cumulated triangle would be the equivalent of a no-op, returning the cumulative triangle unmodified. But as demonstrated above, this is not the case.
Conclusion
The compilation of loss reserving triangles is a necessary part of many Actuarial analyses, but it is merely pre-processing step. In future posts, we’ll highlight various methods employed by Actuaries for the purpose of estimating reserves and reserve variablitily, in addition to stochastic reserving methods. Until then, happy coding!