The runoff triangle is a data structure familiar to both pricing and reserving Actuaries commonly used to organize losses by date of occurance (generally the vertical axis), and in the case of paid loss triangles, the date of payment (horizontal axis). In practice, triangles present losses in one of two states: Incremental loss triangles represent losses for a given accident year an a particular point in time. Cumulative loss triangles represent the cumulative losses to date up to and including the development period from which losses are evaluated. A typical example is shown below:
This post does not detail the what of runoff triangles, but instead focuses on the how, namely how to convert transactional loss data into a triangle structure using base R, the foreach package, the data.table package and the ChainLadder package. For more information specific to the what of loss triangles, refer to Jaqueline Friedland’s Estimating Unpaid Claims Using Basic Techniques, in particular chapters 5 and 6.
For the purposes of consistency and clarity, all examples reference a loss dataset made available by the Reinsurance Association of America, which can be downloaded here. This is the same dataset identified as RAA in the ChainLadder package, the only difference being the above referenced dataset contains incremental (as opposed to cumulative) losses. The available fields are:
-
ORIGIN The year in which the loss occurred.
-
DEV The development period. In the case of paid losses, the number of periods (months, years, etc.) after the occurance that payment was made.
-
VALUE In the case of paid losses, the paid loss amount (in dollars) for losses at each valid cell representing the intersection of ORIGIN and DEV.
Triangle Method I: Base R
Although not the most straightforward approach, it is possible to convert transactional loss data into a triangle format without the use of 3rd-party packages. The next example demonstrates one possible method:
# Loss Triangle Compilation in R: Method I.
DF = read.table(
file="RAA.csv", header=TRUE, sep=",", row.names=NULL,
stringsAsFactors=FALSE
)
triData1 = matrix(
nrow=length(unique(DF$ORIGIN)),
ncol=length(unique(DF$DEV)),
dimnames=list(
ORIGIN=sort(unique(DF$ORIGIN)), DEV=sort(unique(DF$DEV))
)
)
triData1[cbind(factor(DF$ORIGIN), DF$DEV)] = DF$VALUE
Inspecting triData1:
> triData1
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The base R method returns the triangle as a matrix of incremental losses. A triangle of cumulative losses can be obtained as follows:
> triData1c = t(apply(apply(triData1, 2, cumsum), 1, cumsum))
> triData1c
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
Triangle Method II: Using foreach + data.table
The foreach package provides a looping construct for executing R code
repeatedly. It is especially useful in developing platform-independent
applications that distribute tasks across multiple cores. data.table
facilitates fast aggregation of large datasets, fast ordered joins and fast
add/modify/delete of columns by group using no copies. In this example, we use
foreach along with data.table’s rbindlist
function to compile a triangle of
incremental losses:
# Loss Triangle Compilation in R: Method II.
library("data.table")
library("foreach")
DF = setDT(fread(file="RAA.csv", sep=",", stringsAsFactors=FALSE))
lossList = split(DF, by="ORIGIN")
triData2 = foreach(
i=1:length(lossList),
.final=function(ll) rbindlist(ll, fill=TRUE)
) %do% {
iterList = setNames(
as.list(lossList[[i]]$VALUE),
nm=as.character(lossList[[i]]$DEV)
)
append(list(ORIGIN=names(lossList)[i]), iterList)
}
Inspecting triData2:
> print triData2
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
In method II, the transactional loss data is first split into a list of
data.tables by common ORIGIN which was then bound to lossList
. The foreach
constructor then iterates over lossList
, transforming each data.table into
what amounts to a single row in the incremental triangle. Notice that within
the foreach constructor, the .final
parameter is bound to a function: When
specified, .final
represents a function of one argument that is called to
return the final result. data.table’s rbindlist takes as input a list of
data.tables/data.frames and concatenates them horizontally, returning a single data.table.
When the resulting triangle of incremental losses is returned as a data.table/data.frame, we can generate the cumulative loss triangle as follows:
> devDF = triData2[,-c("ORIGIN")]
> devDF[,names(devDF):=Reduce(`+`, devDF, accumulate=TRUE)]
> triData2c = cbind(triData2[, .(ORIGIN)], devDF)
> triData2c
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
First ORIGIN is removed from triData2
, so that devDF
consists only of the
development period columns. Next Reduce
is applied to all rows, specifying
accumulate=TRUE
so that a cumulative sum is calculated. Finally, ORIGIN is
vertically concatenated back with the cumulated losses by development period
and bound to triData2c
.
Triangle Method III: data.table
data.table’s dcast
function is a powerful and flexibile utility used
primarily for reshaping datasets. Fieldnames are specified and used as column
and row indicies in terms of a formula expression that follows the form
LHS ~ RHS
(see the help page for more information). For Method III, dcast
is used to convert the transactional loss data into an incremental loss
triangle using a single function call:
# Loss Triangle Compilation in R: Method III.
library("data.table")
DF = setDT(fread(file="RAA.csv", sep=",", stringsAsFactors=FALSE))
triData3 = dcast(DF, ORIGIN ~ DEV, value.var="VALUE", fill=NA)
Inspecting triData3:
> triData3
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The first argument to dcast
is the dataset. ORIGIN ~ DEV
is then specified,
with ORIGIN corresponding to the LHS and DEV the RHS. value.var
specifies the
field over which to aggregate, and fill
specifies how populate invalid cell
references (defaults to 0).
Since dcast
returns a data.table, the approach demonstrated in Method II can
be used to obtain a triangle of cumulative losses, and so it will not be
repeated here. However, the data.table approach enables us to perform a task
which is non-trivial to perform using the other methods: Converting a loss
triangle back into a dataset of transactional losses. This is accomplished with
data.table’s melt
function. In the next example, we convert triData3
back
into the original RAA dataset:
# Convert a loss triangle back into transactional loss data.
# Assume triData3 exists and represents incremental loss
# data compiled from RAA.csv.
suppressPackageStartupMessages(library("data.table"))
DF2 = data.table::melt(
triData3, id.vars="ORIGIN", variable.name="DEV",
value.name="VALUE", value.factor=FALSE,
variable.factor=FALSE, na.rm=TRUE
)
# Convert DEV back to numeric and order records.
DF2[,DEV:=as.integer(DEV)]
setorderv(DF2, c("ORIGIN", "DEV"))
Test equivalence between DF and DF2:
> all.equal(DF, DF2)
[1] TRUE
The first argument to melt
is the triangle of incremental losses. Next, we
specify id.vars
, variable.name
and value.name
, with value.name
representing the measure of interest. value.factor
and variable.factor
specify whether the value(s) and/or variable(s) should be returned as factors,
and na.rm
indicates whether or not records containing NA
‘s should be
removed from the dataset.
Triangle Method IV: ChainLadder
The ChainLadder package
provides various statistical methods which are typically used for the
estimation of outstanding claims reserves in general insurance. It includes a
suite of utilities that can be used to estimate outstanding claim liabilities,
but those utilities will not be covered here. Of interest is the triangle
class made available by the ChainLadder package. The as.triangle
specification is provided below:
as.triangle(Triangle, origin="origin", dev="dev", value="value",…)
Triangle
represents the transactional loss dataset. For origin
, dev
and
value
, specify the corresponding fieldnames present in the loss data.
Referring again to the RAA dataset:
# Loss Triangle Compilation in R: Method IV.
library("ChainLadder")
DF = read.table(
file="RAA.csv", header=TRUE, sep=",", row.names=NULL,
stringsAsFactors=FALSE
)
# Fieldnames in DF are "ORIGIN", "DEV", "VALUE".
triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")
Inspecting triData4:
> triData4
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The ChainLadder package comes with two convenience functions that allow for
conversion from incremental-to-cumulative or cumulative-to-incremental
triangles, identified as incr2cum
and cum2incr
respectively. In what
follows, the former is used to create the cumulative counterpart of triData4
:
> triData4c = incr2cum(triData4)
> triData4c
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
A point worth mentioning with respect to the behavior of incr2cum
and cum2incr
:
Upon converting tabular loss data into a triangle object, there is no internal
reference that tracks whether the data originally represented cumulative or
incremental losses. incr2cum
and cum2incr
are convienience functions, and
not triangle object class methods. If you have incremental losses that have
been transformed into a triangle instance and then run incr2cum
on that
triangle, a triangle of incremental losses is returned as expected. However, if
you pass that cumulative loss triangle incr2cum
again, the function will
cumulate the already cumulated losses. For example:
# Read in RAA.csv.
> DF = read.table(
file="RAA.csv", header=TRUE, sep=",", row.names=NULL,
stringsAsFactors=FALSE
)
> triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")
> triData4c = incr2cum(triData4)
>triData4c
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
> # So far so good...Pass triData4c to incr2cum again.
> triData4c2 = incr2cum(triData4c)
> triData4c2
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 13281 24188 35993 49532 65713 83722 102330 120992 139826
1982 106 4391 9787 20453 34235 49834 65330 81499 98203 NA
1983 3410 12402 26275 42416 61151 83365 106228 129694 NA NA
1984 5655 17210 32976 54242 77667 103750 130817 NA NA NA
1985 1092 10657 26493 48662 74617 100797 NA NA NA NA
1986 1513 7958 19660 32595 48447 NA NA NA NA NA
1987 557 4577 15523 27837 NA NA NA NA NA NA
1988 1351 8298 21410 NA NA NA NA NA NA NA
1989 3133 8528 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
> Pass triData4c2 to incr2cum again.
> triData4c3 = incr2cum(triData4c2)
> triData4c3
DEV
ORIGIN 1 2 3 4 5 6 7 8 9 10
1981 5012 18293 42481 78474 128006 193719 277441 379771 500763 640589
1982 106 4497 14284 34737 68972 118806 184136 265635 363838 NA
1983 3410 15812 42087 84503 145654 229019 335247 464941 NA NA
1984 5655 22865 55841 110083 187750 291500 422317 NA NA NA
1985 1092 11749 38242 86904 161521 262318 NA NA NA NA
1986 1513 9471 29131 61726 110173 NA NA NA NA NA
1987 557 5134 20657 48494 NA NA NA NA NA NA
1988 1351 9649 31059 NA NA NA NA NA NA NA
1989 3133 11661 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
This is not intended to be taken as an argument against using the ChainLadder
package. I only highlight this behavior since a reasonable expectation might be
that the result of calling incr2cum
on an already-cumulated triangle would be
the equivalent of a no-op, returning the cumulative triangle unmodified. But as
demonstrated above, the state of a triangle is not preserved across invocations
of incr2cum/cum2incr
.