Date Tags R

The runoff triangle is a data structure familiar to both pricing and reserving Actuaries commonly used to organize losses by date of occurance (generally the vertical axis), and in the case of paid loss triangles, the date of payment (horizontal axis). In practice, triangles present losses in one of two states: Incremental loss triangles represent losses for a given accident year an a particular point in time. Cumulative loss triangles represent the cumulative losses to date up to and including the development period from which losses are evaluated. A typical example is shown below:

triangles

This post does not detail the what of runoff triangles, but instead focuses on the how, namely how to convert transactional loss data into a triangle structure using base R, the foreach package, the data.table package and the ChainLadder package. For more information specific to the what of loss triangles, refer to Jaqueline Friedland’s Estimating Unpaid Claims Using Basic Techniques, in particular chapters 5 and 6.

For the purposes of consistency and clarity, all examples reference a loss dataset made available by the Reinsurance Association of America, which can be downloaded here. This is the same dataset identified as RAA in the ChainLadder package, the only difference being the above referenced dataset contains incremental (as opposed to cumulative) losses. The available fields are:

  • ORIGIN The year in which the loss occurred.

  • DEV The development period. In the case of paid losses, the number of periods (months, years, etc.) after the occurance that payment was made.

  • VALUE In the case of paid losses, the paid loss amount (in dollars) for losses at each valid cell representing the intersection of ORIGIN and DEV.

Triangle Method I: Base R

Although not the most straightforward approach, it is possible to convert transactional loss data into a triangle format without the use of 3rd-party packages. The next example demonstrates one possible method:

# Loss Triangle Compilation in R: Method I.
DF = read.table(
    file="RAA.csv", header=TRUE, sep=",", row.names=NULL,
    stringsAsFactors=FALSE
    )

triData1 = matrix(
    nrow=length(unique(DF$ORIGIN)),
    ncol=length(unique(DF$DEV)),
    dimnames=list(
        ORIGIN=sort(unique(DF$ORIGIN)), DEV=sort(unique(DF$DEV))
        )
    )

triData1[cbind(factor(DF$ORIGIN), DF$DEV)] = DF$VALUE

Inspecting triData1:

> triData1
      DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
  1981 5012 3257 2638  898 1734 2642 1828 599  54 172
  1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
  1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
  1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
  1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
  1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
  1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
  1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
  1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
  1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA

The base R method returns the triangle as a matrix of incremental losses. A triangle of cumulative losses can be obtained as follows:

> triData1c = t(apply(apply(triData1, 2, cumsum), 1, cumsum))
> triData1c
      DEV
ORIGIN     1     2     3      4      5      6     7     8     9    10
  1981  5012  8269 10907  11805  13539  16181 18009 18608 18662 18834
  1982  5118 12554 16303  22471  27321  31780 33505 34777 35366    NA
  1983  8528 21546 30176  38612  46056  53994 56368 58243    NA    NA
  1984 14183 33101 45942  59878  69481  80077 83435    NA    NA    NA
  1985 15275 42666 61778  82047  95436 106257    NA    NA    NA    NA
  1986 16788 49111 73480  94982 111288     NA    NA    NA    NA    NA
  1987 17345 53131 84426 107296     NA     NA    NA    NA    NA    NA
  1988 18696 60078 97538     NA     NA     NA    NA    NA    NA    NA
  1989 21829 65473    NA     NA     NA     NA    NA    NA    NA    NA
  1990 23892    NA    NA     NA     NA     NA    NA    NA    NA    NA

Triangle Method II: Using foreach + data.table

The foreach package provides a looping construct for executing R code repeatedly. It is especially useful in developing platform-independent applications that distribute tasks across multiple cores. data.table facilitates fast aggregation of large datasets, fast ordered joins and fast add/modify/delete of columns by group using no copies. In this example, we use foreach along with data.table’s rbindlist function to compile a triangle of incremental losses:

# Loss Triangle Compilation in R: Method II.
library("data.table")
library("foreach")

DF = setDT(fread(file="RAA.csv", sep=",", stringsAsFactors=FALSE))
lossList = split(DF, by="ORIGIN")

triData2 = foreach(
    i=1:length(lossList), 
    .final=function(ll) rbindlist(ll, fill=TRUE)
) %do% {
    iterList = setNames(
        as.list(lossList[[i]]$VALUE),
        nm=as.character(lossList[[i]]$DEV)
        )
    append(list(ORIGIN=names(lossList)[i]), iterList)
}

Inspecting triData2:

> print triData2
      DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
  1981 5012 3257 2638  898 1734 2642 1828 599  54 172
  1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
  1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
  1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
  1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
  1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
  1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
  1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
  1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
  1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA

In method II, the transactional loss data is first split into a list of data.tables by common ORIGIN which was then bound to lossList. The foreach constructor then iterates over lossList, transforming each data.table into what amounts to a single row in the incremental triangle. Notice that within the foreach constructor, the .final parameter is bound to a function: When specified, .final represents a function of one argument that is called to return the final result. data.table’s rbindlist takes as input a list of data.tables/data.frames and concatenates them horizontally, returning a single data.table.

When the resulting triangle of incremental losses is returned as a data.table/data.frame, we can generate the cumulative loss triangle as follows:

> devDF = triData2[,-c("ORIGIN")]
> devDF[,names(devDF):=Reduce(`+`, devDF, accumulate=TRUE)]
> triData2c = cbind(triData2[, .(ORIGIN)], devDF)
> triData2c
      DEV
ORIGIN     1     2     3      4      5      6     7     8     9    10
  1981  5012  8269 10907  11805  13539  16181 18009 18608 18662 18834
  1982  5118 12554 16303  22471  27321  31780 33505 34777 35366    NA
  1983  8528 21546 30176  38612  46056  53994 56368 58243    NA    NA
  1984 14183 33101 45942  59878  69481  80077 83435    NA    NA    NA
  1985 15275 42666 61778  82047  95436 106257    NA    NA    NA    NA
  1986 16788 49111 73480  94982 111288     NA    NA    NA    NA    NA
  1987 17345 53131 84426 107296     NA     NA    NA    NA    NA    NA
  1988 18696 60078 97538     NA     NA     NA    NA    NA    NA    NA
  1989 21829 65473    NA     NA     NA     NA    NA    NA    NA    NA
  1990 23892    NA    NA     NA     NA     NA    NA    NA    NA    NA

First ORIGIN is removed from triData2, so that devDF consists only of the development period columns. Next Reduce is applied to all rows, specifying accumulate=TRUE so that a cumulative sum is calculated. Finally, ORIGIN is vertically concatenated back with the cumulated losses by development period and bound to triData2c.

Triangle Method III: data.table

data.table’s dcast function is a powerful and flexibile utility used primarily for reshaping datasets. Fieldnames are specified and used as column and row indicies in terms of a formula expression that follows the form LHS ~ RHS (see the help page for more information). For Method III, dcast is used to convert the transactional loss data into an incremental loss triangle using a single function call:

# Loss Triangle Compilation in R: Method III.
library("data.table")

DF = setDT(fread(file="RAA.csv", sep=",", stringsAsFactors=FALSE))
triData3 = dcast(DF, ORIGIN ~ DEV, value.var="VALUE", fill=NA)

Inspecting triData3:

> triData3
      DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
  1981 5012 3257 2638  898 1734 2642 1828 599  54 172
  1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
  1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
  1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
  1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
  1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
  1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
  1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
  1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
  1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA

The first argument to dcast is the dataset. ORIGIN ~ DEV is then specified, with ORIGIN corresponding to the LHS and DEV the RHS. value.var specifies the field over which to aggregate, and fill specifies how populate invalid cell references (defaults to 0). Since dcast returns a data.table, the approach demonstrated in Method II can be used to obtain a triangle of cumulative losses, and so it will not be repeated here. However, the data.table approach enables us to perform a task which is non-trivial to perform using the other methods: Converting a loss triangle back into a dataset of transactional losses. This is accomplished with data.table’s melt function. In the next example, we convert triData3 back into the original RAA dataset:

# Convert a loss triangle back into transactional loss data.
# Assume triData3 exists and represents incremental loss 
# data compiled from RAA.csv.
suppressPackageStartupMessages(library("data.table"))

DF2 = data.table::melt(
    triData3, id.vars="ORIGIN", variable.name="DEV",
    value.name="VALUE", value.factor=FALSE,
    variable.factor=FALSE, na.rm=TRUE
    )

# Convert DEV back to numeric and order records.
DF2[,DEV:=as.integer(DEV)]
setorderv(DF2, c("ORIGIN", "DEV"))

Test equivalence between DF and DF2:

> all.equal(DF, DF2)
[1] TRUE 

The first argument to melt is the triangle of incremental losses. Next, we specify id.vars, variable.name and value.name, with value.name representing the measure of interest. value.factor and variable.factor specify whether the value(s) and/or variable(s) should be returned as factors, and na.rm indicates whether or not records containing NAs should be removed from the dataset.

Triangle Method IV: ChainLadder

The ChainLadder package provides various statistical methods which are typically used for the estimation of outstanding claims reserves in general insurance. It includes a suite of utilities that can be used to estimate outstanding claim liabilities, but those utilities will not be covered here. Of interest is the triangle class made available by the ChainLadder package. The as.triangle specification is provided below:

as.triangle(Triangle, origin="origin", dev="dev", value="value",…)

Triangle represents the transactional loss dataset. For origin, dev and value, specify the corresponding fieldnames present in the loss data. Referring again to the RAA dataset:

# Loss Triangle Compilation in R: Method IV.
library("ChainLadder")

DF = read.table(
    file="RAA.csv", header=TRUE, sep=",", row.names=NULL,
    stringsAsFactors=FALSE
    )

# Fieldnames in DF are "ORIGIN", "DEV", "VALUE".
triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")

Inspecting triData4:

> triData4
      DEV
ORIGIN    1    2    3    4    5    6    7   8   9  10
  1981 5012 3257 2638  898 1734 2642 1828 599  54 172
  1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
  1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
  1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
  1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
  1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
  1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
  1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
  1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
  1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA

The ChainLadder package comes with two convenience functions that allow for conversion from incremental-to-cumulative or cumulative-to-incremental triangles, identified as incr2cum and cum2incr respectively. In what follows, the former is used to create the cumulative counterpart of triData4:

> triData4c = incr2cum(triData4)
> triData4c
      DEV
ORIGIN     1     2     3      4      5      6     7     8     9    10
  1981  5012  8269 10907  11805  13539  16181 18009 18608 18662 18834
  1982  5118 12554 16303  22471  27321  31780 33505 34777 35366    NA
  1983  8528 21546 30176  38612  46056  53994 56368 58243    NA    NA
  1984 14183 33101 45942  59878  69481  80077 83435    NA    NA    NA
  1985 15275 42666 61778  82047  95436 106257    NA    NA    NA    NA
  1986 16788 49111 73480  94982 111288     NA    NA    NA    NA    NA
  1987 17345 53131 84426 107296     NA     NA    NA    NA    NA    NA
  1988 18696 60078 97538     NA     NA     NA    NA    NA    NA    NA
  1989 21829 65473    NA     NA     NA     NA    NA    NA    NA    NA
  1990 23892    NA    NA     NA     NA     NA    NA    NA    NA    NA

A point worth mentioning with respect to the behavior of incr2cum and cum2incr: Upon converting tabular loss data into a triangle object, there is no internal reference that tracks whether the data originally represented cumulative or incremental losses. incr2cum and cum2incr are convienience functions, and not triangle object class methods. If you have incremental losses that have been transformed into a triangle instance and then run incr2cum on that triangle, a triangle of incremental losses is returned as expected. However, if you pass that cumulative loss triangle incr2cum again, the function will cumulate the already cumulated losses. For example:

# Read in RAA.csv.
> DF = read.table(
     file="RAA.csv", header=TRUE, sep=",", row.names=NULL,
     stringsAsFactors=FALSE
     )
> triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")
> triData4c = incr2cum(triData4)

>triData4c
      DEV
ORIGIN     1     2     3      4      5      6     7     8     9    10
  1981  5012  8269 10907  11805  13539  16181 18009 18608 18662 18834
  1982  5118 12554 16303  22471  27321  31780 33505 34777 35366    NA
  1983  8528 21546 30176  38612  46056  53994 56368 58243    NA    NA
  1984 14183 33101 45942  59878  69481  80077 83435    NA    NA    NA
  1985 15275 42666 61778  82047  95436 106257    NA    NA    NA    NA
  1986 16788 49111 73480  94982 111288     NA    NA    NA    NA    NA
  1987 17345 53131 84426 107296     NA     NA    NA    NA    NA    NA
  1988 18696 60078 97538     NA     NA     NA    NA    NA    NA    NA
  1989 21829 65473    NA     NA     NA     NA    NA    NA    NA    NA
  1990 23892    NA    NA     NA     NA     NA    NA    NA    NA    NA

> # So far so good...Pass triData4c to incr2cum again.
> triData4c2 = incr2cum(triData4c)
> triData4c2
      DEV
ORIGIN    1     2     3     4     5      6      7      8      9     10
  1981 5012 13281 24188 35993 49532  65713  83722 102330 120992 139826
  1982  106  4391  9787 20453 34235  49834  65330  81499  98203     NA
  1983 3410 12402 26275 42416 61151  83365 106228 129694     NA     NA
  1984 5655 17210 32976 54242 77667 103750 130817     NA     NA     NA
  1985 1092 10657 26493 48662 74617 100797     NA     NA     NA     NA
  1986 1513  7958 19660 32595 48447     NA     NA     NA     NA     NA
  1987  557  4577 15523 27837    NA     NA     NA     NA     NA     NA
  1988 1351  8298 21410    NA    NA     NA     NA     NA     NA     NA
  1989 3133  8528    NA    NA    NA     NA     NA     NA     NA     NA
  1990 2063    NA    NA    NA    NA     NA     NA     NA     NA     NA

> Pass triData4c2 to incr2cum again.
> triData4c3 = incr2cum(triData4c2)
> triData4c3
      DEV
ORIGIN    1     2     3      4      5      6      7      8      9     10
  1981 5012 18293 42481  78474 128006 193719 277441 379771 500763 640589
  1982  106  4497 14284  34737  68972 118806 184136 265635 363838     NA
  1983 3410 15812 42087  84503 145654 229019 335247 464941     NA     NA
  1984 5655 22865 55841 110083 187750 291500 422317     NA     NA     NA
  1985 1092 11749 38242  86904 161521 262318     NA     NA     NA     NA
  1986 1513  9471 29131  61726 110173     NA     NA     NA     NA     NA
  1987  557  5134 20657  48494     NA     NA     NA     NA     NA     NA
  1988 1351  9649 31059     NA     NA     NA     NA     NA     NA     NA
  1989 3133 11661    NA     NA     NA     NA     NA     NA     NA     NA
  1990 2063    NA    NA     NA     NA     NA     NA     NA     NA     NA

This is not intended to be taken as an argument against using the ChainLadder package. I only highlight this behavior since a reasonable expectation might be that the result of calling incr2cum on an already-cumulated triangle would be the equivalent of a no-op, returning the cumulative triangle unmodified. But as demonstrated above, the state of a triangle is not preserved across invocations of incr2cum/cum2incr.