Lets say you have a data.table with rating factors as columns associated with different coverages. For a simple example, let’s assume a 2-coverage, 2-factor rating plan with a single BI factor `bi_occupancy_factor`

and a single property factor `prop_tiv_factor`

. Also assume factors have been attached to a sample cohort of policies based on the risk characteristics of the exposure. We construct the table below:

```
library("data.table")
= data.table(
DF policyno=paste0("0000", 1:5), locationno=rep(1, 5),
prop_tiv_factor=c(1.112, 1.255, 1.3125, 1.075, 1.3125),
bi_occupancy_factor=c(1.015, 1.0675, 1.0925, 1.0675, 1.0475),
stringsAsFactors=FALSE
)
```

The resulting table looks like:

```
policyno locationno prop_tiv_factor bi_occupancy_factor
1: 00001 1 1.1120 1.0150
2: 00002 1 1.2550 1.0675
3: 00003 1 1.3125 1.0925
4: 00004 1 1.0750 1.0675
5: 00005 1 1.3125 1.0475
```

Our goal is to create a new column which represents a combined property factor. With only a single property factor, this is trivial, since we can simply set the combined property factor to `prop_tiv_factor`

:

`:=prop_tiv_factor] DF[,prop_factor`

Resulting in:

```
policyno locationno prop_tiv_factor bi_occupancy_factor prop_factor
1: 00001 1 1.1120 1.0150 1.1120
2: 00002 1 1.2550 1.0675 1.2550
3: 00003 1 1.3125 1.0925 1.3125
4: 00004 1 1.0750 1.0675 1.0750
5: 00005 1 1.3125 1.0475 1.3125
```

Assume the decision has been made to incorporate a new property rating factor into the plan, `prop_age_bld_factor`

. Our revised initial data.table becomes:

```
DF = data.table(
policyno=paste0("0000", 1:5), locationno=rep(1, 5),
prop_tiv_factor=c(1.112, 1.255, 1.3125, 1.075, 1.3125),
bi_occupancy_factor=c(1.015, 1.0675, 1.0925, 1.0675, 1.0475),
prop_age_bld_factor=c(1.173, 1.21235, 1.0935, 1.2815, 1.1115),
stringsAsFactors=FALSE
)
```

Creating our combined property rating factor is still straightforward: This time, we simply set `prop_factor`

to the product of `prop_tiv_factor`

and `prop_age_bld_factor`

for every observation in DF:

`:=prop_tiv_factor * prop_age_bld_factor] DF[,prop_factor`

Which yields:

```
policyno locationno prop_tiv_factor bi_occupancy_factor prop_age_bld_factor prop_factor
1: 00001 1 1.1120 1.0150 1.17300 1.304376
2: 00002 1 1.2550 1.0675 1.21235 1.521499
3: 00003 1 1.3125 1.0925 1.09350 1.435219
4: 00004 1 1.0750 1.0675 1.28150 1.377613
5: 00005 1 1.3125 1.0475 1.11150 1.458844
```

Next the request is made to incorporate another property rating factor, this time `prop_deductible_factor`

. Our dataset now becomes:

```
= data.table(
DF policyno=paste0("0000", 1:5), locationno=rep(1, 5),
prop_tiv_factor=c(1.112, 1.255, 1.3125, 1.075, 1.3125),
bi_occupancy_factor=c(1.015, 1.0675, 1.0925, 1.0675, 1.0475),
prop_age_bld_factor=c(1.173, 1.21235, 1.0935, 1.2815, 1.1115),
prop_deductible_factor=c(1.025, 1.025, 1.755, 1.025, 1.1665),
stringsAsFactors=FALSE
)
```

We could continue as before, updating `prop_factor`

to include `prop_deductible_factor`

in the product. However, this approach is not scalable, and each time you go into the code to make changes, you increase the likelihood of introducing errors. We need a more general solution to the problem, specifically a method which allows us to perform an operation on some logical grouping of columns, where the exact number (of columns) may not be known until the moment of execution.

## Using `Reduce`

and `%like%`

The `Reduce`

function successively applies a function to the elements of an object from left to right or right to left, respectively. The simplest example is using `Reduce`

in place of `sum`

:

```
> Reduce(`+`, c(1, 2, 3, 4, 5))
15] [
```

We can also replicate `prod`

```
> Reduce(`*`, c(1, 2, 3, 4, 5))
1] 120 [
```

`Reduce`

is capable of working with more complex data structures as well.

The `%like%`

function comes from data.table, and works similar to SQL’s `LIKE`

operator. It is essentially shorthand for a regular expression pattern matching subroutine, which returns a TRUE/FALSE based on whether or not the target matches the pattern. To demonstrate, we identify which month names end in `y`

:

```
> month.name %like% "y$"
1] TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE [
```

To return the actual month names ending in `y`

, use the previous expression as a mask:

```
> month.name[month.name %like% "y$"]
1] "January" "February" "May" "July" [
```

Note that in regular expression parlance, `$`

means to match `y`

only when it occurs at the end of the string. Similarly, `^`

indicates the match must occur at the beginning of the string. In addition, `.+`

matches one or more characters, which can be letters, numbers, punctuation or whitespace. We can leverage `Reduce`

and `%like%`

within the context of data.table to take the product of a potentially variable number of factor columns. Using our latest version of DF with 3 `prop_*_factor`

columns, we have:

```
:=
DF[,prop_factorReduce(`*`, .SD), .SDcols=names(DF)[names(DF) %like% "^prop.+factor$"]
]
```

To break this down a bit, note that the value assigned to `.SDcols`

is doing nothing more than filtering to only those columns starting with `prop`

and ending with `factor`

. We can see this by running the expression in isolation:

```
> names(DF)[names(DF) %like% "^prop.+factor$"
1] "prop_tiv_factor" "prop_age_bld_factor" "prop_deductible_factor" [
```

`.SD`

is one of data.table’s *special symbols*, which serve as shortcuts for frequently used operations. In this particular case, `.SD`

is a stand-in for the columns we’re interested in taking the product of, and `.SDcols`

represents the names of the columns over which the operation is to be applied.

In data.table, `:=`

represents the “assignment-by-reference” operator. Notice that when a column is defined via `:=`

, the update is made by reference, so it isn’t necessary to re-assign the column name to the table as would typically be necessary when working with standard data.frame objects.

Does our solution generalize to any number of columns? Let’s add a few more factors and test it out:

```
= data.table(
DF policyno=paste0("0000", 1:5), locationno=rep(1, 5),
prop_tiv_factor=c(1.112, 1.255, 1.3125, 1.075, 1.3125),
bi_occupancy_factor=c(1.015, 1.0675, 1.0925, 1.0675, 1.0475),
prop_age_bld_factor=c(1.173, 1.21235, 1.0935, 1.2815, 1.1115),
prop_deductible_factor=c(1.025, 1.025, 1.755, 1.025, 1.1665),
prop_nbr_stories_factor=c(1.015, 1.015, 1.015, 1.015, 1.1175),
prop_age_roof_factor=c(1.033, 1.0373, 1.3573, 1.033, 1.0373),
bi_protect_class_factor=c(1., 1., 1., 1.25, 1.25),
prop_construction_factor=c(1.0235, 1.0744, 1.1985, 1.0235, 1.0744),
stringsAsFactors=FALSE
)
```

Which gives us

```
policyno locationno prop_tiv_factor bi_occupancy_factor prop_age_bld_factor prop_deductible_factor prop_nbr_stories_factor
1: 00001 1 1.1120 1.0150 1.17300 1.0250 1.0150
2: 00002 1 1.2550 1.0675 1.21235 1.0250 1.0150
3: 00003 1 1.3125 1.0925 1.09350 1.7550 1.0150
4: 00004 1 1.0750 1.0675 1.28150 1.0250 1.0150
5: 00005 1 1.3125 1.0475 1.11150 1.1665 1.1175
prop_age_roof_factor bi_protect_class_factor prop_construction_factor
1: 1.0330 1.00 1.0235
2: 1.0373 1.00 1.0744
3: 1.3573 1.00 1.1985
4: 1.0330 1.25 1.0235
5: 1.0373 1.25 1.0744
```

Our logic should return the product of prop_tiv_factor, prop_age_bld_factor, prop_deductible_factor, prop_nbr_stories_factor, prop_age_roof_factor and prop_construction_factor without modifying our earlier implementation. Let’s compare the results of our dynamic, general expression vs. explicitly specifying the column names to multiply:

```
# Compare results from explicit and implicit column multiplication.
:=
DF[,prop_factorReduce(`*`, .SD), .SDcols=names(DF)[names(DF) %like% "^prop.+factor$"]
]= DF[,prop_factor]
prop_factor1
# Compute product specifying prop-prefixed columns explicitly.
= DF$prop_tiv_factor DF$prop_age_bld_factor * DF$prop_deductible_factor *
prop_factor2 $prop_nbr_stories_factor * DF$prop_age_roof_factor * DF$prop_construction_factor DF
```

Comparing results gives

```
> prop_factor1
1] 1.434765 1.764136 4.158868 1.515323 2.119393
[> prop_factor2
1] 1.434765 1.764136 4.158868 1.515323 2.119393 [
```