I work as an Actuarial Consultant in Reserving Analytics at CNA Financial in Chicago, IL. Throughout the Actuarial and Predictive Analytics teams, convention dictates that any new applications intended to be shared with and used by other analysts should be implemented in R. This makes a lot of sense given the nature of actuarial analysis, and despite not always wanting to do so, have followed the convention myself generally.
Having spent the majority of my time outside of CNA developing in Python, one of the things I miss most (when working in R) is Python’s in-built string methods and string manipulation functions. Even though base R exposes a large number of builtin functions intended for use with strings and string manipulation, the two methods I miss the most when working in R
are startswith
and endswith
. Here’s how they work in Python:
>>> "apple".endswith("e")
True
>>> "rubbersoul".startswith("rubber")
True
>>> "billiondollar".endswith("babies")
False
Everything in Python is an object, and all Python string objects expose these two methods (and many others) by default.
I wanted to expose the same functionality in R while maintaining the simplicity of the Python approach. I found this could be readily accomplished with R’s user-defined binary operators.
Binary Operators
User-defined binary operators in R consist of a string of characters between two “%” characters1. Some frequently used builtin binary operators include %/%
for integer division and %%
, which represents the modulus operator.
Declaring a binary operator is identical to declaring any other user-defined function, except for how the name is specified.
Here’s how the %startswith%
and %endswith%
operators were implemented:
# Example of declaring user-defined binary operators in R
`%startswith%` = function(teststr, testchars) {
# `teststr` => The target string.
# `testchars` => The character(s) to test for in `teststr`.
return(grepl(paste0("^",testchars),teststr))
}
`%endswith%` = function(teststr, testchars) {
# `teststr` => The target string.
# `testchars` => The character(s) to test for in `teststr`.
return(grepl(paste0(testchars,"$"),teststr))
}
Once read in to the current session, both individual strings and vectors of strings can be passed to either operator to test for the specified leading or trailing character(s).
For example, if I had the following vector:
months = c("January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December")
And wanted to test whether or not the elements of months
start with "J"
, we could use %startswith%
as follows:
> months %startswith% "J"
[1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
Similarly, to check whether elements of months
end with "ber"
, we’d run:
> months %endswith% "ber"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
To obtain the indicies of the elements of months
ending with "ber"
, we can use %endswith%
in conjunction with which
:
> months %endswith% "ber"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
> which(months %endswith% "ber")
[1] 9 10 11 12
> months[which(months %endswith% "ber")]
[1] "September" "October" "November" "December"
These were trivial examples, but user-defined binary operators can be used to great effect when applied to a large collection of strings.
Footnotes:
- Adler, Joseph (2010). R in a Nutshell. O’Reilly Media Inc