Date Tags R

I work as an Actuarial Consultant CNA Financial. Throughout the Actuarial and Predictive Analytics teams, convention dictates that new applications intended to be shared with and used by other analysts should be implemented in R. This makes a lot of sense given the nature of actuarial analysis, and despite not always wanting to do so, have generally followed the convention myself. Having spent the majority of my time outside of CNA developing in Python, one of the things I miss most (when working in R) is Python’s in-built string methods and string manipulation functions. The two methods I miss the most are startswith and endswith. Here’s and example of they work in Python:

>>> "apple".endswith("e")
True
>>> "rubbersoul".startswith("rubber")
True
>>> "billiondollar".endswith("babies")
False

Everything in Python is an object, and all Python string objects expose these two methods (and many others). I wanted to make the same functionality available in R while maintaining the simplicity of the Python approach. I found a way to accomplish this using R’s user-defined binary operators.

Binary Operators

User-defined binary operators in R consist of a string of characters between two % characters[ref]Adler, Joseph (2010). R in a Nutshell. O’Reilly Media Inc[/ref]. Some frequently used builtin binary operators include %/% for integer division and %%, which represents the modulus operator. Declaring a binary operator is identical to declaring any other function, except for the name specification. Here’s an implementation of %startswith% and %endswith%:

# Example of declaring user-defined binary operators in R.

`%startswith%` = function(teststr, testchars) {
    #   `teststr`: The target string.
    # `testchars`: The character(s) to test for in `teststr`. 
    return(grepl(paste0("^", testchars), teststr))
}

`%endswith%` = function(teststr, testchars) {
    #   `teststr`: The target string.
    # `testchars`: The character(s) to test for in `teststr`. 
    return(grepl(paste0(testchars, "$"), teststr))
}

Once read in to the current session, both individual strings and vectors of strings can be passed to either operator to test for the specified leading or trailing character(s). For example, if I had the following vector:

months = c("January", "February", "March", "April", "May", "June", "July", 
           "August", "September", "October", "November", "December")

And wanted to test whether or not the elements of months start with “J”,
%startswith% could be used as follows:

> months %startswith% "J"
[1]  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

Similarly, to check whether elements of months end with “ber”, we’d run:

> months %endswith% "ber"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

To obtain the indicies of the elements of months ending with “ber”, we can use %endswith% in conjunction with which:

> months %endswith% "ber"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
> which(months %endswith% "ber")
[1]  9 10 11 12
> months[which(months %endswith% "ber")]
[1] "September" "October"   "November"  "December" 

Footnotes: