I work as an Actuarial Consultant in Reserving Analytics at CNA Financial in Chicago, IL. Throughout the Actuarial and Predictive Analytics teams, convention dictates that any new applications intended to be shared with and used by other analysts should be implemented in R. This makes a lot of sense given the nature of actuarial analysis, and despite not always wanting to do so, have followed the convention myself generally.
Having spent the majority of my time outside of CNA developing in Python, one of the things I miss most (when working in R) is Python’s in-built string methods and string manipulation functions. Even though base R exposes a large number of builtin functions intended for use with strings and string manipulation, the two methods I miss the most when working in R are startswith and endswith. Here’s how they work in Python:

>>> "apple".endswith("e")
True
>>> "rubbersoul".startswith("rubber")
True
>>> "billiondollar".endswith("babies")
False


Everything in Python is an object, and all Python string objects expose these two methods (and many others) by default. I wanted to expose the same functionality in R while maintaining the simplicity of the Python approach. I found this could be readily accomplished with R’s user-defined binary operators.

Binary Operators

User-defined binary operators in R consist of a string of characters between two “%” characters1. Some frequently used builtin binary operators include %/% for integer division and %%, which represents the modulus operator.
Declaring a binary operator is identical to declaring any other user-defined function, except for how the name is specified. Here’s how the %startswith% and %endswith% operators were implemented:

# Example of declaring user-defined binary operators in R 

`%startswith%` = function(teststr, testchars) {
    # `teststr`   => The target string.
    # `testchars` => The character(s) to test for in `teststr`. 
    return(grepl(paste0("^",testchars),teststr))
}

`%endswith%` = function(teststr, testchars) {
    # `teststr`   => The target string.
    # `testchars` => The character(s) to test for in `teststr`. 
    return(grepl(paste0(testchars,"$"),teststr))
}


Once read in to the current session, both individual strings and vectors of strings can be passed to either operator to test for the specified leading or trailing character(s).
For example, if I had the following vector:

months = c("January", "February", "March", "April", "May", "June", "July", 
           "August", "September", "October", "November", "December")


And wanted to test whether or not the elements of months start with "J", we could use %startswith% as follows:

> months %startswith% "J"
[1]  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE


Similarly, to check whether elements of months end with "ber", we’d run:

> months %endswith% "ber"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE


To obtain the indicies of the elements of months ending with "ber", we can use %endswith% in conjunction with which:

> months %endswith% "ber"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
> which(months %endswith% "ber")
[1]  9 10 11 12
> months[which(months %endswith% "ber")]
[1] "September" "October"   "November"  "December" 


These were trivial examples, but user-defined binary operators can be used to great effect when applied to a large collection of strings.

Footnotes:

  1. Adler, Joseph (2010). R in a Nutshell. O’Reilly Media Inc