When working with discrete-valued datasets, percentiles may not be well-defined. In order to perform percentile matching, we use the smoothed empirical percentile 1.

Let:

• $$p$$ = the percentile of interest
• $$n$$ = the size of the sample
• $$x_{(k)}$$ = the $$k^{th}$$ order statistic of the sample
• $$\hat{\pi}_{p}$$ = the smoothed 100$$p^{th}$$ percentile

Then the smoothed empirical percentile is given by:

$$\hat{\pi}_{p} = \big((n+1)p -a \big)x_{(a+1)} + \big(a+1-(n+1)p \big)x_{(a)},$$

where $$a$$ represents the integer floor of $$(n+1)p$$, $$\lfloor(n+1)p\rfloor$$.

To illustrate, consider the following dataset:

$$25, 55, 60, 75, 85, 110, 135, 160, 165, 185$$

To calculate the $$65^{th}$$ percentile:

• If not already, sort data elements in ascending order
• Calculate $$(n+1)*p = 11*.65 = 7.15$$
• From $$(n+1)*p$$ above, we find $$a = \lfloor(n+1)p\rfloor = \lfloor7.15\rfloor = 7$$
• Given $$a=7$$, $$x_{(a+1)} = x_{(8)}$$ = 160 and $$x_{(a)} = x_{(7)} = 135$$

Then substitute these values into the expression for the smoothed empirical percentile:

$$\begin{equation} \begin{split} \hat{\pi}_{p} & = \big((n+1)p -a \big)x_{(a+1)} + \big(a+1-(n+1)p \big)x_{(a)} \\ & = \big(7.15 - 7 \big)*160 + \big(8-7.15 \big)*135 \\ & = 138.75 \end{split} \end{equation}$$

### Implementation

What follows is an implementation of the smoothed empirical percentile logic written in Python:

import math

def smoother(data, p):
"""Determine the smoothed empirical percentile
p for dataset data."""

d = sorted(data)
n = len(d)
a = int(math.floor((n+1)*p))

if ((n+1)*p).is_integer():
sep = d[int(((n+1)*p))-1]

else:
sep = ((((n+1)*p)-a)*d[a])+((a+1-(n+1)*p)*d[a-1])

return(sep)

# test smoother =>
testvals = [25, 55, 60, 75, 85, 110, 135, 160, 165, 185]

smoother(testvals, .65)
# returns 138.75


### Footnotes:

1. Klugman, S.A., Panjer, H.H. and Willmot, G.E. Loss Models: From Data to Decisions, Third Edition (2008)