When working with discrete-valued datasets, percentiles may not be well-defined. In order to perform percentile matching, we can use the smoothed empirical percentile. Let:

• $$p$$: Target quantile

• $$n$$: Size of sample

• $$x_{k}$$: $$k^{th}$$ order statistic of sample

• $$\hat{\pi}_{p}$$: Smoothed 100$$p^{th}$$ percentile

• $$a$$: Integer floor of $$(n+1)p$$, $$\lfloor(n+1)p\rfloor$$

The smoothed empirical percentile is then given by:

$$\hat{\pi}_{p} = \big((n+1)p -a \big)x_{(a+1)} + \big(a+1-(n+1)p \big)x_{(a)}.$$

To illustrate, consider the following dataset:

$$25, 55, 60, 75, 85, 110, 135, 160, 165, 185$$

To calculate the 65th percentile, first sort values in ascending order. Then:

• Compute $$(n+1)*p = 11*.65 = 7.15$$

• $$a = \lfloor(n+1)p\rfloor = \lfloor7.15\rfloor = 7$$

• $$x_{(a+1)} = x_{(8)} = 160$$ and $$x_{(a)} = x_{(7)} = 135$$

Then substitute these values into the expression for the smoothed empirical percentile:

$$$$\begin{split} \hat{\pi}_{p} & = \big((n+1)p -a \big)x_{(a+1)} + \big(a+1-(n+1)p \big)x_{(a)} \\ & = \big(7.15 - 7 \big)*160 + \big(8-7.15 \big)*135 \\ & = 138.75 \end{split}$$$$

Implementation

What follows is an implementation of the smoothed empirical percentile in Python:

import math

def smoother(data, p):
"""
Determine the smoothed empirical percentile p
for dataset data.
"""

d = sorted(data)
n = len(d)
a = int(math.floor((n+1)*p))

if ((n+1)*p).is_integer():
sep = d[int(((n+1)*p))-1]

else:
sep = ((((n+1)*p)-a)*d[a])+((a+1-(n+1)*p)*d[a-1])

return(sep)

testvals = [25, 55, 60, 75, 85, 110, 135, 160, 165, 185]
smoother(testvals, .65)
# 138.75