Jump to content

How does the percentile function calculate very high or very low percentile values (eg: 99.865th percentile or 0.135th percentile used for exponential distribution capability calcs) if the data set is small and doesn't have enough data points?


Sachin Joshi 3

Recommended Posts

I am using a standard heuristic to calculate process capability for an exponential distribution here:

https://www.spcforexcel.com/knowledge/process-capability/process-capability-and-non-normal-data

I used the percentile function but I am concerned if my data set is too small or has extreme outliers. I wanted to understand how spotfire reports the calculation for such high percentiles and whether it extrapolates based on any assumptions about the distribution etc.

Link to comment
Share on other sites

Spotfire does a linear interpolation: if you type 'percentile' on the Help search, this is basically what it says.

(It produces the same results as the Python function numpy.percentile using method='linear'.)

There is no assumption on the shape of the distribution.

If you don't have much data, fitting the data to a distribution might not give you a precise quantile anyway, if the fit is not very good.

If you want to have more choice in the method for calculating percentiles, you could try using numpy.percentile in a Python data function.

For instance: this is a Python data function that would calculate the qth percentile with all the listed methods.

Inputs:

  • q (a real number between 0 and 100)
  • df ( a data table with column [Value] containing the data)

Output:

  • pc_result: a data table with the percentile calculations
import pandas as pdimport numpy as npdata=df[['name of data column']]methods=['linear','lower','higher','midpoint','nearest','inverted_cdf','averaged_inverted_cdf','closest_observation','interpolated_inverted_cdf','hazen', 'weibull','median_unbiased','normal_unbiased']nm=len(methods)qq=[0.0]*nmfor i in range(nm): qq=np.percentile(data, [q], method=methods)[0]pc_result=pd.DataFrame({'method':methods,'q':[q]*nm,'percentile':qq})

If there are extreme outliers, maybe should be removed. It really depends on whether they are bona-fide results or anomalies that should not be there, but that is a different analysis.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...