How does the percentile function calculate very high or very low percentile values (eg: 99.865th percentile or 0.135th percentile used for exponential distribution capability calcs) if the data set is small and doesn't have enough data points?

Sachin Joshi 3 · April 21, 2023

I am using a standard heuristic to calculate process capability for an exponential distribution here:

https://www.spcforexcel.com/knowledge/process-capability/process-capability-and-non-normal-data

I used the percentile function but I am concerned if my data set is too small or has extreme outliers. I wanted to understand how spotfire reports the calculation for such high percentiles and whether it extrapolates based on any assumptions about the distribution etc.

Gaia Paolini · April 25, 2023

Spotfire does a linear interpolation: if you type 'percentile' on the Help search, this is basically what it says.

(It produces the same results as the Python function numpy.percentile using method='linear'.)

There is no assumption on the shape of the distribution.

If you don't have much data, fitting the data to a distribution might not give you a precise quantile anyway, if the fit is not very good.

If you want to have more choice in the method for calculating percentiles, you could try using numpy.percentile in a Python data function.

For instance: this is a Python data function that would calculate the qth percentile with all the listed methods.

Inputs:

q (a real number between 0 and 100)
df ( a data table with column [Value] containing the data)

Output:

pc_result: a data table with the percentile calculations

import pandas as pdimport numpy as npdata=df[['name of data column']]methods=['linear','lower','higher','midpoint','nearest','inverted_cdf','averaged_inverted_cdf','closest_observation','interpolated_inverted_cdf','hazen',         'weibull','median_unbiased','normal_unbiased']nm=len(methods)qq=[0.0]*nmfor i in range(nm):    qq=np.percentile(data, [q],  method=methods)[0]pc_result=pd.DataFrame({'method':methods,'q':[q]*nm,'percentile':qq})

If there are extreme outliers, maybe should be removed. It really depends on whether they are bona-fide results or anomalies that should not be there, but that is a different analysis.

Tomas Jurczyk · April 25, 2023

If you are interested in process capability for normal or non-normal (e.g. exponential) data in Spotfire. There is a simple alternative way how to enable this functionality through the Statistica data function. Will that be something of your interest?

Sachin Joshi 3 · April 25, 2023

Thanks Gaia, that makes sense.

Sign In

How does the percentile function calculate very high or very low percentile values (eg: 99.865th percentile or 0.135th percentile used for exponential distribution capability calcs) if the data set is small and doesn't have enough data points?

Recommended Posts

Sachin Joshi 3

Link to comment

Share on other sites

Gaia Paolini

Link to comment

Share on other sites

Tomas Jurczyk

Link to comment

Share on other sites

Sachin Joshi 3

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Industries