ECDF R Script Manipulation & TIBCO CDF non values don't overwrite old data if you use a smaller data set

Steve Ennis · January 28, 2020

I am trying to modify a TIBCO provided TERR function.https://community.spotfire.com/modules/cdf-data-function-tibco-spotfirerThe script works fantastic for one dataset. I am trying to produce a cumulative distribution curve for each bin on the same graph.The problem is the 'value' and 'prob' need to be calculated for each bin. I have 6 bins, and I am looking at 6, 12, 24 month production for both oil and gas. I have copied the same CDF data function several times and had the function output data frames of the data I need. Then I have combined the data from the multiple tables. This works but is an absolute mess. The dxp included sort of shows what I am trying to accomplish with bins carried through on the TIBCO provided CDF. If this CDF could be modified to calcualte distributions for all Bins it would great. Perhaps the R function Ecdf could be used

Ihave discovered a problem with the original CDF. WhenI choose a different data set which doesn't populate all my bins(i.e. choosing a subset of the main data), the data bins which have no data do not 'overwrite' or negate the exisitng Data Functin output. The attached word document shows bar graphs from 2 different companieswell count binned by proppant. The top table is a large data set. The bottom is a smaller producer and only has data for Bins 1 & 2. While the visual is not the best, Bins 3-6 are actually the same for both graphs. The data was not 'flushed'/cleared/negated when the data function was employed againstthe 2nd query.

The lastvisual in the word document shows what I amstriving to accomplish with a slick R script that provides one table output per production grouping(i.e. 6 month).

Gaia Paolini · February 28, 2020

Hi

The following script should work to produce one cdf curve per Bin, with a single measure, which is the column you choose e.g. 6MonthCumGas.

You will need to install the data.table library. The column containing the bin has been named 'group'.

Bin6 has only one value so it produces a NaN which is ignored. I am attaching a screen shot (value is not in log scale)

Gaia

#####################################################

suppressWarnings(suppressPackageStartupMessages(library(data.table)))

cdfTable=data.table(value=analysisColumn,group=analysisColumn2)

setorder(cdfTable,group,value)

cdf=function(x) {

n=length(x)

return (((1:n)-1)/(n-1))

}

cdfTable[,prob:= cdf(value),by=group]

### end ###

Steve Ennis · February 28, 2020

Thank you so much. I am going to give this script a try. I discovered the not overwriting existing data problem was a caching problem on my part. But the script to output the CDF by group will immensely clean up my DXP file.

Richard Lake 4 · February 26

I would like to group by two columns; "fieldname" and "holedirectioncode", so basically want to run CDF of b-values on vertical field wells, horizontal field wells, etc. I tried using script above in a new function, but got an error. R package data.table was installed. If anyone could provide me with a dxp to follow, that would be great. Thanks!

Gaia Paolini · February 27

the data function above is a TERR script. Looks like you defined it as a Python script. If you rebuild it as TERR it should work.

Richard Lake 4 · February 27

You're right. Thanks! I got it to work.

To build off of this, if I have two groups that I want the function to run on, like run the function on wells with unique field name (group1) and well orientation (group2), would I just?

cdfTable=data.table(value=analysisColumn,group=analysisColumn2, group2=analysisColumn3)

setorder(cdfTable,group,group1,value)

cdf=function(x) {

n=length(x)

return (((1:n)-1)/(n-1))

}

cdfTable[,prob:= cdf(value),by=group,group2]

### end ###

Gaia Paolini · February 28

Apart from the small typo (group2 is called group1 in setorder) , the by clause for multiple groups needs to become a vector and use the column names within quotes:

cdfTable[,prob:= cdf(value),by=c('group','group2')]

Sign In

ECDF R Script Manipulation & TIBCO CDF non values don't overwrite old data if you use a smaller data set

Recommended Posts

Steve Ennis

Link to comment

Share on other sites

Gaia Paolini

Link to comment

Share on other sites

Steve Ennis

Link to comment

Share on other sites

Richard Lake 4

Link to comment

Share on other sites

Gaia Paolini

Link to comment

Share on other sites

Richard Lake 4

Link to comment

Share on other sites

Gaia Paolini

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Industries