Jump to content
We've recently updated our Privacy Statement, available here. ×

ECDF R Script Manipulation & TIBCO CDF non values don't overwrite old data if you use a smaller data set


Steve Ennis

Recommended Posts

I am trying to modify a TIBCO provided TERR function.https://community.spotfire.com/modules/cdf-data-function-tibco-spotfirerThe script works fantastic for one dataset. I am trying to produce a cumulative distribution curve for each bin on the same graph.The problem is the 'value' and 'prob' need to be calculated for each bin. I have 6 bins, and I am looking at 6, 12, 24 month production for both oil and gas. I have copied the same CDF data function several times and had the function output data frames of the data I need. Then I have combined the data from the multiple tables. This works but is an absolute mess. The dxp included sort of shows what I am trying to accomplish with bins carried through on the TIBCO provided CDF. If this CDF could be modified to calcualte distributions for all Bins it would great. Perhaps the R function Ecdf could be used

Ihave discovered a problem with the original CDF. WhenI choose a different data set which doesn't populate all my bins(i.e. choosing a subset of the main data), the data bins which have no data do not 'overwrite' or negate the exisitng Data Functin output. The attached word document shows bar graphs from 2 different companieswell count binned by proppant. The top table is a large data set. The bottom is a smaller producer and only has data for Bins 1 & 2. While the visual is not the best, Bins 3-6 are actually the same for both graphs. The data was not 'flushed'/cleared/negated when the data function was employed againstthe 2nd query.

The lastvisual in the word document shows what I amstriving to accomplish with a slick R script that provides one table output per production grouping(i.e. 6 month).

Link to comment
Share on other sites

  • 5 weeks later...

Hi

The following script should work to produce one cdf curve per Bin, with a single measure, which is the column you choose e.g. 6MonthCumGas.

You will need to install the data.table library. The column containing the bin has been named 'group'.

Bin6 has only one value so it produces a NaN which is ignored. I am attaching a screen shot (value is not in log scale)

Gaia

#####################################################

suppressWarnings(suppressPackageStartupMessages(library(data.table)))

 

cdfTable=data.table(value=analysisColumn,group=analysisColumn2)

setorder(cdfTable,group,value)

 

cdf=function(x) {

n=length(x)

return (((1:n)-1)/(n-1))

}

 

cdfTable[,prob:= cdf(value),by=group]

 

### end ###

Link to comment
Share on other sites

  • 3 years later...

I would like to group by two columns; "fieldname" and "holedirectioncode", so basically want to run CDF of b-values on vertical field wells, horizontal field wells, etc. I tried using script above in a new function, but got an error.  R package data.table was installed.  If anyone could provide me with a dxp to follow, that would be great.  Thanks!

2024-02-26_15-57-49.jpg

Link to comment
Share on other sites

You're right.  Thanks! I got it to work. 

To build off of this, if I have two groups that I want the function to run on, like run the function on wells with unique field name (group1) and well orientation (group2), would I just?

cdfTable=data.table(value=analysisColumn,group=analysisColumn2, group2=analysisColumn3)

setorder(cdfTable,group,group1,value)

cdf=function(x) {

n=length(x)

return (((1:n)-1)/(n-1))

}

 

cdfTable[,prob:= cdf(value),by=group,group2] 

 

### end ###

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...