Steve Ennis Posted January 28, 2020 Share Posted January 28, 2020 I am trying to modify a TIBCO provided TERR function.https://community.spotfire.com/modules/cdf-data-function-tibco-spotfirerThe script works fantastic for one dataset. I am trying to produce a cumulative distribution curve for each bin on the same graph.The problem is the 'value' and 'prob' need to be calculated for each bin. I have 6 bins, and I am looking at 6, 12, 24 month production for both oil and gas. I have copied the same CDF data function several times and had the function output data frames of the data I need. Then I have combined the data from the multiple tables. This works but is an absolute mess. The dxp included sort of shows what I am trying to accomplish with bins carried through on the TIBCO provided CDF. If this CDF could be modified to calcualte distributions for all Bins it would great. Perhaps the R function Ecdf could be used Ihave discovered a problem with the original CDF. WhenI choose a different data set which doesn't populate all my bins(i.e. choosing a subset of the main data), the data bins which have no data do not 'overwrite' or negate the exisitng Data Functin output. The attached word document shows bar graphs from 2 different companieswell count binned by proppant. The top table is a large data set. The bottom is a smaller producer and only has data for Bins 1 & 2. While the visual is not the best, Bins 3-6 are actually the same for both graphs. The data was not 'flushed'/cleared/negated when the data function was employed againstthe 2nd query. The lastvisual in the word document shows what I amstriving to accomplish with a slick R script that provides one table output per production grouping(i.e. 6 month). Link to comment Share on other sites More sharing options...
Gaia Paolini Posted February 28, 2020 Share Posted February 28, 2020 Hi The following script should work to produce one cdf curve per Bin, with a single measure, which is the column you choose e.g. 6MonthCumGas. You will need to install the data.table library. The column containing the bin has been named 'group'. Bin6 has only one value so it produces a NaN which is ignored. I am attaching a screen shot (value is not in log scale) Gaia ##################################################### suppressWarnings(suppressPackageStartupMessages(library(data.table))) cdfTable=data.table(value=analysisColumn,group=analysisColumn2) setorder(cdfTable,group,value) cdf=function(x) { n=length(x) return (((1:n)-1)/(n-1)) } cdfTable[,prob:= cdf(value),by=group] ### end ### Link to comment Share on other sites More sharing options...
Steve Ennis Posted February 28, 2020 Author Share Posted February 28, 2020 Thank you so much. I am going to give this script a try. I discovered the not overwriting existing data problem was a caching problem on my part. But the script to output the CDF by group will immensely clean up my DXP file. Link to comment Share on other sites More sharing options...
Richard Lake 4 Posted February 26 Share Posted February 26 I would like to group by two columns; "fieldname" and "holedirectioncode", so basically want to run CDF of b-values on vertical field wells, horizontal field wells, etc. I tried using script above in a new function, but got an error. R package data.table was installed. If anyone could provide me with a dxp to follow, that would be great. Thanks! Link to comment Share on other sites More sharing options...
Gaia Paolini Posted February 27 Share Posted February 27 the data function above is a TERR script. Looks like you defined it as a Python script. If you rebuild it as TERR it should work. Link to comment Share on other sites More sharing options...
Richard Lake 4 Posted February 27 Share Posted February 27 You're right. Thanks! I got it to work. To build off of this, if I have two groups that I want the function to run on, like run the function on wells with unique field name (group1) and well orientation (group2), would I just? cdfTable=data.table(value=analysisColumn,group=analysisColumn2, group2=analysisColumn3) setorder(cdfTable,group,group1,value) cdf=function(x) { n=length(x) return (((1:n)-1)/(n-1)) } cdfTable[,prob:= cdf(value),by=group,group2] ### end ### Link to comment Share on other sites More sharing options...
Gaia Paolini Posted February 28 Share Posted February 28 Apart from the small typo (group2 is called group1 in setorder) , the by clause for multiple groups needs to become a vector and use the column names within quotes: cdfTable[,prob:= cdf(value),by=c('group','group2')] 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now