Thiago Leo Posted December 11, 2018 Share Posted December 11, 2018 Hello everyone, I'm new on plataform Statistica and i need to present to my companny some requisites, if the software can attend. Could you help me - Read file homesite.train.csv. (obs: this file was provided in one of Kaggle's competitions). - Convert the columns with categorical data (Strings) to numeric. - Save converted data to Hadoop HDFS. - From Spark, read the HDFS data and load them into an RDD. Cache and demonstrate on the Spark console. - Divide the RDD between training and validation (70% -30%) randomly. - With the training data, train a model using Spark's RandomForestClassifier. - Validate the trained model with the validation data. - Calculate the AUC-ROC score in Spark and display in the solution interface. Link to comment Share on other sites More sharing options...
Neema Pitchiah Posted February 21, 2019 Share Posted February 21, 2019 Hello, It seems like most of what you described can be done in Statistica. I added a screenshot with a bunch of relevant nodes. I think Statistica can export to HDFS in general - I am not too sure about loading them into an RDD. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now