Andy Pang Posted February 8, 2019 Share Posted February 8, 2019 Hi, Suppose we have some data in a Hadoop on-premise, while some other data are stored in other data sources in the public cloud. Does SDS support execution of the operation (eg. k-means) using both of these data sources and come up with a combined result What will be the best way to achieve this Cheers, Andy Link to comment Share on other sites More sharing options...
Steven Hillion Posted February 8, 2019 Share Posted February 8, 2019 Yes, SDS can handle processing from multiple sources to be precise, itsupports hybrid data sources within oneworkflow. So you can take a dataset in Oracle/Teradata/whatever, run whatever transformations you like on it (filtering, aggregation, windowing, etc.) then move it into Hadoop, combine it with a Hadoop dataset, and then build a ML model. The data sources can be anywhere (on-prem, cloud) and anything (Hive, Hadoop, RDBMS). And of course, every single operation is pushed down into the underlying database (Oracle, Hadoop, etc). To minimise data movement, you can perform all the pre-aggregation and transformations directly in the source database. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now