Jump to content
  • Institutional Holdings Analysis


    This article demonstrates how to use Spotfire Data Science - Team Studio, to predict the behavior of large institutional investors with respect to particular holdings - whether they will buy in or sell out of those holdings on a quarter-by-quarter basis.

    Data Requirements

    We use two Quandl APIs as our source data, Institutional Holdings Collection (IHC) and Institutional Holders Metrics (IHM). IHC contains information for each stock ticker on which institutional investors own its shares, the size and value of those holdings, and recent trends in shares bought or sold over the last few quarters. IHM contains a selection of calculated metrics for institutional and insider shareholders in terms of shares held, the value of holdings, rotation between sellers and buyers, concentration of shares held, and shares held according to investing styles. Both sets of information come primarily from SEC Form 4, and Forms 13 D, G, and F.

    From Raw Quandl Data to Predictive Models

    screen_shot_2017-06-30_at_1_11.40_pm.thumb.png.5cc9e0b5c7b0dcd371446577a1900351.png

    A complete API pull, ETL, modeling, and evaluation pipeline in one workflow.

    This Playbook contains one workflow that comprises the whole lifecycle of data pulled from Quandl to trained predictive models. Off the raw IHC data, which just contains the position held by each institution each quarter, we compute the quarterly change in ownership, and the size of that change in ownership normalized by the total assets under management for the institution. We then create a categorical variable that we'd like to predict - the direction of the quarter over quarter change in ownership. We then apply three classification operators, Logistic Regression, Decision Tree, and Naive Bayes, of which Logistic Regression has the best performance with 66% classification accuracy.

    Key Technique - API Integration

    The Quandl operators pull live data from Quandl's APIs and persist it to HDFS for processing and modeling. With this workflow scheduled as a job, the Quandl operator will always pull the most recent data off the API. Once the data are in Hadoop they can be joined and manipulated like any other dataset. One can take a similar approach with other services that expose public APIs, such as Salesforce, Stripe, Twilio, and others.

    Check It Out!

    For access to this Playbook, including its workflows, sample data, a PowerPoint summary, and expert support from Spotfire® Data Science data scientists, contact your Spotfire® Data Science sales representative.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...