Jump to content
  • Spotfire® Streaming Aggregate Expressions


    Simple Function Expressions

    Let's begin by discussing the difference between Simple Functions and Aggregate Functions in Spotfire® Streaming. Whenever a Spotfire® Streaming expression is evaluated, the inputs to a simple function may be constant expressions, fields from the tuple from the input port that caused this expression to be evaluated, the current value of Dynamic Variables, the current value of local variables created using the Declare action, and Operator, Module, or Container Parameters.­ No matter how complex the expression is, as long as all that's needed to evaluate the expression are those values previously mentioned the expression is a simple function expression. So substring(input1.symbol, 0, 4) and compound_interest(principal, matureValue, numMonths) and price + ${SKEW} are simple function expressions.

    All infix functions (+, -, *, /, %, &&, ||, in) are simple functions.

    Many built-in functions are available as both simple and aggregate functions, for example max and min. The expression max(a, b) is the simple function that returns the larger value a or b. The expression max(a) is the aggregate function which returns the largest value for a over the window it is evaluated in.

    Aggregate Functions

    Aggregate functions operate as if they were evaluated over all of the tuples in an aggregation context. Examples of aggregation contexts include windows defined by an aggregate operator's dimensions and group-by expressions, result sets of a query operator read (there may be a result set per group). So if the aggregate expression was avg(price) and the aggregation context occurred over this set of tuples with schema {symbol : string, price : double, quantity : int}: {{IBM,  60.0, 100}, {GOOG, 100.0, 200}, {MSFT, 170.0, 300}} the result would be the average of 60.0, 100.0 and 170.0 which is 110.0.

    The phrase "as if they were evaluated over all of the tuples in an aggregation context" was used because, while that is the effect, that isn't precisely how aggregate function evaluation is done in Spotfire® Streaming. To minimize memory use and increase performance, aggregate functions are calculated incrementally. In the previous example, the tuples in the window are not stored in the context of the aggregation -- each tuple is processed individually, and the avg aggregation function instance stores only the sum and count of tuples so far processed by the aggregation.

    When a tuple arrives, the avg aggregate function's increment method is called with the price field of the current tuple. The avg function increments the sample count and the sum of the values. When a tuple is emitted due to either the window closing or an intermediate emission the avg functions calculate method is called. It divides the sum by the count and returns the average.

    While most aggregate functions can be calculated with only a small number of state values, there are aggregate functions that need every value for their calculation. For example, the median function needs to save every value. This doesn't mean that every tuple in the window is stored, just every value passed into the median aggregate function.

    Complex Aggregate Expressions

    When there are multiple aggregate functions in an expression or a mix of aggregate functions and simple functions and constants, they are evaluated in the same way that a single aggregate function is evaluated. For each tuple in the window the arguments to the increment method are evaluated and called for every aggregate function in the expression. When the tuple is emitted, the calculate function returns the result of every aggregate function. So aggregate functions can be mixed with simple functions and constants as long as the arguments to any simple functions are only aggregate functions or constants.

    It might be easier to consider what would make a complex aggregate expression illegal. When the aggregate increment method is called, all of the values for the current tuple are known, but when calculate method is called, there is no current tuple. So any expression that references a tuple field or dynamic variable outside an aggregate function is an illegal aggregate expression.

    Aggregate Operator

    The aggregate operator's predicate windows open expression must be a simple expression, since the decision to open a new window must take place outside a window. The emit, close, emit all, and close all expressions are aggregate expressions since they need to be able to determine things like the number of tuples in the window. For these aggregate expressions, the increment and calculate methods are invoked for every tuple entering the aggregate operator.

    While it does not have any bearing on the aggregate expressions themselves, it should be noted that if a predicate dimension has both an emit and close expression, if the close expression is true and the emit expression is false, the window is closed, but a tuple is not emitted. All other dimensions emit tuples when the close condition occurs regardless of the emit condition.

    Aggregate expressions in the Aggregate Functions tab refer to the current input tuple using either input1 or input. There are currently no operators that have aggregate expressions that have more than 1 input port. The current input tuple is not an aggregate expression because the value of the current input tuple changes with each new tuple. To refer to fields in the input tuple an aggregate expression must be used, such as firstval(input1.price) or max(input.price).

    The Emit and close aggregate expressions treat the current input tuple differently than aggregate expressions in either the Aggregate Operator's Aggregate Functions tab or in the Query Operator. The current input tuple can be referred to with the prefix input1.

    Aggregate Operator Valid Expressions

    firstval(price + 1)
    
    firstval(price) + 1
    
    if(sum(quantity) < 1000) then avg(price) else ${MAX_PRICE}
    
    avg(price*quantity)/sum(quantity)
    
    ln(avg(price))
    
    avg(ln(price))
    
    avg(price + DYN_VAR)
    
    avg(price) + lastval(DYN_VAR)

    Aggregate Operator Invalid Expressions

    avg(price) + DYN_VAR

    Dynamic variables are not constants for aggregate operator

    Price

    Tuple fields are not constant for the aggregate operator

    firstval(price) + log10(input1.quantity)

    log10 is a simple function

    if(price > 1000.0) then{MAX_PRICE} else max(price)

    price > 1000.0 has field not in aggregate function

    Another difference is inside emit and close expressions input1 and last are aggregate expressions referring to the most current tuple. The expression sum(input1.price) > input1.maxPrice is a legal emit or close predicate expression. Input1 and last can be used as if they were not aggregate expressions also, so the previous expression is exactly the same as sum(input1.price) > lastval(input1.maxPrice).

    If this is confusing, then ignore last and the fact that the current tuple is an aggregate expression in emit and close expressions and use input1 exactly the same way as it's used in other Aggregate Operator aggregate expressions and use lastval(input1.field) rather than simply input1.field.

    Predicate Emit and Close Valid Expressions

    count() == 3 /* creates a window size of 3 tuples */
    
    sum(input1.price) > input1.price
    
    sum(input1.price) > firstval(input1.price)

    Query Operator Result Sets

    The major difference in aggregate expressions between the Query Operator and the Aggregate Operator is what is considered to be a constant. The tuple that causes a query operator to perform a read cannot change while aggregate functions are applied to the results of the query, so the tuple fields are considered to be constants in the aggregate expressions in a query operator. Since no tuples are emitted until the aggregate expressions are evaluated, there is no opportunity for dynamic variables to change. So dynamic variables are also considered constant in aggregate expressions in query operators.

    Query Operator Group By

    The Group Options tab in the query operator is used to specify groups of query results over which aggregate expressions will be evaluated. For example, if a query reads all rows from an order book with depth, the group by expressions could group by instrument name to compute statistics for each instrument. So group by aggregate expressions are like other query result aggregate expressions except the group expression is a constant in the aggregate expression and one tuple will be emitted per group rather than just one tuple emitted for the entire query.

    Query Operator Valid Aggregate Expressions

    avg(price) + DYN_VAR
    
    input1.price
    
    firstval(price) + log10(input1.quantity)
    
    if(price > 1000.0) then{MAX_PRICE} else max(price)

    Query Operator (Read) Invalid Aggregate Expressions

    price

    can't tell if it's input1.price or current.price (also an invalid simple expression)

    current.price

    not a constant, needs to be part of an aggregate expression


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...