Jump to content
  • StreamBase EventFlow Authoring Practices


     

    Notes on Software Practices

    Note from Steve Barber (sbarber):

    This Wiki page began as a copy of the TIBCO StreamBase Documentation > Authoring Guide > StreamBase Authoring Fundamentals > StreamBase Best Practices page that existed in the StreamBase documentation through version 7.7, but which the TIBCO Streaming product team removed from the 10.0 release line and forward. It's a bit unclear to me why the page is no longer in the documentation set, but it is true that some of the practices had become outdated, both because of the evolution of the product itself, and because subsequent field and customer experience hadn't yet been incorporated.

    Anyone who has worked with me on a project for very long knows that I find the term Best Practices to be deeply problematic, and that I tend to avoid it. I'm happy to outline some practices that have been or will very likely be beneficial to most StreamBase projects. However, each project has a specific context, and the practices may or may not apply. In any case, I've updated the practices based on my own experience.

    Please be aware that the StreamBase EventFlow Authoring Practices described on this page do not exhaust the set of good software development practices applicable to archtitecting, designing, developing, testing, scaling, and maintaining a StreamBase application. There are a wide variety of types of practice to bring to bear in any software development project, and this page just covers the narrow subject of code authoring. Even more important, this page attempts to limit its content to those practices that are specific to StreamBase EventFlow or are an application of a general principle that has a unique or idiomatic expression within EventFlow.

    All that is to say: good software development practice is necessarily much broader than what is on this page. All the good practices out there that apply to any kind of software development apply to EventFlow, and this page does not attempt to detail what should be considered the knowledge common to properly trained and experienced software engineers. There's also an implicit assumption that more general software development knowledge has been obtained elsewhere. For example, this page doesn't repeat such wisdom as Comment Your Code or even Create Functional Unit Tests As You (or Before You) Implement The Code To Be Tested. These software engineering commonplaces are where we start this page from, and are a prerequisite background for creating great EventFlow Applications.

    I recognize that not every new user of EventFlow has this background, and may not know what is missing from this page. In particular, people who came to StreamBase from the point of view of using it solely to apply predictive models to streams might have Ph.D.s in mathematics or physics but don't have a formal background in engineering of robust and scalable distributed systems. That's OK, we software engineers tend to not know much about predictive analytics, so we're even!

    A lot of the more general software engineering principles to start from are laid out in books like:

    • Fundamentals of Software Architecture by Mark Richards and Neal Ford (O'Reilly)
    • Software Engineering at Google by Titus Winters, Tom Manshreck, and Hyrum Wright (O'Reilly)
    • Software Engineering Best Practices: Lessons from Successful Projects in the Top Companies by Capers Jones (McGraw-Hill)

    and many others. If you are new to formal software engineering, these are a reasonable place to look to understand the breadth of the discipline.

    Even more specifically, EventFlow applications tend to be part of solutions that involved distributed messaging and other forms of near-real-time stream processing. They may involve concurrency and parallelism, which is itself a sub-discipline. Even though TIBCO Streaming takes care of a lot of the day to day details of working with such concepts, the fundamentals of these disciplines are background knowledge needed to properly create robust and scalable Streaming applications and not all of that knowledge is covered on this page, by any means.

    Therefore, if you've come to this page looking for a comprehensive set of practices for using EventFlow, there are things that aren't here. Each software development organization hopefully already  has a set of practices and processes they already follow for generalized software development -- these practices are in addition to those or a specialization of some of those; these are not replacements for those and in particular are not intended to stand alone.

    Please consider this page as a living Wiki page and feel free to edit according to your own experience, as well.

    EventFlow Authoring Practice Checklist

    The following table provides a list of practices to consider adopting when authoring EventFlow applications.

    Practice
    ? Summary Description
    Plan for modularization
      Think about the functional areas of the application and how are they related before writing any EventFlow.

    Modularizing helps to ensure that troubleshooting and performance tuning can be done efficiently and easily.

    Keep adapters and connection logic in modules that are separate from the modules that define business logic. This makes it easier to test the application for correctness.

    Define modules based on clear functional boundaries. Use separate modules for functional areas of your application that have any of the following characteristics:

    • Are CPU intensive

    • Need to be duplicated for all permutations of incoming data

    • Perform a vital function that needs to be monitored throughout performance testing

    Establish naming conventions
      Follow a logical naming convention for application components such as modules, dynamic variables, and schemas. For example, in a trading system you might define various algorithms in modules with names such as BollingerBand_Algo.sbappVWAP_Algo.sbapp, and so on.
    Use named schemas
      Specify named schemas and import them from a common interface or module.

    Specify output schemas as well as input schemas.

    Pay attention to schema sizes. Schema sizes can affect application performance.

    Identify where certain data needs to be used in your application. There are many ways to define your schemas, and how you do so can make a difference in how well your application performs and how easily it can be debugged.

    Use table schemas
      Use table schemas when specifying Query Tables. Using table schemas, you can quickly assign the same structure to Query Tables in several modules across a large application.
    Use module imports
      Reuse definitions of constants, named schemas, table schemas, and other importable items. Importing items from one module to another simplifies application design and improves productivity when creating application modules.
    Use interfaces
      Define interfaces for frequently used design patterns. Using StreamBase interfaces, you can create varying implementations of the same set of streams, tables, and schemas.
    Use interface files for shared definitions
      Define importable items in interface files. Define constants and schemas in interface (.sbint) files whose purpose is to maintain those definitions for other modules to use. You could instead use an imported module (.sbapp) for the same purpose; in this case, the definitions module should have a blank canvas with no tuple processing at all. Using an interface file is likely better practice than using a module for this purpose. How many of these definitional interface files to use is a design question for the application architect; the appropriate granularity might be one per fragment project, to start, or they may be split into separate functional areas, for example. There might be definitional interface files that define entities used by multiple projects, but the reuse patterns and development organization structure might also affect decisions about appropriate file scope.
    Use incremental development
      Incrementally introduce logic into the application.

    Do not put a huge system in place in one fell swoop. Assess the impact of additional business logic at each stage of development with unit tests.

    Analysis of performance metrics should provide you with the ability to identify hot spots in your application while you are still in the development phase, making it easier to make improvements. Once hot spots are identified, you can distribute the processing of hot spot operators into separate threads.

    Use module parameters
      Use module parameters to enhance reuse and flexibility.

    Using module parameters and constants helps to make your application more flexible as a reusable module. For example, you can use module parameters to vary the threshold for an alert based on trade volume.

    Design modules so that their parameters can be specified by a human or by another subsystem, depending on the requirements at your particular site.

    For complex applications with several layers of module references, use chaining to pass the same value to inner modules.



    Module parameters can also assist in externalizing values that are environment-specific and/or will contain values whose contents are sensitive and which will be provided by some secured meants (for example, encrypted strings in configurations or substitution variable values) when deployed.

    Use wall-clock time carefully and deliberately, considering functional regression test strategy
      Use wall-clock time in business logic very carefully.

    Using wall-clock time can have an unpredictable impact on latency and application correctness from machine to machine.

    The StreamBase Engine is fast because it processes incoming tuples as they arrive, not based on an internal or wall-clock timer. If you need to ensure that your processing always occurs by a certain time, use the Metronome or Heartbeat operators to send tuples through.



    Using wall-clock time can make it challenging to write repeatable functional regression tests; use a mock time service to make wall-clock based functionality deterministically testable.

    Complete business logic, then optimize for performance
      Complete the core functionality of each section of an application before tuning for performance.

    Do not optimize applications before the critical business logic has been completed. Premature optimization will make the development of the core functionality of an application unnecessarily complicated.



    (This guideline is about the detailed implementation of business logic, and addresses the tendency of software developers to spend time "optimizing" code as they write it, often using intuition or general rules of thumb that often don't bear themselves out when execution performance is actually measured, or -- far too often -- spending time optimizing code that is already more than fast enough for the task at hand. The idea is that only bottlenecks that are actually experienced are worth optimizing. Bottlenecks that are merely imagined rarely are. The question to ask is almost never "could this code possibly be faster?" but rather "is this code fast enough for this application?" For example, shaving 100 microseconds off of event processing logic that is in practice executed once a minute is not going to be worth the cost of the developers' time.



    The overall architecture of the application, however, must be designed from the outset with any required performance targets in mind, as it can be quite time-consuming to have to re-architect overall information flow and distributed interaction patterns after the application has been written once. For example, if the architecture of a solution is to query a relational database across a network multiple times during the processing of an event, it will be impossible as a matter of physics to perform such processing in a context where the required event processing latency is less than a millisecond. No amount of EventFlow code re-arrangement is going to alter the laws of physics.)

    Strongly consider setting timeouts on all Gather operators
      Gather operator input buffers grow until memory exhausted when tuples that have unmatched keys are never gathered

    Keep in mind when designing Gather operations that unmatched input tuples remain buffered at run time. Use one of the timeout options to limit buffer sizes.



    Be aware that if no timeout options are specified, arriving tuples continue to be buffered until a match occurs, which can increase latency and potentially exhaust memory.

    What this means in practice is that unless the input data is absolutely perfect and complete, unmatched tuples will build up in tuple buffers, eventually exhausting the engine process's memory.



    Since it is extraordinarily rare, in practice, for a large data set to be perfect and complete, setting a timeout on each and every Gather operator is highly, highly recommended.

    Avoid StreamBase identifier naming conflicts
      Do not name a stream field the same as any dynamic variable or constant defined in the same module. StreamBase resolves unqualified names first against the names of dynamic variables and constants, and then against the names of fields in currently available streams
    Consider understandability and readability in EventFlow implementation
      Give each EventFlow component a name that is meaningul in the context of the application The default names for EventFlow components tend to be closely related to the type of the component. For example, the first Input Stream in a module is automatically named InputStream. This name does not help anyone (including yourself even a few weeks in the future) understand much about what that component is being used for in the context of the application. For example, when writing an application that processes orders, a more meaningful name for an input stream in that context might be OrdersIn. (There are a few exceptions to this practice -- for example, in practice it can be quite difficult to come up with more meaningful names for Split operators. But this is an exception to the guideline, not an excuse not to follow it generally.)
      Consider splitting up modules so that they have no more than a dozen or two operators EventFlow modules with dozens or even hundreds(!) of operators are very difficult to understand and maintain. As a general guideline, once a module has more than about two dozen operators or one finds oneself having to zoom in from a complete view of the module just to read the names of the operators, it's time to strongly consider splitting the module up into a top-level module and some sub-modules. Complex modules are hard to understand (and difficult to test thoroughly).

     

    Practices to consider for naming for EventFlow components in .sbapp files

    Note: There are quite a number of things in TIBCO Streaming that have names, and the conventions for those different kinds of names may well be different. StreamBase has a set of identifier naming rules that must be followed for all identifiers to be valid (and typecheck), and identifiers are used for several different types of things in StreamBase and EventFlow. For example, EventFlow component names and schema field names both follow the identifier rules, but are (more or less) separate name spaces, and conventions for one might not be identical to conventions for the other. This section assumes that the identifier rules are being followed, and then layer these practice suggestions on top of those specifically for component names. For purposes of this list of practices, an EventFlow component is a thing that appears on the EventFlow Editor canvas: streams, operators and adapters, and data constructs.

    • Good practice: Name your components with names that are meaningful in the context of the application domain or function, so that you and your team can more easily understand the code later, and so that other stakeholders can look at EventFlow modules and fairly easily understand what that module is doing.
    • Bad practice: Leaving the default component names that Studio generates like Filter, Filter1, Filter2, CopyOfFilter, CopyOfCopyOfFilter, etc. This way lies disaster and insanity. Taking 10 seconds to think of and enter a descriptive name will save the team hours and hours over the lifetime of the application.
    • Practice to consider: It might seem redundant to have the type of component as part of its name, for example, if I have a filter that filters out null tuples, should I call it NullTuple and rely on the visual Filter icon for people to understand it? Or should I name it NullTupleFilter? I would go with the latter, because some of our monitoring tools only give the names of components without any information about their type, and it is often useful to know the type of the component on a monitoring display without having to go back into Studio and look it up, especially for people such as operators and administrators that don?t have easy access to or understanding of Studio.
    • Suggestion: Use CamelCase if you want a suggestion for component name style. Otherwise, use what you like, but be consistent with your conventions, and document them for your team. If automatically mapping external names, especially stream names, it might be convenient to conform to the external convention to facilitate automation and traceability.
    • Good practice: EventFlow component name identifier rules tend to discourage anything besides alphanumerics and underscores, though it creates a special kind of identifier using something called Escaped Identifier Syntax when doing so. Just because you can use other kinds of characters in your component names doesn?t mean you should; a lot of the overall Streaming tool set doesn?t work flawlessly with these escaped identifiers, and your life may become difficult if you choose to ignore this advice. Be especially careful about putting spaces in component names; people love to try this, but that practice usually doesn?t end well, and non-alphanumerics/underscore characters are best avoided, tempting though they may be.

    There is some other EventFlow identifier naming guidance in the product documentation, for example: Identifier Naming Rules and Escaped Identifier Syntax.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...