Jump to content
  • Spotfire Tips & Tricks: Raise the bar with effective Bar charts


    Table of Contents


    Overview

    A famous adage in the data analytics circles is ?Even the most sophisticated data analysis software cannot achieve what a skilled analyst can.? But combining a powerful tool with the knowledge of a skilled analyst can lead to business breakthroughs. One such rapidly growing area of skill is Data Visualization.

    Visual Thinking or the rate at which the retina communicates with the brain is 10 million bits per second [1] compared to verbal thinking, which is roughly about 1000 bits per second - 1/10,000th of visual thinking. This makes it essential in the current data-age to learn about principles of effective data visualization. This is the first of a series of articles which will explore techniques to get the most value out of data graphs: tips for both the end user or the creator of the graphs.

    Bar charts and types

    We start with bar charts, one of the most primitive yet powerful visualization types. These graphs show discrete numeric comparisons across categories; note, the categories can be quantitative as well e.g. binned columns. Bar chart visualisation type is used so often that it is sometimes called a boring visualization. But is it truly boring? We?ll visit this a little way down the article. The history of the humble bar chart goes all the way back to the 1800?s, William Playfair is credited with it?s invention. [2]

    Research shows that the power of bar graphs comes from its ease of interpretability, particularly when it comes to absolute measurements (y-values) across an x variable. [4] Recent advancements in data visualization have created variations of the basic bar chart and extended applicability from absolute measurements to part-to-whole relationships and beyond. A few useful examples of bar chart types are discussed below:

    1. Histograms

    Histograms are discretized representations of continuous distributions and thus the simplest type of bar graph. In Spotfire data panel, selecting a numeric variable shows a quick preview of the histogram of its distribution with the default number of bins = 20. This granularity decides the distribution pattern, and the user is advised to vary the bin width to uncover finer patterns and peaks in the data.

    2. Vertical bar chart

    These are recommended for numeric (or datetime) x-variable to numeric y-variable relationships. As a rule of thumb a bar chart can effectively replace majority of pie charts that have more than three sectors. The Grammar of Graphics by L. Wilkinson points out that a pie chart can actually be thought of as a bar chart in polar coordinates In example below we explore the comparative relationships of shelters available to individuals from the Homeless data set (Reference), and it?s clear to see how bar charts are more interpretable. Note how vertical bar charts also draw your focus to ?absolute? y-values.

    E7vPsKFBEhGm59KkVIP0k3_KNNLo0568hK4VbvMkRzBiH8MUMogJdPmoOCParHRPOpI1X0ZuzE1S12wydpScVPJbcw-_u9ns_14u5z8zOPQ3b6kThM2MZIkqxfi8etMoa1O_w2TM

    60NMZlVCghe6hrROMrPU-8BtCEc91j1PgabzjgsnOY9mVoTqkdW6k4X1_xAAOA098L4wAR6tJSGnaokmsNOuZPr3tCwcFuSAuEB4KYenVTNnomv74Q5tLSaYR6M7jnTse7cpf9d_

    In Spotfire you can create bar charts by clicking on the bars icon in the quick access menu.

    -HMiBQ_FqeL1AKSocXdfc-mYgn5-ayHKJsyPAIUgs0aQOb-qyMgC8b3o-P0Vw0sMfC8Gsr58dcoYOzlWbuX-dN3XXIgh3zwEZRIPUHjzd9Y-VGSwk4DnLTOHt5WyABBesE2nYHTG

    3. Horizontal bar chart

    Going in the direction of zero to positive infinity on the x-axis creates a perception of an increasing quantity. Thus is it recommended that when possible for bar charts, the x-axis be numeric. In the above example we see the categories on the x-axis; to improve this chart in accordance with information visualization theory one should convert it to horizontal bar chart.

    By6buYAKImWwTnpxtKE7jw2hvSTyjGnF9Up0--nzSJwncBojKcWRSxd8LVXp5bcKCytQdPT-wSotooEibYb5DeRFUeTY__yOuT_HIJy_xcAfVMGw4sulYMsW_R1zXI0Skn-sJpa0

    This configuration allows the labels to be configured from left to right, the natural reading order of English, thus providing better readability than vertical bar charts or referring to a legend with a large number of categories.

    In Spotfire you can convert a vertical bar chart to a horizontal one by clicking on the axis and selecting the horizontal orientation from the menu.

    yPscxTob7KLbJuF8PPKE5xZHE82aRvIwKQHCfiA39AD8QMeFZaANY8C-0T-qLf9j4zpjo73jMXsJT_UBWW3nKlF71KlhjZ4R1vX2CG43DIBqhGUDIw6rxZfI2PlAtkbqIwlWQlCg

    4. Stacked bar charts

    So far we have discussed only the x and y dimensions in a graph. Using attributes such as size, color, trellis, etc, it is possible to display more dimensions or variables within the same graph. An example of this is tall type data or data in which a categorical variable is made of other categories each of which may have a measure associated with it. The simplest way to visualize these is through a stacked bar chart.

    I2Ebt2g-YzEDEvU8rLavEZf5jmASoSXXAufy6ddUHTYBQ28STceGLd5dcrd-SssRqJ-j2FTkdtT9ArnbCVtve7BunGk2FCvIrz9yMifJTZ_iXvNftZyd5EQDW6zU5Oj-90qMkIUE

    Stacked bar charts are recommended when the user wants to see the impact of each subcategory (in this case year 2012 or 2013) compared to the total on the value axis. In this configuration, gridlines for value axis are a useful embellishment as they allow the user a quick visual reference point for each bar.

    In Spotfire to add segment labels like above navigate to visualization properties -> Labels, and choose from Value or percentages. You can also define label orientation and other configurations.

    2Wz9ba_RWvGVWqK5gVx4VXDWu-K-Wd8iZhs-wEK5kG1eYYamtqH0QIwD1bEgzmhJALFy6jcsxC65Fn1mpmGEkbcUHJvSAGw4S8hFDGGi03x1R7kYFzQgopaYnWwHVcliHSus60Dj

    5. 100% Stacked bar charts

    Use 100% stacked bar charts to more efficiently show part-to-whole relationships than say a pie chart. Adding this to the same data as above, it can be seen how for smaller unbalanced categories (rapid re-housing in this case) the 100% stacked bar chart shows proportions better.

    v7UGzGiYqqqyH9mECidRQsmIdt5VV5_zy4knI2q0xd6YeZAxNeLq4MXbBqmwBD_Ld6tOs3MdWOedDNLIN-b25_JDKo5P1qwjoRXxhAI3xDG1FXS06r_-PvQSD53dxob_7vpQHlA9

    In many cases 100% stacked bar charts help detect proportion deviations. See, for example, the sales trend data for coffee categories, the 100% stacked bar chart easily reveals the recent increase in sale of ?Novelty? Coffee - Pumpkin spice latte anyone?

    VjWh76IrLhUDigS_Ud65CfCFh67ueqzSNRuOfu5keIHM4NXUhTNISG54v2Kl-L8AJew7YdiYI_umOGfXzhoVxboyUtnFYrMoBY9vyQ4QS4zhfdMNbkv7yaPg3y0MQMEgwvDt0iqm

    This trend can be made more obvious by sorting the segments by ?value? (in Spotfire bar chart visualization Properties -> Appearance -> Sorting).

    6. Side-by-side bar charts

    As the number of subcategories increases (year in this example), the stacked bar chart gets less useful. The segment labels start to distract like chart-junk. In this scenario it is better to use a side-by-side bar chart. This allows same attribute (Shelter type) to be compared simultaneously over multiple subcategories (Year).

    ZmaNPYrnbMM567WK8jTEV-QfxaIHJ7TZaj8DxP2HJ1IEdc7hu_pjOCF6qpwnGBaielAyiIpvNoEiYu7kAXy2mr5A02NApIR4eogk1NkQFird6K_wt89DjWI04qPP4IiOsOsITbPq

    Additional enriching configurations

    1. Sort

    Sorting has been implicitly used in several examples of this post but the following definitively shows the value of sorting in a bar graph. Here category demand for type of hot beverage is shown on the left. On the right the same graph is recreated after sorting for value and including a zoom slider. It immediately becomes clear that about 40% more people in Australia prefer Normal coffee to Chocolate flavored.

    RIegZwrLDCR3pFY3Gj6sCEsO0DNpwmfCIsfhSZz6Dek7KN9PV18gcEvXNLbYcSIp8tp6FmXVLVu2LxFIv7eFe0Rb4tcWC_DZz7_iumhiwQ5ftIOFr7x-cOJtk9YOdHo2VfopwFZ0

    In Spotfire you can sort a bar chart by right-click > sort bars by value. Zoom sliders can be inserted using the Controls menu icon on the visualization title bar.

    TWHZuO3W03OCIURnOz4VVOcVlg2qXj-NsrHf6HXCxoH8TpVYDiIc_MqQsExc0Jv2Bn3Td-6nt5HN4qieoeieNBN1jNQDEOiEQFTgstqtZW3EoD9gLxEEDlIHjK7D2fjrh4aw5OM5

    2. Trellis

    Trellis is a way to extend side-by-side bar charts principle to show the same measure for different entities. Here Sales result for Mobile vendors is shown for 4th Quarter of 2015 and 2016. The measure is the Sale amount, entity is quarter comparison. (Anonymized data from Reference)

    47b-ADjqZRXa1cumfx4aKBfO7_RnKtdW4Wj501cC_9FNO7MUNNKlWPRwjOalkfNSacohwGY5rAo5NZNdjigMoOp-DO5b7wrfhWPxWZ-IKM6ldTv5tTid37seyOgy9219uxAy25Gx

    3. Color

    Let say that Product2 is our product, to emphasize this we can utilize color by choosing a monochrome color scheme with a single complementary color to highlight our product. This is more effective than any text annotation.

    kj4jSYsVkHQ3W9bBU_EtWUrMqlDqxSZsDw4dmhjnXRVZE5RIwtNY8i_1Q2OoWKibHCSzZqdVtmzAZn--8Td1QcUirmrwL6aP38ChA3UfFqzAO_tuyO73Yf_yKIa4vLMurMYmgq1D

    Note how using a monochrome fade I am still able to bring out the relative ordering of the other products from least to most. There is planty more to discuss about color theory, but that discussion is better suited for a future article dedicated to Color Theory and Perception applied to Data Visualization.

    4. Modified grid lines in B?arc graphs

    B?arc is short for bar-arc charts. Most graphs that require circular spatial analysis (pie charts, donut charts etc.) are hard to comprehend. In such graphs information may be encoded in completely different attributes like arcs, areas, angles etc [5] further adding to the debate in reading accuracy of these. The reason being that often graphs with dimensions represented in a circular form, lack reference points for the end user to focus on. Providing these reference points for two KPI examples below can greatly reduce the burden on the user to interpret the graphs.

    E_WI6YWn1Hfp1mCIsU7c-LW08FUjMK7GCSRhrD-2DfVETxKA_Fdrs2RNDTfkZqSYNl02E9WMNmfiQVcJHiFTP81X6lCEbqbCM3l9DWUEWL4lKrFqSkL46hpPSYoaKuf2HtyBNh6wlE2Te8RPdql9_hQgP5EQnQZLhG1llWxEug1pC2B8Vg99eW4rmRwRD-WhVqGfkbHkB7C2KAKGx38OTCDQM8R5PRLbjE6C8bJLeRUyjAF_ZF4YJDp1nd3_VvanHw8mtc7oAJNptavp

    Fig. shows KPI on Overall data quality using gauge chart and KPI on records with business name using donut chart with circular gridlines.

    These graphs have been created in Spotfire using JSViz. Attributes that make above graphs easy to read:

    • Title, scale (0-100), and measure are clearly indicated using text
    • Contrast color is used to give a sense of proportionality
    • Modified grid lines as reference points in graph to the right
    • Single KPI plotted in both to avoid Jastrow?s illusion or similar confusion

    Other variants of the bar chart

    1. Bullet chart

    The bullet chart was created by Stephen Few and is extremely effective in comparative measures against benchmarks. Treat it as being similar to a thermometer that measures temperature using a fixed scale in degrees and compares against benchmark values like 101 degree F. Here the current measure Sale of Product1 (blue bar) is being measured against benchmark of Product2 Sale (red bar) for the 4th Quarter in 2015 and 2016. The absolute difference is reported as a calculated value.

    D6-u4sUykKYjAUqQVglCQPqbE6QiMp39FuEEjX72OSiI1o0dV4A_faiAXjRTlTNSV6IT2wmffVvF4x3mhvExN0x3WW4_4DNNHzrLb8MWzQtqokL3O78q1ry13Ampc1I9oy5jF-N6

    Additional benchmark values can be set against Product1 scale by creating a quantitative gradient instead of the gray background. But in the interest of readability, we discourage excessive embellishment for this particular scenario.

    2. Deviation bar charts

    Deviation bar graphs are centered around 0 and make it easier to differentiate between gains and losses. In the example below, the left graph the shows number of deliveries that the distribution center needs to make and the blue line is the recommended capacity for the number of deliveries. The user needs to estimate from this the delta for each distribution center. A better visualization is one on the right that shows the same data but now the point of reference is recommended capacity. This takes away the burden of finding above-capacity distribution centers from the user.

    RxZwudCauwhCT3ad7V-D4gx2pHD4xdYyVf-0xe0HOa4yM4LBuci8E6iD_0q3jbqdc8fyJ6tXQKAYkz91vzP3PqqMtLp4FN1Ec7hkUkoH8OTLxlV6_bwmRD8pmDMplg6ygfrOqTgm

    Note that color here encodes another variable that is not relevant to the discussion.

    3. Waterfall charts

    Waterfall Charts are similar to deviation charts, but rather than a common reference around the visualization centers, each y-variable point gets its deviation value from the end point of the previous point?s y-value. For example, the graph below shows waterfall chart of difference percent in weekly sales across retail stores. The store ID goes from 1 to 44+, each store?s Sales diff % starts where the previous store?s sales diff % ends.

    cHjzIR_SDr0vwMvRHO24LRQES6d03E5xY5K9qcqPNzit7B_dK47Z8dxuTmrQTo9pelbXyLa4SkmPhQvmjIUufmqL6fFr1fgADzsN7jDpbsGPGoLRs7Oeea0g6rQXnavIMujIrOXu

    Broadly, waterfall charts show how different factors contribute to a final cumulative. Here cross sectional sales difference % leads to a cumulative weekly sales difference of 16%. This chart type is also excellent for comparing y-values for different categories.

    4. Lollipop charts

    Lollipop charts are essentially bar charts with narrow bar width and markers at the y-variable measure. These are often used in place of Gantt Charts and to avoid clutter when using bar charts with a large number of bars as an alternative to zoom slider. Lollipop charts became popular due to the visual aesthetic of having a dot plot with connecting lines to the axis, but the asymmetry often causes more distraction than advantage.

    Common configuration mistakes to avoid

    Bar charts created with the above guidelines will be compelling in conveying observations. But often the chart can be compromised by unproductive configurations. The following are a few common configuration mistakes to avoid:

    1. Dual Y-axis

    See any problem with the chart below?

    60sm9ua41_wuR2TIbNzofSm3GD2bCRO3Ao5_cJbPlTMaR1486E0poVNDhlPV6DcTp8CCDQxm_uHyPyuCkc5JFI0BT5ZfNv3kAiW8g3PwedIgwQfLlYZyzo35gMO4AzZB6BfcQs1j

    According to scale at left, the point of intersection between the two lines is between 625k and 633k, but according to scale on right the intersection is between 639k and 654k. It?s easy to imagine the havoc this would cause if left as is during financial planning. Dual Y-axis are never okay, but while showing trends (in say absolute values, percentages concurrently), if encoded correctly with color and labels they can be less of a cardinal sin.

    2. Non-zero reference axis

    Refer to graph above, see how exaggerated the difference is between 611k and 671k? Here?s the same graph with the origin included. Notice the reduction in absolute difference between the two lines. Subjective of course to the use case, but more often than not starting a graph at a non-zero point can lead to perceptually inaccurate patterns.

    Daj4mf57wgDd1tuJhSloKoFq8Hg2OHuDed7_I1l-0tTo3MQ5ZL2k_kh4UKzBXcK9XvKcdXUdZTOO8zZwKs748-ApuHXFTFFPbDYHdVT0AiWavXm9W6PkwkCKbU-oevIhEZ6NoRmA

    Infact, chopping off the reference zero point is one of the top ways to lie with statistics.

    3. Reduce chart junk

    The role of data visualization for analytics is to make it as easy as possible for the user to assimilate actionable information. Chart junk refers to all visual elements in a chart that are not essential to comprehend the chart, these can be a major distraction and hindrance to understanding the data. So whenever applicable, always clean out the graph area by eliminating unnecessary legend details, axis selectors, excessive labels and annotations. Be sure though to provide descriptive but brief titles and other configuration information.

    When not to use bar charts

    In this article we have discussed several exploratory data analysis and reporting scenarios where bar chart would be the best visualization type. It is also important to understand a few examples of where bar charts are incorrectly used and other visualizations would be better suited:

    1. Unequal intervals in time series

    When working with time series where some y-values are missing, it is often easier to extrapolate the pattern or trend through a line chart in place of a bar chart as in example below. Note, the labels have been hidden to draw focus on pattern.

    FJZgICuTQLUCw-vvUsOL1it-c0DVu4adqSKcYHgX43czXCSRUtCCz_cAkptSuc8x90smj_oUtqzeWBeOFbAsVXlwwfAP7rvBcALCrLfU8KPpPEEKjTR-mS_9sdT9cOJX4q7yxQ4Q

    2. Trends in time

    Similar to the example above, it is easier to perceive trends when they are in line form with labeled values vs. bar charts. However this may not be true for highly seasonal data with a lot of data points in which case bar graphs present a stronger visual (at the cost of data to ink ratio). When lines are plotted, people focus on the pattern formed but for bars, the focus rests on bar length.

    3. Showing change from one time benchmark to another

    A slopegraph usually comprises of two columns of measures corresponding to different time benchmarks for the same categorical variable. The slope of the line connecting these measures gives a sense of increase or decrease of the value. Bar charts are often used to show side-by-side comparisons when a slopegraph could be equally if not more useful.

    Conclusion

    Near the beginning of this article I laid out the question of whether bar charts are boring; to answer this we must define the difference between an infographic and a visualization created as part of an analysis. In an infographic, you are trying to draw the attention of the reader by various bells and whistles, but in an analytical setting the reader is already attentive. So no, the simple and plain-Jane bar chart configured correctly is not boring... Not unless you want to risk losing an important insight just because the chart was overlaid with decorative accents leading to loss of visual attention of the reader.

    To conclude, bar charts are useful because of their ease of understanding from a human cognition perspective without any additional training and the versatility of data and relationships they can encode.

    How do I learn more?

    This summarizes techniques for creating effective bar charts. Watch the page and vote up to get notified about detailed updates. You could also request a featured session on any specific method from above on Dr. Spotfire by:

    See also

    Citations

    [1] Penn Researchers calculate how much the eye tells the brain - link

    [2] M. Friendly (2006), A Brief History of Data Visualization, Springer-Verlag, in Handbook of Computational Statistics: Data Visualization - link

    [3] Playfair, W. (1801). Statistical Breviary; Shewing, on a Principle Entirely New, the Resources of Every State and Kingdom in Europe. London: Wallis. Re-published in Wainer, H. and Spence, I. (eds.), The Commercial and Political Atlas and Statistical Breviary, 2005, Cambridge University Press

    [4] W. Cleveland & R. McGill (1984), Graphical Perception: Theory, Experimentation and Application to the Development of Graphical Methods, Journal of American Statistical Association Vol. 79 - link

    [5] D. Skau & R. Kosara, Arcs, Angles, or Areas: Individual Data Encodings in Pie and Donut Charts, Eurographics Conference on Visualization (EuroVis) 2016 - link


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...