DataVisualisation_AHandbookforDataDrivenDesign.pdf

Data Visualisation

2

3

Data VisualisationA Handbook for Data Driven Design

Andy Kirk

4

SAGE Publications Ltd

1 Oliver’s Yard

55 City Road

London EC1Y 1SP

SAGE Publications Inc.

2455 Teller Road

Thousand Oaks, California 91320

SAGE Publications India Pvt Ltd

B 1/I 1 Mohan Cooperative Industrial Area

Mathura Road

New Delhi 110 044

SAGE Publications Asia-Pacific Pte Ltd

3 Church Street

#10-04 Samsung Hub

Singapore 049483

5

© Andy Kirk 2016

First published 2016

Apart from any fair dealing for the purposes of research or private study,or criticism or review, as permitted under the Copyright, Designs andPatents Act, 1988, this publication may be reproduced, stored ortransmitted in any form, or by any means, only with the prior permissionin writing of the publishers, or in the case of reprographic reproduction, inaccordance with the terms of licences issued by the Copyright LicensingAgency. Enquiries concerning reproduction outside those terms should besent to the publishers.

Library of Congress Control Number: 2015957322

British Library Cataloguing in Publication data

A catalogue record for this book is available from the British Library

ISBN 978-1-4739-1213-7

ISBN 978-1-4739-1214-4 (pbk)

Editor: Mila Steele

Editorial assistant: Alysha Owen

Production editor: Ian Antcliff

Marketing manager: Sally Ransom

Cover design: Shaun Mercier

Typeset by: C&M Digitals (P) Ltd, Chennai, India

Printed and bound in Great Britain by Bell and Bain Ltd, Glasgow

6

ContentsList of Figures with Source NotesAcknowledgementsAbout the AuthorINTRODUCTIONPART A FOUNDATIONS

1 Defining Data Visualisation2 Visualisation Workflow

PART B THE HIDDEN THINKING3 Formulating Your Brief4 Working With Data5 Establishing Your Editorial Thinking

PART C DEVELOPING YOUR DESIGN SOLUTION6 Data Representation7 Interactivity8 Annotation9 Colour10 Composition

PART D DEVELOPING YOUR CAPABILITIES11 Visualisation Literacy

ReferencesIndex

7

List of Figures with Source Notes1.1 A Definition for Data Visualisation 191.2 Per Capita Cheese Consumption in the U.S., by Sarah Slobin(Fortune magazine) 201.3 The Three Stages of Understanding 221.4–6 Demonstrating the Process of Understanding 24–271.7 The Three Principles of Good Visualisation Design 301.8 Housing and Home Ownership in the UK, by ONS DigitalContent Team 331.9 Falling Number of Young Homeowners, by the Daily Mail 331.10 Gun Deaths in Florida (Reuters Graphics) 341.11 Iraq’s Bloody Toll, by Simon Scarr (South China Morning Post)341.12 Gun Deaths in Florida Redesign, by Peter A. Fedewa(@pfedewa) 351.13 If Vienna would be an Apartment, by NZZ (Neue ZürcherZeitung) [Translated] 451.14 Asia Loses Its Sweet Tooth for Chocolate, by GraphicsDepartment (Wall Street Journal) 452.1 The Four Stages of the Visualisation Workflow 543.1 The ‘Purpose Map’ 763.2 Mizzou’s Racial Gap Is Typical On College Campuses, byFiveThirtyEight 773.3 Image taken from ‘Wealth Inequality in America’, by YouTubeuser ‘Politizane’ (www.youtube.com/watch?v=QPKKQnijnsM) 783.4 Dimensional Changes in Wood, by Luis Carli (luiscarli.com) 793.5 How Y’all, Youse and You Guys Talk, by Josh Katz (The NewYork Times) 803.6 Spotlight on Profitability, by Krisztina Szücs 813.7 Countries with the Most Land Neighbours 833.8 Buying Power: The Families Funding the 2016 PresidentialElection, by Wilson Andrews, Amanda Cox, Alicia DeSantis, EvanGrothjan, Yuliya Parshina-Kottas, Graham Roberts, Derek Watkinsand Karen Yourish (The New York Times) 843.9 Image taken from ‘Texas Department of Criminal Justice’Website(www.tdcj.state.tx.us/death_row/dr_executed_offenders.html) 86

8

3.10 OECD Better Life Index, by Moritz Stefaner, Dominikus Baur,Raureif GmbH 893.11 Losing Ground, by Bob Marshall, The Lens, Brian Jacobs andAl Shaw (ProPublica) 893.12 Grape Expectations, by S. Scarr, C. Chan, and F. Foo (ReutersGraphics) 913.13 Keywords and Colour Swatch Ideas from Project aboutPsychotherapy Treatment in the Arctic 923.14 An Example of a Concept Sketch, by Giorgia Lupi of Accurat 924.1 Example of a Normalised Dataset 994.2 Example of a Cross-tabulated Dataset 1004.3 Graphic Language: The Curse of the CEO, by David Ingold andKeith Collins (Bloomberg Visual Data), Jeff Green (BloombergNews) 1014.4 US Presidents by Ethnicity (1789 to 2015) 1144.5 OECD Better Life Index, by Moritz Stefaner, Dominikus Baur,Raureif GmbH 1164.6 Spotlight on Profitability, by Krisztina Szücs 1174.7 Example of ‘Transforming to Convert’ Data 1194.8 Making Sense of the Known Knowns 1234.9 What Good Marathons and Bad Investments Have in Common,by Justin Wolfers (The New York Times) 1245.1 The Fall and Rise of U.S. Inequality, in Two Graphs Source:World Top Incomes Database; Design credit: Quoctrung Bui (NPR)1365.2–4 Why Peyton Manning’s Record Will Be Hard to Beat, byGregor Aisch and Kevin Quealy (The New York Times) 138–140C.1 Mockup Designs for ‘Poppy Field’, by Valentina D’Efilippo(design); Nicolas Pigelet (code); Data source: The Polynational WarMemorial, 2014 (poppyfield.org) 1466.1 Mapping Records and Variables on to Marks and Attributes 1526.2 List of Mark Encodings 1536.3 List of Attribute Encodings 1536.4 Bloomberg Billionaires, by Bloomberg Visual Data (Design anddevelopment), Lina Chen and Anita Rundles (Illustration) 1556.5 Lionel Messi: Games and Goals for FC Barcelona 1566.6 Image from the Home page of visualisingdata.com 1566.7 How the Insane Amount of Rain in Texas Could Turn RhodeIsland Into a Lake, by Christopher Ingraham (The Washington Post)156

9

6.8 The 10 Actors with the Most Oscar Nominations but No Wins1616.9 The 10 Actors who have Received the Most Oscar Nominations1626.10 How Nations Fare in PhDs by Sex Interactive, by Periscopic;Research by Amanda Hobbs; Published in Scientific American 1636.11 Gender Pay Gap US, by David McCandless, Miriam Quick(Research) and Philippa Thomas (Design) 1646.12 Who Wins the Stanley Cup of Playoff Beards? by GraphicsDepartment (Wall Street Journal) 1656.13 For These 55 Marijuana Companies, Every Day is 4/20, by AlexTribou and Adam Pearce (Bloomberg Visual Data) 1666.14 UK Public Sector Capital Expenditure, 2014/15 1676.15 Global Competitiveness Report 2014–2015, by Bocoup and theWorld Economic Forum 1686.16 Excerpt from a Rugby Union Player Dashboard 1696.17 Range of Temperatures (°F) Recorded in the Top 10 MostPopulated Cities During 2015 1706.18 This Chart Shows How Much More Ivy League Grads MakeThan You, by Christopher Ingraham (The Washington Post) 1716.19 Comparing Critics Scores (Rotten Tomatoes) for Major MovieFranchises 1726.20 A Career in Numbers: Movies Starring Michael Caine 1736.21 Comparing the Frequency of Words Used in Chapter 1 of thisBook 1746.22 Summary of Eligible Votes in the UK General Election 20151756.23 The Changing Fortunes of Internet Explorer and Google Chrome1766.24 Literarcy Proficiency: Adult Levels by Country 1776.25 Political Polarization in the American Public’, Pew ResearchCenter, Washington, DC (February, 2015) (http://www.people-press.org/2014/06/12/political-polarization-in-the-american-public/)1786.26 Finviz (www.finviz.com) 1796.27 This Venn Diagram Shows Where You Can Both Smoke Weedand Get a Same-Sex Marriage, by Phillip Bump (The WashingtonPost) 1806.28 The 200+ Beer Brands of SAB InBev, by Maarten Lambrechtsfor Mediafin: www.tijd.be/sabinbev (Dutch),

10

www.lecho.be/service/sabinbev (French) 1816.29 Which Fossil Fuel Companies are Most Responsible for ClimateChange? by Duncan Clark and Robin Houston (Kiln), published inthe Guardian, drawing on work by Mike Bostock and Jason Davies1826.30 How Long Will We Live – And How Well? by BonnieBerkowitz, Emily Chow and Todd Lindeman (The Washington Post)1836.31 Crime Rates by State, by Nathan Yau 1846.32 Nutrient Contents – Parallel Coordinates, by Kai Chang(@syntagmatic) 1856.33 How the ‘Avengers’ Line-up Has Changed Over the Years, byJon Keegan (Wall Street Journal) 1866.34 Interactive Fixture Molecules, by @experimental361 and@bootifulgame 1876.35 The Rise of Partisanship and Super-cooperators in the U.S.House of Representatives. Visualisation by Mauro Martino, authoredby Clio Andris, David Lee, Marcus J. Hamilton, Mauro Martino,Christian E. Gunning, and John Armistead Selde 1886.36 The Global Flow of People, by Nikola Sander, Guy J. Abel andRamon Bauer 1896.37 UK Election Results by Political Party, 2010 vs 2015 1906.38 The Fall and Rise of U.S. Inequality, in Two Graphs. Source:World Top Incomes Database; Design credit: Quoctrung Bui (NPR)1916.39 Census Bump: Rank of the Most Populous Cities at EachCensus, 1790–1890, by Jim Vallandingham 1926.40 Coal, Gas, Nuclear, Hydro? How Your State Generates Power.Source: U.S. Energy Information Administration, Credit: ChristopherGroskopf, Alyson Hurt and Avie Schneider (NPR) 1936.41 Holdouts Find Cheapest Super Bowl Tickets Late in the Game,by Alex Tribou, David Ingold and Jeremy Diamond (BloombergVisual Data) 1946.42 Crude Oil Prices (West Texas Intermediate), 1985–2015 1956.43 Percentage Change in Price for Select Food Items, Since 1990,by Nathan Yau 1966.44 The Ebb and Flow of Movies: Box Office Receipts 1986–2008,by Mathew Bloch, Lee Byron, Shan Carter and Amanda Cox (TheNew York Times) 1976.45 Tracing the History of N.C.A.A. Conferences, by Mike Bostock,

11

Shan Carter and Kevin Quealy (The New York Times) 1986.46 A Presidential Gantt Chart, by Ben Jones 1996.47 How the ‘Avengers’ Line-up Has Changed Over the Years, byJon Keegan (Wall Street Journal) 2006.48 Native and New Berliners – How the S-Bahn Ring Divides theCity, by Julius Tröger, André Pätzold, David Wendler (BerlinerMorgenpost) and Moritz Klack (webkid.io) 2016.49 How Y’all, Youse and You Guys Talk, by Josh Katz (The NewYork Times) 2026.50 Here’s Exactly Where the Candidates Cash Came From, by ZachMider, Christopher Cannon, and Adam Pearce (Bloomberg VisualData) 2036.51 Trillions of Trees, by Jan Willem Tulp 2046.52 The Racial Dot Map. Image Copyright, 2013, Weldon CooperCenter for Public Service, Rector and Visitors of the University ofVirginia (Dustin A. Cable, creator) 2056.53 Arteries of the City, by Simon Scarr (South China MorningPost) 2066.54 The Carbon Map, by Duncan Clark and Robin Houston (Kiln)2076.55 Election Dashboard, by Jay Boice, Aaron Bycoffe and AndreiScheinkman (Huffington Post). Statistical model created by SimonJackman 2086.56 London is Rubbish at Recycling and Many Boroughs are GettingWorse, by URBS London using London Squared Map © 2015www.aftertheflood.co 2096.57 Automating the Design of Graphical Presentations of RelationalInformation. Adapted from McKinlay, J. D. (1986). ACMTransactions on Graphics, 5(2), 110–141. 2136.58 Comparison of Judging Line Size vs Area Size 2136.59 Comparison of Judging Related Items Using Variation in Colour(Hue) vs Variation in Shape 2146.60 Illustrating the Correct and Incorrect Circle Size Encoding 2166.61 Illustrating the Distortions Created by 3D Decoration 2176.62 Example of a Bullet Chart using Banding Overlays 2186.63 Excerpt from What’s Really Warming the World? by EricRoston and Blacki Migliozzi (Bloomberg Visual Data) 2186.64 Example of Using Markers Overlays 2196.65 Why Is Her Paycheck Smaller? by Hannah Fairfield and GrahamRoberts (The New York Times) 219

12

6.66 Inside the Powerful Lobby Fighting for Your Right to Eat Pizza,by Andrew Martin and Bloomberg Visual Data 2206.67 Excerpt from ‘Razor Sales Move Online, Away From Gillette’,by Graphics Department (Wall Street Journal) 2207.1 US Gun Deaths, by Periscopic 2257.2 Finviz (www.finviz.com) 2267.3 The Racial Dot Map: Image Copyright, 2013, Weldon CooperCenter for Public Service, Rector and Visitors of the University ofVirginia (Dustin A. Cable, creator) 2277.4 Obesity Around the World, by Jeff Clark 2287.5 Excerpt from ‘Social Progress Index 2015’, by Social ProgressImperative, 2015 2287.6 NFL Players: Height & Weight Over Time, by Noah Veltman(noahveltman.com) 2297.7 Excerpt from ‘How Americans Die’, by Matthew C. Klein andBloomberg Visual Data 2307.8 Model Projections of Maximum Air Temperatures Near theOcean and Land Surface on the June Solstice in 2014 and 2099:NASA Earth Observatory maps, by Joshua Stevens 2317.9 Excerpt from ‘A Swing of Beauty’, by Sohail Al-Jamea, WilsonAndrews, Bonnie Berkowitz and Todd Lindeman (The WashingtonPost) 2317.10 How Well Do You Know Your Area? by ONS Digital Contentteam 2327.11 Excerpt from ‘Who Old Are You?’, by David McCandless andTom Evans 2337.12 512 Paths to the White House, by Mike Bostock and Shan Carter(The New York Times) 2337.13 OECD Better Life Index, by Moritz Stefaner, Dominikus Baur,Raureif GmbH 2337.14 Nobel Laureates, by Matthew Weber (Reuters Graphics) 2347.15 Geography of a Recession, by Graphics Department (The NewYork Times) 2347.16 How Big Will the UK Population be in 25 Years Time? by ONSDigital Content team 2347.17 Excerpt from ‘Workers’ Compensation Reforms by State’, byYue Qiu and Michael Grabell (ProPublica) 2357.18 Excerpt from ‘ECB Bank Test Results’, by Monica Ulmanu,Laura Noonan and Vincent Flasseur (Reuters Graphics) 2367.19 History Through the President’s Words, by Kennedy Elliott, Ted

13

Mellnik and Richard Johnson (The Washington Post) 2377.20 Excerpt from ‘How Americans Die’, by Matthew C. Klein andBloomberg Visual Data 2377.21 Twitter NYC: A Multilingual Social City, by James Cheshire,Ed Manley, John Barratt, and Oliver O’Brien 2387.22 Killing the Colorado: Explore the Robot River, by AbrahmLustgarten, Al Shaw, Jeff Larson, Amanda Zamora and LaurenKirchner (ProPublica) and John Grimwade 2387.23 Losing Ground, by Bob Marshall, The Lens, Brian Jacobs andAl Shaw (ProPublica) 2397.24 Excerpt from ‘History Through the President’s Words’, byKennedy Elliott, Ted Mellnik and Richard Johnson (The WashingtonPost) 2407.25 Plow, by Derek Watkins 2427.26 The Horse in Motion, by Eadweard Muybridge. Source: UnitedStates Library of Congress’s Prints and Photographs division, digitalID cph.3a45870. 2438.1 Titles Taken from Projects Published and Credited Elsewhere inThis Book 2488.2 Excerpt from ‘The Color of Debt: The Black NeighborhoodsWhere Collection Suits Hit Hardest’, by Al Shaw, Annie Waldmanand Paul Kiel (ProPublica) 2498.3 Excerpt from ‘Kindred Britain’ version 1.0 © 2013 NicholasJenkins – designed by Scott Murray, powered by SUL-CIDR 2498.4 Excerpt from ‘The Color of Debt: The Black NeighborhoodsWhere Collection Suits Hit Hardest’, by Al Shaw, Annie Waldmanand Paul Kiel (ProPublica) 2508.5 Excerpt from ‘Bloomberg Billionaires’, by Bloomberg VisualData (Design and development), Lina Chen and Anita Rundles(Illustration) 2518.6 Excerpt from ‘Gender Pay Gap US?’, by David McCandless,Miriam Quick (Research) and Philippa Thomas (Design) 2518.7 Excerpt from ‘Holdouts Find Cheapest Super Bowl Tickets Latein the Game’, by Alex Tribou, David Ingold and Jeremy Diamond(Bloomberg Visual Data) 2528.8 Excerpt from ‘The Life Cycle of Ideas’, by Accurat 2528.9 Mizzou’s Racial Gap Is Typical On College Campuses, byFiveThirtyEight 2538.10 Excerpt from ‘The Infographic History of the World’, HarperCollins (2013); by Valentina D’Efilippo (co-author and designer);

14

James Ball (co-author and writer); Data source: The Polynational WarMemorial, 2012 2548.11 Twitter NYC: A Multilingual Social City, by James Cheshire,Ed Manley, John Barratt, and Oliver O’Brien 2558.12 Excerpt from ‘US Gun Deaths’, by Periscopic 2558.13 Image taken from Wealth Inequality in America, by YouTubeuser ‘Politizane’ (www.youtube.com/watch?v=QPKKQnijnsM) 2569.1 HSL Colour Cylinder: Image from Wikimedia Commonspublished under the Creative Commons Attribution-Share Alike 3.0Unported license 2659.2 Colour Hue Spectrum 2659.3 Colour Saturation Spectrum 2669.4 Colour Lightness Spectrum 2669.5 Excerpt from ‘Executive Pay by the Numbers’, by Karl Russell(The New York Times) 2679.6 How Nations Fare in PhDs by Sex Interactive, by Periscopic;Research by Amanda Hobbs; Published in Scientific American 2689.7 How Long Will We Live – And How Well? by BonnieBerkowitz, Emily Chow and Todd Lindeman (The Washington Post)2689.8 Charting the Beatles: Song Structure, by Michael Deal 2699.9 Photograph of MyCuppa mug, by Suck UK(www.suck.uk.com/products/mycuppamugs/) 2699.10 Example of a Stacked Bar Chart Based on Ordinal Data 2709.11 Rim Fire – The Extent of Fire in the Sierra Nevada Range andYosemite National Park, 2013: NASA Earth Observatory images, byRobert Simmon 2709.12 What are the Current Electricity Prices in Switzerland[Translated], by Interactive things for NZZ (the Neue ZürcherZeitung) 2719.13 Excerpt from ‘Obama’s Health Law: Who Was Helped Most’,by Kevin Quealy and Margot Sanger-Katz (The New York Times) 2729.14 Daily Indego Bike Share Station Usage, by Randy Olson(@randal_olson)(http://www.randalolson.com/2015/09/05/visualizing-indego-bike-share-usage-patterns-in-philadelphia-part-2/) 2729.15 Battling Infectious Diseases in the 20th Century: The Impact ofVaccines, by Graphics Department (Wall Street Journal) 2739.16 Highest Max Temperatures in Australia (1st to 14th January2013), Produced by the Australian Government Bureau of

15

Meteorology 2749.17 State of the Polar Bear, by Periscopic 2759.18 Excerpt from Geography of a Recession by GraphicsDepartment (The New York Times) 2759.19 Fewer Women Run Big Companies Than Men Named John, byJustin Wolfers (The New York Times) 2769.20 NYPD, Council Spar Over More Officers by GraphicsDepartment (Wall Street Journal) 2779.21 Excerpt from a Football Player Dashboard 2779.22 Elections Performance Index, The Pew Charitable Trusts © 20142789.23 Art in the Age of Mechanical Reproduction: Walter Benjamin byStefanie Posavec 2799.24 Casualties, by Stamen, published by CNN 2799.25 First Fatal Accident in Spain on a High-speed Line [Translated],by Rodrigo Silva, Antonio Alonso, Mariano Zafra, Yolanda Clementeand Thomas Ondarra (El Pais) 2809.26 Lunge Feeding, by Jonathan Corum (The New York Times);whale illustration by Nicholas D. Pyenson 2819.27 Examples of Common Background Colour Tones 2819.28 Excerpt from NYC Street Trees by Species, by Jill Hubley 2849.29 Demonstrating the Impact of Red-green Colour Blindness(deuteranopia) 2869.30 Colour-blind Friendly Alternatives to Green and Red 2879.31 Excerpt from, ‘Pyschotherapy in The Arctic’, by Andy Kirk 2899.32 Wind Map, by Fernanda Viégas and Martin Wattenberg 28910.1 City of Anarchy, by Simon Scarr (South China Morning Post)29410.2 Wireframe Sketch, by Giorgia Lupi for ‘Nobels no degree’ byAccurat 29510.3 Example of the Small Multiples Technique 29610.4 The Glass Ceiling Persists Redesign, by Francis Gagnon(ChezVoila.com) based on original by S. Culp (Reuters Graphics)29710.5 Fast-food Purchasers Report More Demands on Their Time, byEconomic Research Service (USDA) 29710.6 Stalemate, by Graphics Department (Wall Street Journal) 29710.7 Nobels No Degrees, by Accurat 29810.8 Kasich Could Be The GOP’s Moderate Backstop, byFiveThirtyEight 298

16

10.9 On Broadway, by Daniel Goddemeyer, Moritz Stefaner,Dominikus Baur, and Lev Manovich 29910.10 ER Wait Watcher: Which Emergency Room Will See You theFastest? by Lena Groeger, Mike Tigas and Sisi Wei (ProPublica) 30010.11 Rain Patterns, by Jane Pong (South China Morning Post) 30010.12 Excerpt from ‘Pyschotherapy in The Arctic’, by Andy Kirk 30110.13 Gender Pay Gap US, by David McCandless, Miriam Quick(Research) and Philippa Thomas (Design) 30110.14 The Worst Board Games Ever Invented, by FiveThirtyEight30310.15 From Millions, Billions, Trillions: Letters from Zimbabwe,2005−2009, a book written and published by Catherine Buckle(2014), table design by Graham van de Ruit (pg. 193) 30310.16 List of Chart Structures 30410.17 Illustrating the Effect of Truncated Bar Axis Scales 30510.18 Excerpt from ‘Doping under the Microscope’, by S. Scarr andW. Foo (Reuters Graphics) 30610.19 Record-high 60% of Americans Support Same-sex Marriage,by Gallup 30610.20 Images from Wikimedia Commons, published under theCreative Commons Attribution-Share Alike 3.0 Unported license 30811.1–7 The Pursuit of Faster’ by Andy Kirk and Andrew Witherley318–324

17

Acknowledgements

This book has been made possible thanks to the unwavering support of myincredible wife, Ellie, and the endless encouragement from my Mum andDad, the rest of my brilliant family and my super group of friends.

From a professional standpoint I also need to acknowledge thefundamental role played by the hundreds of visualisation practitioners (nomatter under what title you ply your trade) who have created such a wealthof brilliant work from which I have developed so many of my convictionsand formed the basis of so much of the content in this book. The peopleand organisations who have provided me with permission to use their workare heroes and I hope this book does their rich talent justice.

18

About the Author

Andy Kirkis a freelance data visualisation specialist based in Yorkshire, UK. Heis a visualisation design consultant, training provider, teacher,researcher, author, speaker and editor of the award-winning websitevisualisingdata.comAfter graduating from Lancaster University in 1999 with a BSc(hons) in Operational Research, Andy held a variety of businessanalysis and information management positions at organisationsincluding West Yorkshire Police and the University of Leeds.He discovered data visualisation in early 2007 just at the time whenhe was shaping up his proposal for a Master’s (MA) ResearchProgramme designed for members of staff at the University of Leeds.On completing this programme with distinction, Andy’s passion forthe subject was unleashed. Following his graduation in December2009, to continue the process of discovering and learning the subjecthe launched visualisingdata.com, a blogging platform that wouldchart the ongoing development of the data visualisation field. Overtime, as the field has continued to grow, the site too has reflected this,becoming one of the most popular in the field. It features a widerange of fresh content profiling the latest projects and contemporarytechniques, discourse about practical and theoretical matters,commentary about key issues, and collections of valuable referencesand resources.In 2011 Andy became a freelance professional focusing on datavisualisation consultancy and training workshops. Some of his clientsinclude CERN, Arsenal FC, PepsiCo, Intel, Hershey, the WHO andMcKinsey. At the time of writing he has delivered over 160 publicand private training events across the UK, Europe, North America,Asia, South Africa and Australia, reaching well over 3000 delegates.In addition to training workshops Andy also has two academicteaching positions. He joined the highly respected Maryland InstituteCollege of Art (MICA) as a visiting lecturer in 2013 and has beenteaching a module on the Information Visualisation Master’sProgramme since its inception. In January 2016, he began teaching adata visualisation module as part of the MSc in Business Analytics atthe Imperial College Business School in London.

19

Between 2014 and 2015 Andy was an external consultant on aresearch project called ‘Seeing Data’, funded by the Arts &Humanities Research Council and hosted by the University ofSheffield. This study explored the issues of data visualisation literacyamong the general public and, among many things, helped to shapean understanding of the human factors that affect visualisationliteracy and the effectiveness of design.

20

Introduction

I.1 The Quest BeginsIn his book The Seven Basic Plots, author Christopher Booker investigatedthe history of telling stories. He examined the structures used in biblicalteachings and historical myths through to contemporary storytellingdevices used in movies and TV. From this study he found seven commonthemes that, he argues, can be identifiable in any form of story.

One of these themes was ‘The Quest’. Booker describes this as revolvingaround a main protagonist who embarks on a journey to acquire atreasured object or reach an important destination, but faces manyobstacles and temptations along the way. It is a theme that I feel sharesmany characteristics with the structure of this book and the nature of datavisualisation.

You are the central protagonist in this story in the role of the datavisualiser. The journey you are embarking on involves a route along adesign workflow where you will be faced with a wide range of differentconceptual, practical and technical challenges. The start of this journeywill be triggered by curiosity, which you will need to define in order toaccomplish your goals. From this origin you will move forward toinitiating and planning your work, defining the dimensions of yourchallenge. Next, you will begin the heavy lifting of working with data,determining what qualities it contains and how you might share these withothers. Only then will you be ready to take on the design stage. Here youwill be faced with the prospect of handling a spectrum of different designoptions that will require creative and rational thinking to resolve mosteffectively.

The multidisciplinary nature of this field offers a unique opportunity andchallenge. Data visualisation is not an especially difficult capability toacquire, it is largely a game of decisions. Making better decisions will beyour goal but sometimes clear decisions will feel elusive. There will beoccasions when the best choice is not at all visible and others when therewill be many seemingly equal viable choices. Which one to go with? Thisbook aims to be your guide, helping you navigate efficiently through these

21

difficult stages of your journey.

You will need to learn to be flexible and adaptable, capable of shiftingyour approach to suit the circumstances. This is important because thereare plenty of potential villains lying in wait looking to derail progress.These are the forces that manifest through the imposition of restrictivecreative constraints and the pressure created by the relentless ticking clockof timescales. Stakeholders and audiences will present complex humanfactors through the diversity of their needs and personal traits. These willneed to be astutely accommodated. Data, the critical raw material of thisprocess, will dominate your attention. It will frustrate and even disappointat times, as promises of its treasures fail to materialise irrespective of thehard work, love and attention lavished upon it.

Your own characteristics will also contribute to a certain amount of thevillainy. At times, you will find yourself wrestling with internal creativeand analytical voices pulling against each other in opposite directions.Your excitably formed initial ideas will be embraced but will need taming.Your inherent tastes, experiences and comforts will divert you away fromthe ideal path, so you will need to maintain clarity and focus.

The central conflict you will have to deal with is the notion that there is noperfect in data visualisation. It is a field with very few ‘always’ and‘nevers’. Singular solutions rarely exist. The comfort offered by the rulesthat instruct what is right and wrong, good and evil, has its limits. You canfind small but legitimate breaking points with many of them. While youcan rightly aspire to reach as close to perfect as possible, the attitude ofaiming for good enough will often indeed be good enough andfundamentally necessary.

In accomplishing the quest you will be rewarded with competency in datavisualisation, developing confidence in being able to judge the mosteffective analytical and design solutions in the most efficient way. It willtake time and it will need more than just reading this book. It will alsorequire your ongoing effort to learn, apply, reflect and develop. Each newdata visualisation opportunity poses a new, unique challenge. However, ifyou keep persevering with this journey the possibility of a happy endingwill increase all the time.

I.2 Who is this Book Aimed at?22

The primary challenge one faces when writing a book about datavisualisation is to determine what to leave in and what to leave out. Datavisualisation is big. It is too big a subject even to attempt to cover it all, indetail, in one book. There is no single book to rule them all because thereis no one book that can cover it all. Each and every one of the topicscovered by the chapters in this book could (and, in several cases, do) existas whole books in their own right.

The secondary challenge when writing a book about data visualisation is todecide how to weave all the content together. Data visualisation is notrocket science; it is not an especially complicated discipline. Lots of it, asyou will see, is rooted in common sense. It is, however, certainly acomplex subject, a semantic distinction that will be revisited later. Thereare lots of things to think about and decide on, as well as many things todo and make. Creative and analytical sensibilities blend with artistic andscientific judgments. In one moment you might be checking the statisticalrigour of your calculations, in the next deciding which tone of orange mostelegantly contrasts with an 80% black. The complexity of datavisualisation manifests itself through how these different ingredients, andmany more, interact, influence and intersect to form the whole.

The decisions I have made in formulating this book‘s content have beenshaped by my own process of learning about, writing about and practisingdata visualisation for, at the time of writing, nearly a decade. Significantly– from the perspective of my own development – I have been fortunate tohave had extensive experience designing and delivering trainingworkshops and postgraduate teaching. I believe you only truly learn aboutyour own knowledge of a subject when you have to explain it and teach itto others.

I have arrived at what I believe to be an effective and proven pedagogythat successfully translates the complexities of this subject into accessible,practical and valuable form. I feel well qualified to bridge the gap betweenthe large population of everyday practitioners, who might identifythemselves as beginners, and the superstar technical, creative andacademic minds that are constantly pushing forward our understanding ofthe potential of data visualisation. I am not going to claim to belong to thatlatter cohort, but I have certainly been the former – a beginner – and mostof my working hours are spent helping other beginners start their journey.I know the things that I would have valued when I was starting out and I

23

know how I would have wished them to be articulated and presented forme to develop my skills most efficiently.

There is a large and growing library of fantastic books offering manydifferent theoretical and practical viewpoints on the subject of datavisualisation. My aim is to bring value to this existing collection of workby taking on a particular perspective that is perhaps under-represented inother texts – exploring the notion and practice of a visualisation designprocess. As I have alluded to in the opening, the central premise of thisbook is that the path to mastering data visualisation is achieved by makingbetter decisions: effective choices, efficiently made. The book’s centralgoal is to help develop your capability and confidence in facing thesedecisions.

Just as a single book cannot cover the whole of this subject, it stands that asingle book cannot aim to address directly the needs of all people doingdata visualisation. In this section I am going to run through some of thecharacteristics that shape the readers to whom this book is primarilytargeted. I will also put into context the content the book will and will notcover, and why. This will help manage your expectations as the reader andestablish its value proposition compared with other titles.

Domain and DutiesThe core audiences for whom this book has been primarily written areundergraduate and postgraduate-level students and early career researchersfrom social science subjects. This reflects a growing number of people inhigher education who are interested in and need to learn about datavisualisation.

Although aimed at social sciences, the content will also be relevant acrossthe spectrum of academic disciplines, from the arts and humanities rightthrough to the formal and natural sciences: any academic duty where thereis an emphasis on the use of quantitative and qualitative methods in studieswill require an appreciation of good data visualisation practices. Wherestatistical capabilities are relevant so too is data visualisation.

Beyond academia, data visualisation is a discipline that has reachedmainstream consciousness with an increasing number of professionals andorganisations, across all industry types and sizes, recognising the

24

importance of doing it well for both internal and external benefit. Youmight be a market researcher, a librarian or a data analyst looking toenhance your data capabilities. Perhaps you are a skilled graphic designeror web developer looking to take your portfolio of work into a more data-driven direction. Maybe you are in a managerial position and not directlyinvolved in the creation of visualisation work, but you need to coordinateor commission others who will be. You require awareness of the mostefficient approaches, the range of options and the different key decisionpoints. You might be seeking generally to improve the sophistication ofthe language you use around commissioning visualisation work and tohave a better way of expressing and evaluating work created for you.

Basically, anyone who is involved in whatever capacity with the analysisand visual communication of data as part of their professional duties willneed to grasp the demands of data visualisation and this book will go someway to supporting these needs.

Subject NeutralityOne of the important aspects of the book will be to emphasise that datavisualisation is a portable practice. You will see a broad array of examplesof work from different industries, covering very different topics. What willbecome apparent is that visualisation techniques are largely subject-matterneutral: a line chart that displays the ebb and flow of favourable opiniontowards a politician involves the same techniques as using a line chart toshow how a stock has changed in value over time or how peaktemperatures have changed across a season in a given location. A linechart is a line chart, regardless of the subject matter. The context of theviewers (such as their needs and their knowledge) and the specificmeaning that can be drawn will inevitably be unique to each setting, butthe role of visualisation itself is adaptable and portable across all subjectareas.

Data visualisation is an entirely global concern, not focused on any definedgeographic region. Although the English language dominates the writtendiscourse (books, websites) about this subject, the interest in it and visibleoutput from across the globe are increasing at a pace. There are culturalmatters that influence certain decisions throughout the design process,especially around the choices made for colour usage, but otherwise it is adiscipline common to all.

25

Level and PrerequisitesThe coverage of this book is intended to serve the needs of beginners andthose with intermediate capability. For most people, this is likely to be asfar as they might ever need to go. It will offer an accessible route fornovices to start their learning journey and, for those already familiar withthe basics, there will be content that will hopefully contribute to fine-tuning their approaches.

For context, I believe the only distinction between beginner andintermediate is one of breadth and depth of critical thinking rather than anydegree of difficulty. The more advanced techniques in visualisation tend tobe associated with the use of specific technologies for handling larger,complex datasets and/or producing more bespoke and feature-rich outputs.

This book is therefore not aimed at experienced or establishedvisualisation practitioners. There may be some new perspectives to enrichtheir thinking, some content that will confirm and other content that mightconstructively challenge their convictions. Otherwise, the coverage in thisbook should really echo the practices they are likely to be alreadyobserving.

As I have already touched on, data visualisation is a genuinelymultidisciplinary field. The people who are active in this field orprofession come from all backgrounds – everyone has a different entrypoint and nobody arrives with all constituent capabilities. It is thereforequite difficult to define just what are the right type and level of pre-existing knowledge, skills or experiences for those learning about datavisualisation. As each year passes, the savvy-ness of the type of audiencethis book targets will increase, especially as the subject penetrates moreinto the mainstream. What were seen as bewilderingly new techniquesseveral years ago are now commonplace to more people.

That said, I think the following would be a fair outline of the type andshape of some of the most important prerequisite attributes for getting themost out of this book:

Strong numeracy is necessary as well as a familiarity with basicstatistics.While it is reasonable to assume limited prior knowledge of data

26

visualisation, there should be a strong desire to want to learn it. Thedemands of learning a craft like data visualisation take time andeffort; the capabilities will need nurturing through ongoing learningand practice. They are not going to be achieved overnight or acquiredalone from reading this book. Any book that claims to be ablemagically to inject mastery through just reading it cover to cover isover-promising and likely to under-deliver.The best data visualisers possess inherent curiosity. You should bethe type of person who is naturally disposed to question the worldaround them or can imagine what questions others have. Your instinctfor discovering and sharing answers will be at the heart of thisactivity.There are no expectations of your having any prior familiarity withdesign principles, but a desire to embrace some of the creative aspectspresented in this book will heighten the impact of your work. Unlockyour artistry!If you are somebody with a strong creative flair you are veryfortunate. This book will guide you through when and crucially whennot to tap into this sensibility. You should be willing to increase therigour of your analytical decision making and be prepared to haveyour creative thinking informed more fundamentally by data ratherthan just instinct.A range of technical skills covering different software applications,tools and programming languages is not expected for this book, as Iwill explain next, but you will ideally have some knowledge of basicExcel and some experience of working with data.

I.3 Getting the Balance

Handbook vs Tutorial BookThe description of this book as being a ‘handbook’ positions it as being ofpractical help and presented in accessible form. It offers direction withcomprehensive reference – more of a city guidebook for a tourist than aninstruction manual to fix a washing machine. It will help you to know whatthings to think about, when to think about them, what options exist andhow best to resolve all the choices involved in any data-driven design.

Technology is the key enabler for working with data and creating

27

visualisation design outputs. Indeed, apart from a small proportion ofartisan visualisation work that is drawn by hand, the reliance ontechnology to create visualisation work is an inseparable necessity. Formany there is a understandable appetite for step-by-step tutorials that helpthem immediately to implement data visualisation techniques via existingand new tools.

However, writing about data visualisation through the lens of selectedtools is a bit of a minefield, given the diversity of technical options outthere and the mixed range of skills, access and needs. I greatly admirethose people who have authored tutorial-based texts because they requireastute judgement about what is the right level, structure and scope.

The technology space around visualisation is characterised by flux. Thereare the ongoing changes with the enhancement of established tools as wellas a relatively high frequency of new entrants offset by the decline ofothers. Some tools are proprietary, others are open source; some are easierto learn, others require a great deal of understanding before you can evenconsider embarking on your first chart. There are many recent cases ofapplications or services that have enjoyed fleeting exposure beforereaching a plateau: development and support decline, the community ofusers disperses and there is a certain expiry of value. Deprecation ofsyntax and functions in programming languages requires the perennialupdating of skills.

All of this perhaps paints a rather more chaotic picture than is necessarilythe case but it justifies the reasons why this book does not offer teaching inthe use of any tools. While tutorials may be invaluable to some, they mayalso only be mildly interesting to others and possibly of no value to most.Tools come and go but the craft remains. I believe that creating a practical,rather than necessarily a technical, text that focuses on the underlying craftof data visualisation with a tool-agnostic approach offers an effective wayto begin learning about the subject in appropriate depth. The contentshould be appealing to readers irrespective of the extent of their technicalknowledge (novice to advanced technicians) and specific tool experiences(e.g. knowledge of Excel, Tableau, Adobe Illustrator).

There is a role for all book types. Different people want different sourcesof insight at different stages in their development. If you are seeking a textthat provides in-depth tutorials on a range of tools or pages ofprogrammatic instruction, this one will not be the best choice. However, if

28

you consult only tutorial-related books, the chances are you will likely fallshort on the fundamental critical thinking that will be needed in the longerterm to get the most out of the tools with which you develop strong skills.

To substantiate the book’s value, the digital companion resources to thisbook will offer a curated, up-to-date collection of visualisation technologyresources that will guide you through the most common and valuable tools,helping you to gain a sense of what their roles are and where these fit intothe design workflow. Additionally, there will be recommended exercisesand many further related digital materials available for exploring.

Useful vs BeautifulAnother important distinction to make is that this book is not intended tobe seen as a beauty pageant. I love flicking through those glossy ‘coffeetable’ books as much as the next person; such books offer great inspirationand demonstrate some of the finest work in the field. This book serves avery different purpose. I believe that, as a beginner or relative beginner onthis learning journey, the inspiration you need comes more fromunderstanding what is behind the thinking that makes these amazing workssucceed and others not.

My desire is to make this the most useful text available, a reference thatwill spend more time on your desk than on your bookshelf. To be useful isto be used. I want the pages to be dog-eared. I want to see scribbles andannotated notes made across its pages and key passages underlined. I wantto see sticky labels peering out above identified pages of note. I want tosee creases where pages have been folded back or a double-page spreadthat has been weighed down to keep it open. In time I even want its coverreinforced with wallpaper or wrapping paper to ensure its contents remainbound together. There is every intention of making this an elegantlypresented and packaged book but it should not be something that invitesyou to ‘look, but don’t touch’.

Pragmatic vs TheoreticalThe content of this book has been formed through many years of absorbingknowledge from all manner of books, generations of academic papers,thousands of web articles, hundreds of conference talks, endless online and

29

personal discussions, and lots of personal practice. What I present here is apragmatic translation and distillation of what I have learned down theyears.

It is not a deeply academic or theoretical book. Where theoretical contextand reference is relevant it will be signposted as I do want to ground thisbook in as much evidenced-based content as possible; it is about judgingwhat is going to add most value. Experienced practitioners will likely havean appetite for delving deeper into theoretical discourse and the underlyingsciences that intersect in this field but that is beyond the scope of thisparticular text.

Take the science of visual perception, for example. There is no value inattempting to emulate what has already been covered by other books ingreater depth and quality than I could achieve. Once you start peeling backthe many different layers of topics like visual and cognitive science theboundaries of your interest and their relevance to data visualisation neverseem to arrive. You get swallowed up by the depth of these subjects. Yourealise that you have found yourself learning about what the very conceptof light and sight is and at that point your brain begins to ache (well, minedoes at least), especially when all you set out to discover was if a bar chartwould be better than a pie chart.

An important reason for giving greater weight to pragmatism is because ofpeople: people are the makers, the stakeholders, the audiences and thecritics in data visualisation. Although there are a great deal of valuableresearch-driven concepts concerning data visualisation, their practicalapplication can be occasionally at odds with the somewhat sanitised andartificial context of the research methods employed. To translate them intoreal-world circumstances can sometimes be easier said than done as theinfluence of human factors can easily distort the significance of otherwiserobust ideas.

I want to remove the burden from you as a reader having to translaterelevant theoretical discourse into applicable practice. Critical thinkingwill therefore be the watchword, equipping you with the independence ofthought to decide rationally for yourself what the solutions are that best fityour context, your data, your message and your audience. To do this youwill need an appreciation of all the options available to you (the differentthings you could do) and a reliable approach for critically determiningwhat choices you should make (the things you will do and why).

30

Contemporary vs HistoricalThis book is not going to look too far back into the past. We all respect theancestors of this field, the great names who, despite primitive means,pioneered new concepts in the visual display of statistics to shape thefoundations of the field being practised today. The field’s lineage isdecorated by the influence of William Playfair’s first ever bar chart,Charles Joseph Minard’s famous graphic about Napoleon’s Russiancampaign, Florence Nightingale’s Coxcomb plot and John Snow’s choleramap. These are some of the totemic names and classic examples that willalways be held up as the ‘firsts’. Of course, to many beginners in the field,this historical context is of huge interest. However, again, this kind ofcontent has already been superbly covered by other texts on more thanenough occasions. Time to move on.

I am not going to spend time attempting to enlighten you about how welive in the age of ‘Big Data’ and how occupations related to data are orwill be the ‘sexiest jobs’ of our time. The former is no longer news, thelatter claim emerged from a single source. I do not want to bloat this bookwith the unnecessary reprising of topics that have been covered at lengthelsewhere. There is more valuable and useful content I want you to focusyour time on.

The subject matter, the ideas and the practices presented here willhopefully not date a great deal. Of course, many of the graphic examplesincluded in the book will be surpassed by newer work demonstratingsimilar concepts as the field continues to develop. However, their worth asexhibits of a particular perspective covered in the text should provetimeless. As more research is conducted in the subject, without questionthere will be new techniques, new concepts, new empirically evidencedprinciples that emerge. Maybe even new rules. There will be new thought-leaders, new sources of reference, new visualisers to draw insight from.New tools will be created, existing tools will expire. Some things that aredone and can only be done by hand as of today may become seamlesslyautomated in the near future. That is simply the nature of a fast-growingfield. This book can only be a line in the sand.

Analysis vs Communication

31

A further important distinction to make concerns the subtle but significantdifference between visualisations which are used for analysis andvisualisations used for communication.

Before a visualiser can confidently decide what to communicate to others,he or she needs to have developed an intimate understanding of thequalities and potential of the data. This is largely achieved throughexploratory data analysis. Here, the visualiser and the viewer are the sameperson. Through visual exploration, different interrogations can be pursued‘on the fly’ to unearth confirmatory or enlightening discoveries about whatinsights exist.

Visualisation techniques used for analysis will be a key component of thejourney towards creating visualisation for communication but the practicesinvolved differ. Unlike visualisation for communication, the techniquesused for visual analysis do not have to be visually polished or necessarilyappealing. They are only serving the purpose of helping you to truly learnabout your data. When a data visualisation is being created tocommunicate to others, many careful considerations come into play aboutthe requirements and interests of the intended or expected audience. Thishas a significant influence on many of the design decisions you make thatdo not exist alone with visual analysis.

Exploratory data analysis is a huge and specialist subject in and of itself. Inits most advanced form, working efficiently and effectively with largecomplex data, topics like ‘machine learning’, using self-learningalgorithms to help automate and assist in the discovery of patterns in data,become increasingly relevant. For the scope of this book the content isweighted more towards methods and concerns about communicating datavisually to others. If your role is in pure data science or statistical analysisyou will likely require a deeper treatment of the exploratory data analysistopic than this book can reasonably offer. However, Chapter 4 will coverthe essential elements in sufficient depth for the practical needs of mostpeople working with data.

Print vs DigitalThe opportunity to supplement the print version of this book with an e-book and further digital companion resources helps to cushion theagonising decisions about what to leave out. This text is therefore

32

enhanced by access to further digital resources, some of which are newlycreated, while others are curated references from the endless well ofvisualisation content on the Web. Included online(book.visualisingdata.com) will be:

a completed case-study project that demonstrates the workflowactivities covered in this book, including full write-ups and all relateddigital materials;an extensive and up-to-date catalogue of over 300 data visualisationtools;a curated collection of tutorials and resources to help develop yourconfidence with some of the most common and valuable tools;practical exercises designed to embed the learning from each chapter;further reading resources to continue learning about the subjectscovered in each chapter.

I.4 ObjectivesBefore moving on to an outline of the book’s contents, I want to share fourkey objectives that I hope to accomplish for you by the final chapter.These are themes that will run through the entire text: challenge, enlighten,equip and inspire.

To challenge you I will be encouraging you to recognise that your currentthinking about visualisation may need to be reconsidered, both as a creatorand as a consumer. We all arrive in visualisation from different subject anddomain origins and with that comes certain baggage and prior sensibilitiesthat can distort our perspectives. I will not be looking to eliminate these,rather to help you harness and align them with other traits and viewpoints.

I will ask you to relentlessly consider the diverse decisions involved in thisprocess. I will challenge your convictions about what you perceive to begood or bad, effective or ineffective visualisation choices: arbitrarychoices will be eliminated from your thinking. Even if you are notnecessarily a beginner, I believe the content you read in this book willmake you question some of your own perspectives and assumptions. I willencourage you to reflect on your previous work, asking you to considerhow and why you have designed visualisations in the way that you have:where do you need to improve? What can you do better?

33

It is not just about creating visualisations, I will also challenge yourapproach to reading visualisations. This is not something you mightusually think much about, but there is an important role for more tacticalapproaches to consuming visualisations with greater efficiency andeffectiveness.

To enlighten you will be to increase your awareness of the possibilities indata visualisation. As you begin your discovery of data visualisation youmight not be aware of the whole: you do not entirely know what optionsexist, how they are connected and how to make good choices. Until youknow, you don’t know – that is what the objective of enlightening is allabout.

As you will discover, there is a lot on your plate, much to work through. Itis not just about the visible end-product design decisions. Hidden beneaththe surface are many contextual circumstances to weigh up, decisionsabout how best to prepare your data, choices around the multitude ofviable ways of slicing those data up into different angles of analysis. Thatis all before you even reach the design stage, where you will begin toconsider the repertoire of techniques for visually portraying your data – thecharts, the interactive features, the colours and much more besides.

This book will broaden your visual vocabulary to give you more ways ofexpressing your data visually. It will enhance the sophistication of yourdecision making and of visual language for any of the challenges you mayface.

To equip is to ensure you have robust tactics for managing your waythrough the myriad options that exist in data visualisation. The variety itoffers makes for a wonderful prospect but, equally, introduces the burdenof choice. This book aims to make the challenge of undertaking datavisualisation far less overwhelming, breaking down the overall prospectinto smaller, more manageable task chunks.

The structure of this book will offer a reliable and flexible framework forthinking, rather than rules for learning. It will lead to better decisions.With an emphasis on critical thinking you will move away from an over-reliance on gut feeling and taste. To echo what I mentioned earlier, its roleas a handbook will help you know what things to think about, when tothink about them and how best to resolve all the thinking involved in anydata-driven design challenge you meet.

34

To inspire is to give you more than just a book to read. It is the opening ofa door into a subject to inspire you to step further inside. It is about helpingyou to want to continue to learn about it and expose yourself to as muchpositive influence as possible. It should elevate your ambition and broadenyour capability.

It is a book underpinned by theory but dominated by practical andaccessible advice, including input from some of the best visualisers in thefield today. The range of print and digital resources will offer lots ofsupplementary material including tutorials, further reading materials andsuggested exercises. Collectively this will hopefully make it one of themost comprehensive, valuable and inspiring titles out there.

I.5 Chapter ContentsThe book is organised into four main parts (A, B, C and D) comprisingeleven chapters and preceded by the ‘Introduction’ sections you arereading now.

Each chapter opens with an introductory outline that previews the contentto be covered and provides a bridge between consecutive chapters. In theclosing sections of each chapter the most salient learning points will besummarised and some important, practical tips and tactics shared. Asmentioned, online there will be collections of practical exercises andfurther reading resources recommended to substantiate the learning fromthe chapter.

Throughout the book you will see sidebar captions that will offer relevantreferences, aphorisms, good habits and practical tips from some of themost influential people in the field today.

IntroductionThis introduction explains how I have attempted to make sense of thecomplexity of the subject, outlining the nature of the audience I am tryingto reach, the key objectives, what topics the book will be covering and notcovering, and how the content has been organised.

35

Part A: FoundationsPart A establishes the foundation knowledge and sets up a key reference ofunderstanding that aids your thinking across the rest of the book. Chapter 1will be the logical starting point for many of you who are new to the fieldto help you understand more about the definitions and attributes of datavisualisation. Even if you are not a complete beginner, the content of thechapter forms the terms of reference that much of the remaining content isbased on. Chapter 2 prepares you for the journey through the rest of thebook by introducing the key design workflow that you will be following.

Chapter 1: Defining Data Visualisation

Defining data visualisation: outlining the components of thinkingthat make up the proposed definition for data visualisation.The importance of conviction: presenting three guiding principles ofgood visualisation design: trustworthy, accessible and elegant.Distinctions and glossary: explaining the distinctions and overlapswith other related disciplines and providing a glossary of terms usedin this book to establish consistency of language.

Chapter 2: Visualisation Workflow

The importance of process: describing the data visualisation designworkflow, what it involves and why a process approach is required.The process in practice: providing some useful tips, tactics andhabits that transcend any particular stage of the process but will bestprepare you for success with this activity.

Part B: The Hidden ThinkingPart B discusses the first three preparatory stages of the data visualisationdesign workflow. ‘The hidden thinking’ title refers to how these vitalactivities, that have a huge influence over the eventual design solution, aresomewhat out of sight in the final output; they are hidden beneath thesurface but completely shape what is visible. These stages represent theoften neglected contextual definitions, data wrangling and editorialchallenges that are so critical to the success or otherwise of any

36

visualisation work – they require a great deal of care and attention beforeyou switch your attention to the design stage.

Chapter 3: Formulating Your Brief

What is a brief?: describing the value of compiling a brief to helpinitiate, define and plan the requirements of your work.Establishing your project’s context: defining the origin curiosity ormotivation, identifying all the key factors and circumstances thatsurround your work, and defining the core purpose of yourvisualisation.Establishing your project’s vision: early considerations about thetype of visualisation solution needed to achieve your aims andharnessing initial ideas about what this solution might look like.

Chapter 4: Working With Data

Data literacy: establishing a basic understanding with this criticalliteracy, providing some foundation understanding about datasets anddata types and some observations about statistical literacy.Data acquisition: outlining the different origins of and methods foraccessing your data.Data examination: approaches for acquainting yourself with thephysical characteristics and meaning of your data.Data transformation: optimising the condition, content and form ofyour data fully to prepare it for its analytical purpose.Data exploration: developing deeper intimacy with the potentialqualities and insights contained, and potentially hidden, within yourdata.

Chapter 5: Establishing Your Editorial Thinking

What is editorial thinking?: defining the role of editorial thinking indata visualisation.The influence of editorial thinking: explaining how the differentdimensions of editorial thinking influence design choices.

Part C: Developing Your Design Solution

37

Part C is the main part of the book and covers progression through the datavisualisation design and production stage. This is where your concernsswitch from hidden thinking to visible thinking. The individual chapters inthis part of the book cover each of the five layers of the data visualisationanatomy. They are treated as separate affairs to aid the clarity andorganisation of your thinking, but they are entirely interrelated matters andthe chapter sequences support this. Within each chapter there is aconsistent structure beginning with an introduction to each design layer, anoverview of the many different possible design options, followed bydetailed guidance on the factors that influence your choices.

The production cycle: describing the cycle of development activitiesthat take place during this stage, giving a context for how to workthrough the subsequent chapters in this part.

Chapter 6: Data Representation

Introducing visual encoding: an overview of the essentials of datarepresentation looking at the differences and relationships betweenvisual encoding and chart types.Chart types: a detailed repertoire of 49 different chart types, profiledin depth and organised by a taxonomy of chart families: categorical,hierarchical, relational, temporal, and spatial.Influencing factors and considerations: presenting the factors thatwill influence the suitability of your data representation choices.

Chapter 7: Interactivity

The features of interactivity:

Data adjustments: a profile of the options for interactivelyinterrogating and manipulating data.View adjustments: a profile of the options for interactivelyconfiguring the presentation of data.

Influencing factors and considerations: presenting the factors that willinfluence the suitability of your interactivity choices.

Chapter 8: Annotation

38

The features of annotation:

Project annotation: a profile of the options for helping to provideviewers with general explanations about your project.Chart annotation: a profile of the annotated options for helping tooptimise viewers’ understanding your charts.

Influencing factors and considerations: presenting the factors that willinfluence the suitability of your annotation choices.

Chapter 9: Colour

The features of colour:

Data legibility: a profile of the options for using colour to representdata.Editorial salience: a profile of the options for using colour to directthe eye towards the most relevant features of your data.Functional harmony: a profile of the options for using colour mosteffectively across the entire visualisation design.

Influencing factors and considerations: presenting the factors that willinfluence the suitability of your colour choices.

Chapter 10: Composition

The features of composition:

Project composition: a profile of the options for the overall layout andhierarchy of your visualisation design.Chart composition: a profile of the options for the layout andhierarchy of the components of your charts.

Influencing factors and considerations: presenting the factors that willinfluence the suitability of your composition choices.

Part D: Developing Your CapabilitiesPart D wraps up the book’s content by reflecting on the range ofcapabilities required to develop confidence and competence with data

39

visualisation. Following completion of the design process, themultidisciplinary nature of this subject will now be clearly established.This final part assesses the two sides of visualisation literacy – your role asa creator and your role as a viewer – and what you need to enhance yourskills with both.

Chapter 11: Visualisation Literacy

Viewing: Learning to see: learning about the most effective strategyfor understanding visualisations in your role as a viewer rather than acreator.Creating: The capabilities of the visualiser: profiling the skill sets,mindsets and general attributes needed to master data visualisationdesign as a creator.

40

Part A Foundations

41

1 Defining Data Visualisation

This opening chapter will introduce you to the subject of datavisualisation, defining what data visualisation is and is not. It will outlinethe different ingredients that make it such an interesting recipe andestablish a foundation of understanding that will form a key reference forall of the decision making you are faced with.

Three core principles of good visualisation design will be presented thatoffer guiding ideals to help mould your convictions about distinguishingbetween effective and ineffective in data visualisation.

You will also see how data visualisation sits alongside or overlaps withother related disciplines, and some definitions about the use of language inthis book will be established to ensure consistency in meaning across allchapters.

1.1 The Components of UnderstandingTo set the scene for what is about to follow, I think it is important to startthis book with a proposed definition for data visualisation (Figure 1.1).This definition offers a critical term of reference because its componentsand their meaning will touch on every element of content that follows inthis book. Furthermore, as a subject that has many different proposeddefinitions, I believe it is worth clarifying my own view before goingfurther:

Figure 1.1 A Definition for Data Visualisation

42

At first glance this might appear to be a surprisingly short definition: isn’tthere more to data visualisation than that, you might ask? Can nine wordssufficiently articulate what has already been introduced as an eminentlycomplex and diverse discipline?

I have arrived at this after many years of iterations attempting to improvethe elegance of my definition. In the past I have tried to force too manywords and too many clauses into one statement, making it cumbersomeand rather undermining its value. Over time, as I have developed greaterclarity in my own convictions, I have in turn managed to establish greaterclarity about what I feel is the real essence of this subject. The definitionabove is, I believe, a succinct and practically useful description of what thepursuit of visualisation is truly about. It is a definition that largely informsthe contents of this book. Each chapter will aim to enlighten you aboutdifferent aspects of the roles of and relationships between each componentexpressed. Let me introduce and briefly examine each of these one by one,explaining where and how they will be discussed in the book.

Firstly, data, our critical raw material. It might appear a formality tomention data in the definition for, after all, we are talking about datavisualisation as opposed to, let’s say, cheese visualisation (thoughvisualisation of data using cheese has happened, see Figure 1.2), but itneeds to be made clear the core role that data has in the design process.Without data there is no visualisation; indeed there is no need for one.Data plays the fundamental role in this work, so you will need to give ityour undivided attention and respect. You will discover in Chapter 4 theimportance of developing an intimacy with your data to acquaint yourselfwith its physical properties, its meaning and its potential qualities.

43

Figure 1.2 Per Capita Cheese Consumption in the US

Data is names, amounts, groups, statistical values, dates, comments,locations. Data is textual and numeric in format, typically held in datasetsin table form, with rows of records and columns of different variables.

This tabular form of data is what we will be considering as the raw form ofdata. Through tables, we can look at the values contained to precisely readthem as individual data points. We can look up values quite efficiently,scanning across many variables for the different records held. However,we cannot easily establish the comparative size and relationship betweenmultiple data points. Our eyes and mind are not equipped to translateeasily the textual and numeric values into quantitative and qualitativemeaning. We can look at the data but we cannot really see it without thecontext of relationships that help us compare and contrast them effectivelywith other values. To derive understanding from data we need to see itrepresented in a different, visual form. This is the act of datarepresentation.

This word representation is deliberately positioned near the front of thedefinition because it is the quintessential activity of data visualisationdesign. Representation concerns the choices made about the form in whichyour data will be visually portrayed: in lay terms, what chart or charts youwill use to exploit the brain’s visual perception capabilities mosteffectively.

When data visualisers create a visualisation they are representing the datathey wish to show visually through combinations of marks and attributes.Marks are points, lines and areas. Attributes are the appearance properties

44

of these marks, such as the size, colour and position. The recipe of thesemarks and their attributes, along with other components of apparatus, suchas axes and gridlines, form the anatomy of a chart.

In Chapter 6 you will gain a deeper and more sophisticated appreciation ofthe range of different charts that are in common usage today, broadeningyour visual vocabulary. These charts will vary in complexity andcomposition, with each capable of accommodating different types of dataand portraying different angles of analysis. You will learn about the keyingredients that shape your data representation decisions, explaining thefactors that distinguish the effective from the ineffective choices.

Beyond representation choices, the presentation of data concerns all theother visible design decisions that make up the overall visualisationanatomy. This includes choices about the possible applications ofinteractivity, features of annotation, colour usage and the composition ofyour work. During the early stages of learning this subject it is sensible topartition your thinking about these matters, treating them as isolateddesign layers. This will aid your initial critical thinking. Chapters 7–10will explore each of these layers in depth, profiling the options availableand the factors that influence your decisions.

However, as you gain in experience, the interrelated nature of visualisationwill become much more apparent and you will see how the overall designanatomy is entirely connected. For instance, the selection of a chart typeintrinsically leads to decisions about the space and place it will occupy; aninteractive control may be included to reveal an annotated caption; for anydesign property to be even visible to the eye it must possess a colour that isdifferent from that of its background.

The goal expressed in this definition states that data visualisation is aboutfacilitating understanding. This is very important and some extra time isrequired to emphasise why it is such an influential component in ourthinking. You might think you know what understanding means, but whenyou peel back the surface you realise there are many subtleties that need tobe acknowledged about this term and their impact on your datavisualisation choices. Understanding ‘understanding’ (still with me?) inthe context of data visualisation is of elementary significance.

When consuming a visualisation, the viewer will go through a process ofunderstanding involving three stages: perceiving, interpreting and

45

comprehending (Figure 1.3). Each stage is dependent on the previous oneand in your role as a data visualiser you will have influence but not fullcontrol over these. You are largely at the mercy of the viewer – what theyknow and do not know, what they are interested in knowing and whatmight be meaningful to them – and this introduces many variables outsideof your control: where your control diminishes the influence and relianceon the viewer increases. Achieving an outcome of understanding istherefore a collective responsibility between visualiser and viewer.

These are not just synonyms for the same word, rather they carryimportant distinctions that need appreciating. As you will seethroughout this book, the subtleties and semantics of language in datavisualisation will be a recurring concern.

Figure 1.3 The Three Stages of Understanding

Let’s look at the characteristics of the different stages that form the processof understanding to help explain their respective differences and mutualdependencies.

Firstly, perceiving. This concerns the act of simply being able to read achart. What is the chart showing you? How easily can you get a sense ofthe values of the data being portrayed?

Where are the largest, middle-sized and smallest values?What proportion of the total does that value hold?How do these values compare in ranking terms?To which other values does this have a connected relationship?

The notion of understanding here concerns our attempts as viewers to

46

efficiently decode the representations of the data (the shapes, the sizes andthe colours) as displayed through a chart, and then convert them intoperceived values: estimates of quantities and their relationships to othervalues.

Interpreting is the next stage of understanding following on fromperceiving. Having read the charts the viewer now seeks to convert theseperceived values into some form of meaning:

Is it good to be big or better to be small?What does it mean to go up or go down?Is that relationship meaningful or insignificant?Is the decline of that category especially surprising?

The viewer’s ability to form such interpretations is influenced by their pre-existing knowledge about the portrayed subject and their capacity to utilisethat knowledge to frame the implications of what has been read. Where aviewer does not possess that knowledge it may be that the visualiser has toaddress this deficit. They will need to make suitable design choices thathelp to make clear what meaning can or should be drawn from the displayof data. Captions, headlines, colours and other annotated devices, inparticular, can all be used to achieve this.

Comprehending involves reasoning the consequence of the perceiving andinterpreting stages to arrive at a personal reflection of what all this meansto them, the viewer. How does this information make a difference to whatwas known about the subject previously?

Why is this relevant? What wants or needs does it serve?Has it confirmed what I knew or possibly suspected beforehand orenlightened me with new knowledge?Has this experience impacted me in an emotional way or left mefeeling somewhat indifferent as a consequence?Does the context of what understanding I have acquired lead me totake action – such as make a decision or fundamentally change mybehaviour – or do I simply have an extra grain of knowledge theconsequence of which may not materialise until much later?

Over the page is a simple demonstration to further illustrate this process ofunderstanding. In this example I play the role of a viewer working with asample isolated chart (Figure 1.4). As you will learn throughout the design

47

chapters, a chart would not normally just exist floating in isolation like thisone does, but it will serve a purpose for this demonstration.

Figure 1.4 shows a clustered bar chart that presents a breakdown of thecareer statistics for the footballer Lionel Messi during his career with FCBarcelona.

The process commences with perceiving the chart. I begin by establishingwhat chart type is being used. I am familiar with this clustered bar chartapproach and so I quickly feel at ease with the prospect of reading itsdisplay: there is no learning for me to have to go through on this occasion,which is not always the case as we will see.

I can quickly assimilate what the axes are showing by examining the labelsalong the x- and y-axes and by taking the assistance provided by colourlegend at the top. I move on to scanning, detecting and observing thegeneral physical properties of the data being represented. The eyes andbrain are working in harmony, conducting this activity quite instinctivelywithout awareness or delay, noting the most prominent features ofvariation in the attributes of size, shape, colour and position.

Figure 1.4 Demonstrating the Process of Understanding

I look across the entire chart, identifying the big, small and medium values

48

(these are known as stepped magnitude judgements), and form an overallsense of the general value rankings (global comparison judgements). I aminstinctively drawn to the dominant bars towards the middle/right of thechart, especially as I know this side of the chart concerns the most recentcareer performances. I can determine that the purple bar – showing goals –has been rising pretty much year-on-year towards a peak in 2011/12 andthen there is a dip before recovery in his most recent season.

My visual system is now working hard to decode these properties intoestimations of quantities (amounts of things) and relationships (howdifferent things compare with each other). I focus on judging the absolutemagnitudes of individual bars (one bar at a time). The assistance offeredby the chart apparatus, such as the vertical axis (or y- axis) values and theinclusion of gridlines, is helping me more quickly estimate the quantitieswith greater assurance of accuracy, such as discovering that the highestnumber of goals scored was around 73.

I then look to conduct some relative higher/lower comparisons. Incomparing the games and goals pairings I can see that three out of the lastfour years have seen the purple bar higher than the blue bar, in contrast toall the rest. Finally I look to establish proportional relationships betweenneighbouring bars, i.e. by how much larger one is compared with the next.In 2006/07 I can see the blue bar is more than twice as tall as the purpleone, whereas in 2011/12 the purple bar is about 15% taller.

By reading this chart I now have a good appreciation of the quantitiesdisplayed and some sense of the relationship between the two measures,games and goals.

The second part of the understanding process is interpreting. In reality, itis not so consciously consecutive or delayed in relationship to theperceiving stage but you cannot get here without having already done theperceiving. Interpreting, as you will recall, is about converting perceived‘reading’ into meaning. Interpreting is essentially about orientating yourassessment of what you’ve read against what you know about the subject.

As I mentioned earlier, often a data visualiser will choose to – or have theopportunity to – share such insights via captions, chart overlays orsummary headlines. As you will learn in Chapter 3, the visualisations thatpresent this type of interpretation assistance are commonly described asoffering an ‘explanatory’ experience. In this particular demonstration it is

49

an example of an ‘exhibitory’ experience, characterised by the absence ofany explanatory features. It relies on the viewer to handle the demands ofinterpretation without any assistance.

As you will read about later, many factors influence how well differentviewers will be able to interpret a visualisation. Some of the most criticalinclude the level of interest shown towards the subject matter, its relevanceand the general inclination, in that moment, of a viewer to want to readabout that subject through a visualisation. It is also influenced by theknowledge held about a subject or the capacity to derive meaning from asubject even if a knowledge gap exists.

Returning to the sample chart, in order to translate the quantities andrelationships I extracted from the perceiving stage into meaning, I ameffectively converting the reading of value sizes into notions of good orbad and comparative relationships into worse than or better than etc. Tointerpret the meaning of this data about Lionel Messi I can tap into mypassion for and knowledge of football. I know that for a player to scoreover 25 goals in a season is very good. To score over 35 is exceptional. Toscore over 70 goals is frankly preposterous, especially at the highest levelof the game (you might find plenty of players achieving these statisticsplaying for the Dog and Duck pub team, but these numbers have beenachieved for Barcelona in La Liga, the Champions League and otherdomestic cup competitions). I know from watching the sport, and poringover statistics like this for 30 years, that it is very rare for a player to scoreremotely close to a ratio of one goal per game played. Those purple barsthat exceed the height of the blue bars are therefore remarkable. Beyondthe information presented in the chart I bring knowledge about the periodswhen different managers were in charge of Barcelona, how they played thegame, and how some organised their teams entirely around Messi’s talents.I know which other players were teammates across different seasons andwho might have assisted or hindered his achievements. I also know his ageand can mentally compare his achievements with the traditional footballcareer arcs that will normally show a steady rise, peak, plateau, and thendecline.

Therefore, in this example, I am not just interested in the subject but canbring a lot of knowledge to aid me in interpreting this analysis. That helpsme understand a lot more about what this data means. For other peoplethey might be passingly interested in football and know how to read what

50

is being presented, but they might not possess the domain knowledge to godeeper into the interpretation. They also just might not care. Now imaginethis was analysis of, let’s say, an NHL ice hockey player (Figure 1.5) –that would present an entirely different challenge for me.

In this chart the numbers are irrelevant, just using the same chart as beforewith different labels. Assuming this was real analysis, as a sports fan ingeneral I would have the capacity to understand the notion of asportsperson’s career statistics in terms of games played and goals scored:I can read the chart (perceiving) that shows me this data and catch the gistof the angle of analysis it is portraying. However, I do not have sufficientdomain knowledge of ice hockey to determine the real meaning andsignificance of the big–small, higher–lower value relationships. I cannotconfidently convert ‘small’ into ‘unusual’ or ‘greater than’ into‘remarkable’. My capacity to interpret is therefore limited, and besides Ihave no connection to the subject matter, so I am insufficiently interestedto put in the effort to spend much time with any in-depth attempts atinterpretation.

Figure 1.5 Demonstrating the Process of Understanding

Imagine this is now no longer analysis about sport but about the sightingsin the wild of Winglets and Spungles (completely made up words). Onceagain I can still read the chart shown in Figure 1.6 but now I have

51

absolutely no connection to the subject whatsoever. No knowledge and nointerest. I have no idea what these things are, no understanding about thesense of scale that should be expected for these sightings, I don’t knowwhat is good or bad. And I genuinely don’t care either. In contrast, forthose who do have a knowledge of and interest in the subject, the meaningof this data will be much more relevant. They will be able to read the chartand make some sense of the meaning of the quantities and relationshipsdisplayed.

To help with perceiving, viewers need the context of scale. To help withinterpreting, viewers need the context of subject, whether that is providedby the visualiser or the viewer themself. The challenge for you and I asdata visualisers is to determine what our audience will know already andwhat they will need to know in order to possibly assist them in interpretingthe meaning. The use of explanatory captions, perhaps positioned in thatbig white space top left, could assist those lacking the knowledge of thesubject, possibly offering a short narrative to make the interpretations – themeaning – clearer and immediately accessible.

We are not quite finished, there is one stage left. The third part of theunderstanding process is comprehending. This is where I attempt to formsome concluding reasoning that translates into what this analysis means forme. What can I infer from the display of data I have read? How do I relateand respond to the insights I have drawn out as through interpretation?Does what I’ve learnt make a difference to me? Do I know somethingmore than I did before? Do I need to act or decide on anything? How doesit make me feel emotionally?

Figure 1.6 Demonstrating the Process of Understanding

52

Through consuming the Messi chart, I have been able to form an evengreater appreciation of his amazing career. It has surprised me just howprolific he has been, especially having seen his ratio of goals to games, andI am particularly intrigued to see whether the dip in 2013/14 was atemporary blip or whether the bounce back in 2014/15 was the blip. Andas he reaches his late 20s, will injuries start to creep in as they seem to dofor many other similarly prodigious young talents, especially as he hasbeen playing relentlessly at the highest level since his late teens?

My comprehension is not a dramatic discovery. There is no suddeninclination to act nor any need – based on what I have learnt. I just feel aheightened impression, formed through the data, about just how good andprolific Lionel Messi has been. For Barcelona fanatics who watch him playevery week, they will likely have already formed this understanding. Thiskind of experience would only have reaffirmed what they already probablyknew.

And that is important to recognise when it comes to managingexpectations about what we hope to achieve amongst our viewers in termsof their final comprehending. One person’s ‘I knew that already’ is anotherperson’s ‘wow’. For every ‘wow, I need to make some changes’ type ofreflection there might be another ‘doesn’t affect me’. A compellingvisualisation about climate change presented to Sylvie might affect her

53

significantly about the changes she might need to make in her lifestylechoices that might reduce her carbon footprint. For Robert, who is alreadyfamiliar with the significance of this situation, it might have substantiallyless immediate impact – not indifference to the meaning of the data, justnothing new, a shrug of the shoulders. For James, the hardened sceptic,even the most indisputable evidence may have no effect; he might just notbe receptive to altering his views regardless.

What these scenarios try to explain is that, from your perspective of thevisualiser, this final stage of understanding is something you will haverelatively little control over because viewers are people and people arecomplex. People are different and as such they introduce inconsistencies.You can lead a horse to water but you cannot make it drink: you cannotforce a viewer to be interested in your work, to understand the meaning ofa subject or get that person to react exactly how you would wish.

Visualising data is just an agent of communication and not a guarantor forwhat a viewer does with the opportunity for understanding that ispresented. There are different flavours of comprehension, differentconsequences of understanding formed through this final stage. Manyvisualisations will be created with the ambition to simply inform, like theMessi graphic achieved for me, perhaps to add just an extra grain to thepile of knowledge a viewer has about a subject. Not every visualisationresults in a Hollywood moment of grand discoveries, surprising insights orlife-saving decisions. But that is OK, so long as the outcome fits with theintended purpose, something we will discuss in more depth in Chapter 3.

Furthermore, there is the complexity of human behaviour in how peoplemake decisions in life. You might create the most compellingvisualisation, demonstrating proven effective design choices, carefullyconstructed with very a specific audience type and need in mind. Thismight clearly show how a certain decision really needs to be taken bythose in the audience. However, you cannot guarantee that the decisionmaker in question, while possibly recognising that there is a need to act,will be in a position to act, and indeed will know how to act.

It is at this point that one must recognise the ambitions and – moreimportantly – realise the limits of what data visualisation can achieve.Going back again, finally, to the components of the definition, all thereasons outlined above show why the term to facilitate is the most avisualiser can reasonably aspire to achieve.

54

It might feel like a rather tepid and unambitious aim, something of a cop-out that avoids scrutiny over the outcomes of our work: why not aim to‘deliver’, ‘accomplish’, or do something more earnest than just ‘facilitate’?I deliberately use ‘facilitate’ because as we have seen we can only controlso much. Design cannot change the world, it can only make it run a littlesmoother. Visualisers can control the output but not the outcome: at bestwe can expect to have only some influence on it.

1.2 The Importance of ConvictionThe key structure running through this book is a data visualisation designprocess. By following this process you will be able to decrease the size ofthe challenge involved in making good decisions about your designsolution. The sequencing of the stages presented will help reduce themyriad options you have to consider, which makes the prospect of arrivingat the best possible solution much more likely to occur.

Often, the design choices you need to make will be clear cut. As you willlearn, the preparatory nature of the first three stages goes a long way tosecuring that clarity later in the design stage. On other occasions, plain oldcommon sense is a more than sufficient guide. However, for more nuancedsituations, where there are several potentially viable options presentingthemselves, you need to rely on the guiding value of good designprinciples.

‘I say begin by learning about data visualisation’s “black and whites”,the rules, then start looking for the greys. It really then becomes quite apersonal journey of developing your conviction.’ Jorge Camoes, DataVisualization Consultant

For many people setting out on their journey in data visualisation, themajor influences that shape their early beliefs about data visualisationdesign tend to be influenced by the first authors they come across. Nameslike Edward Tufte, unquestionably one of the most important figures inthis field whose ideas are still pervasive, represent a common entry pointinto the field, as do people like Stephen Few, David McCandless, AlbertoCairo, and Tamara Munzner, to name but a few. These are authors ofprominent works that typically represent the first books purchased and

55

read by many beginners.

Where you go from there – from whom you draw your most valuableenduring guidance –will be shaped by many different factors: taste, theindustry you are working in, the topics on which you work, the types ofaudiences you produce for. I still value much of what Tufte extols, forexample, but find I can now more confidently filter out some of his idealsthat veer towards impractical ideology or that do not necessarily hold upagainst contemporary technology and the maturing expectations of people.

‘My key guiding principle? Know the rules, before you break them.’Gregor Aisch, Graphics Editor, The New York Times

The key guidance that now most helpfully shapes and supports myconvictions comes from ideas outside the boundaries of visualisationdesign in the shape of the work of Dieter Rams. Rams was a Germanindustrial and product designer who was most famously associated withthe Braun company.

In the late 1970s or early 1980s, Rams was becoming concerned about thestate and direction of design thinking and, given his prominent role in theindustry, felt a responsibility to challenge himself, his own work and hisown thinking against a simple question: ‘Is my design good design?’. Bydissecting his response to this question he conceived 10 principles thatexpressed the most important characteristics of what he considered to begood design. They read as follows:

1. Good design is innovative.2. Good design makes a product useful.3. Good design is aesthetic.4. Good design makes a product understandable.5. Good design is unobtrusive.6. Good design is honest.7. Good design is long lasting.8. Good design is thorough down to the last detail.9. Good design is environmentally friendly.

10. Good design is as little design as possible.

Inspired by the essence of these principles, and considering theirapplicability to data visualisation design, I have translated them into three

56

high-level principles that similarly help me to answer my own question: ‘Ismy visualisation design good visualisation design?’ These principles offerme a guiding voice when I need to resolve some of the more seeminglyintangible decisions I am faced with (Figure 1.7).

Figure 1.7 The Three Principles of Good Visualisation Design

In the book Will it Make the Boat Go Faster?, co-author Ben Hunt-Davisprovides details of the strategies employed by him and his team that led totheir achieving gold medal success in the Men’s Rowing Eight event at theSydney Olympics in 2000. As the title suggests, each decision taken had topass the ‘will it make the boat go faster?’ test. Going back to the goal ofdata visualisation as defined earlier, these design principles help me judgewhether any decision I make will better aid the facilitation ofunderstanding: the equivalence of ‘making the boat go faster’.

I will describe in detail the thinking behind each of these principles andexplain how Rams’ principles map onto them. Before that, let me brieflyexplain why there are three principles of Rams’ original ten that do notentirely fit, in my view, as universal principles for data visualisation.

‘I’m always the fool looking at the sky who falls off the cliff. In otherwords, I tend to seize on ideas because I’m excited about them withoutthinking through the consequences of the amount of work they willentail. I find tight deadlines energizing. Answering the question of“what is the graphic trying to do?” is always helpful. At minimum thework I create needs to speak to this. Innovation doesn’t have to be awholesale out-of-the box approach. Iterating on a previous idea, movingit forward, is innovation.’ Sarah Slobin, Visual Journalist

Good design is innovative: Data visualisation does not need alwaysto be innovative. For the majority of occasions the solutions beingcreated call upon the tried and tested approaches that have been usedfor generations. Visualisers are not conceiving new forms ofrepresentation or implementing new design techniques in every

57

project. Of course, there are times when innovation is required toovercome a particular challenge; innovation generally materialiseswhen faced with problems that current solutions fail to overcome.Your own desire for innovation may be aligned to personal goalsabout the development of your skills or through reflecting on previousprojects and recognising a desire to rethink a solution. It is not thatdata visualisation is never about innovation, just that it is not alwaysand only about innovation.Good design is long lasting: The translation of this principle to thecontext of data visualisation can be taken in different ways. ‘Longlasting’ could be related to the desire to preserve the ongoingfunctionality of a digital project, for example. It is quite demoralisinghow many historic links you visit online only to find a project hasnow expired through a lack of sustained support or is no longerfunctionally supported on modern browsers.Another way to interpret ‘long lasting’ is in the durability of thetechnique. Bar charts, for example, are the old reliables of the field –always useful, always being used, always there when you need them(author wipes away a respectful tear). ‘Long lasting’ can also relate toavoiding the temptation of fashion or current gimmickry and having atimeless approach to design. Consider the recent design trend movingaway from skeuomorphism and the emergence of so-called flatdesign. By the time this book is published there will likely be a newmovement. ‘Long lasting’ could apply to the subject matter. Expiry inthe relevance of certain angles of analysis or out-of-date data isinevitable in most of our work, particularly with subjects that concerncurrent matters. Analysis about the loss of life during the SecondWorld War is timeless because nothing is now going to change thenature or extent of the underlying data (unless new discoveriesemerge). Analysis of the highest grossing movies today will changeas soon as new big movies are released and time elapses. So, onceagain, this idea of long lasting is very context specific, rather thanbeing a universal goal for data visualisation.Good design is environmentally friendly: This is, of course, a nobleaim but the relevance of this principle has to be positioned again atthe contextual level, based on the specific circumstances of a givenproject. If your work is to be printed, the ink and paper usageimmediately removes the notion that it is an environmentally friendlyactivity. Developing a powerful interactive that is being hammeredconstantly and concurrently by hundreds of thousands of users puts an

58

extra burden on the hosting server, creating more demands on energysupply. The specific judgements about issues relating to the impact ofa project on the environment realistically reside with the protagonistsand stakeholders involved.

A point of clarity is that, while I describe them as design principles, theyactually provide guidance long before you reach the design thinking at thefinal stage of this workflow. Design choices encapsulate the criticalthinking undertaken throughout. Think of it like an iceberg: the design isthe visible consequences of lots of hidden preparatory thinking formedthrough earlier stages.

Finally, a comment is in order about something often raised in discussionsabout the principles for this subject: that is, the idea that visualisationsneed to be memorable. This is, in my view, not relevant as a universalprinciple. If something is memorable, wonderful, that will be a terrific by-product of your design thinking, but in itself the goal of achievingmemorability has to be isolated, again, to a contextual level based on thespecific goals of a given task and the capacity of the viewer. A politicianor a broadcaster might need to recall information more readily in theirwork than a group of executives in a strategy meeting with permanentaccess to endless information at the touch of a button via their iPads.

Principle 1: Good Data Visualisation isTrustworthyThe notion of trust is uppermost in your thoughts in this first of the threeprinciples of good visualisation design. This maps directly onto one ofDieter Rams’ general principles of good design, namely that good designis honest.

Trust vs Truth

This principle is presented first because it is about the fundamentalintegrity, accuracy and legitimacy of any data visualisation you produce.This should always exist as your primary concern above all else. Thereshould be no compromise here. Without securing trust the entire purposeof doing the work is undermined.

59

There is an important distinction to make between trust and truth. Truth isan obligation. You should never create work you know to be misleading incontent, nor should you claim something presents the truth if it evidentlycannot be supported by what you are presenting. For most people, thedifference between a truth and an untruth should be beyond dispute. Forthose unable or unwilling to be truthful, or who are ignorant of how todifferentiate, it is probably worth putting this book away now: my tellingyou how this is a bad thing is not likely to change your perspective.

If the imperative for being truthful is clear, the potential for there beingmultiple different but legitimate versions of ‘truth’ within the same data-driven context muddies things. In data visualisation there is rarely asingular view of the truth. The glass that is half full is also half empty.Both views are truthful, but which to choose? Furthermore, there are manydecisions involved in your work whereby several valid options maypresent themselves. In these cases you are faced with choices withoutnecessarily having the benefit of theoretical influence to draw out the rightoption. You decide what is right. This creates inevitable biases – no matterhow seemingly tiny – that ripple through your work. Your eventualsolution is potentially comprised of many well-informed, well-intendedand legitimate choices – no doubt – but they will reflect a subjectiveperspective all the same. All projects represent the outcome of an entirelyunique pathway of thought.

You can mitigate the impact of these subjective choices you make, forexample, by minimising the amount of assumptions applied to the data youare working with or by judiciously consulting your audience to best ensuretheir requirements are met. However, pure objectivity is not possible invisualisation.

‘Every number we publish is wrong but it is the best number there is.’Andrew Dilnott, Chair of the UK Statistics Authority

Rather than view the unavoidability of these biases as an obstruction, thefocus should instead be on ensuring your chosen path is trustworthy. In theabsence of an objective truth, you need to be able to demonstrate that yourtruth is trustable.

Trust has to be earned but this is hard to secure and very easy to lose. Asthe translation of a Dutch proverb states, ‘trust arrives on foot and leaves

60

on horseback’. Trust is something you can build by eliminating any sensethat your version of the truth can be legitimately disputed. Yet, visualisersonly have so much control and influence in the securing of trust. Avisualisation can be truthful but not viewed as trustworthy. You may havedone something with the best of intent behind your decision making, but itmay ultimately fail to secure trust among your viewers for differentreasons. Conversely a visualisation can be trustworthy in the mind of theviewer but not truthful, appearing to merit trust yet utterly flawed in itsunderlying truth. Neither of these are satisfactory: the latter scenario is achoice we control, the former is a consequence we must strive toovercome.

‘Good design is honest. It does not make a product appear moreinnovative, powerful or valuable than it really is. It does not attempt tomanipulate the consumer with promises that cannot be kept.’ DieterRams, celebrated Industrial Designer

Let’s consider a couple of examples to illustrate this notion oftrustworthiness. Firstly, think about the trust you might attach respectivelyto the graphics presented in Figure 1.8 and Figure 1.9. For the benefit ofclarity both are extracted from articles discussing issues about homeownership, so each would be accompanied with additional written analysisat their published location. Both charts are portraying the same data andthe same analysis; they even arrive at the same summary finding. How dothe design choices make you feel about the integrity of each work?

Figure 1.8 Housing and Home Ownership in the UK (ONS)

61

Both portrayals are truthful but in my view the first visualisation, producedby the UK Office for National Statistics (ONS), commands greatercredibility and therefore far more trust than the second visualisation,produced by the Daily Mail. The primary reason for this begins with thecolour choices. They are relatively low key in the ONS graphic: colourfulbut subdued, yet conveying a certain assurance. In contrast, the DailyMail’s colour palette feels needy, like it is craving my attention withsweetly coloured sticks. I don’t care for the house key imagery in thebackground but it is relatively harmless. Additionally, the typeface, fontsize and text colour feel more gimmicky in the second graphic. Onceagain, it feels like it is wanting to shout at me in contrast to the more politenature of the ONS text. Whereas the Daily Mail piece refers to the ONS asthe source of the data, it fails to include further details about the datasource, which is included on the ONS graphic alongside other importantexplanatory features such as the subtitle, clarity about the yearly periodsand the option to access and download the associated data. The ONSgraphic effectively ‘shows all its workings’ and overall earns, from me atleast, significantly more trust.

62

Figure 1.9 Falling Number of Young Homeowners (Daily Mail)

Another example about the fragility of trust concerns the next graphic,which plots the number of murders committed using firearms in Floridaover a period of time. This frames the time around the enactment of the‘Stand your ground’ law in the Florida. The area chart in Figure 1.10shows the number of murders over time and, as you can see, the chart usesan inverted vertical y-axis with the red area going lower down as thenumber of deaths increases, with peak values at about 1990 and 2007.However, some commentators felt the inversion of the y-axis wasdeceptive and declared the graphic not trustworthy based on the fact theywere perceiving the values as represented by an apparent rising ‘whitemountain’. They mistakenly observed peak values around 1999 and 2005based on them seeing these as the highest points. This confusion is causedby an effect known as figure-ground perception whereby a backgroundform (white area) can become inadvertently recognised as the foregroundform, and vice versa (with the red area seen as the background).

Figure 1.10 Gun Deaths in Florida

63

Figure 1.11 Iraq’s Bloody Toll

64

The key point here is that there was no intention to mislead. Although theapproach to inverting the y-axis may not be entirely conventional, it wastechnically legitimate. Creatively speaking, the effect of dribbling bloodwas an understandably tempting metaphor to pursue. Indeed, the graphicattempts to emulate a notable infographic from several years ago showingthe death toll during the Iraq conflict (Figure 1.11). In the case of the

65

Florida graphic, on reflection maybe the data was just too ‘smooth’ toconvey the same dribbling effect achieved in the Iraq piece. However,being inspired and influenced by successful techniques demonstrated byothers is to be encouraged. It is one way of developing our skills.

Figure 1.12 Reworking of ‘Gun Deaths in Florida’

Unfortunately, given the emotive nature of the subject matter – gun deaths– this analysis would always attract a passionate reaction regardless of itsform. In this case the lack of trust expressed by some was an unintended

66

consequence of a single, innocent design: by reverting the y-axis to anupward direction, as shown in the reworked version in Figure 1.12, youcan see how a single subjective design choice can have a huge influenceon people’s perception.

The creator of the Florida chart will have made hundreds of perfectlysound visualisations and will make hundreds more, and none of them willever carry the intent of being anything other than truthful. However, youcan see how vulnerable perceived trust is when disputes about motives canso quickly surface as a result of the design choice made. This is especiallythe case within the pressured environment of a newsroom where you haveonly a single opportunity to publish a work to a huge and widespreadaudience. Contrast this setting with a graphic published within anorganisation that can be withdrawn and reissued far more easily.

Trust Applies Throughout the Process

Trustworthiness is a pursuit that should guide all your decisions, not justthe design ones. As you will see in the next chapter, the visualisationdesign workflow involves a process with many decision junctions – manypaths down which you could pursue different legitimate options.Obviously, design is the most visible result of your decision making, butyou need to create and demonstrate complete integrity in the choices madeacross the entire workflow process. Here is an overview of some of the keymatters where trust must be at the forefront of your concern.

‘My main goal is to represent information accurately and in propercontext. This spans from data reporting and number crunching todesigning human-centered, intuitive and clear visualizations. This is mysole approach, although it is always evolving.’ Kennedy Elliott,Graphics Editor, The Washington Post

Formulating your brief: As mentioned in the discussion about the‘Gun Crimes in Florida’ graphic, if you are working with potentiallyemotive subject matter, this will heighten the importance ofdemonstrating trust. Rightly or wrongly, your topic will be moreexposed to the baggage of prejudicial opinion and trust will beprecarious. As you will learn in Chapter 3, part of the thinkinginvolved in ‘formulating your brief’ concerns defining your audience,

67

considering your subject and establishing your early thoughts aboutthe purpose of your work, and what you are hoping to achieve. Therewill be certain contexts that lend themselves to exploiting the emotivequalities of your subject and/or data but many others that will not.Misjudge these contextual factors, especially the nature of youraudience’s needs, and you will jeopardise the trustworthiness of yoursolution. As I have shown, matters of trust are often outside of yourimmediate influence: cynicism, prejudice or suspicion held byviewers through their beliefs or opinions is a hard thing to combat oraccommodate. In general, people feel comfortable with visualisationsthat communicate data in a way that fits with their world view. Thatsaid, at times, many are open to having their beliefs challenged bydata and evidence presented through a visualisation. The platform andlocation in which your work is published (e.g. website or sourcelocation) will also influence trust. Visualisations encountered inalready-distrusted media will create obstacles that are hard toovercome.Working with data: As soon as you begin working with data youhave a great responsibility to be faithful to this raw material. To betransparent to your audience you need to consider sharing as muchrelevant information about how you have handled the data that isbeing presented to them:

How was it collected: from where and using what criteria?What calculations or modifications have you applied to it?Explain your approach.Have you made any significant assumptions or observed anyspecial counting rules that may not be common?Have you removed or excluded any data?How representative it is? What biases may exist that coulddistort interpretations?

Editorial thinking: Even with the purest of intent, your role as thecurator of your data and the creator of its portrayal introducessubjectivity. When you choose to do one thing you are often choosingto not do something else. The choice to focus on analysis that showshow values have changed over time is also a decision to not show thesame data from other viewpoints such as, for example, how it lookson a map. A decision to impose criteria on your analysis, like settingdate parameters or minimum value thresholds, in order to reduceclutter, might be sensible and indeed legitimate, but is still asubjective choice.

68

‘Data and data sets are not objective; they are creations of humandesign. Hidden biases in both the collection and analysis stages presentconsiderable risks [in terms of inference].’ Kate Crawford, PrincipalResearcher at Microsoft Research NYC

Data representation: A fundamental tenet of data visualisation is tonever deceive the receiver. Avoiding possible misunderstandings,inaccuracies, confusions and distortions is of primary concern. Thereare many possible features of visualisation design that can lead tovarying degrees of deception, whether intended or not. Here are a fewto list now, but note that these will be picked up in more detail later:

The size of geometric areas can sometimes be miscalculatedresulting in the quantitative values being disproportionatelyperceived.When data is represented in 3D, on the majority of occasions thisrepresents nothing more than distracting – and distorting –decoration. 3D should only be used when there are legitimatelythree dimensions of data variables being displayed and theviewer is able to change his or her point of view to navigate tosee different 2D perspectives.The bar chart value axis should never be ‘truncated’ – the originvalue should always be zero – otherwise this approach willdistort the bar size judgements.The aspect ratio (height vs width) of a line chart’s display isinfluential as it affects the perceived steepness of connectinglines which are key to reading the trends over time – too narrowand the steepness will be embellished; too wide and thesteepness is dampened.When portraying spatial analysis through a thematic maprepresentation, there are many different mapping projections tochoose from as the underlying apparatus for presenting andorienting the geographical position of the data. There are manydifferent approaches to flatten the spherical globe, translating itinto a two-dimensional map form. The mathematical treatmentapplied can alter significantly the perceived size or shape ofregions, potentially distorting their perception.Sometimes charts are used in a way that is effectively corrupt,like using pie charts for percentages that add up to more, or less,than 100%.

69

Data presentation: The main rule here is: if it looks significant, itshould be, otherwise you are either misleading or creatingunnecessary obstacles for your viewer. The undermining of trust canalso be caused by what you decline to explain: restricted or non-functioning features of interactivity.

Absent annotations such as introduction/guides, axis titles andlabels, footnotes, data sources that fail to inform the reader ofwhat is going on.Inconsistent or inappropriate colour usage, without explanation.Confusing or inaccessible layouts.Thoroughness in delivering trust extends to the faith you createthrough reliability and consistency in the functional experience,especially for interactive projects. Does the solution work and,specifically, does it work in the way it promises to do?

Principle 2: Good Data Visualisation is AccessibleThis second of the three principles of good visualisation design helps toinform judgments about how best to facilitate your viewers through theprocess of understanding. It is informed by three of Dieter Rams’ generalprinciples of good design:

2 Good design makes a product useful.4 Good design makes a product understandable.5 Good design is unobtrusive.

Reward vs Effort

The opening section of this chapter broke down the stages a viewer goesthrough when forming their understanding about, and from, a visualisation.This process involved a sequence of perceiving, interpreting and thencomprehending. It was emphasised that a visualiser’s control over theviewer’s pursuit of understanding diminishes after each stage. Theobjective, as stated by the presented definition, of ‘facilitating’understanding reflects the reality of what can be controlled. You can’tforce viewers to understand, but you can smooth the way.

To facilitate understanding for an audience is about deliveringaccessibility. That is the essence of this principle: to remove design-related

70

obstacles faced by your viewers when undertaking this process ofunderstanding. Stated another way, a viewer should experience minimumfriction between the act of understanding (effort) and the achieving ofunderstanding (reward).

This ‘minimising’ of friction has to be framed by context, though. This iskey. There are many contextual influences that will determine whetherwhat is judged inaccessible in one situation could be seen as entirelyaccessible in another. When people are involved, diverse needs exist. As Ihave already discussed, varying degrees of knowledge emerge andirrational characteristics come to the surface. You can only do so much: donot expect to get all things right in the eyes of every viewer.

‘We should pay as much attention to understanding the project’s goal inrelation to its audience. This involves understanding principles ofperception and cognition in addition to other relevant factors, such asculture and education levels, for example. More importantly, it meanscarefully matching the tasks in the representation to our audience’sneeds, expectations, expertise, etc. Visualizations are human-centredprojects, in that they are not universal and will not be effective for allhumans uniformly. As producers of visualizations, whether devised fordata exploration or communication of information, we need to take intocareful consideration those on the other side of the equation, and whowill face the challenges of decoding our representations.’ IsabelMeirelles, Professor, OCAD University (Toronto)

That is not to say that attempts to accommodate the needs of your audienceshould just be abandoned, quite the opposite. This is hard but it isessential. Visualisation is about human-centred design, demonstratingempathy for your audiences and putting them at the heart of your decisionmaking.

There are several dimensions of definition that will help you betterunderstand your audiences, including establishing what they know, whatthey do not know, the circumstances surrounding their consumption ofyour work and their personal characteristics. Some of these you canaccommodate, others you may not be able to, depending on the diversityand practicality of the requirements. Again, in the absence of perfectionoptimisation is the name of the game, even if this means that sometimesthe least worst is best.

71

The Factors Your Audiences Influence

Many of the factors presented here will occur when you think about yourproject context, as covered in Chapter 3. For now, it is helpful to introducesome of the factors that specifically relate to this discussion aboutdelivering accessible design.

Subject-matter appeal: This was already made clear in the earlierillustration, but is worth logging again here: the appeal of the subjectmatter is a fundamental junction right at the beginning of theconsumption experience. If your audiences are not interested in thesubject – i.e. they are indifferent towards the topic or see no need orrelevance to engage with it there and then – then they will not likelystick around. They will probably not be interested in putting in theeffort to work through the process of understanding for somethingthat might be ultimately irrelevant. For those to whom the subjectmatter is immediately appealing, they are significantly more likely toengage with the data visualisation right the way through.

‘Data visualization is like family photos. If you don’t know the peoplein the picture, the beauty of the composition won’t keep your attention.’Zach Gemignani, CEO/Founder of Juice Analytics

Many of the ideas for this principle emerged from the Seeing Datavisualisation literacy research project (seeingdata.org) on which Icollaborated.

Dynamic of need: Do they need to engage with this work or is itentirely voluntary? Do they have a direct investment in having accessto this information, perhaps as part of their job and they need thisinformation to serve their duties?Subject-matter knowledge: What might your audiences know andnot know about this subject? What is their capacity to learn orpotential motivation to develop their knowledge of this subject? Acritical component of this issue, blending existing knowledge with thecapacity to acquire knowledge, concerns the distinctions betweencomplicated, complex, simple and simplified. This might seem to bemore about the semantics of language but is of significant influencein data visualisation – indeed in any form of communication:

72

Complicated is generally a technical distinction. A subject mightbe difficult to understand because it involves pre-existing – andprobably high-level – knowledge and might be intricate in itsdetail. The mathematics that underpinned the Moon landings arecomplicated. Complicated subjects are, of course, surmountable– the knowledge and skill are acquirable – but only achievedthrough time and effort, hard work and learning (orextraordinary talent), and, usually, with external assistance.Complex is associated with problems that have no perfectconclusion or maybe even no end state. Parenting is complex;there is no rulebook for how to do it well, no definitive right orwrong, no perfect way of accomplishing it. The elements ofparenting might not be necessarily complicated – cuttingEmmie’s sandwiches into star shapes – but there are lots ofdifferent interrelated pressures always influencing andoccasionally colliding.Simple, for the purpose of this book, concerns a matter that isinherently easy to understand. It may be so small in dimensionand scope that it is not difficult to grasp, irrespective of priorknowledge and experience.Simplified involves transforming a problem context from either acomplex or complicated initial state to a reduced form, possiblyby eliminating certain details or nuances.

Understanding the differences in these terms is vital. When consideringyour subject matter and the nature of your analysis you will need to assesswhether your audience will be immediately able to understand what youare presenting or have the capacity to learn how to understand it. If it is asubject that is inherently complex or complicated, will it need to besimplified? If you are creating a graphic about taxation, will you need tostrip it down to the basics or will this process of simplification risk thesubject being oversimplified? The final content may be obscured by theabsence of important subtleties. Indeed, the audience may have feltsufficiently sophisticated to have had the capacity to work out and workwith a complicated topic, but you denied them that opportunity. You mightreasonably dilute/reduce a complex subject for kids, but generally myadvice is don’t underestimate the capacity of your audience. Accordingly,clarity trumps simplicity as the most salient concern about datavisualisation design.

73

‘Strive for clarity, not simplicity. It’s easy to “dumb something down,”but extremely difficult to provide clarity while maintaining complexity.I hate the word “simplify.” In many ways, as a researcher, it is the baneof my existence. I much prefer “explain,” “clarify,” or “synthesize.” Ifyou take the complexity out of a topic, you degrade its existence andmalign its importance. Words are not your enemy. Complex thoughtsare not your enemy. Confusion is. Don’t confuse your audience. Don’ttalk down to them, don’t mislead them, and certainly don’t lie to them.’Amanda Hobbs, Researcher and Visual Content Editor

What do they need to know? The million-dollar question. Often, themost common frustration expressed by viewers is that thevisualisation ‘didn’t show them what they were most interested in’.They wanted to see how something changed over time, not how itlooked on a map. If you were them what would you want to know?This is a hard thing to second-guess with any accuracy. We will bediscussing it further in Chapter 5.Unfamiliar representation: In the final chapter of this book I willcover the issue of visualisation literacy, discussing the capabilitiesthat go into being the most rounded creator of visualisation work andthe techniques involved in being the most effective consumer also.Many people will perhaps be unaware of a deficit in theirvisualisation literacy with regard to consuming certain chart types.The bar, line and pie chart are very common and broadly familiar toall. As you will see in Chapter 6, there are many more ways ofportraying data visually. This deficit in knowing how to read a new orunfamiliar chart type is not a failing on the part of the viewer, it issimply a result of their lack of prior exposure to these differentmethods. For visualisers a key challenge lies with situations when thedeployment of an uncommon chart may be an entirely reasonable andappropriate choice – indeed perhaps even the ‘simplest’ chart thatcould have been used – but it is likely to be unfamiliar to the intendedviewers. Even if you support it with plenty of ‘how to read’ guidance,if a viewer is overwhelmed or simply unwilling to make the effort tolearn how to read a different chart type, you have little control inovercoming this.Time: At the point of consuming a visualisation is the viewer in apressured situation with a lot at stake? Are viewers likely to beimpatient and intolerant of the need to spend time learning how toread a display? Do they need quick insights or is there some capacityfor them to take on exploring or reading in more depth? If it is the

74

former, the immediacy of the presented information will therefore bea paramount requirement. If they have more time to work through theprocess of perceiving, interpreting and comprehending, this could bea more conducive situation to presenting complicated or complexsubject matter – maybe even using different, unfamiliar chart types.Format: What format will your viewers need to consume your work?Are they going to need work created for a print output or a digitalone? Does this need to be compatible with a small display as on asmartphone or a tablet? If what you create is consumed away from itsintended native format, such as viewing a large infographic withsmall text on a mobile phone, that will likely result in a frustratingexperience for the viewer. However, how and where your work isconsumed may be beyond your control. You can’t mitigate for everyeventuality.Personal tastes: Individual preferences towards certain colours,visual elements and interaction features will often influence (enablingor inhibiting) a viewer’s engagement. The semiotic conventions thatvisualisers draw upon play a part in determining whether viewers arewilling to spend time and expend effort looking at a visualisation. Beaware though that accommodating the preferences of one person maynot cascade, with similar appeal, to all, and might indeed create arather negative reaction.Attitude and emotion: Sometimes we are tired, in a bad mood,feeling lazy, or having a day when we are just irrational. And theprospect of working on even the most intriguing and well-designedproject sometimes feels too much. I spend my days looking atvisualisations and can sympathise with the narrowing of mentalbandwidth when I am tired or have had a bad day. Confidence is anextension of this. Sometimes our audiences may just not feelsufficiently equipped to embark on a visualisation if it is about anunknown subject or might involve pushing them outside their comfortzone in terms of the demands placed on their interpretation andcomprehension.

The Factors You Can Influence

Flipping the coin, let’s look at the main ways we, as visualisers, caninfluence (positively or negatively) the accessibility of the designs created.In effect, this entire book is focused on minimising the likelihood that yoursolution demonstrates any of these negative attributes. Repeating the

75

mantra from earlier, you must avoid doing anything that will cause theboat to go slower.

‘The key difference I think in producing data visualisation/infographicsin the service of journalism versus other contexts (like art) is that thereis always an underlying, ultimate goal: to be useful. Not just beautiful orefficient – although something can (and should!) be all of those things.But journalism presents a certain set of constraints. A journalist has toalways ask the question: How can I make this more useful? How canwhat I am creating help someone, teach someone, show someonesomething new?’ Lena Groeger, Science Journalist, Designer andDeveloper at ProPublica

As you saw listed at the start of this section, the selected, related designprinciples from Dieter Rams’ list collectively include the aim of ensuringour work is useful, unobtrusive and understandable. Thinking about whatnot to do – focusing on the likely causes of failure across these aims – is,in this case, more instructive.

Your Solution is Useless

You have failed to focus on relevant content.It is not deep enough. You might have provided a summary-level/aggregated view of the data when the audience wanted furtherangles of analysis and greater depth in the details provided.A complex subject was oversimplified.It is not fit for the setting. You created work that required too muchtime to make sense of, when immediate understanding and rapidinsights were needed.

Your Solution is Obtrusive

It is visually inaccessible. There is no appreciation of potentialimpairments like colour blindness and the display includes clumsilyineffective interactive features.Its format is misjudged. You were supposed to create work fit for asmall-sized screen, but the solution created was too fine-detailed andcould not be easily read.It has too many functions. You failed to focus and instead providedtoo many interactive options when the audience had no desire to put

76

in a lot of effort interrogating and manipulating the display.

You Solution is not Understandable

Complex subject or complex analysis. Not explained clearly enough –assumed domain expertise, such as too many acronyms, abbreviationsand technical language.Used a complex chart type. Not enough explanation of how to readthe graphic or failure to consider if the audience would be capable ofunderstanding this particular choice of chart type.Absent annotations. Insufficient details like scales, units,descriptions, etc.

Principle 3: Good Data Visualisation is ElegantElegance in design is the final principle of good visualisation design. Thisrelates closely to the essence of three more of Dieter Rams’ generalprinciples of good design:

3 Good design is aesthetic.8 Good design is thorough down to the last detail.10 Good design is as little design as possible.

What is Elegant Design?

Elegant design is about seeking to achieve a visual quality that will attractyour audience and sustain that sentiment throughout the experience, farbeyond just the initial moments of engagement. This is presented as thethird principle for good reason. Any choices you make towards achieving‘elegance’ must not undermine the accomplishment of trustworthiness andaccessibility in your design. Indeed, in pursuing the achievement of theother principles, elegance may have already arrived as a by-product oftrustworthy and accessible design thinking. Conversely, the visual ‘lookand feel’ of your work will be the first thing viewers encounter beforeexperiencing the consequences of your other principle-led thinking. Ittherefore stands that optimising the perceived appeal of your work willhave a great impact on your viewers.

The pursuit of elegance is elusive, as is its definition: what gives

77

something an elegant quality? As we know, beauty is in the eye of thebeholder, but how do we really recognise elegance when we areconfronted by it?

When thinking about what the pursuit of elegance of means, the kind ofwords that surface in my mind are adjectives like stylish, dignified,effortless and graceful. For me, they capture the timelessness of elegance,certainly more so than fancy, cool or trendy, which seem more momentary.Elegance is perhaps appreciated more when it is absent from or notentirely accomplished in a design. If something feels cumbersome,inconsistent and lacking a sense of harmony across its composition and useof colour, it is missing that key ingredient of elegance.

‘When working on a problem, I never think about beauty. I think onlyhow to solve the problem. But when I have finished, if the solution isnot beautiful, I know it is wrong.’ Richard Buckminster Fuller,celebrated inventor and visionary

‘Complete is when something looks seamless, as if it took little effort toproduce.’ Sarah Slobin, Visual Journalist

When it feels like style over substance has been at the heart of decision-making, no apparent beauty can outweigh the negatives of an obstructed orabsent functional experience. While I’m loathe to dwell on forcing aseparation in concern between form and function, as a beginner workingthrough the design stages and considering all your options, functionaljudgements will generally need to be of primary concern. However, it isimperative that you also find room for appropriate aesthetic expression. Indue course your experience will lead you to fuse the two perspectivestogether more instinctively.

In his book The Shape of Design, designer Frank Chimero references aShaker proverb: ‘Do not make something unless it is both necessary anduseful; but if it is both, do not hesitate to make it beautiful.’ In serving theprinciples of trustworthy and accessible design, you will have hopefullycovered both the necessary and useful. As Chimero suggests, if we haveserved the mind, our heart is telling us that now is the time to think aboutbeauty.

78

How Do You Achieve Elegance in Design?

There are several components of design thinking that I believe directlycontribute to achieving an essence of elegance.

‘“Everything must have a reason”… A principle that I learned as agraphic designer that still applies to data visualisation. In essence,everything needs to be rationalised and have a logic to why it’s in thedesign/visualisation, or it’s out.’ Stefanie Posavec, InformationDesigner

Eliminate the arbitrary: As with any creative endeavour orcommunication activity, editing is perhaps the most influential skill,and indeed attitude. Every single design decision you make – everydot, every pixel – should be justifiable. Nothing that remains in yourwork should be considered arbitrary. Even if there isn’t necessarily ascientific or theoretical basis for your choices, you should still be ableto offer reasons for every thing that is included and also excluded.The reasons you can offer for design options being rejected orremoved are just as important in evidence of your developing eye forvisualisation design.Often you will find yourself working alone on a data visualisationproject and will therefore need to demonstrate the discipline andcompetence to challenge yourself. Avoid going through the motionsand don’t get complacent. Why present data on a map if there isnothing spatially relevant about the regional patterns? Why includeslick interactive features if they really add no value to the experience?It is easy to celebrate the brilliance of your amazing ideas and becomeconsumed by work that you have invested deeply in – both your timeand emotional energy. Just don’t be stubborn or precious. Ifsomething is not working, learn to recognise when to not pursue itany further and then kill it.Thoroughness: A dedicated visualiser should be prepared to agoniseover the smallest details and want to resolve even the smallest pixel-width inaccuracies. The desire to treat your work with this level ofattention demonstrates respect for your audience: you want them to beable to work with quality so pride yourself on precision. Do notneglect checking, do not cut corners, do not avoid the non-sexyduties, and never stop wanting to do better.

79

Style: This is another hard thing to pin down, especially as the worditself can have different meanings for people, and especially when ithas been somewhat ‘damaged’ by the age-old complaints aroundsomething demonstrating style over substance. Developing a style –or signature, as Thomas Clever suggests – is in many ways amanifestation of elegant design. The decisions around colourselection, typography and composition are all matters that influenceyour style. The development of a style preserves the consistency ofyour strongest design values, leaving room to respond flexibly to thenuances of each different task you face. It is something that developsin time through the choices you make and the good habits youacquire.

‘You don’t get there [beauty] with cosmetics, you get there by takingcare of the details, by polishing and refining what you have. This isultimately a matter of trained taste, or what German speakers callfingerspitzengefühl (“finger-tip-feeling”)’. Oliver Reichenstein,founder of Information Architects (iA)

Many news and media organisations seek to devise their own styleguides to help visualisers, graphics editors and developers navigatethrough the choppy waters of design thinking. This is a consciousattempt to foster consistency in approach as well as create efficiency.In these industries, the perpetual pressure of tight timescales from therelentless demands of the news cycle means that creating efficiency isof enormous value. By taking away the burden of having always tothink from scratch about their choices, the visualisers in suchorganisations are left with more room to concern themselves with thefundamental challenge of what to show and not just get consumed byhow to show it. The best styles will stand out as instantlyrecognisable: there is a reason why you can instantly pick out thework of the New York Times, National Geographic, Bloomberg, theGuardian, the Washington Post, the Financial Times, Reuters and theSouth China Morning Post.Decoration should be additive, not negative: The decorative artsare historically considered to be an intersection of that which is usefuland beauty, yet the term decoration when applied to data can oftensuggest a negative connotation of dressing it up using superfluousdevices to attract people, but without any real substance. Visualembellishments are, in moderation and when discernibly deployed,

80

effective devices for securing visual appeal and preservingcommunicated value. This is especially the case when they carry acertain congruence with the subject matter or key message, such aswith the use of the different ground textures in the treemap displayedin Figure 1.13. In this graphic, Vienna is reduced to an illustrative100m2 apartment and the floor plan presents the proportionalcomposition of the different types of space and land in the city. Thisis acceptable gratuitousness because the design choices are additive,not negatively obstructive or distracting.

‘I suppose one could say our work has a certain “signature”. “Style” – tome – has a negative connotation of “slapped on” to prettify somethingwithout much meaning. We don’t make it our goal to have arecognisable (visual) signature, instead to create work that truly mattersand is unique. Pretty much all our projects are bespoke and have adifferent end result. That is one of the reasons why we are moreconcerned with working according to values and principles thattranscend individual projects and I believe that is what makes our workrecognisable.’ Thomas Clever, Co-founder CLEVER°FRANKE, adata driven experiences studio

Figure 1.13 If Vienna Would be an Apartment

Any design choices you make with the aim of enhancing appeal through

81

novelty or fun need to support, not distract from, the core aim offacilitating understanding. Be led by your data and your audience, not yourideas. There should, though, always be room to explore ways of seekingthat elusive blend of being fun, engaging and informative. The bar chart inFigure 1.14 reflects this: using Kit Kat-style fingers of chocolate for eachbar and a foil wrapper background, it offers an elegant and appealingpresentation that is congruent with its subject.

Figure 1.14 Asia Loses Its Sweet Tooth for Chocolate

82

Allow your personality to express itself in the times and places where suchflair is supportive of the aims of facilitating understanding. After all, a

83

singularity of style is a dull existence. As Groove Armada once sang: ‘Ifeverybody looked the same, we’d get tired of looking at each other.’

Not about minimalism: As expressed by Rams’ principle ‘Gooddesign is as little design as possible’, elegant design achieves a certaininvisibility: as a viewer you should not see design, you should seecontent. This is not to be confused with the pursuit of minimalism,which is a brutal approach that strips away the arbitrary but then cutsdeeper. In the context of visualisation, minimalism can be anunnecessarily savage and austere act that may be incongruous withsome of the design options you may need to include in your work.

‘I’ve come to believe that pure beautiful visual works are somehowrelevant in everyday life, because they can become a trigger to getpeople curious to explore the contents these visuals convey. I like theidea of making people say “oh that’s beautiful! I want to know what thisis about!” I think that probably (or, at least, lots of people pointed thatout to us) being Italians plays its role on this idea of “making things notonly functional but beautiful”.’ Giorgia Lupi, Co-founder and DesignDirector at Accurat

In ‘De architectura’, a thesis on architecture written around 15 BC byMarcus Vitruvius Pollio, a Roman architect, the author declares how theessence of quality in architecture is framed by the social relevance ofthe work, not the eventual form or workmanship towards that form.What he is stating here is that good architecture can only be measuredaccording to the value it brings to the people who use it. In a 1624translation of the work, Sir Henry Wooton offers a paraphrased versionof one of Vitruvius’s most enduring notions that a ‘well building haththree conditions: firmness, commodity, and delight’, of which a furtherinterpretation for today might be read as ‘sturdy, useful, and beautiful’.One can easily translate these further to fit with these principles of goodvisualisation design. Trustworthy is sturdy – it is robust, reliable, andhas integrity. Useful is accessible – it can be used without undueobstruction. Beautiful is elegant – it appeals and retains attraction.

1.3 Distinctions and GlossaryAs in any text, consistency in the meaning of terms or language used

84

around data visualisation is important to preserve clarity for readers. Ibegan this chapter with a detailed breakdown of a proposed definition forthe subject. There are likely to be many other terms that you either arefamiliar with or have heard being used. Indeed, there are significantoverlaps and commonalities of thought between data visualisation andpursuits like, for example, infographic design.

As tools and creative techniques have advanced over the past decade, thetraditional boundaries between such fields begin to blur. Consequently, thepractical value of preserving dogmatic distinctions reduces accordingly.Ultimately, the visualiser tasked with creating a visual portrayal of data isprobably less concerned about whether their creation will be filed under‘data visualisation’ or ‘infographic’ as long as it achieves the aim ofhelping the audience achieve understanding.

Better people than me attach different labels to different worksinterchangeably, perhaps reflecting the fact that these dynamic groups ofactivities are all pursuing similar aims and using the same raw material –data – to achieve them. Across this book you will see plenty of referencesto and examples of works that might not be considered data visualisationdesign work in the purest sense. You will certainly see plenty of examplesof infographics.

The traditional subject distinctions still deserve to be recognised andrespected. People are rightfully proud of identifying with a discipline theyhave expertise or mastery in. And so, before you step into the designworkflow chapters, it is worthwhile to spend a little time establishingclarifications and definitions for some of the related fields and activities soall readers are on the same page of understanding. Additionally, there is aglossary of the terms used that will help you more immediately understandthe content of later chapters. It makes sense to position those clarificationsin this chapter as well.

DistinctionsData vis: Just to start with one clarification. While the abbreviatedterm of data visualisation might be commonly seen as ‘data vis’ (or‘data viz’; don’t get me started on the ‘z’ issue), and this is probablyhow all the cool kids on the street and those running out of characterson Twitter refer to it, I am sticking with the full Sunday name of ‘data

85

visualisation’ or at the very least the shortened term ‘visualisation’.Information visualisation: There are many who describe datavisualisation as information visualisation and vice versa, myselfincluded, without a great deal of thought for the possible differences.The general distinction, if there is any, tends to be down to one’semphasis on the input material (data) or the nature of the output form(information). It is also common that information visualisation is usedas the term to define work that is primarily concerned with visualisingabstract data structures such as trees or graphs (networks) as well asother qualitative data (therefore focusing more on relationships ratherthan quantities).Infographics: The classic distinction between infographics and datavisualisation concerns the format and the content. Infographics weretraditionally created for print consumption, in newspapers ormagazines, for example. The best infographics explain thingsgraphically –systems, events, stories – and could reasonably betermed explanation graphics. They contain charts (visualisationelements) but may also include illustrations, photo-imagery, diagramsand text. These days, the art of infographic design continues to beproduced in static form, irrespective of how and where they arepublished.Over the past few years there has been an explosion in different formsof infographics. From a purist perspective, this new wave of work isgenerally viewed as being an inferior form of infographic design andmay be better suited to terms like info-posters or tower graphics(these commonly exist with a fixed-width dimension in order to beembedded into websites and social media platforms). Often theseworks will be driven by marketing intent through a desire to gethits/viewers, generally with the compromising of any real valuabledelivery of understanding. It is important not to dismiss entirely theevident – if superficial – value of this type of work, as demonstratedby the occasionally incredible numbers for hits received. If yourmotive is ‘bums on seats’ then this approach will serve you well.However, I would question the legitimacy of attaching the terminfographic to these designs and I sense the popular interest in theseforms is beginning to wane.Visual analytics: Some people use this term to relate to analytical-style visualisation work, such as dashboards, that serve the role ofoperational decision support systems or provide instruments ofbusiness intelligence. Additionally, the term visual analytics is often

86

used to describe the analytical reasoning and exploration of datafacilitated by interactive tools. This aligns with the pursuit ofexploratory data analysis that I will be touching on in Chapter 5.Data art: Aside from the disputes over the merits of certaininfographic work, data art is arguably the other discipline related tovisualisation that stirs up the most debate. Those creating data art areoften pursuing a different motive to pure data visualisation, but itssheer existence still manages to wind up many who perhaps reside inthe more ‘purest’ visualisation camps. For data artists the rawmaterial is still data but their goal is not driven by facilitating the kindof understanding that a data visualisation would offer. Data art ismore about pursuing a form of self-expression or aesthetic exhibitionusing data as the paint and algorithms as the brush. As a viewer,whether you find meaning in displays of data art is entirely down toyour personal experience and receptiveness to the open interpretationit invites.Information design: Information design is a design practiceconcerned with the presentation of information. It is often associatedwith the activities of data visualisation, as it shares the underlyingmotive of facilitating understanding. However, in my view,information design has a much broader application concerned withthe design of many different forms of visual communication, such asway-finding devices like hospital building maps or in the design ofutility bills.Data science: As a field, data science is hard to define, so it is easierto consider this through the ingredients of the role of data scientists.They possess a broad repertoire of capabilities covering the gathering,handling and analysing of data. Typically this data is of a large sizeand complexity and originates from multiple sources. Data scientistswill have strong mathematical, statistical and computer science skills,not to mention astute business experience and many notable ‘softer’skills like problem solving, communication and presentation. If youfind somebody with all these skills, tie them to a desk (legally) andnever ever let them leave your organisation.Data journalism: Also known as data-driven journalism (DDJ), thisconcerns the increasingly recognised importance of having numerical,data and computer skills in the journalism field. In a sense it is anadaption of data visualisation but with unquestionably deeper roots inthe responsibilities of the reporter/journalist.Scientific visualisation: This is another form of a term used by many

87

people for different applications. Some give exploratory data analysisthe label scientific visualisation (drawing out the scientific methodsfor analysing and reasoning about data). Others relate it to the use ofvisualisation for conceiving highly complex and multivariate datasetsspecifically concerning matters with a scientific bent (such as themodelling functions of the brain or molecular structures).

GlossaryThe precision and consistency of language in this field can get caught upin a little too much semantic debate at times, but it is important to establishearly on some clarity about its usage and intent in this book at least.

Roles and Terminology

Project: For the purpose of this book, you should consider any datavisualisation creation activity to be consistent with the idea of aproject. Even if what you are working on is only seen as the smallestof visualisation tasks that hardly even registers on the bullet points ofa to-do list, you should consider it a project that requires the samerigorous workflow process approach.Visualiser: This is the role I am assigning to you – the person makingthe visualisation. It could be more realistic to use a term likeresearcher, analyst, creator, practitioner, developer, storyteller or, tobe a little pretentious, visualist. Designer would be particularlyappropriate but I want to broaden the scope of the role beyond just thedesign thinking to cover all aspects of this discipline.Viewer: This is the role assigned to the recipient, the person who isviewing and/or using your visualisation product. It offers a broaderand better fit than alternatives such as consumer, reader, recipient orcustomer.Audience: This concerns the collective group of people to whom youare intending to serve your work. Within the audience there will becohorts of different viewer types that you might characterise throughdistinct personas to help your thinking about serving the needs oftarget viewers.Consuming: This will be the general act of the viewer, to consume. Iwill use more active descriptions like ‘reading’ and ‘using’ whenconsuming becomes too passive and vague, and when distinctions are

88

needed between reading text and using interactive features.Creating: This will be the act of the visualiser, to create. This termwill be mainly used in contrast with consuming to separate the focusbetween the act of the visualiser and the act of the viewer.

Data Terminology

Data is: I’m sorry ‘data are’ fans, but that’s just not how normalpeople speak. In this book, it’s going to be ‘data is’ all the way.Unless my editor disagrees, in which case you won’t even see thispassage.Raw data: Also known as primary data, this is data that has not beensubjected to statistical treatment or any other transformation toprepare it for usage. Some people have a problem with the implied‘rawness’ this term claims, given that data will have already lost itspurity having been recorded by some measurement instrument,stored, retrieved and maybe cleaned already. I understand this view,but am going to use the term regardless because I think most peoplewill understand its intent.Dataset: A dataset is a collection of data values upon which avisualisation is based. It is useful to think of a dataset as taking theform of a table with rows and columns, usually existing in aspreadsheet or database.Tabulation: A table of data is based on rows and columns. The rowsare the records – instances of things – and the columns are thevariables – details about the things. Datasets are visualised in order to‘see’ the size, patterns and relationships that are otherwise hard toobserve. For the purpose of this book, I distinguish between types ofdatasets that are ‘normalised’ and others that are ‘cross-tabulated’.This distinction will be explained in context during Chapter 5.Variables: Variables are related items of data held in a dataset thatdescribe a characteristic of those records. It might be the names, datesof birth, genders and salaries of a department of employees. Think ofvariables as the different columns of values in a table, with thevariable name being the descriptive label on the header row. Thereare different types of variables including, at a general level,quantitative (e.g. salary) and categorical (e.g. gender). A chart plotsthe relationship between different variables. For example, a bar chartmight show the number of staff (with the size of bar showing thequantity) across different departments (one bar for each department or

89

category).Series: A series of values is essentially a row (or column, dependingon table layout) of related values in a table. An example of a series ofvalues would be all the highest temperatures in a city for each monthof the year. Plotting this on a chart, like a line chart, would produce aline for that city’s values across the year. Another line could be addedto compare temperatures for another city thus presenting a furtherseries of values.Data source: This is the term used to describe the origin of data orinformation used to construct the analysis presented. This is animportant feature of annotation that can help gain trust from viewersby showing them all they need to know about the source of the data.Big Data: Big Data is characterised by the 3Vs – high volume(millions of rows of data), high variety (hundreds of differentvariables/columns) and high velocity (new data that is created rapidlyand frequently, every millisecond). A database of bank transactions oran extract from a social media platform would be typical of Big Data.It is necessary to take out some of the hot air spouted about Big Datain its relationship with data visualisation. The ‘Bigness’ (one alwaysfeels obliged to include a capitalised B) of data does notfundamentally change the tasks one faces when creating a datavisualisation, it just makes it a more significant prospect to workthrough. It broadens the range of possibilities, it requires stronger andmore advanced technology resources, and it amplifies the pressureson time and resources. With more options the discipline of choicebecomes of even greater significance.

Visualisation

Chart type: Charts are individual, visual representations of data.There are many ways of representing your data, using differentcombinations of marks, attributes, layouts and apparatus: thesecombinations form archetypes of charts, commonly reduced to simplychart types. There are some charts you might already be familiarwith, such as the bar chart, pie chart or line chart, while others may benew to you, like the Sankey diagram, treemap or choropleth map.Graphs, charts, plots, diagrams and maps: Traditionally the termgraph has been used to describe visualisations that display networkrelationships and chart would be commonly used to label commondevices like the bar or pie chart. Plots and diagrams are more

90

specifically attached to special types of displays but with no patternof consistency in their usage. All these terms are so interchangeablethat useful distinction no longer exists and any energy expended inchampioning meaningful difference is wasted. For the purpose of thisbook, I will generally stick to the term chart to act as the single labelto cover all visualisation forms. In some cases, this umbrella labelwill incorporate maps for the sake of convenience even though theyclearly have a unique visual structure that is quite different from mostcharts. By the way, the noise you just heard is every cartographerreading this book angrily closing it shut in outrage at the sheeraudacity of my lumping maps and charts together.Graphic: The term graphic will be more apt when referring tovisuals focused more on information-led explanation diagrams(infographics), whereas chart will be more concerned with data-driven visuals.Storytelling: The term storytelling is often attached to variousactivities around data visualisation and is a contemporary buzzwordoften spread rather thinly in the relevance of its usage. It is a thing butnot nearly as much a thing as some would have you believe. I will bedampening some of the noise that accompanies this term in the nextchapter.Format: This concerns the difference in output form between printedwork, digital work and physical visualisation work.Function: This concerns the difference in functionality of avisualisation, whether it is static or interactive. Interactivevisualisations allow you to manipulate and interrogate a computer-based display of data. The vast majority of interactive visualisationsare found on websites but increasingly might also exist within appson tablets and smartphones. In contrast, a static visualisation displaysa single-view, non-interactive display of data, often presented in printbut also digitally.Axes: Many common chart types (such as the bar chart and line chart)have axis lines that provide reference for measuring quantitativevalues or assigning positions to categorical values. The horizontalaxis is known as the x-axis and the vertical axis is known as the y-axis.Scale: Scales are marks on axes that describe the range of valuesincluded in a chart. Scales are presented as intervals (10, 20, 30, etc.)representing units of measurement, such as prices, distances, years orpercentages, or in keys that explain the associations between, for

91

example, different sizes of areas or classifications of different colourattributes.Legend: All charts employ different visual attributes, such as colours,shapes or sizes, to represent values of data. Sometimes, a legend isrequired to house the ‘key’ that explains what the different scales orclassifications mean.Outliers: Outliers are points of data that are outside the normal rangeof values. They are the unusually large or small or simply differentvalues that stand out and generally draw attention from a viewer –either through amazement at their potential meaning or suspicionabout their accuracy.Correlation: This is a measure of the presence and extent of a mutualrelationship between two or more variables of data. You wouldexpect to see a correlation between height and weight or age andsalary. Devices like scatter plots, in particular, help visually to portraypossible correlations between two quantitative values.

Summary: Defining Data VisualisationIn this chapter you have learned a definition of data visualisation: ‘Therepresentation and presentation of data to facilitate understanding.’ Theprocess of understanding a data visualisation involves three stages,namely:

Perceiving: what can I see?Interpreting: what does it mean?Comprehending: what does it mean to me?

You were also introduced to the three principles of good visualisationdesign:

Good data visualisation is trustworthy.Good data visualisation is accessible.Good data visualisation is elegant.

Finally, you were presented with an array of descriptions and explanationsabout some of the key terms and language used in this field and throughoutthe book.

92

93

2 Visualisation Workflow

Clear, effective and efficient thinking is the critical difference between avisualisation that succeeds and one that fails. You cannot expect just toland accidentally on a great solution. You have got to work for it.

In this chapter I will outline the data visualisation workflow that forms thebasis of this book’s structure and content. This workflow offers a creativeand analytical process that will guide you from an initial trigger thatinstigates the need for a visualisation through to developing your finalsolution.

You will learn about the importance of process thinking, breaking downthe components of a visualisation design challenge into sequenced,manageable chunks. This chapter will also recommend some practical tipsand good habits to ensure the workflow is most effectively adopted.

2.1 The Importance of ProcessAs I have already established, the emphasis of this book is on betterdecision making. There are so many different things to think about whencreating a data visualisation, regardless of whether the output will be thesimplest of charts or the most ambitious of multi-faceted digitalimplementations.

The decisions you will face will inevitably vary in the weight of theirsignificance. There will be some big choices – matters like defining youreditorial angles and selecting the best fit chart type – and many seeminglysmall ones – such as picking the precise shade of grey for a chart’s axislabels. The process of creating a visualisation generally follows the Paretoprinciple, whereby 20% of decisions made have implications for about80% of the final visible design. However, just because some decisions willappear more significant in the final output, as visualisers we need to attendto every single decision equally, caring about detail, precision andaccuracy.

To repeat, one of the main mental barriers to overcome for those new to

94

the field is to acknowledge that the pursuit of perfect in data visualisationis always unfulfilled. There are better and there are worse solutions, butthere is no perfect. Perfect exists in an artificial vacuum. It is free ofpressures, has no constraints. That is not real life. There will always beforces pushing and pulling you in different directions. There may befrustrating shortcomings in the data you will have to work with orlimitations with your technical capabilities. As discussed, people – youraudience members – introduce huge inconsistencies. They – we – arecomplex, irrational and primarily different. Accepting the absence ofperfection helps us unburden ourselves somewhat from the constantnagging sense that we missed out on discovering the perfect solution. Thiscan prove quite liberating.

That is not to say our ambitions need to be lowered. Quite the opposite.We should still strive for best, the absolute optimum solution given thecircumstances we face. To achieve this requires improved effectivenessand efficiency in decision making. We need to make better calls, morequickly. The most reliable approach to achieving this is by following adesign process.

The process undertaken in this book is structured around the followingstages (Figure 2.1).

Figure 2.1 The Four Stages of the Visualisation Workflow

Here are a few observations about this process ahead of itscommencement.

Pragmatic: This process aims to provide a framework for thinking,rather than instructions for learning. As described in the Introduction,there are very few universal rules to rely upon. While the comfortprovided by rules is what many might seek at the beginning of theirlearning journey, flexible pragmatism beats dogmatism in anysituation. Useful rules do exist in visualisation but are often related toquite micro-level matters. I will come to discuss these in due course.

95

Reducing the randomness of your approach: The value of theprocess is that it guides your entry and closing points: where and howto begin your work as well as how and when it will be finished. Whenyou are new to data visualisation, the sheer extent of things to thinkabout can be quite an overwhelming prospect. This workflowapproach aims to break down activities into a connected system ofthinking that will help to organise and preserve the cohesiveness ofyour activities. The process incrementally leads you towardsdeveloping a solution, with each stage building on the last andinforming the next. The core purpose of the approach is to give you agreater sense of the options that exist at each stage and provide youwith better information with which to make your choices.Protect experimentation: The systematic approach I am advocatingin this book should not be seen as squeezing out the scope forcreativity or eradicating any space for experimentation. It is natural towant to reduce wasted effort, but at the same time it is absolutely vitalto seek opportunities – in the right places – for imagination toblossom. In reality, many of the projects you will work on will notnecessarily rely on much creative input. There will be projects thathave pressures on time – and a need to compromise on experimentingin favour of the desire for efficiency. There will be subjects ordatasets that you work with that are just not congruent with overtcreative thinking. It is about striking a balance, affording time onthose activities that will bring the right blend of value to suit eachcontext.

‘I truly feel that experimentation (even for the sake of experimentation)is important, and I would strongly encourage it. There are infinitepossibilities in diagramming and visual communication, so we havemuch to explore yet. I think a good rule of thumb is to never allow yourdesign or implementation to obscure the reader understanding thecentral point of your piece. However, I’d even be willing to forsake this,at times, to allow for innovation and experimentation. It ends up movingus all forward, in some way or another.’ Kennedy Elliott, GraphicsEditor, The Washington Post

Facilitate adaptability and iteration: This workflow ischaracterised as a design process rather than a procedure. A goodprocess should facilitate the adaptability and remove the inflexibilityof a defined procedure of operation. Although the activities are

96

introduced and presented in this book in a linear fashion, inevitablythere is much iteration that takes place. There will be times when youwill have to revisit decisions, maybe even redo activities in acompletely different way given what you have discovered furtherdown the line. If you make mistakes or bad calls – and everyone does– it is important to fail gracefully but recover quickly. You will needto be able to respond to changes in circumstances and accommodatetheir impact fast. A good process cushions the impact of situationsarising like this.The first occasion, not the last: It is important to note that the tasksyou face at each stage in the process will represent the first occasionyou pay attention to these matters, but not the final occasion. There issomething of a trickle-down effect here. Many of the concerns youwill be faced with at the start of a challenge will likely continue toecho in your thoughts right through to the end. Some things are justnot possible to close off that easily. Take the ongoing demands ofprofiling who your audiences are and what they might need. Thatthinking starts early and should actually never drop off your radar.The nature of the process gives you the best chance of keeping all theplates spinning that need to be spun, knowing which ones can be leftto drop and when.Always the same process: The range of visualisation challenges youwill face in your career will vary. Even if you are producing the samework every month, no two projects will provide the same experience:just having an extra month of data means it has a new shape or size. Itis different. Some projects you work on will involve fairly simpledata, others will involve hugely complex data. In some cases you willhave perhaps two hours or two days to deliver a solution, in othersyou might have two months. The key thing is that the process youfollow will always require the same activities in the same sequence,regardless of the size, speed and complexity of your challenge. Themain difference is that any extremes in the circumstances you facewill amplify the stresses at each stage of the process and place greaterdemands on the need for thorough, effective and timely decisionmaking.Partitioning your mindset: Within each of the sequenced stageslisted in Figure 2.1 there will be different demands on your mindset:sometimes you are thinking, sometimes you are doing, sometimesyou are making. When you are working alone, especially, it isimportant to appreciate the activities that will require different

97

mindsets:Thinking: The duties here will be conceptual in nature, requiringimagination and judgment, such as formulating your curiosity,defining your audience’s needs, reasoning your editorialperspectives, and making decisions about viable design choices.Doing: These are practical tasks that will still engage the brain,obviously, but manifest themselves through more hands-onactivities like sketching ideas and concepts, learning about asubject through research, gathering and handling your data.Making: These involve the constructive and technical activitiesthat specifically relate to the production cycle as you face thechallenge of translating promising, well-considered designconcepts into effective, working solutions.

‘You need a design eye to design, and a non-designer eye to feel whatyou designed. As Paul Klee said, “See with one eye, feel with theother”.’ Oliver Reichenstein, founder of Information Architects (iA)

2.2 The Process in PracticeThroughout this book I will call out key points of advice in the form ofuseful tips, tactics or good habits you should be looking to consideremploying. Many of these have been informed by interviews with some ofthe brilliant people working in this field today. As you are about tocommence the design workflow here are some pieces of advice thattranscend any individual stage of the process.

Managing progress and resources: Good planning, time andresource management keep a process cohesive and progressing. Theyrepresent the lubricant. You will rarely have the luxury of working ona project that has no defined end date and so adhering to imposed orself-imposed timescales is especially important. It is very easy to getswallowed up by the demands of certain activities, particularly thoseinvolved in the ‘working with data’ stage. Similarly the productioncycle (which takes place during and beyond Part C), as you iteratebetween idea, prototype and construction, can at times appear neverto have an end in sight. As one task is finished, another two alwaysseem to appear. As you get closer to a deadline you will either sink orswim: for some the pressure of time is crippling; others thrive on the

98

adrenaline it brings and their focus is sharpened as a result.Recognising the need to factor in time for some of the broaderresponsibilities – clerical tasks, arranging demo meetings and skypecalls, file management and running backups – will prove hugelybeneficial by the end.Room to think: On the theme of task duration and progress, it isimportant to build in the capacity to think. The notion of brain ‘states’is relevant here, in particular the ‘alpha’ state which kicks in mostcommonly when we are particularly relaxed. Being in this state helpsto heighten your imagination, learning and thought process.Apologies for the mental image but I do some of my most astutethinking in the shower or bath, and just before going to sleep at night.These are the occasions when I am most likely drifting into a relaxedalpha state and help me to contemplate most clearly the thoughts andideas I might have. I find train or air travel achieves the same as doeslying on a beach. Unfortunately in the latter scenario I just don’t careenough about work in that moment to note down my frequent geniusideas (what do you mean ‘which ones’?). If I have a task that willtake two days of my time but the deadline is further away, I typicallytry to break down the time I give to it across smaller clusters of threeto four hours spread across four days of activity in order to createsufficient opportunities for my brain to tick over during theintervening gaps and hopefully allow good ideas to ferment.Heuristics to support decisions: As I have discussed, there will beoccasions when the best choice does not present itself, when time ispressurising you and when you will need to make a call. You mighthave to occasionally rely on heuristic techniques that help to speed upyour decision-making at certain stages. Although this might seem anunsatisfactory tactic to consider, given the previously stated need toeliminate arbitrary choices, heuristics can remain consistent with thisdesire when they rely on educated, intuitive or common-sensejudgements. As you develop your experience, the astuteness of suchheuristic judgments will be increasingly reliable to fall back on whenthe need arises.Pen and paper: The humble pairing of pen and paper will prove tobe a real ally throughout your process. I will not over-sentimentallyclaim this is the most important tool combination because, unless youare producing artisan hand-drawn work, you will have other technicaltools that would probably rise up the importance list. However, thepoint here is that capturing ideas and creating sketches are a critical

99

part of your process. Do not rely on your memory; if you have a greatidea sketch it down. This activity is never about artistic beauty. Itdoes not need you to be an artist, it just needs you to get things out ofyour head and onto paper, particularly if you are collaborating withothers. If you are incredibly fortunate to be so competent with a giventool that you find it more natural than using pen and paper to ‘sketch’ideas quickly, then this is of course absolutely fine, as long as it isindeed the quickest medium to do so.Note-taking: Whether this is via pen and paper, or in Word, or aGoogle doc, note-taking is a vital habit to get into. This is aboutpreserving records of important details such as:

information about the sources of data you are using;calculations or manipulations you have applied to your data;assumptions you have made;data descriptions, particularly if explanations have been offeredto use verbally by somebody who knows the data well;questions you have yet to get answers to;the answers you did get to your questions;terminology, abbreviations, codes – things you need toremember the meaning and associations in your data;task lists and wish lists of features or ideas you would like toconsider pursuing;issues or problems you can foresee;websites or magazines that you saw and gave you a bit ofinspiration;ideas you have had or rejected.

Note-taking is easier said than done, and I am among the least naturalnote-takers to roam this Earth, but I have forced it into becoming a habitand a valuable one at that.

‘Because I speak the language of data, I can talk pretty efficiently withthe experts who made it. It doesn’t take them long, even if the subject isnew to me, for them to tell me any important caveats or trends. I alsothink that’s because I approach that conversation as a journalist, whereI’m mostly there to listen. I find if you listen, people talk. (It sounds soobvious but it is so important.) I find if you ask an insightful question,something that makes them say “oh, that’s a good point,” the wholeconversation opens up. Now you’re both on the same side, trying to getthis great data to the public in an understandable way.’ Katie Peek,

100

Data Visualization Designer and Science Journalist

Communication: Communication is a two-way activity. Firstly, it isabout listening to stakeholders (clients or colleagues) and youraudience: what do they want, what do they expect, what ideas do theyhave? In particular, what knowledge do they have about your subject?Secondly, communication is about speaking to others: presentingideas, updating on progress, seeking feedback, sharing your thoughtsabout possible solutions, and promoting and selling your work(regardless of the setting, you will need to do this). If you do notknow the intimate details of your subject matter you will need tolocate others who do: find smart people who know the subject betterthan you or find smart people who do not know the subject but arejust smart. You cannot avoid the demands of communicating so donot hide behind your laptop – get out there and speak and listen topeople who can help you.Research: Connected to the need for good communication is theimportance of research. This is an activity that will exist as a constant,running along the spine of your process thinking. You cannot knoweverything about your subject, about the meaning of your data, aboutthe relevant and irrelevant qualities it possesses. As you will see later,data itself can only tell us so much; often it just tells us whereinteresting things might exist, not what actually explains why they areinteresting.

‘Research is key. Data, without interpretation, is just a jumble of wordsand numbers – out of context and devoid of meaning. If done well,research not only provides a solid foundation upon which to build yourgraphic/visualisation, but also acts as a source of inspiration and aguidebook for creativity. A good researcher must be a team player withthe ability to think critically, analytically, and creatively. They shouldbe a proactive problem solver, identifying potential pitfalls andproviding various roadmaps for overcoming them. In short, theirinclusion should amplify, not restrain, the talents of others.’ AmandaHobbs, Researcher and Visual Content Editor

Attention to detail: Like note-taking, this will be something thatmight not be a natural trait for some but is so important. You cannotafford to carry any errors in your work. Start every project with thatcommitment. This is such an important ingredient to securing trust in

101

your work. The process you are about to learn is greatly influenced bythe concept of ‘aggregation of marginal gains’. You need to sweat thesmall stuff. Even if many of your decisions seem small andinconsequential, they deserve your full attention and merit being doneright, always. You should take pride in the fine detail of your designthinking, so embrace the need for checking and testing. If you are soimmersed in your work that you become blind to it, get others to help– call on those same smart people you identified above. As someonewho once published a graphic stating Iran’s population was around 80billion and not 80 million, I know how one tiny mistake can cause theintegrity of an entire project to crumble to the ground. You do not geta second chance at a first impression, somebody once said. I forgetwho, I wasn’t paying attention …Make it work for you: The only way you will truly find out whethera process works for you is if you practise it, relentlessly. As I havestated, every project will be different even if only in small ways.However, if you just cannot get the approach presented in this book tofit your personality or purpose, modify it. We are all different. Do notfeel like I am imposing this single approach. Take it as a proposedframework based on what has worked for me in the past. Bend it,stretch it, and make it work. As you become more experienced (andconfident through having experienced many different types ofchallenges) the many duties involved in data visualisation design willbecome second nature, by which time you probably will no longer beaware of even observing a process.Be honest with yourself: Feedback, editing, not doing certain things,are disciplines of the effective visualiser. Honesty with yourself isvital, especially as you are often working on a solo project but needso many different skill sets and mindsets. As I mentioned in the lastsection, preciousness or stubbornness that starts to impede on qualitybecomes destructive. Being blind to things that are not working, ornot taking on board constructive feedback just because you haveinvested so much time in something, will prove to be the largerburden. Do not be afraid to kill things when they are not working.Learn: Reflective learning is about looking back over your work,examining the output and evaluating your approach. What did you dowell? What would you do differently? How well did you manageyour time? Did you make the best decisions you could given theconstraints that existed? Beyond private reflections, some of the bestmaterial about data visualisation on the Web comes from people

102

sharing narratives about their design processes. Read how otherpeople undertake their challenges. Maybe share your own? You willfind you truly learn about something when you find the space to writeabout it and explain it to others. Write up your projects, present yourwork to others and, in doing so, that will force you to think ‘why did Ido what I did?’.

‘No work is ever wasted. If it’s not working, let go and move on – it’llcome back around to be useful later.’ Emma Coats, freelance FilmDirector, formerly of Pixar

Summary: Data Visualisation WorkflowIn this chapter you were introduced to the design workflow, whichinvolves four key stages:

1. Formulating your brief: planning, defining and initiating your project.2. Working with data: going through the mechanics of gathering,

handling and preparing your data.3. Establishing your editorial thinking: defining what you will show

your audience.4. Developing your design solution: considering all the design options

and beginning the production cycle.

Undertaking the activities in this workflow require you to partition yourmindset:

Thinking: conceptual tasks, decision making.Doing: practical undertakings like sketching, visually examining data.Making: technical duties like analysing data, constructing thesolution.

Finally, you were presented with some general tips and tactics ahead ofputting the process into practice:

This will be the first time you think about each of the stages andactivities, not the last – visualisation design is as much about plate-spinning management as anything else.The importance of good project management to manage progress and

103

resources cannot be over-emphasised.Create room to think: clear thinking helps with efficiency of effort.Pen and paper will prove to be one of your key tools.Note-taking is a habit worth developing.Communication is a two-way relationship: it is speaking andlistening.Attention to detail is an obligation: the integrity of your work isparamount.Make the workflow work for you: practise and adapt the approach tosuit you.Be honest with yourself, do not be precious and have the disciplinenot to do things, to kill ideas, to avoid scope-creep.

104

Part B The Hidden Thinking

105

3 Formulating Your Brief

In Chapter 2 you learnt about the importance of process, taking on datavisualisation challenges using a design workflow to help you make gooddecisions. This third chapter initiates the process with formulating yourbrief.

The essence of this stage is to identify the context in which your work willbe undertaken and then define its aims: it is the who, what, why, where,when and how. It can be as formal – and shared with others – or asinformal an activity as you need to make it.

The first contextual task will be to consider why you are producing thisdata visualisation – what is its raison d’être? To answer this, you will needto define what triggered it (the origin curiosity) and what it is aiming toaccomplish (the destination purpose). Recognising that no visualisationprojects are ever entirely free of constraints or limitations, you will alsoneed to identify the circumstances surrounding the project that will shapethe scope and nature of the project you’re about to undertake.

Following these contextual definitions you will briefly switch yourattention to consider a vision for your work. With the origin and intendeddestination mapped out you will be able to form an initial idea about whatwill be the best-fit type of design solution. You will be introduced to thepurpose map, which provides a landscape of all the different types ofvisualisation you could pursue, helping you establish an early sense ofwhat you should pursue. To wrap up the chapter, you will allocate sometime to harness the instinctive thoughts you might have had about theideas, images, keywords and inspirations that you feel could play a role inyour work.

Collectively this work will provide you with a solid foundation fromwhich to best inform all your subsequent workflow process stages.

3.1 What is a Brief?In its simplest form a brief represents a set of expectations and captures all

106

the relevant information about a task or project. It is commonly associatedwith the parlance of project management or graphic design, but in datavisualisation the need to establish clarity about the definitions andrequirements of a project is just as relevant. This is about establishing thecontext of and vision for your work.

When you are working with clients or colleagues it will be in the interestsof all parties to have a mutual understanding of the project’s requirementsand some agreement over the key deliverables. In such situations you mayhave already been issued with some form of initial brief from thesestakeholders. This could be as informal as an emailed or verbal request oras formal as a template-based briefing document. Irrespective of what hasbeen issued you will get more value from compiling your own briefingdocument to ensure you have sufficient information to plan your upcomingwork.

If you are not working for or with others – essentially pursuing work thatyou have initiated yourself – you clearly will not have been issued withany brief, but once again, it will be to your advantage to compile a brieffor yourself. This does not have to be an overly burdensome orbureaucratic task. I use a simple checklist that is not only practicallylightweight but also comprehensively helpful, comprising a series ofquestion prompts that I either answer myself or raise with thosestakeholders with whom I am working.

For some beginners, this stage can feel somewhat frustrating. On thesurface it sounds like a clerical prospect when what you really want is justto get on with the good stuff, like playing with the data and focusing oncreativity. Understanding contextual matters, in particular before anythingelse, is too invaluable a practice to neglect. All the decisions that follow inthis workflow will be shaped around the definitions you establish now.There may be changes but you will reap the benefits from gaining as muchearly clarity as possible.

3.2 Establishing Your Project’s Context

Defining Your Origin CuriosityA worthwhile data visualisation project should commence from the

107

starting point of a curiosity. According to the dictionary definition,curiosity is about possessing ‘a strong desire to know or learn something’.This aligns perfectly with the goal of data visualisation, defined in Chapter1 as being to facilitate understanding. By establishing a clear sense ofwhere your project originated in curiosity terms, the primary force thatshapes your decision making will be the desire to respond effectively tothis expressed intrigue.

‘Be curious. Everyone claims she or he is curious, nobody wants to say“no, I am completely ‘uncurious’, I don’t want to know about theworld”. What I mean is that, if you want to work in data visualisation,you need to be relentlessly and systematically curious. You should try toget interested in anything and everything that comes your way. Also,you need to understand that curiosity is not just about your interestsbeing triggered. Curiosity also involves pursuing those interests like ahound. Being truly curious involves a lot of hard work, devoting timeand effort to learn as much as possible about various topics, and to makeconnections between them. Curiosity is not something that just comesnaturally. It can be taught, and it can be learned. So myrecommendation is: develop your curiosity, educate yourself – don’t justwait for the world to come to you with good ideas. Pursue them.’Professor Alberto Cairo, Knight Chair in Visual Journalism,University of Miami, and Visualisation Specialist

A visualisation process that lacks an initially articulated curiosity can leadto a very aimless solution. After all, what is it you are solving? Whatdeficit in people’s understanding are you trying to address? Having thebenefit of even just a broad motive can help you tremendously innavigating the myriad options you face.

The nature of the curiosity that surrounds your work will vary dependingon where it originated and who it is serving. Consider these five scenarioswhere the characteristics differ sufficiently to offer different contextualchallenges:

Personal intrigue – ‘I wonder what …’Stakeholder intrigue – ‘He/she needs to know …’Audience intrigue – ‘They will need to know …’Anticipated intrigue – ‘They might be interested in knowing …’Potential intrigue – ‘There might be something interesting …’

108

Let’s work through an illustration of each of these scenarios to explaintheir differences and influences.

Firstly, there are situations where a project is instigated in response to acuriosity borne out of personal intrigue. An example of this type ofsituation can be found in the case-study project that I have published as adigital companion to this book (book.visualisingdata.com) to helpdemonstrate the workflow process in practice. The project is titled‘Filmographics’ and concerns the ebb and flow of the careers of differentmovie stars. You can find out more about it by visiting the book’s digitalresources.

The reason I pursued this particular project was because, firstly, I have apassion for movies and, secondly, I had a particular curiosity about theemergence, re-emergence and apparent disappearance of certain actors.Expressed as a question, the core curiosity that triggered this project was:‘What is the pattern of success or failure in the movie careers of a selectionof notable actors?’

This initial question provided me with immediate clarity: the goal of thevisualisation would be to deliver an ‘answer’ to this question, to help mebetter understand how the career patterns look for the different actorsselected. In this case I am the originator of the curiosity and I am pursuingthis project for my own interest.

Ultimately, whether this initially defined curiosity remains the samethroughout the process does not really matter. Quite often one’s initialexpression of curiosity shifts considerably once data has been gathered andanalysed. When more research is carried out on the subject matter youbecome more roundly acquainted with the relevance (or otherwise) of thetrigger enquiry. You might alter your pursuit when you realise there issomething different – and potentially more interesting – to explore. You donot want to be anchored to an enquiry that no longer reflects the mostrelevant perspective but it does offer at least a clear starting point – aninitial motive – from which the process begins.

Sometimes, the nature of the motivation for a personal intrigue-basedcuriosity is recognition of one’s ignorance about an aspect of a subject thatshould be known (a deficit in ‘available’ understanding) more than adefined interest in a subject that may not be known (possibly creating newunderstanding).

109

Let’s consider another scenario, still concerning movie-related subjectmatter, but to explain a different type of curiosity. Suppose you work for amovie studio and have been tasked by a casting director to compile a one-off report that will profile which actors are potentially the best option tocast in a major sci-fi movie that has just been given the green light to beginproduction. You have certain criteria to follow: they have to be female,aged 30-45, and must fit the description of ‘rising’ star. They must nothave been in other sci-fi movies, nor can they have any of the ‘baggage’that comes with being associated with huge flops. Their fees should beunder $2 million. You go away, undertake the analysis, and compile areport showing the career paths of some of the most likely stars who fit thebill.

This scenario has not come about through your own personal curiosity butinstead you are responding to the specific curiosity of the casting director.In undertaking this work you effectively inherit – take on – the curiosity ofothers. They have briefed you to find the data, analyse it, and then presentthe findings to them. This would be an example of curiosity born out ofstakeholder intrigue: work commissioned by a stakeholder who is alsothe target audience (or is the prominent party among the intendedaudience). There is no anticipation of interest here, rather it is known.

For the third scenario, you might work for a business involved in theanalysis and commentary of the state of the movie industry. Let’s imagineyour company specialises in producing a dashboard that is shared with abroad group of users comprising Hollywood executives, studio seniormanagement and casting agents, among others. The dashboard profiles allaspects of the industry, covering current trends and the career fortunes of awide range different actors, helping users to identify who is hot, who isnot, who is emerging, who is declining, who will cost what, who scoreswell with different audiences, etc.

The various indicators of information you are compiling and presenting onthe dashboard are based on the recognised needs of the professionalcuriosities these people (client users) will have about this subject matter(movie career statuses). Given the diverse permutations of the differentmeasures included, not all the information provided will be of interest allthe time to all who consume it, but it is provided and available as andwhen they do need it. This would be an example of curiosity born out ofaudience intrigue – shaped out of a combination of knowing what will be

110

needed and reasonably anticipating what could be needed.

What you are working towards in situations like this is ensuring that all therelevant aspects of possible curiosity can be brought together in a singleplace to serve as many needs as possible. There are similarities here withthe multitude of dials, displays and indicators in the cockpit of an aircraft.The pilot does not need all that information as an immediate priority all thetime, but may need access to some of the information in a reactive senseshould the situation arise. Additionally, this scenario may be typical of avaried and larger scale audience in contrast to the more bespoke nature ofa stakeholder intrigue scenario. You will rarely if ever be able to serve100% of the audience’s potential needs but you can certainly aspire to doyour best.

Consider another similar scenario, but with a different setting used toillustrate a more subtle distinction. Suppose I am as a graphics editorworking for a newspaper. One of the topics of current attention mightconcern the relatively late-career breakthrough of a certain actor, who hasalmost overnight moved from roles in relatively modest TV shows tostarring in cinematic blockbusters. It is decided by the assignments editorthat I will work on a graphic that examines the fortunes of this actor’scareer alongside a selection of other actors to provide contrast or drawcomparisons with.

On this occasion the trigger is not necessarily emerging from personalintrigue. I have essentially been issued with the requirements: even if Iagree with the idea or had a similar thought myself, the organisationalrelationship decrees that others will instruct me about which tasks to workon. A stakeholder – the assignments editor (or others in the editorialhierarchy) – has determined that this is a topic of interest and worthexploring. However, in contrast to the stakeholder intrigue scenario, herethe stakeholder is not the intended audience. It is not necessarily even acuriosity they have themselves. The motive for this work is likely drivenby the machinations of current affairs: what is newsworthy and likely to beof some interest to readers? Therefore, the belief is that this analysis(looking at this actor’s career path compared with others) is aligned to thecurrent entertainment news agenda.

‘The best piece of advice, which is “always be curious,” came fromSteve Duenes, who has led the graphics team at the New York Times for

111

more than a decade. Being curious covers the essence of journalism:question everything, never make assumptions, dig. You can’t makegreat visualizations without great information, so make sure yourreporting leads you to visual stories that are interesting, surprising,significant.’ Hannah Fairfield, Sr. Graphics Editor, The New YorkTimes

This would be an example of curiosity born out of anticipated intrigue.The audience has not explicitly asked for this and does not necessarilyneed it. However, it is perceived to be relevant in the context of the newscycle and informed judgement has been used to anticipate there should besufficient interest among the target audience about this topic. Sometimesyou will work on projects where you have almost to imagine or assumewhat appetite exists among an audience rather than just respond to anexpressed need.

Most of the projects I work on will be driven by stakeholders asking me tocreate a visualisation to communicate understanding to others (notnecessarily them), as per the ‘audience’ and ‘anticipated’ intriguescenarios. The secondary role of the ‘filmographics’ project, that I definedas emerging from and serving a personal intrigue, will also be to pique theinterest of other movie fans. Again, this is based on anticipated intriguemore than known audience intrigue.

The final scenario of curiosity goes back to our role as an individual. Let’ssay I am interested in data visualisation and also interested in movies and Idiscover a clean dataset full of rich content about movies and actors. Thissounds like a compelling opportunity to do something with it because I amconvinced there will be some nuggets of insight locked inside. I might nothave determined a specific curiosity yet, as my entry point, but I will beable to establish this later once I have had a closer look at what potentialthe data offers.

This would be a situation where curiosity is born out of potential intrigue– potential because I just do not know explicitly what it will be yet.Sometimes, in your subject of study or in the workplace, perhaps if youwork with collections of survey results or findings from an experiment,you might find yourself with the opportunity to explore a dataset withoutany real prior sense of exactly what it is you are looking to get out of it.You are initially unclear about the precise angle of your enquiry but youwill explore the data to acquaint yourself fully with its qualities and

112

generally research the subject. From there you should have a better idea ofa more specific curiosity you might pursue. In effect, this scenario wouldthen switch into more personal intrigue (if it remains just for yourself) oranticipated intrigue (if you might share it with others).

This final scenario is the only one whereby the availability of and access todata would arrive before you have articulated a specific curiosity. In all theother scenarios outlined, the data you need will typically be sought as aresponse to the curiosity. Even in the lattermost scenario about potentialintrigue, data itself does not just fall from the sky and into your lap(top).The sheer fact that you have a dataset to work with will be becausesomebody else, at an earlier moment in time, was interested in measuringan activity, recording it, and making the collected data available. That initself could only have arisen from their own curiosity.

The potential intrigue type of curiosity might also extend to situationswhere you simply have a desire to practise your visualisation skills,experimenting and trying out new techniques with some sample data. Inthis scenario the incentive is more to learn from a new experience ofworking through a visualisation process and may not necessarily havethe same drivers as when definable audiences exist

Why do these different scenarios of curiosity have such an important roleto play? Firstly, they provide clarity about the angles of analysis that youmight be pursuing. As you will see later, even in the smallest andseemingly simplest dataset, there are many possibilities for conductingdifferent types of analysis. The burden of choosing is somewhat eased byknowing in advance what might be the most interesting and relevantanalysis to focus on. Secondly, the different scenarios described all presentslightly different characteristics in the dynamics of the people involved.Who are the stakeholders and what is their interest? Who are the intendedrecipients – the audience – and what is their interest? As you have alreadyseen – and will keep seeing – the involvement of people creates suchinfluential forces (good and bad) shaping your visualisation thinking. Youtherefore need to know about how those forces might materialise from theoutset.

Identifying Your Project’s Circumstances

113

Defining your project’s circumstances involves identifying all therequirements and restrictions that are inherited by you, imposed on you ordetermined by you. These are the different pressure points that establishwhat you can or cannot pursue and what you should or should not pursue.Much of this contextual thinking is therefore associated with the aim ofambition management.

There are so many hidden variables and influences in a visualisationproject that the end viewer never gets to see and often does not appreciate.It is natural for them to assess a project through the lens of an idealisedcontext free of restriction, but there are always limitations, externalinfluences and project-specific factors that affect the shape of the finalwork.

When starting a project you will find that not all the circumstances thatcould have an influence on your work will prove to be as identifiable,definable or fixed as you might like. Some things change. Some things canonly be recognised once you’ve become a little more acquainted with thenature of your task. As I stated in the previous chapter, doing this activitynow is only the first occasion you will be paying attention to these matters,not the last. Of course, the more you can define, the greater the clarityyour subsequent decisions will be based upon. There are other stageswhere you can work with uncertainty but, ideally, not here. Your workneeds to obtain as much focus as possible.

In order to design a tool, we must make our best efforts to understandthe larger social and physical context within which it is intended tofunction.’ Bill Buxton, Computer Scientist, Designer and Author,Sketching User Experiences

‘Context is key. You’ll hear that the most important quality of avisualisation is graphical honesty, or storytelling value, or facilitation of“insights”. The truth is, all of these things (and others) are the mostimportant quality, but in different times and places. There is no singularfunction of visualisation; what’s important shifts with the constraints ofyour audience, goals, tools, expertise, and data and time available.’Scott Murray, Designer

There are some factors that may not be relevant or do not have anypredefined restrictions or set requirements. For example, you might not

114

have any format restrictions (print vs digital, large size vs small size) tocontend with, in which case it is entirely up to you how it evolves.Identifying that no format restrictions exist is as valuable as knowing whenthey do. It gives you control. You might decide there is merit in imposinga restriction yourself. You might appreciate some degree of focus bydetermining that your target output will be for a printed, poster-sizeddisplay.

People

Stakeholders: In project situations where you have beenrequested/commissioned to do a visualisation by somebody else, it ishelpful to establish an understanding of all the different players andtheir involvement. Defining stakeholders will help you anticipatewhat sort of experience you are going to go through, how enjoyableand smooth it might be, or how much friction and what obstaclesmight be involved.For starters, who is the ultimate customer? This might not be theperson who has directly commissioned you, nor might it be somebodybelonging to the intended audience, rather someone who hasinfluence over the final work. They may not be decision makers,rather decision approvers. They are the people from whom you awaitthe thumbs up. Stakeholders will have an influence on when work isof sufficient quality, in their eyes, to declare it as being on the rightpath or, ultimately, to signal the completion of the project. In myworld, when you might be doing work as a contracted designconsultant, they determine when I will get paid. As you have seen,stakeholders might also be the people from whom the origin curiosityemerged, so they will be especially invested in what it is you are ableto produce.Other stakeholders might have a smaller involvement or influence.Their role may be a positive one – offering advice and assistance witha specific domain challenge – or, in a minority of cases, a negativeone – hindering progress by influencing design decisions beyond theirremit and capability. In this case they become interferers. We don’tlike interferers because they make life unnecessarily harder(especially, strangely enough, if they are nice people). A primarycontact person, who will act as the liaison between parties, will beanother important role to identify.If there are no stakeholders and the project is a solo pursuit there will

115

be much more flexibility for you to dictate matters. You might evenbe more motivated to go ‘above and beyond’ if you are driven by apersonal intrigue. Conversely there will be fewer channels ofguidance and support. This is not to say that one situation is better, itjust means they are different and this difference needs to berecognised early.Audience: What are the characteristics of your viewers? Severaldifferent attributes were defined in discussing the principle for‘accessible design’ in Chapter 1. You are primarily trying tounderstand their relationship with the subject matter. How informedare they about a subject and what motivation might they have towardsit – is it a passing interest or a definable need? What capacity mightthey have to make sense of the type of visualisations you may need tocreate (their graphical literacy)? How could their personal traitsinfluence your design choices? You will never nor should you ever letthe spinning plate of concern about your audience drop.

Constraints

Pressures: The primary pressure relates to timescales: how muchtime have you got to work through the full process and publish acompleted solution? The difference in potential ambition between aproject that is needed in two days compared with one that is needed intwo months is clear. However, the real issue is the relationshipbetween timescales and the estimated duration of your work. Twomonths might sound great but not if you have three months’ work toaccomplish. Estimating project duration to any reliable degree is adifficult task. You need experience from working on a diverse rangeof projects that can inform your expectations for how long eachconstituent task could take. Even then, seemingly similar projects canend up with very different task durations as a result of the slightestchanges in certain circumstances, such as the inclusion of an extravariable of data, or more significant changes like a previously print-only project requiring a bespoke digital interactive solution as well.In addition to project timescales, you will need to be aware of anyother milestones that might have to be met. Work that you areproducing for other stakeholders will often require you to presentyour ideas/progress at various stages. This is a good thing. It givesyou the opportunity to check if you are in sync or discover if youhave misunderstood certain needs. Note that it can be risky to present

116

under-developed concepts to potentially inexperienced stakeholderswho may not be able to extend their imagination to envision how thework will look when completed.Other pressures may exist in tangible terms through financialrestrictions. What time can you afford to spend? This is not justassociated with freelancing or studio work, it can be the same forresearch groups which have finite resources and need to use their time– and their costs – sensibly. It might also have an influence onoccasions where you need to outsource parts of your work (e.g.paying for transcription services, third-party data sources) or makepurchases (software, hardware, licences for photograph usage).

‘What is the LEAST this can be? What is the minimum result that will1) be factually accurate, 2) present the core concepts of this story in away that a general audience will understand, and 3) be readable on avariety of screen sizes (desktop, mobile, etc.)? And then I judge whatelse can be done based on the time I have. Certainly, when we’re downto the wire it’s no time to introduce complex new features that requirelots of testing and could potentially break other, working features.’Alyson Hurt, News Graphics Editor, NPR, on dealing withtimescale pressures

Always note down your task durations so you can refine your estimatesfar better on future projects. These estimates are not just valuable forclient work, you will need them to manage your own time regardless ofthe nature of the project.

The final pressure is slightly less tangible but comes in the form ofwhat might be described as market influences. Sometimes you willfind your work is competing for attention alongside other work. Inthis age of plenty, a desire to emulate the best or differentiate fromthe rest can prove to be a strong motive. For example, if you areworking for a charitable organisation, how do you get your messageacross louder and more prominently than others? If you are workingon an academic research project, how do you get your findings heardamong all the other studies also looking to create an impact? It mightbe the internal dynamics within a student group or organisation or thebroader competition across entire marketplace and industries, butregardless, considerations like this do introduce an extra ingredient to

117

shape your thinking.Rules: These are relatively straightforward matters to define and areconcerned with any design rules you need to know about and follow.These might be issues around:

Layout/size restrictions: Maximum size and specific shaperestrictions might exist with graphics created for articlespublished in journals or the screen size dimensions for digitaloutputs that need to work on a tablet/smartphone. Are thereprinting resolution requirements around dpi (dots per inch)? Thecommonly used industry standard for printing is 300 dpi.Style guidelines: In many organisations (and with some media)there are often visual identity branding guides imposed on youthat determine the colours, typeface and possibly logos that youneed to include. If possible, try to push back on this because theycan be unnecessarily restrictive and often the choices imposedare horribly ill-suited to data visualisation. Otherwise, you willhave to abide by the style requirements dictated to you. Alsocheck to see if you will need to include any logos. They maytake up valuable space and you’ll need to think about theirimpact on the balance of your overall colour palette andcomposition.Functional restrictions: The potential requirement to createoutputs that are compatible with certain browsers, versions ofsoftware or programming languages will be an importantconsideration to establish early.

Consumption

Frequency: The issue of frequency concerns how often a particularproject will be repeated and what its lifespan will be. It might be aregular (e.g. monthly report) or irregular (e.g. election polling graphicupdated after each new release) product, in which case the efficiencyand reproducibility of your data and design choices will beparamount. If it is a one-off, you will have freedom from this concernbut you will have to weigh up the cost–benefit involved. Will there beany future benefits from reusing the techniques and thinking you putinto this project? Can you afford to invest time and energy, forexample, in programmatically automating certain parts of the creationprocess or will this be ultimately wasted if it is never reused? What isthe trade-off between the amount of work to create it and the expiry

118

of its relevance as time goes by – will it very quickly become out ofdate as new data ‘happens’? Maybe it is a one-off project in creationterms but is to be constantly fed by real-time data updates, in whichcase the primary concern will be of functional robustness.Setting: This concerns the situation in which your work would beconsumed. Firstly, this is judging whether the work is going to beconsumed remotely or presented in person (in which case the keyinsights and explanations can be verbalised). Secondly, is the natureof the engagement one that needs to facilitate especially rapidunderstanding or does it lend itself to a more extended/prolongedengagement?

‘I like to imagine that I have a person sitting in front of me, and I needto explain something interesting or important about this data to them,and I’ve only got about 10 seconds to do it. What can I say, or showthem, that will keep them from standing up and walking away?’ BillRapp, Data Visualisation Designer, discussing an audience scenariosetting he conceives in his mind’s eye

I keep four characteristic settings in mind when thinking about thesituations in which my work will be consumed by viewers:

The boardroom: A setting characterised by there being limitedtime, patience or tolerance for what might be perceived as anydelay in facilitating understanding: immediate insights required,key messages at a glance.The coffee shop: A more relaxed setting that might becompatible with a piece of work that is more involving andrequiring of viewers to spend more time learning about thesubject, familiarising themselves with how to read the displayand discovering the (likely) many different parts of the content.The cockpit: The situation that relates to the instrumentationnature of a visualisation tool or dashboard. There is a need forimmediate signals to stand out at a glance whilst also offeringsufficient breadth and depth to serve the likely multitude ofdifferent potential interests. Another example might be the usageof a reference map that works on all levels of enquiry, from at aglance, high-level orientation through to in-depth detail to aidthe operational needs of navigation.The prop: Here a visualisation plays the role of a supportingvisual device to accompany a presenter’s verbal facilitation of

119

the key understandings (via a talk) or an author’s written accountof salient findings (report, article).

Deliverables

Quantity: This concerns establishing the project’s workload prospectin terms of quantities. How many things am I making? How much,what type, what shape and what size? Is it going to involve a broadarray of different angles of analysis or a much narrower and focusedview of the data? What are the basic quantities of the outputs? Is it,for example, going to be about producing 12 different graphics for avaried slide deck or a 50-page report that will need two charts foreach of the 20 questions in a survey and some further summaries?Perhaps its a web-based project with four distinct sections, eachrequiring four interactively adjustable views of data. It will notalways be possible to determine such dimensions this early on in theprocess, but even by just establishing a rough estimate this can behelpful, especially for informing your estimate of the project’s likelyduration.Format: This concerns the output format: digital, print or physical.You will need a clear understanding of the specific format of thedeliverables required to factor in how your design work will beaffected:

Is it intended as a large poster-sized print or something for astandard A4-sized report?Will it exist as a website, a video, maybe even a tool or app?Is the digital output intended for smartphone, tablet (whichones?) as well as desktop? What ppi (pixels per inch) orresolution will it ideally need to work with?Are you handing over to your stakeholder just the final designwork or will you also be expected to provide all the backgroundfiles that contributed to the final piece of work?

‘I love, love, love print. I feel there is something so special about havingthe texture and weight of paper be the canvas of the visualisation. It’s aprivilege to be able to design for print these days, so take advantage ofthe strengths that paper offers – mainly, resolution and texture. Print hasa lot more real estate than screen, allowing for very dense, informationpacked visualisations. I love to take this opportunity to build in multiple

120

story strands, and let the reader explore on their own. The texture ofpaper can also play a role in enhancing the visualisation; consider how adesign and colour choices might be different on a glossy magazine pageversus the rougher surface of a newspaper.’ Jane Pong, DataVisualisation Designer, loves print (I think)

Resources

Skills: What capabilities exist among those who will have a role toplay in the design process? This might just be you, in which casewhat can you do and what can’t you do? What are you good at andnot good at? If you have collaborators, what are the blend ofcompetencies you collectively bring to the table? How might youallocate different roles and duties to optimise the use of yourresources? To help assess your capabilities, and possibly those acrossany team you are part of, consider the breakdown presented inChapter 11 in the ‘Seven hats’ section.Technology: As I have described already, there are myriad tools,applications and programming options in the data visualisation space,offering an array of different capabilities. No single package offerseverything you will ever need but, inevitably, some offer more andothers less. In order to complete the more advanced visualisationprojects you will likely require a Swiss-Army-knife approachinvolving a repertoire of different technology options at each of thepreparatory and development stages in this process. The software andtechnological infrastructure you have access to will have a greatinfluence on framing the ambitions of your work. I will be sharingmore information about tools in the digital resources that accompanythis book.

‘The thing is, this world, especially the digital data visualization world,is changing rapidly: new technologies, new tools and frameworks arebeing developed constantly. So, you need to be able to adapt. Butprinciples are much more timeless. If you know what you want tocreate, then using technology is just the means to create what you havein mind. If you’re too fixed on one type of technology, you may be outof a job soon. So, keep learning new technologies, but moreimportantly, know your principles, as they will allow you to make theright decisions.’ Jan Willem Tulp, Data Experience Designer

121

A final point to make about circumstances is to recognise the value, inmany cases, of limitations and constraint. Often such restrictions canprove to be a positive influence. Consider the circumstances faced byDirector Steven Spielberg while filming Jaws. The early attempts tocreate a convincing-looking shark model proved to be so flawed that formuch of the film’s scheduled production Spielberg was left without avisible shark to work with. Such were the diminishing time resourcesthat he could not afford to wait for a solution to film the actionsequences so he had to work with a combination of props and visualdevices. Objects being disrupted, like floating barrels or buoys and,famously, a mock shark fin piercing the surface, were just some of thetactics he used to create the suggestion of a shark rather than actuallyshow a shark. Eventually, a viable shark model was developed to servethe latter scenes but, as we all now know, in not being able to show theshark for most of the film, the suspense was immeasurably heightened.This made it one of the most enduring films of its generation. Thenecessary innovation that emerged from the limited resources andincreasing pressure led to a solution that surely transcended any otheroutcome had there been freedom from restrictions. The key messagehere is to embrace the constraints you face because they can heightenyour creative senses and lead to successful, innovative solutions.

Defining Your Project’s PurposeIdentifying the curiosity that motivates your work establishes the project’sorigin. The circumstances you have just considered will give you a senseof the different factors that will influence your experience on the projectand shape your ambitions. The final component of contextual thinking is toconsider your intended destination. What is it you specifically hope toaccomplish with your visualisation? This involves articulating yourproject’s purpose.

You know now that the overriding goal is to facilitate understanding, thatis non-negotiable, but the nature of this understanding may varysignificantly. In Chapter 1, I described how – as viewers – we go through aprocess of understanding involving the stages of perceiving, interpretingand, finally, comprehending. The undertaking of the first stage ofperceiving is largely controlled by the accessibility of the visualiser’sdesign choices. The second stage of interpreting (establishing meaningfrom a visualisation) will be influenced by the viewer’s capacity to derivemeaning or by the visualiser providing explanatory assistance to help the

122

viewer form this meaning. The final stage of comprehending is largelydetermined by the viewer alone as what something means to them is souniquely shaped by their personal context: what they know or do notknow, what their beliefs are and what their intentions are for acquiring thisunderstanding.

This three-stage model of understanding helps demonstrate the importanceof defining the purpose of a visualisation upfront. Some visualisationsmight aim to be quite impactive, attempting to shock or inspire viewers inorder to persuade them about a need to change behaviour or makesignificant decisions. For example. you might be seeking to demonstratevisually compelling evidence of the impact of dietary factors like sugarydrinks on the rise of obesity. The purpose might not just be to inform butactively to seek to make a difference, maybe targeting parents to changethe foods they allow their kids to eat. To achieve this kind of outcome youmight take a more emotive approach in the portrayal of your data to attractthe audience’s attention in the first place and then strike home thepowerful message in a way that resonates more deeply. Affecting people tothis degree can be quite ambitious.

In a different context, you might not need to go this deep. Some projectsmay be more modestly designed to enlighten or simply inform viewersbetter about a subject, even if the acquired understanding is quite small.There might be recognition that the target viewers should (and maybe arebetter placed to) reach their own conclusions. Perhaps, if you wererevealing the same type of dietary data to health professionals rather thanto parents, you might only be serving to confirm what they already mightknow or at least suspect. They probably will not need convincing about theimportance of the message, so the ambitions of the visualisation itself willbe considerably different. To achieve the purpose of this project wouldlikely lead to a very different design approach from the one in the previousscenario.

One size does not fit all. No single type of visualisation will be capable ofdelivering an experience whereby all flavours of understanding arefacilitated. Articulating your purpose is your statement of intent: anecessary sense of focus to help inform your design choices and a potentialmeasure to determine whether you accomplish your aims.

Defining your purpose before establishing your trigger curiosity is

123

putting the cart before the horse. A project driven by curiosity is thepurest basis for a visualisation project to commence and one most likelyto be guided by the clearest thinking. It is the approach that fits bestwith the sequence of thinking outlined in this workflow. When thedesired purpose drives decisions, visualisers can be overly focused onoutputs and not inputs. As discussed in Chapter 4, you need to let yourdata do the talking, not force the data to do your talking. The pressure toreach the desired destination can impact artificially on the data, editorialand design decisions you make.

If you are working with colleagues or for clients who express theirrequirements from the perspective of an outcome- or purpose-ledprocess, your skill as a visualiser will be to direct the discussionstowards a more curiosity-led perspective. Sometimes you will findstakeholders who are primarily motivated by a desire to reach manyviewers and their singular measure of success is purely the quantity ofeyeballs that will peruse a piece of work. However, I would contest thatthis does not make it a viable motive for an effective data visualisation,where the measure of success is about facilitating understanding firstand foremost. Loads of visitors and social media hits (likes, retweets,upvotes) are a wonderful bonus but should only be seen as a by-productof interest, not an indicator of effectiveness in and of itself. Those whoseek a viral success story rarely achieve it because it is so hard tomanufacture.

3.3 Establishing Your Project’s Vision

The ‘Purpose’ MapIn compiling definitions about the curiosity, circumstances and purpose,you have helped to initiate your process with a clear idea of the origin ofyour work, its likely desired destination, and some of the most influencingfactors you will have to contend with along the way.

To supplement this contextual thinking you should take the opportunity toconsider forming an initial vision for your work. The definition of vision is‘the ability to think about or plan the future with imagination or wisdom’and it has particular relevance for how we might foresee achieving thepurpose we have stated.

There are many types of visualisation with many different characteristics.

124

Two of the most significant concern the differences in tone andexperience. Reflecting the diversity of visualisation work being produced,the ‘purpose map’ (Figure 3.1) offers a high-level view of this landscapeshaped by different relationships across those two dimensions.

Figure 3.1 The ‘Purpose Map’

Certain types of visualisation will offer a better fit for your project. Theircharacteristics, in terms of experience and tone, will offer the right blendto best connect your origin curiosity with destination purpose. What youneed to consider here is what can you envision being the most suitabletype of visualisation that might be most capable of accomplishing whatyou intend.

While the more detailed design thinking won’t arrive until later in theworkflow, even at this early stage it is instructive to put some thought intothis matter. Let me explain sequentially the meaning of each of thesedimensions and the significance of the different regions within thispurpose map.

Experience

The horizontal dimension of this map concerns the experience of thevisualisation:

125

How will it practically operate as a means of communication?Through what functional experience will understanding be achievedby the viewer?

Along this spectrum are three different states against which you maydefine your intentions: Explanatory, Exhibitory or Exploratory (for amnemonic, think about this as being all about the EXs).

Explanatory visualisations are found on the left side of the map.Explanatory in this context essentially means we – as visualisers –will provide the viewer with a visual portrayal of the subject’s dataand will also take some responsibility to bring key insights to thesurface, rather than leave the prospect of interpreting the meaning ofthe information entirely to the viewer. The visualiser here isattempting to assist with the viewers’ process of understanding asmuch as possible, in particular with the interpretation, drawing out themeaning of the data.The rightmost side of this explanatory region of the map (in thesecond column, more towards the middle of the map) might beconsidered the ‘mildest’ form of explanatory visualisation. Here youfind projects that include simple annotation devices like value labelsor visual guides that direct the eye to help assist with the task ofinterpreting the data: the use of colour can be an immediate visual cueto help separate different features of a chart and captions mightoutline a key message or summary finding. An example of this kindof explanatory visualisation is seen in Figure 3.2, which waspublished in an article reporting on protests across US schools (inNovember 2015) regarding the underrepresentation of black students.Here you can see a scatter plot comparing the share of enrolled blackstudents for different public research universities (in the vertical axis)with the share of the college-age black populations in the respectivestates. With protests beginning at the University of Missouri, the chartuses red to highlight this data point within the chart to enablecomparison with other schools. Other notable schools are emphasisedto draw out some of the main insights. Additionally, using encodedoverlays, such as the trend line and dotted-line indicating proportionalrepresentation, the viewer is assisted beyond just perceiving the datato help them with the stage of interpretation: what does it mean to behigher or lower on this chart? Which locations are consideredgood/bad or typical/atypical?

126

Figure 3.2 Mizzou’s Racial Gap Is Typical On College Campuses

127

The best way to get your head around ‘explanatory’ visualisations isto consider how you would explain this display of analysis to aviewer if you were sat with that person in front of a screen or with aprintout. What features would you be pointing out to them? Whichvalues would you be pointing to as being the most interesting? Whatthings would you not need to explain? The traits of a goodexplanatory visualisation will accommodate these types of importantdescriptions, which would otherwise be verbalised, within the designof the chart itself, making it ‘stand’ alone without the need for in-person explanation.Towards the leftmost region of the map this is where the experience isabout generally more intensive attempts to enlighten an audience’sunderstanding of a subject. This could possibly be through the use ofa narrative structured around a compelling sequence of informationand/or a dramatic experience. The form of this type of work would becharacterised by videos or presentations, or maybe an animated ormotion graphic. Some term this ‘narrative’ visualisation. This isarguably where the most tangible demonstrations of visualisation

128

through storytelling (more on this later) are found. An example thattypifies this classification on the map would be characterised by thisvery powerful and popular video (Figure 3.3) about the issue ofwealth inequality in the USA. It employs a semi-animated slideshowsequence to weave together the narrative and is accompanied by aneffective and affective voiceover narrating the story.

Figure 3.3 Image taken from ‘Wealth Inequality in America’

Across all explanatory visualisations the visualiser will requiresufficient knowledge (or the skill and capacity to acquire this) aboutthe topic being shown in order to identify the most relevant,interesting and worthwhile insights to present to the viewer. Creatingexplanatory visualisations forces you to challenge how well youactually know a subject. If you cannot explain or articulate what isinsightful, and why, to others, then this probably means you do notknow the reasons yourself.Fundamentally, explanatory visualisations are the best-fit solution ifthe specific context dictates that saying nothing is not good enough;leaving viewers with a ‘so what?’ reaction would be seen as a failure,so in such cases a takeaway message(s) would need to be offered.Exploratory visualisations differ from explanatory visualisations inthat they are focused more on helping the viewer or – morespecifically in this case – the user find their own insights. Almostuniversally, these types of works will be digital and interactive innature. The ‘mildest’ forms of exploratory works are those thatfacilitate interrogation and manipulation of the data. You might beable to modify a view of the chart, perhaps by highlighting/filteringcertain categories of interest, or maybe change data parameters andswitch between different views. You might be able to hover over

129

different features to reveal detailed annotations. All of theseoperations facilitate understanding to the extent of the perceivingstage. The task of interpreting and comprehending will largely be theresponsibility of the viewer to form. This will be suitable if theintended audience have the necessary foundation knowledge for thesubject and sufficient interest to translate the general and personalmeaning.An example of this type of visualisation can be seen through theinteractive project (Figure 3.4). It was developed to allow users toexplore different measures concerning the dimension changes ofwood, over time, across selected cities of the world.

Figure 3.4 Dimensional Changes in Wood

There are no captions, no indications of what is significant orinsignificant, no assistance to form meaning through the use ofcolours or markers to emphasise ‘good’ or ‘bad’ values. This projectis simply a visual window into the analysis of this data that lets usersperceive the data values and interact with the different dimensionsoffered. To form meaning, it is open to them to determine whatfeatures of the data resonate with their existing interests, knowledgeand needs.As you look more towards the rightmost edge of the purpose map youreach far deeper exploratory experiences. You might characterisevisualisations here as facilitating a more participatory or contributoryexperience. The prospect of greater control, a deeper array of featuresand the possibility of contributing one’s own data to a visualisationcan be very seductive. Users are naturally drawn to challenges likequizzes and projects that allow them to make sense of their place inthe world (e.g. how does my salary compare with others; how well do

130

I know the area where I live?) – they are simply too hard to resist!The huge success of the New York Times’ so-called ‘Dialect map’(Figure 3.5), showing the similarity or otherwise of US dialects basedon users’ responses to 25 questions, is just one example of acontemporary project employing this participatory approach to greateffect.

Figure 3.5 How Y’all, Youse and You Guys Talk

The biggest obstacle to the success of an exploratory visualisation’simpact is the ‘so what?’ factor. ‘What do you want me to do with thisproject? Why is it relevant? What am I supposed to get out of this?’ Ifthese are the reactions you are seeing expressed by your intendedusers then there is a clear disconnect between the intentions of yourproject and the experience (or maybe expectations) of the audiencesusing it.Exhibitory visualisations are found in the final separate ‘experience’category within the latitude of the purpose map. They arecharacterised by being neither explicitly explanatory nor exploratory.With exhibitory visualisations the viewers have to do the work tointerpret meaning, relying on their own capacity to make sense of thedisplay of data (to perceive it) and the context of the subject-matter.As well as lacking explanatory qualities, they also do not offer scopefor exploratory interrogation. I generally describe them as simply

131

being visual displays of data. Think of this term in relation toexhibiting an artwork: it takes the interpretative capacity of theviewer to be able to understand the content of a display as well as thecontext of a display.When you look across the many different visualisations beingpublished you will find that many projects mistakenly fall into thevoid of being exhibitory visualisation when they really need to bemore supportively explanatory or functionally exploratory.So you might wonder what the value is of an exhibitory visualisation.Well, sometimes the setting for a visualisation does not needexploration or direct explanation. As I’ve stated, exhibitory projectsrely entirely on and make assumptions about the capacity of andinterest among the target audience. If you have a very specificaudience whom you know to be sufficiently knowledgeable about thedomain and the analysis you have provided, it might not needimportant insights to be surfaced in the way you would with anexplanatory visualisation. An explanatory project will mainly be foraudiences who do not have the knowledge, capacity or time to findfor themselves the key features of meaning (through interpreting)alone. Furthermore, the extent of the analysis might be so narrow thatthere is no fundamental need to incorporate ways of manipulating andpersonalising the experience as you would see with exploratoryvisualisations.

Figure 3.6 Spotlight on Profitability

In Figure 3.6, the analysis of the top three profitable movies by genreand year is not interactive (and so does not enable any explorations),nor does it bring to the surface any observations about notable moviesor conclusions about the relationship between movie ratings andtakings. It is intended as an exhibitory experience – a visual display

132

of this data – that lets you as a user draw your own conclusions, findyour own shapes of interest, and look up the movies that you want tosee data for.An exhibitory visualisation might also be a graphic that supports awritten article or report. In and of itself it does not explain things in astand-alone sense but instead exists as a visual prop for referencing.The written passages will therefore provide the explanatory narrativeseparate but still drawn from the supporting graphic.I mentioned earlier the scenario of sitting down with someone andexplaining a chart to them from a printout or a screen. As I said, thekey points verbalised in this setting would, for an explanatory piece,be directly incorporated within the graphic. Conversely, I might usean exhibitory visualisation in a presentation where my narrative,observations and gestures provide the explanatory experience – Iperform these myself, in person – rather than these being incorporatedwithin or around the chart(s). This would define the visualisation asexhibitory but presented in an explanatory setting. Two of the mostfamous visualisation-based presentations, Al Gore’s presentation inAn Inconvenient Truth and Hans Rosling’s ‘Gapminder TEDtalk’, areexcellent demonstrations of this.

One could argue that the Rosling talk was an explanatory presentationof an exploratory tool, but some of the main narrative was deliveredagainst a more exhibitory animation of data.

Tone

The vertical dimension of the purpose map concerns the intended tone ofthe visualisation, with reading tone positioned towards the top and feelingtone towards the bottom. Whereas the experience dimension had twodistinct and opposite sides (Explanatory vs Exploratory) with a pivot in themiddle (Exhibitory), the tone dimension is much more of a continuum withsubtle – and very subjective – variations between the two ends. What youare largely considering here is a judgement of the most suitable perceptualreadability of your data.

Whereas the difference between types of experience can be quite distinctonce you become familiar with the characteristics of each, defining tone isa slightly harder matter to nail down, especially as a beginner. The general

133

question you are asking yourself is: through what tone of voice in mydesign will the purpose of this project be accomplished? Let me elaborateby looking closely at the two ends of this continuum.

Reading tone: At the top of the purpose map the tone of yourvisualisation design choices will be geared towards optimising theease with which viewers can accurately estimate the magnitude of andrelationships between values. There is emphasis on the efficiency ofperceiving data. The reading tone would be your best-fit approachwhen the purpose of your work requires you to facilitateunderstanding with a high degree of precision and detail. This wouldalso be relevant in situations when there is no need to seduce anaudience through your aesthetic treatment. Furthermore, it suits theneeds well when the subject matter does not inherently embody ormerit any form of visual stimulation to convey the essence of themessage more potently. The visual quality created with this tonemight be considered rather utilitarian, formed around a style that feelsconsistent with adjectives like analytical, pragmatic, maybe even no-frills.Devices like bar charts, as you can see in Figure 3.7, are the posterboys for this type of display. As you will learn later, the perceptualaccuracy enabled by using the size of a bar to represent quantitativevalues makes these charts extremely effective options for visuallyportraying data in a way that aids both general sense-making andaccurate point-reading. That’s why they are so ubiquitous.Most of the visualisations you will ever produce will lean towardsthis reading end of the tonal continuum. Indeed, you might ask whywould you ever seek to create anything but the most easily andaccurately readable representations of data? Surely anything thatcompromises on this aim is undermining the principles of trustworthyand accessible design? Well, that’s why the definitions aroundpurpose are so significant in their influence and why we need toappreciate other perspectives.

Figure 3.7 Countries with the Most Land Neighbours

134

‘There’s a strand of the data viz world that argues everything could be abar chart. That’s possibly true but also possibly a world without joy.’Amanda Cox, Editor, The Upshot

Feeling tone: The lower end of this vertical dimension offers acontrasting tone of voice to that of reading. When I introduced inChapter 1 the sequence of understanding – from perceiving tointerpreting and through to comprehending – the illustration I gavewas based on perceiving a bar chart. Here you could easily andconfidently estimate the values portrayed by the bar sizes.Sometimes, though, your aims will not necessarily fit with thesingular desire to provide such a perceptually precise display:sometimes you might justify placing greater importance on the feelingof your data. At this side of the tonal spectrum there is more emphasisplaced on determining the gist of the big, medium and small valuesand a general sense of the relationships that exist. Sometimes an ‘at-a-glance’, high-level view is the most suitable way to portray asubject’s values.Again, let me address the likely objections from those spitting theircoffee at the very thought of any visualiser not giving the utmostpriority to precision, efficiency and accuracy in their work. Toappreciate why, on occasion, you might consider a different approachit is worth reflecting again on the motive for visualising data. Visualforms of data unquestionably offer a more revealing and moreefficient way to understand the quantities and relationships that existwithin data. It cannot be reasonably achieved either effectively or

135

efficiently through non-visual forms. By visualising data you arelooking for something more and something different from what, let’ssay, a table of data can offer. The bar chart, by way of example, offersthat. However, on occasion you might need something even moredifferent than this.In the project illustrated in Figure 3.8, you will see excerpts from ananalysis about the small number of families who have most financialclout when it comes to providing funding for presidential candidates.The data quantities are portrayed using Monopoly house pieces as ametaphor of wealth. The red houses represent the small number offamilies who have contributed nearly half of the initial campaignfunding. The green pieces are representative of the total households inthe US. You cannot count the pieces, you cannot even remotelyestimate their relative proportions, but you get the gist of the scalesinvolved as a proxy illustration of the remarkably disproportionatebalance and power of wealth. Furthermore, this use of the Monopolypieces is a symbolically strong metaphor as well as offering anappealing, almost playful approach to portraying the data.

Figure 3.8 Buying Power: The Families Funding the 2016 PresidentialElection

There will be times when you will need to consider employing what mightbe described as aesthetic seduction: some way of creating an appealingform that attracts viewers and encourages them to engage with a subjectthey might not have otherwise found relevant. This could involve a novelvisual ‘look’ that attracts – but also informs – or a functional feature thatattracts – but also performs. The influence of fun cannot be underestimatedhere. I repeat, we are all humans with occasionally quite base needs.Sometimes viewers crave something that stirs a more upbeat and upfront

136

emotional engagement.

Some may argue that viewers will be encouraged to engage with avisualisation if it is relevant to them, regardless of its appearance,otherwise they should not be considered part of the target audience. That isnot true, unfortunately. Perhaps in a business or operational setting, theneeds of individuals, roles and groups are much more clear-cut and youcan apply a binary perspective like that quite easily. Outside in the realworld there are many more nuances. As a viewer your interest in a subjectmay not materialise until after viewers have engaged with a visualisation.It may be a consequence not the prerequisite. Had they not been somehowattracted to view it in the first place they might never have reached thatpoint.

‘I love the idea of Edward Tufte’s assertion that “Graphical excellenceis that which gives to the viewer the greatest number of ideas in theshortest time with the least ink in the smallest space.” But I found thatwhen I developed magazine graphics according to that philosophy, theywere most often met with a yawn. The reality is that ScientificAmerican isn’t required reading. We need to engage readers, as well asinform them. I try to do that in an elegant, and refined, and smartmanner. To that end, I avoid illustrative details that distort the coreconcept. But I’m happy to include them if the topic could benefit from awelcoming gesture.’ Jen Christiansen, Graphics Editor at ScientificAmerican

‘On the one hand we had this great idea of doing something fun –animated lifts racing up and down buildings while the user was on theweb page. But on the other hand this is The Financial Times and thatcarries with it a responsibility to do things in a certain way. So we spenttime illustrating and designing to give the graphic high productionvalues, and it was then presented alongside an excellent piece ofjournalism from our manufacturing correspondent. The result? Anundeniably fun user experience, but delivered in such a way that met FTsubscribers’ standards for high quality visuals and high qualityjournalism.’ John Burn-Murdoch, Senior Data VisualisationJournalist at Financial Times, having fun visualising the speed ofelevators in skyscrapers

On a similar note, sometimes you will be working with a subject – like

137

wealth inequality, as we’ve just seen, or gun crime, as discussed earlier –that has the potential to stir strong emotions. Any visualisation of this datahas to contend with decisions about how to handle the perpetual baggageof feeling that comes as standard. Depending on the purpose of your workthere might be good reason to encapsulate and perhaps exploit theseemotions through your visualisation in a way that arguably a bar chartsimply may not be able to achieve. By embodying an emotional sensation(fear, shock, fun, power, inequity) through your display, you might be ableto influence how your viewer experiences that most elusive stage ofunderstanding, comprehending. The task of reasoning ‘what does thismean to me?’ is often a somewhat intangible notion but far less so when anemotional chord is struck. Behaviours can be changed. The making ofdecisions can be stirred. The taking of actions can be expedited. So long asthe audience’s needs, interest and setting are aligned this can be an entirelysuitable strategy.

For some in the visualisation field, this can be seen as manipulation and, toa certain degree it probably is. As long as you are still faithful to theunderlying data and you have not achieved an outcome throughsuperficial, artificial or deceptive means, I believe it is an entirelyappropriate motive in the right circumstances. As ever, there is a balanceto be struck and you must remind yourself of the influence of the designprinciples I introduced earlier to ensure that none of the choices you makehinder the overall goal of facilitating the type of understanding yourcontext decrees.

It is important to note that any visualisation work that leans more towards‘feeling’ is typically the exception and in a minority. However, a skilledvisualisation practitioner needs to have an adaptive view. They need to beable to recognise and respond to those occasions when the purpose doessupport an exceptional approach and a compromise beyond just serving themost perpetually accurate and efficient reading of data is required.

The Purpose Map in PracticeA simple illustration of the role of the purpose map involves momentarilyfocusing on a rather grave subject: data about offender executions. In2013, the State of Texas reached the unenviable milestone of havingexecuted its 500th death-row prisoner since the resumption of capitalpunishment in 1982. At the time of this landmark I came across a dataset

138

curated by the Texas Department of Criminal Justice and published on itswebsite. This simply structured table of data (Figure 3.9) included strikinginformation about the offenders, their offences and their final statements –a genuinely compelling source of data. Thinking about this subject and thedataset helps to frame the essence of what role this purpose map can play,especially in the tone dimension.

Since the milestone of 500 executions in 2013 the number has grownsignificantly. For the purpose of this illustration, for now we willconsider the nature of this data as it was at the moment of thismilestone.

Figure 3.9 Image taken from Texas Department of Criminal JusticeWebsite

Imagine viewing this data from a high vantage point, like in a hot-airballoon. The big picture is that there are 500 prisoners who have beenexecuted. That is the whole. Lowering the viewpoint, as you get a littlecloser, you might see a breakdown of race, showing 225 offenders werewhite, 187 black, 86 Hispanic and 2 defined as other. Lower still and you

139

see that 4 offenders originated from Anderson County. Lower againreveals that 112 offenders referred to God in their last statement. Down tothe lowest level – the closest vantage point – you see individuals andindividual items of data, such as Charles Milton, convicted in Tarrantcounty, who was aged 34 when executed on 25 June 1985.

The view of the data has travelled from a figurative perspective to a non-figurative one. The former is an abstraction of the data that effectivelysupresses the underlying phenomena being about people and translates –and maybe reduces it – into statistical quantities. People into numbers. Thelatter perspective concerns a more literal and realistic expression of whatthe data actually represents.

‘Data is a simplification – an abstraction – of the real world. So whenyou visualize data, you visualize an abstraction of the world.’ DrNathan Yau, Statistican and Author of Data Points

Going back to the discussion about judging tone, there are several differentpotential ways of portraying this executed offenders data depending on thepurpose that has been defined.

Suppose you worked at the Texas Department of Criminal Justice as amember of staff responsible for conducting and reporting data analysis.You might be asked to analyse the resource implications of all offenderscurrently on death row, looking at issues around their cost ‘footprint’. Inthis case you might seek to strip away all the emotive qualities of the dataand focus only on its statistical attributes. You would likely aim for afigurative or abstracted representation of the subject, reducing it tofundamental statistical quantities and high-level relationships. Yourapproach to achieve this would probably fit with the upper end of the tonaldimension, portraying your work with a utilitarian style that facilitates anefficient and precise reading of the data.

A different scenario may now involve your doing some visual work for acampaign group with a pro-capital-punishment stance. The approach mightbe to demonise the individuals, putting a human face to the offenders andtheir offences. The motive is to evoke sensation, shock and anger to getpeople to support this cause. Would a bar chart breakdown of the keystatistics accomplish this in tone? Possibly not.

140

Another situation could see you working for a newspaper that had aparticularly liberal viewpoint and was looking to publish a graphic to markthis sober milestone of 500 executions. You might avoid using the sternimagery of the offenders’ mug shots and instead focus on some of thehuman sentiments expressed in their last statements or on case studies ofsome of the extremely young offenders for whom life was perhaps nevergoing to follow a positive path. To humanise or demonise the individualsinvolved in this dataset is possible because there is such richness andintimate levels of detail available from the data.

‘I have this fear that we aren’t feeling enough.’ Chris Jordan, VisualArtist and Cultural Activist

It is worth reinforcing again that a figurative approach (reading) istypically what most of your work will involve and require. Only a smallproportion will require a non-figurative (feeling) approach even withemotive subjects. The whole point about introducing you to the alternativeperspective of the feeling tone is to prepare you for those occasions whenthe desired purpose of your work requires more of a higher-level grasp ofdata values or a deeper connection with subject matter through its data.

To complete this discussion, here are some final points to make about thepurpose map to further clarify and frame its scope.

Format: Firstly, it is important to stress that this map does not defineformat in terms of print, digital or physical. Exploratory visualisationswill almost entirely be digital but exhibitory or explanatory projectscould be print or digital.First thoughts not final commitment: Considering the definitions ofexperience and tone now simply represents the beginning of this kindof design thinking. As the workflow progresses you might change (orneed to change) your mind and pursue an alternative course,especially when you get deeper into data work, the nature of whichmay reveal a better fit with a completely different type of solution. Iwill state again that in these early stages the things you will thinkabout will be the first occasion on which you think about them but notthe last. The benefit of starting this kind of thinking now is theincreased focus it affords from any sense of eliminating potentialtypes of visualisation from your concern that will have no relevanceto your context.

141

Collective visual quality: Decisions around tone may not be solelyisolated to how data should be represented. There may be a broadersense of overall visual mood or ‘quality’ that you are trying to conveyacross the presentation design choices as well. As you will see, thereare other media assets (photos, videos, illustrations, text) that couldgo towards achieving a certain tone for the project that does notnecessarily directly influence the tone of the data.Not about a singular location: Some projects will involve just asingle chart and this makes it a far more straightforward prospect toinform your definition of its best-fit location on this purpose map.However, there will be other projects that you work on involvingmultiple chart assets, multiple interactions, different pages and deeperlayers. So, when it comes to considering your initial vision throughthe purpose map dimensions, you may recognise separate definitionsfor each major elements. This will become much clearer as you getdeeper into the project – and can actually identify the need formultiple assets.The mantra proposed by Ben Schneiderman, one of the mostesteemed academics in this field – ‘Overview first, details ondemand’ – informs the idea of thinking about different layers ofreadability and depth in your visualisation work accessed throughinteractivity. Some of the chart types that you will meet in Chapter 6can only ever hope to deliver a gist of the general magnitude ofvalues (the big, the small and the medium) and not their precisedetails. A treemap, for example, is never going to facilitate thedetailed perceiving of values because it uses rectangular areas torepresent data values and our perceptual system is generally quitepoor at judging different area scales. Additionally, a treemap oftencomprises a breakdown of many categorical values within the samechart display, so it is very busy and densely packed. However, if youhave the capability to incorporate interactive features that allow theuser to enter via this first overview layer and then explore beneath thesurface, maybe clicking on a shape to reveal a pop-up with precisevalue labels, you are opening up additional details.In effect you have moved your viewer’s readability up the tonalspectrum that began with more of a general feeling of data and thenmoved towards the reading of data as a result of the interactiveoperation. Sometimes a ‘gateway’ layer is required for your primaryview, to seduce your audience or to provide a big-picture overview(feeling), and then you can let the audience move on to more

142

perceptually precise displays of the data (reading) either throughinteraction or perhaps by advancing through pages in a report orslide-deck sequence.

In the Better Life Index, shown in Figure 3.10, the opening layer is basedaround a series of charts that look like flowers. This is attractive, intriguingand offers a nice, single-page, at-a-glance summary. The task of readingthe petal sizes with any degree of precision is hard but that is not the intentof this first layer. The purpose is to get a balance between a form thatattracts the user and a function that offers a general sense of where the big,medium and small values sit within the data. For those who want to readthe values with more precision, they are only a click away (on the flowers)from viewing an alternative display using a bar chart to represent the samevalues.

Figure 3.10 OECD Better Life Index

Figure 3.11 Losing Ground

143

Increasingly there is a trend for projects to incorporate bothexplanatory and exploratory experiences into the same overall project– the term ‘explorable explanations’ has been coined to describethem. A project like ‘Losing Ground’ by ProPublica (Figure 3.11) isan example of this as it moves between telling a story about thedisappearing coastline of Louisiana and enabling users to interrogateand adjust their view of the data at various milestone stages in thesequence.

Harnessing IdeasThe discussions so far in this chapter have involved practical reasoning.Before you move on to the immediate next stage of the design process –working with data – it can be valuable to briefly allow yourself theopportunity to harness your instinctive imagination.

Alongside your consideration of the purpose map, the other strand ofthinking about ‘vision’ concerns the earliest seeds of any ideas you mayhave in mind for what this solution might comprise or even look like.These might be mental manifestations of ideas you have formed yourselfor influenced or inspired by what you have seen elsewhere.

144

‘I focus on structural exploration on one side and on the reality and thelandscape of opportunities in the other … I try not to impose any earlyideas of what the result will look like because that will emerge from theprocess. In a nutshell I first activate data curiosity, client curiosity, andthen visual imagination in parallel with experimentation.’ SantiagoOrtiz, founder and Chief Data Officer at DrumWave, discussing therole – and timing – of forming ideas and mental concepts

There are limits to the value of ideas and also to the role they are allowedto play, as I will mention shortly, but your instincts can offer a uniqueperspective if you choose to allow them to surface. If you have a naturallyanalytical approach to visualisation this activity might seem to be thewrong way round: how can legitimate ideas be formed until the data hasbeen explored? I understand that, and it is a step that some readers willchoose not to entertain until later in the process. However, do not rule itout, see if liberating your imagination now adds value to your analyticalthinking later. There are several aspects to the concept and role ofharnessing ideas that I feel are valuable to consider at this primary stage:

Mental visualisation: This concerns the other meaning ofvisualisation and is about embracing what we instinctively ‘see’ inour mind’s eye when we consider the emerging brief for our task. InThinking Fast and Slow, by Daniel Kahneman, the author describestwo models of thought that control our thinking activities. He callsthese System 1 and System 2 thinking: the former is responsible forour instinctive, intuitive and metaphorical thoughts; the latter is muchmore ponderous, by contrast, much slower, and requiring of moremental effort when being called upon. System 1 thinking is what youwant to harness right now: what are the mental impressions that formquickly and automatically in your mind when you first think about thechallenge you’re facing?You cannot switch off System 1 thoughts. You will not be able tostop mental images formulating about what your mind’s eye seeswhen thinking about this problem instinctively. So, rather thanstifling your natural mental habits, this earliest stage of the workflowprocess presents the best possible opportunity to allow yourself spaceto begin imagining.What colours do you see? Sometimes instinctive ideas are reflectionsof our culture or society, especially the connotations of colour usage.What shapes and patterns strike you as being semantically aligned

145

with the subject? This can be useful not just to inspire but alsopossibly to obtain a glimpse into the similarly impulsive way theminds of your audience might connect with a subject whenconsuming the solution.For example, Figure 3.12 shows the size of production for differentgrape varieties across the wine industry. It uses a bubble chart tocreate the impression of a bunch of grapes. You can clearly see howthis concept might have been formed in early sketches before the dataeven arrived, based on the mental visualisation of what the shape of abunch of grapes looks like. It is consistent with the subject and offersan immediate metaphor that means any viewer looking at the workwill immediately spot the connection between form and subject.

Figure 3.12 Grape expectations

Keywords: What terms of language come to mind when thinkingabout the subject or the phenomena of your data? Figure 3.13 showssome notes I made in capturing the instinctive keywords and coloursthat came to mind when I was forming early thoughts and ideas abouta project to do with psychotherapy treatment in the Arctic.The words reflected the type of language I felt would be important toframe my design thinking, establishing a reference that could informthe tone of voice of my work. The colours were somewhat arbitraryand in the end I did not actually use them all, but they were indicativeof the tones I was seeking. I did, however, see through my intention

146

to avoid the blacks and blues (as they would carry unwelcome andclichéd connotations in this subject’s context).

Figure 3.13 Example of Keywords and Colour Swatch Ideas

Sketching: As well as taking notes, sketching ideas is of great valueto us here. I mentioned earlier that this is not about being a giftedartist but recognising the freedom and speed when extracting ideasfrom your mind onto paper. This is particularly helpful if you areworking with collaborators and want a low-fidelity sketch fordiscussing plans, as well as in early discussions with stakeholders tounderstand better each others’ take on the brief. For some people, themost fluent and efficient way to ‘sketch’ is through their softwareapplication of choice rather than on paper.

‘I draw to freely explore possibilities. I draw to visually understandwhat I am thinking. I draw to evaluate my ideas and intuitions by seeingthem coming to life on paper. I draw to help my mind think withoutlimitations, without boundaries. The act of drawing, and the very factwe choose to stop and draw, demands focus and attention. I use drawingas my primary expression, as a sort of functional tool for capturing andexploring thoughts.’ Giorgia Lupi, Co-founder and Design Directorat Accurat

147

Regardless of whether your tool is the pen or the computer, justsketch your ideas with whatever is the most efficient and effectiveoption given your time and confidence (see Figure 3.14). You willlikely refine your sketches later on and, indeed, eventually you willmove your attention completely away from pen and paper and ontothe tools you are using to create the final work.

Figure 3.14 Example of a Concept Sketch, by Giorgia Lupi

Research and inspiration: It is important to be sufficiently open toinfluence and inspiration from the world around you. Exposing yoursenses to different sources of reference both within and outside ofvisualisation can only help to broaden the range of solutions youmight be able to conceive. Research the techniques that are beingused around the visualisation field, look through books and see howothers might have tackled similar subjects or curiosities (e.g. howthey have shown changes over time on a map).Beyond visualisation consider any source of imagery that inspiresyou: colours, patterns, shapes, and metaphors from everyday lifewhose aesthetic qualities you just like. In addition to your notebookand sketch pad, start a scrapbook or project mood board that compilesthe sources of inspiration you come across and helps you form ideas

148

about the style, tone or essence of your project. They might not haveimmediate value for the current project you are working on but maymaterialise as useful for future work.

‘Recently taking up drawing has helped me better articulate the images Isee in my mind, otherwise I still follow up on all different types ofdesign and art outside information design/data visualisation. I try tolook at things outside my field as often as I can to keep my mind freshas opposed to only looking at projects from my field for inspiration.’Stefanie Posavec, Information Designer

‘Look at how other designers solve visual problems (but don’t copy thelook of their solutions). Look at art to see how great painters use space,and organise the elements of their pictures. Look back at the history ofinfographics. It’s all been done before, and usually by hand! Drawsomething with a pencil (or pen … but NOT a computer!) Sketch often:The cat asleep. The view from the bus. The bus. Personally, I listen tomusic – mostly jazz – a lot.’ Nigel Holmes, Explanation GraphicDesigner, on inspirations that feed his approach

‘It is easy to immerse yourself in a certain idea, but I think it isimportant to step back regularly and recognise that other people havedifferent ways of interpreting things. I am very fortunate to work withpeople whom I greatly admire and who also see things from a differentperspective. Their feedback is invaluable in the process.’ Jane Pong,Data Visualisation Designer

Limitation of your ideas: There are important limitations toacknowledge around the role of ideas. Influence and inspiration arehealthy: the desire to emulate what others have done isunderstandable. Plagiarism, copying and stealing uncredited ideas arewrong. There are ambiguities in any creative discipline about theboundaries between influence and plagiarism, and the worlds ofvisualisation and infographic design are not spared that challenge.Being influenced by the research you do and the great work you seearound the field is not stealing, but if you do incorporate explicitideas influenced by others in your work, at the very least you shoulddo the noble thing and credit the authors, or even better seek them outand ask them to grant you their approval. You do not have to credit

149

William Playfair every time you use the bar chart, but there arecertain unique visual devices that will be unquestionably deserving ofattribution.Secondly, data is your raw material, your ideas are not. As you willsee later, it is vital that you leave the main influence for your thinkingto emerge from the type, size and meaning of your data. It may bethat your ideas are ultimately incompatible with these properties ofthe data, in which case you will need to set these aside, and perhapsform new ones.Eventually you will need to evolve from ideas and sketched conceptsto starting to develop a solution in your tool of choice. These earlyideas and sparks of creativity are vital and they should be embraced,but do not be precious or stubborn, always maintain an open mindand recognise that they have a limited role. Try to ignore the voices inyour head after a certain period!Limitation of others’ ideas: Finally, there is the diplomaticchallenge of being faced with the prospect of taking on board otherpeople’s ideas. One of the greatest anxieties I face comes fromworking with stakeholders who are unequivocally and emphaticallyclear about what they think a solution should look like. Often yourinvolvement in a project may arrive after these ideas have alreadybeen formed and have become the basis of the brief issued by thestakeholders to you (‘Can you make this, please?’). This is whereyour tactful but assured communicator’s skill set comes to the fore.The ideas presented may be reasonable and well intended but it isyour responsibility to lead on the creation process and guide it awayfrom an early concept that simply may not work out. You can takethese idea on board but, as with the limitations of your own ideas,there will be other factors with a greater influence – the nature of thedata, the type of curiosities you are pursuing, the essence of thesubject matter and the nature of the audience, among many otherthings. These will be the factors that ultimately dictate whether anyearly vision of potential ideas ends up being of value.

Summary: Formulating Your Brief

Establishing Your Project’s Context

Defining Your Origin Curiosity Why are we doing it: what type of

150

curiosity has motivated the decision/desire to undertake this visualisationproject?

Personal intrigue: ‘I wonder what …’Stakeholder intrigue: ‘He/she needs to know …’Audience intrigue: ‘They need to know …’Anticipated intrigue: ‘They might be interested in knowing …’Potential intrigue: ‘There should be something interesting …’

Circumstances The key factors that will impact on your critical thinkingand shape your ambitions:

People: stakeholders, audience.Constraints: pressures, rules.Consumption: frequency, setting.Deliverables: quantity, format.Resources: skills, technology.

Defining Your Purpose The ‘so what?’: what are we trying to accomplishwith this visualisation? What is a successful ‘outcome’?

Establishing Your Project’s Vision

‘Purpose Map’ Plotting your expectation of what will be the best-fit typeof solution to facilitate the desired purpose:

What kind of experience? Explanatory, exhibitory or exploratory?What tone of voice will it offer? The efficiency and perceptibility ofreading data vs the high-level, affective nature of feeling data?

Harnessing Ideas What mental images, ideas and keywords instinctivelycome to mind when thinking about the subject matter of this challenge?What influence and inspiration can you source from elsewhere that mightstart to shape your thinking?

Tips and Tactics

Do not get hung up if you are struggling with some circumstantialfactors. Certain things may change in definition, some undefinedthings will emerge, some defined things will need to be reconsidered,

151

some things are just always open.Notes are so important to keep about any thoughts you have had thatexpress the nature of your curiosity, articulation of purpose, anyassumptions, things you know and do not know, where you mightneed to get data from, who are the experts, questions, things to do,issues/problems, wish lists …Keep a ‘scrapbook’ (digital bookmarks, print clippings) of anythingand everything that inspires and influences you – not just datavisualisations. Log your ideas and inspire yourself.This stage is about ambition management/skills – it is to your benefitthat you treat it with the thoroughness it needs. The negative impactof any corners being cut here will be amplified later on.

152

4 Working With Data

In Chapter 3 the workflow process was initiated by exploring the definingmatters around context and vision. The discussion about curiosity, framingnot just the subject matter of interest but also a specific enquiry that youare seeking an answer to, in particular leads your thinking towards thissecond stage of the process: working with data.

In this chapter I will start by covering some of the most salient aspects ofdata and statistical literacy. This section will be helpful for those readerswithout any – or at least with no extensive – prior data experience. Forthose who have more experience and confidence with this topic, maybethrough their previous studies, it might merely offer a reminder of some ofthe things you will need to focus on when working with data on avisualisation project.

There is a lot of hard work that goes into the activities encapsulated by‘working with data’. I have broken these down into four different groupsof action, each creating substantial demands on your time:

Data acquisition: Gathering the raw material.Data examination: Identifying physical properties and meaning.Data transformation: Enhancing your data through modification andconsolidation.Data exploration: Using exploratory analysis and research techniquesto learn.

You will find that there are overlapping concerns between this chapter andthe nature of Chapter 5, where you will establish your editorial thinking.The present chapter generally focuses more on the mechanics offamiliarisation with the characteristics and qualities of your data; the nextchapter will build on this to shape what you will actually do with it.

As you might expect, the activities covered in this chapter are associatedwith the assistance of relevant tools and technology. However, the focusfor the book will remain concentrated on identifying which tasks you haveto undertake and look less at exactly how you will undertake these. Therewill be tool-specific references in the curated collection of resources that

153

are published in the digital companion.

4.1 Data Literacy: Love, Fear and LoathingI frequently come across people in the field who declare their love for data.I don’t love data. For me it would be like claiming ‘I love food’ when,realistically, that would be misleading. I like sprouts but hate carrots. Anddon’t get me started on mushrooms.

At the very start of the book, I mentioned that data might occasionallyprove to be a villain in your quest for developing confidence with datavisualisation. If data were an animal it would almost certainly be a cat: ithas a capacity to earn and merit love but it demands a lot of attention andalways seems to be conspiring against you.

I love data that gives me something interesting to do analysis-wise andthen, subsequently, also visually. Sometimes that just does not happen.

I love data that is neatly structured, clean and complete. This rarely exists.Location data will have inconsistent place-name spellings, there will bedates that have a mixture of US and UK formats, and aggregated data thatdoes not let me get to the underlying components.

You don’t need to love data but, equally, you shouldn’t fear data. Youshould simply respect it by appreciating that it will potentially need lots ofcare and attention and a shift in your thinking about its role in the creativeprocess. Just look to develop a rapport with it, embracing its role as theabsolutely critical raw material of this process, and learn how to nurture itspotential.

For some of you reading this book, you might have interest in data butpossibly not much knowledge of the specific activities involving data asyou work on a visualisation design solution. An assumed prerequisite foranyone working in data visualisation is an appreciation of data andstatistical literacy. However, this is not always the case. One of the biggestcauses of failure in data visualisations – especially in relation to theprinciple I introduced about ‘trustworthy design’ – comes from a poorunderstanding of these numerate literacies. This can be overcome, though.

154

‘When I first started learning about visualisation, I naively assumed thatdatasets arrived at your doorstep ready to roll. Begrudgingly I acceptedthat before you can plot or graph anything, you have to find the data,understand it, evaluate it, clean it, and perhaps restructure it.’ MarciaGray, Graphic Designer

I discussed in the Introduction the different entry points from which peopledoing data visualisation work come. Typically – but absolutely notuniversally – those who join from the more creative backgrounds ofgraphic design and development might not be expected to have developedthe same level of data and statistical knowledge than somebody from themore numerate disciplines. If you are part of this creative cohort and canidentify with this generalisation, then this chapter will ease you throughthe learning process (and in doing so hopefully dispel any myth that it isespecially complicated).

Conversely, many others may think they do not know enough about databut in reality they already do ‘get’ it – they just need to learn more aboutits role in visualisation and possibly realign their understanding of some ofthe terminology. Therefore, before delving further into this chapter’s tasks,there are a few ‘defining’ matters I need to address to cover the basics inboth data and statistical literacy.

Data Assets and Tabulation TypesFirstly, let’s consider some of the fundamentals about what a dataset is aswell as what shape and form it comes in.

When working on a visualisation I generally find there are two maincategories of data ‘assets’: data that exist in tables, known as datasets; anddata that exists as isolated values.

For the purpose of this book I describe this type of data as being rawbecause it has not yet been statistically or mathematically manipulatedand it has not been modified in any other way from its original state.

Tabulated datasets are what we are mainly interested in at this point. Dataas isolated values refers to data that exists as individual facts and statisticalfigures. These do not necessarily belong in, nor are they normally

155

collected in, a table. They are just potentially useful values that aredispersed around the Web or across reports: individual facts or figures thatyou might come across during your data gathering or research stages. Lateron in your work you might use these to inform calculations (e.g. applyinga currency conversion) or to incorporate a fact into a title or caption (e.g.78% of staff participated in the survey), but they are not your main focusfor now.

Tabulated data is unquestionably the most common form of data asset thatyou will work with, but it too can exist in slightly different shapes andsizes. A primary difference lies between what can be termed normaliseddatasets (Figure 4.1) and cross-tabulated datasets (Figure 4.2).

A normalised dataset might loosely be described as looking like lists ofdata values. In spreadsheet parlance, you would see this as a series ofcolumns and rows of data, while in database parlance it is the arrangementof fields and records. This form of tabulated data is generally the mostdetailed form of data available for you to work with. The table in Figure4.1 is an example of normalised data where the columns of variablesprovide different descriptive values for each movie (or record) held in thetable.

Figure 4.1 Example of a Normalised Dataset

Cross-tabulated data is presented in a reconfigured form where, instead ofdisplaying raw data values, the table of cells contain the results ofstatistical operations (like summed totals, maximums, averages). These

156

values are aggregated calculations formed from the relationship betweentwo variables held in the normalised form of the data. In Figure 4.2, youwill see the cross-tabulated result of the normalised table of movie data,now showing a statistical summary for each movie category. The statisticunder ‘Max Critic Rating’ is formed from an aggregating calculation basedon the ‘Critic Rating’ and ‘Category’ variables seen in Figure 4.1.

Figure 4.2 Example of a Cross-tabulated Dataset

Typically, if you receive data in an already cross-tabulated form, you donot have access to the original data. This means you will not be able to‘reverse-engineer’ it back into its raw form, which, in turn, means youhave reduced the scope of your potential analysis. In contrast, normaliseddata gives you complete freedom to explore, manipulate and aggregateacross multiple dimensions. You may choose to convert the data into‘cross-tabulated’ form but that is merely an option that comes with theluxury of having access to the detailed form of your data. In summary, it isalways preferable, where possible, to work with normalised data.

Data TypesOne of the key parts of the design process concerns understanding thedifferent types of data (sometimes known as levels of data or scales ofmeasurement). Defining the types of data will have a huge influence on somany aspects of this workflow, such as determining:

the type of exploratory data analysis you can undertake;the editorial thinking you establish;the specific chart types you might use;the colour choices and layout decisions around composition.

In the simplest sense, data types are distinguished by being eitherqualitative or quantitative in nature. Beneath this distinction there areseveral further separations that need to be understood. The most usefultaxonomy I have found to describe these different types of data is based onan approach devised by the psychologist researcher Stanley Stevens. He

157

developed the acronym NOIR as a mnemonic device to cover the differenttypes of data you may come to work with, particularly in social research:Nominal, Ordinal, Interval, and Ratio. I have extended this, adding ontothe front a ‘T’ – for Textual – which, admittedly, somewhat underminesthe grace of the original acronym but better reflects the experiences ofhandling data today. It is important to describe, define and compare thesedifferent types of data.

Textual (Qualitative)

Textual data is qualitative data and generally exists as unstructured streamsof words. Examples of textual data might include:

‘Any other comments?’ data submitted in a survey.Descriptive details of a weather forecast for a given city.The full title of an academic research project.The description of a product on Amazon.The URL of an image of Usain Bolt’s victory in the 100m at the 2012Olympics.

Figure 4.3 Graphic Language: The Curse of the CEO

158

In its native form, textual data is likely to offer rich potential but it canprove quite demanding to unlock this. To work with textual data in ananalysis and visualisation context will generally require certain naturallanguage processing techniques to derive or extract classifications,sentiments, quantitative properties and relational characteristics.

159

An example of how you can use textual data is seen in the graphic of CEOswear word usage shown in Figure 4.3. This analysis provides abreakdown of the profanities used by CEOs from a review of recordedconference calls over a period of 10 years. This work shows the two waysof utilising textual data in visualisation. Firstly, you can derive categoricalclassifications and quantitative measurements to count the use of certainwords compared to others and track their usage over time. Secondly, theoriginal form of the textual data can be of direct value for annotationpurposes, without the need for any analytical treatment, to include ascaptions.

Working with textual data will always involve a judgement of reward vseffort: how much effort will I need to expend in order to extract usable,valuable content from the text? There are an increasing array of tools andalgorithmic techniques to help with this transformational approach butwhether you conduct it manually or with some degree of automation it canbe quite a significant undertaking. However, the value of the insights youare able to extract may entirely justify the commitment. As ever, yourjudgment of the aims of your work, the nature of your subject and theinterests of your audience will influence your decision.

Nominal (Qualitative)

Nominal data is the next form of qualitative data in the list of distinct datatypes. This type of data exists in categorical form, offering a means ofdistinguishing, labelling and organising values. Examples of nominal datamight include:

The ‘gender’ selected by a survey participant.The regional identifier (location name) shown in a weather forecast.The university department of an academic member of staff.The language of a book on Amazon.An athletic event at the Olympics.

Often a dataset will hold multiple nominal variables, maybe offeringdifferent organising and naming perspectives, for example the gender, eyecolour and hair colour of a class of school kids.

Additionally, there might be a hierarchical relationship existing betweentwo or more nominal variables, representing major and sub-categorical

160

values: for example, a major category holding details of ‘Country’ and asub-category holding ‘Airport’; or a major category holding details of‘Industry’ and a sub-category holding details of ‘Company Names’.Recognising this type of relationship will become important whenconsidering the options for which angles of analysis you might decide tofocus on and how you may portray them visually using certain chart types.

Nominal data does not necessarily mean text-based data; nominal valuescan be numeric. For example, a student ID number is a categorical deviceused uniquely to identify all students. The shirt number of a footballer is away of helping teammates, spectators and officials to recognise eachplayer. It is important to be aware of occasions when any categoricalvalues are shown as numbers in your data, especially in order tounderstand that these cannot have (meaningful) arithmetic operationsapplied to them. You might find logic statements like TRUE or FALSEstated as a 1 and a 0, or data captured about gender may exist as a 1(male), 2 (female) and 3 (other), but these numeric values should not beconsidered quantitative values – adding ‘1’ to ‘2’ does not equal ‘3’ (other)for gender.

Ordinal (Qualitative)

Ordinal data is still categorical and qualitative in nature but, instead ofthere being an arbitrary relationship between the categorical values, thereare now characteristics of order. Examples of nominal data might include:

The response to a survey question: based on a scale of 1 (unhappy) to5 (very happy).The general weather forecast: expressed as Very Hot, Hot, Mild,Cold, Freezing.The academic rank of a member of staff.The delivery options for an Amazon order: Express, Next Day, SuperSaver.The medal category for an athletic event: Gold, Silver, Bronze.

Whereas nominal data is a categorical device to help distinguish values,ordinal data is also a means of classifying values, usually in some kind ofranking. The hierarchical order of some ordinal values goes through asingle ascending/descending rank from high or good values to low or badvalues. Other ordinal values have a natural ‘pivot’ where the direction

161

changes around a recognisable mid-point, such as the happiness scalewhich might pivot about ‘no feeling’ or weather forecast data that pivotsabout ‘Mild’. Awareness of these different approaches to ‘order’ willbecome relevant when you reach the design stages involving theclassifying of data through colour scales.

Interval (Quantitative)

Interval data is the less common form of quantitative data, but it is stillimportant to be aware of and to understand its unique characteristics. Aninterval variable is a quantitative and numeric measurement defined bydifference on a scale but not by relative scale. This means the differencebetween two values is meaningful but an arithmetic operation such asmultiplication is not.

The most common example is the measure for temperature in a weatherforecast, presented in units of Celsius. The absolute difference between15°C and 20°C is the same difference as between 5°C and 10°C. However,the relative difference between 5°C and 10°C is not the same as thedifference between 10°C and 20°C (where in both cases you multiply bytwo or increase by 100%). This is because a zero value is arbitrary andoften means very little or indeed is impossible. A temperature reading of0°C does not mean there is no temperature, it is a quantitative scale formeasuring relative temperature. You cannot have a shoe size or BodyMass Index of zero.

Ratio (Quantitative)

Ratio data is the most common quantitative variable you are likely to comeacross. It comprises numeric measurements that have properties ofdifference and scale. Examples of nominal data might include:

The age of a survey participant in years.The forecasted amount of rainfall in millimetres.The estimated budget for a research grant proposal in GBP (£).The number of sales of a book on Amazon.The distance of the winning long jump at the 2012 Olympics inmetres.

Unlike interval data, for ratio data variables zero means something. The

162

absolute difference in age between a 10 and 20 year old is the same as thedifference between a 40 and 50 year old. The relative difference between a10 and a 20 year old is the same as the difference between a 40 and an 80year old (‘twice as old’).

Whereas most of the quantitative measurements you will deal with arebased on a linear scale, there are exceptions. Variables about the strengthof sound (decibels) and magnitude of earthquakes (Richter) are actuallybased on a logarithmic scale. An earthquake with a magnitude of 4.0 onthe Richter scale is 1000 times stronger based on the amount of energyreleased than an earthquake of magnitude 2.0. Some consider these astypes of data that are different from ratio variables. Most still define themas ratio variables but separate them as non-linear scaled variables.

If temperature values were measured in kelvin, where there is anabsolute zero, this would be considered a ratio scale, not an interval one.

Temporal Data

Time-based data is worth mentioning separately because it can be afrustrating type of data to deal with, especially in attempting to define itsplace within the TNOIR classification. The reason for this is that differentcomponents of time can be positioned against almost all data types,depending simply on what form your time data takes:

Textual: ‘Four o’clock in the afternoon on Monday, 12 March 2016’

Ordinal: ‘PM’, ‘Afternoon’, ‘March’, ‘Q1’

Interval: ‘12’, ‘12/03/2016’, ‘2016’

Ratio: ‘16:00’

Note that time-based data is separate in concern to duration data, which,while often formatted in structures such as hh:mm:ss, should be seen as aratio measure. To work with duration data it is often useful to transform itinto single units of time, such as total seconds or minutes.

163

Discrete vs Continuous

Another important distinction to make about your data, and something thatcuts across the TNOIR classification, is whether the data is discrete orcontinuous. This distinction is influential in how you might analyse itstatistically and visually.

The relatively simple explanation is that discrete data is associated with allclassifying variables that have no ‘in-between’ state. This applies to allqualitative data types and any quantitative values for which only a whole ispossible. Examples might be:

Heads or tails for a coin toss.Days of the week.The size of shoes.Numbers of seats in a theatre.

In contrast, continuous variables can hold the value of an in-between stateand, in theory, could take on any value between the natural upper andlower limits if it was possible to take measurements in fine degrees ofdetail, such as:

Height and weight.Temperature.Time.

One of the classifications that is hard to nail down involves data that could,on the TNOIR scale, arguably fall under both ordinal and ratio definitionsbased on its usage. This makes it hard to determine if it should beconsidered discrete or continuous. An example would be the star systemused for rating a movie or the happiness rating. When a star rating value isoriginally captured, the likelihood is that the input data was discrete innature. However, for analysis purposes, the statistical operations applied todata that is based on different star ratings could reasonably be treatedeither as discrete classifications or, feasibly, as continuous numeric values.For both star review ratings or happiness ratings decimal averages couldbe calculated as a way of formulating average score. (The median andmode would still be discrete.) The suitability of this approach will dependon whether the absolute difference between classifying values can beconsidered equal.

164

4.2 Statistical LiteracyIf the fear of data is misplaced, I can sympathise with anybody’strepidation towards statistics. For many, statistics can feel complicated tounderstand and too difficult a prospect to master. Even for those relativelycomfortable with stats, it is unquestionably a discipline that can easilybecome rusty without practice, which can also undermine your confidence.Furthermore, the fear of making mistakes with delicate and rule-basedstatistical calculations also depresses the confidence levels lower than theyneed to be.

The problem is that you cannot avoid the need to use some statisticaltechniques if you are going to work with data. It is therefore important tobetter understand statistics and its role in visualisation, as you must dowith data. Perhaps you can make the problem more surmountable bypackaging the whole of statistics into smaller, manageable elements thatwill dispel the perception of overwhelming complexity.

I do believe that it is possible to overstate the range and level of statisticaltechniques most people will need to employ on most of their visualisationtasks. The caveats are important as I know there will be people withvisualisation experience who are exposed to a tremendous amount ofstatistical thinking in their work, but it is a relevant point.

It all depends, of course. From my experience, however, the majority ofdata visualisation challenges will generally involve relativelystraightforward univariate and multivariate statistical techniques.Univariate techniques help you to understand the shape, size and range ofquantitative values. Multivariate techniques help you to explore thepossible relationships between different combinations of variables andvariable types. I will describe some of the most relevant statisticaloperations associated with these techniques later in this chapter, at thepoint in your thinking where they are most applicable.

As you get more advanced in your work (and your confidence increases)you might have occasion to employ inference techniques. These includeconcepts such as data modelling and the use of regression analysis:attempting to measure the relationships between variables to explorecorrelations and (the holy grail) causations. Many of you will likelyexperience visualisation challenges that require an understanding of

165

probabilities, testing hypotheses and becoming acquainted with terms likeconfidence intervals. You might use these techniques to assist withforecasting or modelling risk and uncertainty. Above and beyond that, youare moving towards more advanced statistical modelling and algorithmdesign.

It is somewhat dissatisfactory to allocate only a small part of this text todiscussing the role of descriptive and exploratory statistics. However, forthe scope of this book, and seeking to achieve a pragmatic balance, themost sensible compromise is just to flag up which statistical activities youmight need to consider and where these apply. It can take years to learnabout the myriad advanced techniques that exist and it takes experience toknow when and how to deploy all the different methods.

There are hundreds of books better placed to offer the depth of detail youtruly need to fulfil these activities and there is no real need to reinvent thewheel – and indeed reinvent an inferior wheel. That statistics is just onepart of the visualisation challenge, and is in itself such a prolific field,further demonstrates the variety and depth of this subject.

4.3 Data AcquisitionThe first step in working with data naturally involves getting it. As Ioutlined in the contextual discussion about the different types of triggercuriosities, you will only have data in place before now if the opportunitypresented by the data was the factor that triggered this work. You willrecall this scenario was described as pursuing a curiosity born out of‘potential intrigue’. Otherwise, you will only be in a position to know whatdata you need after having established your specific or general motivatingcuriosity. In these situations, once you have sufficiently progressed yourthinking around ‘formulating your brief’, you will need to switch yourthinking onto the task of acquiring your data:

What data do you need and why?From where, how, and by whom will the data be acquired?When can you obtain it?

What Data Do You Need?

166

Your primary concern is to ensure you can gather sufficient data about thesubject in which you are interested to pursue your identified curiosity. By‘sufficient’, I mean you will need to establish some general criteria in yourmind for what data you do need and what data you do not need. There isno harm in getting more than you need at this stage but it can result inwasted efforts, waste that you would do well to avoid.

Let’s propose you have defined your curiosity to be ‘I wonder what a mapof McDonald’s restaurant openings looks like over time?’. In this scenarioyou are going to try to find a source of data that will provide you withdetails of all the McDonald’s restaurants that have ever opened. Ashopping list of data items would probably include the date of opening, thelocation details (as specific as possible) and maybe even a closing date toensure you can distinguish between still operating and closed-downrestaurants.

You will need to conduct some research, a perpetual strand of activity thatruns throughout the workflow, as I explained earlier. In this scenario youmight need first to research a bit of the history of McDonald’s restaurantsto discover, for instance, when the first one opened, how many there are,and in which countries they are located. This will establish an initial senseof the timeframe (number of years) and scale (outlets, global spread) ofyour potential data. You might also discover significant differencesbetween what is considered a restaurant and what is just a franchisepositioned in shopping malls or transit hubs. Sensitivities around thequalifying criteria or general counting rules of a subject are important todiscover, as they will help significantly to substantiate the integrity andaccuracy of your work.

Unless you know or have been told where to find this restaurant data, youwill then need to research from where the data might be obtainable. Willthis type of information be published on the Web, perhaps on thecommercial pages of McDonald’s own site? You might have to get intouch with somebody (yes, a human) in the commercial or PR departmentto access some advice. Perhaps there will be some fast-food enthusiast insome niche corner of the Web who has already gathered and madeavailable data like this?

Suppose you locate a dataset that includes not just McDonald’s restaurantsbut all fast-food outlets. This could potentially broaden the scope of yourcuriosity, enabling broader analysis about the growth of the fast-food

167

industry at large to contextualise MacDonald’s contribution to this.Naturally, if you have any stakeholders involved in your project, youmight need to discuss with them the merits of this wider perspective.

Another judgement to make concerns the resolution of the data youanticipate needing. This is especially relevant if you are working with big,heavy datasets. You might genuinely want and need all available data.This would be considered full resolution – down to the most detailed grain(e.g. all details about all MacDonald’s restaurants, not just totals per cityor country). Sometimes, in this initial gathering activity, it may be morepractical just to obtain a sample of your data. If this is the case, what willbe the criteria used to identify a sufficient sample and how will you selector exclude records? What percentage of your data will be sufficient to berepresentative of the range and diversity (an important feature we willneed to examine next)? Perhaps you only need a statistical, high-levelsummary (total number of restaurants opened by year)?

The chances are that you will not truly know what data you want or needuntil you at least get something to start with and learn from there. Youmight have to revisit or repeat the gathering of your data, so an attitude of‘what I have is good enough to start with’ is often sensible.

From Where, How and By Whom Will the DataBe Acquired?There are several different origins and methods involved in acquiring data,depending on whether it will involve your doing the heavy work to curatethe data or if this will be the main responsibility of others.

Curated by You

This group of data-gathering tasks or methods is characterised by yourhaving to do most of the work to bring the data together into a convenientdigital form.

Primary data collection: If the data you need does not exist or you needto have full control over its provenance and collection, you will have toconsider embarking on gathering ‘primary’ data. In contrast to secondarydata, primary data involves you measuring and collecting the raw data

168

yourself. Typically, this relates to situations where you gather quite small,bespoke datasets about phenomena that are specific to your needs. It mightbe a research experiment you have designed and launched for participantsto submit responses. You may manually record data from othermeasurement devices, such as your daily weight as measured by yourbathroom scales, or the number of times you interacted face-to-face withfriends and family. Some people take daily photographs of themselves,their family members or their gardens, in order to stitch these backtogether eventually to portray stories of change. This data-gatheringactivity can be expensive in terms of both the time and cost. The benefithowever is that you have carefully controlled the collection of the data tooptimise its value for your needs.

Manual collection and data foraging: If the data you need does not existdigitally or in a convenient singular location, you will need to forage for it.This again might typically relate to situations where you are sourcingrelatively small datasets. An example might be researching historical datafrom archived newspapers that were only published in print form and notavailable digitally. You might look to pull data from multiple sources tocreate a single dataset: for example, if you were comparing the attributesof a range of different cars and weighing up which to buy. To achieve thisyou would probably need to source different parts of the data you needfrom several different places. Often, data foraging is something youundertake in order to finish off data collected by other means that mighthave a few missing values. It is sometimes more efficient to find theremaining data items yourself by hand to complete the dataset. This can besomewhat time-consuming depending on the extent of the manualgathering required, but it does provide you with greater assurance over thefinal condition of the data you have collected.

Extracted from pdf files: A special subset of data foraging – or avariation at least – involves those occasions when your data is digital butessentially locked away in a pdf file. For many years now reportscontaining valuable data have been published on the Web in pdf form.Increasingly, movements like ‘open data’ are helping to shift the attitudesof organisations towards providing additional, fully accessible digitalversions of data. Progress is being made but it will take time before allindustries and government bodies adopt this as a common standard. In themeantime, there are several tools on the market (free and proprietary) thatwill assist you in extracting tables of data from pdf files and converting

169

these to more usable Excel or CSV formats.

Some data acquisition tasks may be repetitive and, should you possessthe skills and have access to the necessary resources, there will be scopefor exploring ways to automate these. However, you always have toconsider the respective effort and ongoing worth of your approach. Ifyou do go to the trouble of authoring an automation routine (of anydescription) you could end up spending more time on that than youwould otherwise collecting by more manual methods. If it is going to bea regular piece of analysis the efficiency gains from your automationwill unquestionably prove valuable going forward, but, for any one-offprojects, it may not be ultimately worth it

Web scraping (also known as web harvesting): This involves usingspecial tools or programs to extract structured and unstructured items ofdata published in web pages and convert these into tabulated form foranalysis. For example, you may wish to extract several years’ worth of testcricket results from a sports website. Depending on the tools used, you canoften set routines in motion to extract data across multiple pages of a sitebased on the connected links that exist within it. This is known as webcrawling. Using the same example (let’s imagine), you could further yourgathering of test cricket data by programmatically fetching data back fromthe associated links pointing to the team line-ups. An importantconsideration to bear in mind with any web scraping or crawling activityconcerns rules of access and the legalities of extracting the data held oncertain sites. Always check – and respect – the terms of use beforeundertaking this.

Curated by Others

In contrast to the list of methods I have profiled, this next set of data-gathering approaches is characterised by other people having done most ofthe work to source and compile the data. They will make it available foryou to access in different ways without needing the extent of manualefforts often required with the methods presented already. You mightoccasionally still have to intervene by hand to fine-tune your data, butothers would generally have put in the core effort.

Issued to you: On the occasions when you are commissioned by astakeholder (client, colleague) you will often be provided with the data you

170

need (and probably much more besides), most commonly in a spreadsheetformat. The main task for you is therefore less about collection and moreabout familiarisation with the contents of the data file(s) you are set towork with.

Download from the Web: Earlier I bemoaned the fact that there are stillorganisations publishing data (through, for example, annual reports) in pdfform. To be fair, increasingly there are facilities being developed thatenable interested users to extract data in a more structured form. Moresophisticated reporting interfaces may offer users the opportunity toconstruct detailed queries to extract and download data that is highlycustomised to their needs.

System report or export: This is related more to an internal context inorganisations where there are opportunities to extract data from corporatesystems and databases. You might, for example, wish to conduct someanalysis about staff costs and so the personnel database may be where youcan access the data about the workforce and their salaries.

‘Don’t underestimate the importance of domain expertise. At the Officefor National Statistics (ONS), I was lucky in that I was very oftenworking with the people who created the data – obviously, not everyonewill have that luxury. But most credible data producers will nowproduce something to accompany the data they publish and help usersinterpret it – make sure you read it, as it will often include key findingsas well as notes on reliability and limitations of the data.’ Alan SmithOBE, Data Visualisation Editor, Financial Times

Third-party services: There is an ever-increasing marketplace for dataand many commercial services out there now offer extensive sources ofcurated and customised data that would otherwise be impossible to obtainor very complex to gather. Such requests might include very large,customised extracts from social media platforms like Twitter based onspecific keywords and geo-locations.

API: An API (Application Programme Interface) offers the means tocreate applications that programmatically access streams of data from sitesor services, such as accessing a live feed from Transport for London (TfL)to track the current status of trains on the London Underground system.

171

When Can the Data Be Acquired?The issue of when data is ready and available for acquisition is a delicateone. If you are conducting analysis of some survey results, naturally youwill not have the full dataset of responses to work with until the survey isclosed. However, you could reasonably begin some of your analysis workearly by using an initial sample of what had been submitted so far. Ideallyyou will always work with data that is as complete as possible, but onoccasions it may be advantageous to take the opportunity to get an earlysense of the nature of the submitted responses in order to begin preparingyour final analysis routines. Working on any dataset that may not yet becomplete is a risk. You do not want to progress too far ahead with yourvisualisation workflow if there is the real prospect that any further datathat emerges could offer new insights or even trigger different, moreinteresting curiosities.

4.4 Data ExaminationAfter acquiring your data your next step is to thoroughly examine it. As Ihave remarked, your data is your key raw material from which theeventual visualisation output will be formed. Before you choose what mealto cook, you need to know what ingredients you have and what you needto do to prepare them.

It may be that, in the act of acquiring the data, you have already achieved acertain degree of familiarity about its status, characteristics and qualities,especially if you curated the data yourself. However, there is a definiteneed to go much further than you have likely achieved before now. To dothis you need to conduct an examination of the physical properties and themeaning of your data.

As you progress through the stages of this workflow, your data will likelychange considerably: you will bring more of it in, you will remove some ofit, and you will refine it to suit your needs. All these modifications willalter the physical makeup of your data so you will need to keep revisitingthis step to preserve your critical familiarity.

Data Properties

172

The first part of familiarising yourself with with your data is to undertakean examination of its physical properties. Specifically you need toascertain its type, size and condition. This task is quite mechanical in manyways because you are in effect just ‘looking’ at the data, establishing itssurface characteristics through visual and/or statistical observations.

What To Look For?

The type and size of your data involve assessing the characteristics andamount of data you have to work with. As you examine the data you alsoneed to determine its condition: how good is its quality and is it fit forpurpose?

Data types: Firstly, you need to identify what data types you have. Ingathering this data in the first place you might already have a solidappreciation about what you have before you, but doing thisthoroughly helps to establish the attention to detail you will need todemonstrate throughout this stage. Here you will need to refer to thedefinitions from earlier in the chapter about the different types of data(TNOIR). Specifically you are looking to define each column or fieldof data based on whether it is qualitative (text, nominal, ordinal) orquantitative (interval, ratio) and whether it is discrete or continuous innature.Size: Within each column or field you next need to know what rangeof values exist and what are the specific attributes/formats of thevalues held. For example, if you have a quantitative variable (intervalor ratio), what is the lowest and the highest value? In what numberformat is it presented (i.e. how many decimal points or commaformatted)? If it is a categorical variable (nominal or ordinal), howmany different values are held? If you have textual data, what is themaximum character length or word count?Condition: This is the best moment to identify any data quality andcompleteness issues. Naturally, unidentified and unresolved issuesaround data quality will come to bite hard later, undermining thescope and, crucially, trust in the accuracy of your work. You willaddress these issues next in the ‘transformation’ step, but for now thefocus is on identifying any problems. Things to look out for mayinclude the following:

Missing values, records or variables – Are empty cells assumedas being of no value (zero/nothing) or no measurement (n/a,

173

null)? This is a subtle but important difference.Erroneous values – Typos and any value that clearly looks out ofplace (such as a gender value in the age column).Inconsistencies – Capitalisation, units of measurement, valueformatting.Duplicate records.Out of date – Values that might have expired in accuracy, likesomeone’s age or any statistic that would be reasonably expectedto have subsequently changed.Uncommon system characters or line breaks.Leading or trailing spaces – the invisible evil!Date issues around format (dd/mm/yy or mm/dd/yy) and basis(systems like Excel’s base dates on daily counts since 1 January1900, but not all do that).

How to Approach This?

I explained in the earlier ‘Data literacy’ section the difference in assettypes (data that exists in tables and data that exists as isolated values) andalso the difference in form (normalised data or cross-tabulated).Depending on the asset and form of data, your examination of data typesmay involve slightly different approaches, but the general task is the same.Performing this examination process will vary, though, based on the toolsyou are using. The simplest approach, relevant to most, is to describe thetask as you would undertake it using Excel, given that this continues to bethe common tool most people use or have the skills to use. Also, it is likelythat most visualisation tasks you undertake will involve data of a size thatcan be comfortably handled in Excel.

‘Data inspires me. I always open the data in its native format and look atthe raw data just to get the lay of the land. It’s much like looking at amap to begin a journey.’ Kim Rees, Co-founder, Periscopic

As you go through this task, it is good practice to note down a detailedoverview of what data you have, perhaps in the form of a table of datadescriptions. This is not as technical a duty as would be associated with thecreation of a data dictionary but its role and value are similar, offering aconvenient means to capture all the descriptive properties of your variousdata assets.

174

Inspect and scan: Your first task is just to scan your table of datavisually. Navigate around it using the mouse/trackpad, use the arrowkeys to move up or down and left or right, and just look at all thedata. Gain a sense of its overall dimension. How many columns andhow many rows does it occupy? How big a prospect might workingwith this be?Data operations: Inspecting your data more closely might require theuse of interrogation features such as sorting columns and doing basicfilters. This can be a quick and simple way to acquaint yourself withthe type of data and range of values.Going further, once again depending on the technology (andassuming you have normalised data to start with), you might apply across-tabulation or pivot table to create aggregated, summary viewsof different angles and combinations of your data. This can be auseful approach to also check out the unique range of values that existunder different categories as well as helping to establish how sub-categories may relate other categories hierarchically. This type ofinspection will be furthered in the next step of the ‘working with data’process when you will undertake deeper visual interrogations of thetype, size and condition of your data.If you have multiple tables, you will need to repeat this approach foreach one as well as determine how they are related collectively andon what basis. It could be that just considering one table as thestandard template, representative of each instance, is sufficient: forexample, if each subsequent table is just a different monthly view ofthe same activity.For so-called ‘Big Data’ (see the glossary definition earlier), it is lesslikely that you can conduct this examination work through relativelyquick, visual observations using Excel. Instead it will need toolsbased around statistical language that will describe for you what isthere rather than let you look at what is there.Statistical methods: The role of statistics in this examination stagegenerally involves relatively basic quantitative analysis methods tohelp describe and understand the characteristics of each data variable.The common term applied to this type of statistical approach isunivariate, because it involves just looking at one variable at a time(the best opportunity to perform the analysis of multiple variablescomes later). Here are some different types of statistical analyses youmight find useful at this stage. These are not the only methods youwill ever need to use, but will likely prove to be among the most

175

common:Frequency counts: applied to categorical values to understandthe frequency of different instances.Frequency distribution: applied to quantitative values to learnabout the type and shape of the distribution of values.Measurements of central tendency describe the summaryattributes of a group of quantitative values, including:

the mean (the average value);the median (the middle value if all quantities were arrangedfrom smallest to largest);the mode (the most common value).

Measurements of spread are used to describe the dispersion ofvalues above and below the mean:

Maximum, minimum and range: the highest and lowest andmagnitude of spread of values.Percentiles: the value below which x% of values fall (e.g.the 20th percentile is the value below which 20% of allquantitative values fall).Standard deviation: a calculated measure used to determinehow spread out a series of quantitative values are.

Data MeaningIrrespective of whether you or others have curated the data, you need to bediscerning about how much trust you place in it, at least to begin with. Asdiscussed in the ‘trustworthy design’ principle, there are provenanceissues, inaccuracies and biases that will affect its status on the journeyfrom being created to being acquired. These are matters you need to beconcerned with in order to resolve or at least compensate for potentialshortcomings.

Knowing more about the physical properties of your data does not yetachieve full familiarity with its content nor give you sufficientacquaintance with its qualities. You will have examined the data in alargely mechanical and probably quite detached way from the underlyingsubject matter. You now need to think a little deeper about its meaning,specifically what it does – and does not – truly represent.

‘A visualization is always a model (authored), never a mould (replica),

176

of the real. That’s a huge responsibility.’ Paolo Ciuccarelli, ScientificDirector of DensityDesign Research Lab at Politecnico di Milano

What Phenomenon?

Determining the meaning of your data requires that you recognise this ismore than just a bunch of numbers and text values held in the cells of atable. Ask yourself, ‘What is it about? What activity, entity, instance orphenomenon does it represent?’.

One of the most valuable pieces of advice I have seen regarding this taskcame from Kim Rees, co-founder of Periscopic. Kim describes the processof taking one single row of data and using that as an entry point to learncarefully about what each value means individually and then collectively.Breaking down the separation between values created by the table’s cells,and then sticking the pieces back together, helps you appreciate the partsand the whole far better.

‘Absorb the data. Read it, re-read it, read it backwards and understandthe lyrical and human-centred contribution.’ Kate McLean, SmellscapeMapper and Senior Lecturer Graphic Design

You saw the various macro- and micro-level views applied to the contextof the Texas Department for Criminal Justice executed offendersinformation in the previous chapter. The underlying meaning of this data –its phenomenon – was offenders who had been judged guilty ofcommitting heinous crimes and had faced the ultimate consequence. Theavailability of textual data describing the offenders’ last statements anddetails of their crimes heightened the emotive potential of this data. It washeavy stuff. However, it was still just a collection of values detailing dates,names, locations, categories. All datasets, whether on executed offendersor the locations of MacDonald’s restaurants, share the same properties asoutlined by the TNOIR data-type mnemonic. What distinguishes them iswhat these values mean.

What you are developing here is a more semantic appreciation of your datato substantiate the physical definitions. You are then taking that collectiveappreciation of what your data stands for to influence how you mightdecide to amplify or suppress the influence of this semantic meaning. This

177

builds on the discussion in the last chapter about the tonal dimension,specifically the difference between figurative and non-figurativeportrayals.

A bar chart (Figure 4.4) comprising two bars, one of height 43 and theother of height 1, arguably does not quite encapsulate the emotivesignificance of Barack Obama becoming the first black US president,succeeding the 43 white presidents who served before him. Perhaps a morepotent approach may be to present a chronological display of 44photographs of each president in order to visually contrast Mr Obama’sheadshot in the final image in the sequence with the previous 43.Essentially, the value of 43 is almost irrelevant in its detail – it could be 25or 55 – it is about there being ‘many’ of the same thing followed by the‘one’ that is‘different’. That’s what creates the impact. (What will imagenumber 45 bring? A further striking ‘difference’ or a return to the standardmould?)

Figure 4.4 US Presidents by Ethnicity (1789 to 2015)

Learning about the underlying phenomena of your data helps you feel itsspirit more strongly than just looking at the rather agnostic physicalproperties. It also helps you in knowing what potential sits inside the data– the qualities it possesses – so you are then equipped the best

178

understanding of how you might want to portray it. Likewise it preparesyou for the level of responsibility and potential sensitivity you will face incurating a visual representation of this subject matter. As you saw with thecase study of the ‘Florida Gun Crimes’ graphic, some subjects areinherently more emotive than others, so we have to demonstrate a certainamount of courage and conviction in deciding how to undertake suchchallenges.

‘Find loveliness in the unlovely. That is my guiding principle. Often,topics are disturbing or difficult; inherently ugly. But if they areillustrated elegantly there is a special sort of beauty in the truthfulcommunication of something. Secondly, Kirk Goldsberry stresses thatdata visualization should ultimately be true to a phenomenon, ratherthan a technique or the format of data. This has had a huge impact onhow I think about the creative process and its results.’ John Nelson,Cartographer

Completeness

Another aspect of examining the meaning of data is to determine howrepresentative it is. I have touched on data quality already, but inaccuraciesin conclusions about what data is saying have arguably a greater impact ontrust and are more damaging than any individual missing elements of data.

The questions you need to ask of your data are: does it represent genuineobservations about a given phenomenon or is it influenced by thecollection method? Does your data reflect the entirety of a particularphenomenon, a recognised sample, or maybe even an obstructed viewcaused by hidden limitations in the availability of data about thatphenomenon?

Reflecting on the published executed offenders data, there would be acertain confidence that it is representative of the total population ofexecutions but with a specific caveat: it is all the executed offenders underthe jurisdiction of the Texas Department of Criminal Justice since 1982. Itis not the whole of the executions conducted across the entire USA nor is itrepresentative of all the executions that have taken place throughout thehistory of Texas. Any conclusions drawn from this data must be boxedwithin those parameters.

179

The matter of judging completeness can be less about the number ofrecords and more a question of the integrity of the data content. Thisexecuted offenders dataset would appear to be a trusted and reliable recordof each offender but would there/could there be an incentive for thecurators of this data not to capture, for example, the last statements as theywere explicitly expressed? Could they have possibly been in any waysanitised or edited, for example? These are the types of questions you needto pose. This is not aimless cynicism, it is about seeking assurances ofquality and condition so you can be confident about what you canlegitimately present and conclude from it (as well as what you should not).

Consider a different scenario. If you are looking to assess the politicalmood of a nation during a televised election debate, you might consideranalysing Twitter data by looking at the sentiments for and against thecandidates involved. Although this would offer an accessible source ofrich data, it would not provide an entirely reliable view of the nationalmood. It could only offer algorithmically determined insights (i.e. throughthe process of determining the sentiment from natural language) of thepeople who have a Twitter account, are watching the debate and havechosen to tweet about it during a given timeframe.

Now, just because you might not have access to a ‘whole’ population ofpolitical opinion data does not mean it is not legitimate to work on asample. Sometimes samples are astutely reflective of the population. Andin truth, if samples were not viable then most of the world’s analyseswould need to cease immediately.

A final point is to encourage you to probe any absence of data. Sometimesyou might choose to switch the focus away from the data you have gottowards the data you have not got. If the data you have is literally as muchas you can acquire but you know the subject should have more data aboutit, then perhaps shine a light on the gaps, making that your story. Maybeyou will unearth a discovery about the lack of intent or will to make thedata available, which in itself may be a fascinating discovery. Astransparency increases, those who are not stand out the most.

‘This is one of the first questions we should ask about any dataset: whatis missing? What can we learn from the gaps?’ Jer Thorp, Founder ofThe Office for Creative Research

180

Any identified lack of completeness or full representativeness is not anobstacle to progress, it just means you need to tread carefully with regardto how you might represent and present any work that emerges from it. Itis about caution not cessation.

Influence on ProcessThis extensive examination work gives you an initial – but thorough –appreciation of the potential of your data, the things it will offer and thethings it will not. Of course this potential is as yet unrealised. Furtheringthis examination will be the focus of the next activity, as you look toemploy more visual techniques to help unearth the as-yet-hidden qualitiesof understanding locked away in the data. For now, this examination worktakes your analytical and creative thinking forward another step.

Purpose map ‘tone’: Through deeper acquaintance with your data,you will have been able to further consider the suitability of thepotential tone of your work. By learning more about the inherentcharacteristics of the subject, this might help to confirm or redefineyour intentions for adopting a utilitarian (reading) or sensation-based(feeling) tone.Editorial angles: The main benefit of exploring the data types is toarrive at an understanding of what you have and have not got to workwith. More specifically, it guides your thinking towards what possibleangles of analysis may be viable and relevant, and which can beeliminated as not. For example, if you do not have any location orspatial data, this rules out the immediate possibility of being able tomap your data. This is not something you could pursue with thecurrent scope of your dataset. If you do have time-based data then theprospect of conducting analysis that might show changes over time isviable. You will learn more about this idea of editorial ‘angle’ in thenext chapter but let me state now it is one of the most importantcomponents of visualisation thinking.Physical properties influence scale: Data is your raw material, yourideas are not. I stated towards the end of Chapter 3 that you shouldembrace the instinctive manifestations of ideas and seek influenceand inspiration from other sources. However, with the shape and sizeof your data having such an impact on any eventual designs, you mustrespect the need to be led by your data’s physical properties and notjust your ideas.

181

Figure 4.5 OECD Better Life Index

In particular, the range of values in your data will shape thingssignificantly. The shape of data in the ‘Better Life Index’ project yousaw earlier is a good example. Figure 4.5 presents an analysis of thequality of life across the 36 OECD member states. Each country is aflower comprising 11 petals with each representing a different qualityof life indicator (the larger the petal, the better the measured qualityof life).Consider this. Would this design concept still be viable if there were20 indicators? Or just 3? How about if the analysis was for 150countries? The connection between data range and chart designinvolves a discerning judgement about ‘fit’. You need to identifycarefully the underlying shape of the data to be displayed and whattolerances this might test in the shape of the possible design conceptsused.

‘My design approach requires that I immerse myself deeply in theproblem domain and available data very early in the project, to get a feelfor the unique characteristics of the data, its “texture” and theaffordances it brings. It is very important that the results from theseexplorations, which I also discuss in detail with my clients, caninfluence the basic concept and main direction of the project. To put itin Hans Rosling’s words, you need to “let the data set change your mindset”.’ Moritz Stefaner, Truth & Beauty Operator

Another relevant concern involves the challenge of elegantly handlingquantitative measures that have hugely varied value ranges and contain(legitimate) outliers. Accommodating all the values into a single display

182

can have a hugely distorting impact on the space it occupies. For example,note the exceptional size of the shape for Avatar in Figure 4.6, from the‘Spotlight on profitability’ graphic you saw earlier. It is the one movieincluded that bursts through the ceiling, far beyond the otherwise entirelysuitable 1000 million maximum scale value. As a single outlier, in thiscase, it was treated with a rather unique approach. As you can see, itsstriking shape conveniently trespasses onto the space offered by the twoempty rows above. The result emphasises this value’s exceptional quality.You might seldom have the luxury of this type of effective resolution, sothe key point to stress is always be acutely aware of the existence of‘Avatars’ in your data.

Figure 4.6 Spotlight on Profitability

4.5 Data TransformationHaving undertaken an examination of your data you will have a good ideaabout what needs to be done to ensure it is entirely fit for purpose. Thenext activity is to work on transforming the data so it is in optimumcondition for your needs.

At this juncture, the linearity of a book becomes rather unsatisfactory.Transforming your data is something that will take place before, duringand after both the examination and (upcoming) exploration steps. It will

183

also continue beyond the boundaries of this stage of the workflow. Forexample, the need to transform data may only emerge once you begin your‘editorial thinking’, as covered by the next chapter (indeed you will likelyfind yourself bouncing forwards and backwards between these sections ofthe book on a regular basis). As you get into the design stage you willconstantly stumble upon additional reasons to tweak the shape and size ofyour data assets. The main point here is that your needs will evolve. Thismoment in the workflow is not going to be the only or final occasion whenyou look to refine your data.

Two important notes to share upfront at this stage. Firstly, in accordancewith the desire for trustworthy design, any treatments you apply to yourdata need to be recorded and potentially shared with your audience. Youmust be able to reveal the thinking behind any significant assumptions,calculations and modifications you have made to your data.

Secondly, I must emphasise the critical value of keeping backups. Beforeyou undertake any transformation, make a copy of your dataset. After eachmajor iteration remember to save a milestone version for backup purposes.Additionally, when making changes, it is useful to preserve original(unaltered) data items nearby for easy rollback should you need them. Forexample, suppose you are cleaning up a column of messy data to do with‘Gender’ that has a variety of inconsistent values (such as “M”, “Male”,“male”, “FEMALE”, “F”, “Female”). Normally I would keep the originaldata, duplicate the column, and then tidy up this second column of values.I have then gained access to both original and modified versions. If you aregoing to do any transformation work that might involve a significantinvestment of time and (manual) effort, having an opportunity to refer to aprevious state is always useful in my experience.

There are four different types of potential activity involved in transformingyour data: cleaning, converting, creating and consolidating.

Transform to clean: I spoke about the importance of data quality(better quality in, better quality out, etc.) in the examination sectionwhen looking at the physical condition of the data. There’s no need torevisit the list of potential observations you might need to considerlooking out for but this is the point where you will need to begin toaddress these.There is no single or best approach for how to conduct this task.Some issues can be addressed through a straightforward ‘find and

184

replace’ (or remove) operation. Some treatments will be possibleusing simple functions to convert data into new states, such as usinglogic formulae that state ‘if this, do this, otherwise do that’. Forexample, if the value in the ‘Gender’ column is “M” make it “Male”,if the value is “MALE” make it “Male” etc. Other tasks might bemuch more intricate, requiring manual intervention, often incombination with inspection features like ‘sort’ or ‘filter’, to find,isolate and then modify problem values.Part of cleaning up your data involves the elimination of junk. Goingback to the earlier scenario about gathering data about McDonald’srestaurants, you probably would not need the name of the restaurantmanager, details of the opening times or the contact telephonenumber. It is down to your judgement at the time of gathering the datato decide whether these extra items of detail – if they were as easilyacquirable as the other items of data that you really did need – maypotentially provide value for your analysis later in the process. Mytactic is usually to gather as much data as I can and then reject/trimlater; later has arrived and now is the time to consider what toremove. Any fields or rows of data that you know serve no ongoingvalue will take up space and attention, so get rid of these. You willneed to separate the wheat from the chaff to help reduce yourproblem.Transform to convert: Often you will seek to create new data valuesout of existing ones. In the illustration in Figure 4.7, it might beuseful to extract the constituent parts of a ‘Release Date’ field inorder to group, analyse and use the data in different ways. You mightuse the ‘Month’ and ‘Year’ fields to aggregate your analysis at theserespective levels in order to explore within-year and across-yearseasonality. You could also create a ‘Full Release Date’ formattedversion of the date to offer a more presentable form of the releasedate value possibly for labeling purposes.

Figure 4.7 Example of Converted Data Transformation

185

Extracting or deriving new forms of data will be necessary when itcomes to handling qualitative ‘textual’ data. As stated in the ‘Dataliteracy’ section, if you have textual data you will generally alwaysneed to transform this into various categorical or quantitative forms,unless its role is simply to provide value as an annotation (such as aquoted caption or label). Some would argue that qualitativevisualisation involves special methods for the representation of data. Iwould disagree. I believe the unique challenge of working withtextual data lies with the task of transforming the data: visuallyrepresenting the extracted and derived properties from textual datainvolves the same suite of representation options (i.e. chart types) thatwould be useful for portraying analysis of any other data types.Here is a breakdown of some of the conversions, calculations andextractions you could apply to textual data. Some of these tasks canbe quite straightforward (e.g. Using the LEN function in Excel todetermine the number of characters) while others are more technicaland will require more sophisticated tools or programmes dedicated tohandling textual data.Categorical conversions:

Identify keywords or summary themes from text and convertthese into categorical classifications.Identify and flag up instances of certain cases existing orotherwise (e.g. X is mentioned in this passage).Identify and flag up the existence of certain relationships (e.g. Aand B were both mentioned in the same passage, C was alwaysmentioned before D).

186

Use natural language-processing techniques to determinesentiments, to identify specific word types (nouns, verbs,adjectives) or sentence structures (around clauses andpunctuation marks).With URLs, isolate and extract the different components ofwebsite address and sub-folder locations

Quantitative conversions:

Calculate the frequency of certain words being used.Analyse the attributes of text, such as total word count, physicallength, potential reading duration.Count the number of sentences or paragraphs, derived from thefrequency of different punctuation marks.Position the temporal location of certain words/phrases in relation toother words/phrases or compared to the whole (e.g. X was mentionedat 1m51s).Position the spatial location of certain words/phrases in relation toother words/phrases or compared to the whole.

A further challenge that falls under this ‘converting’ heading willsometimes emerge when you are working with data supplied by others inspreadsheets. This concerns the obstacles created when trying to analyse adata that has been formatted visually, perhaps in readiness for printing. Ifyou receive data in this form you will need to unpack and reconstruct itinto the normalised form described earlier, comprising all records andfields included in a single table.

Any merged cells need unmerging or removing. You might have aheading that is common to a series of columns. If you see this,unmerge it and replicate the same heading across each of the relevantcolumns (perhaps appending an index number to each header tomaintain some differentiation). Cells that have visual formatting likebackground shading or font attributes (bold, coloured) to indicate avalue or status are useful when observing and reading the data, but foranalysis operations these properties are largely invisible. You willneed to create new values in actual data form that are not visual(creating categorical values, say, or status flags like ‘yes’ or ‘no’) torecreate the meaning of the formats. The data provided to you – orthat you create – via a spreadsheet does not need to be elegant inappearance, it needs to be functional.

187

Transform to create: This task is something I refer to as the hiddencleverness, where you are doing background thinking to form newcalculations, values, groupings and any other mathematical or manualtreatments that really expand the variety of data available.A simple example might involve the need to create some percentagecalculations in a new field, based on related quantities elsewherewithin your existing data. Perhaps you have pairs of ‘start date’ and‘end date’ values and you need to calculate the duration in days for allyour records. You might use logic formula to assist in creating a newvariable that summarises another – maybe something like (inlanguage terms) IF Age < 18 THEN status = “Child”, ELSE status =“Adult”. Alternatively, you might want to create a calculation thatstandardised some quantities’ need to source base population figuresfor all the relevant locations in your data in order to convert somequantities into ‘per capita’ values. This would be particularlynecessary if you anticipate wanting to map the data as this will ensureyou are facilitating legitimate comparisons.Transform to consolidate: This involves bringing in additional datato help expand (more variables) or append (more records) to enhancethe editorial and representation potential of your project.An example of a need to expand your data would be if you had detailsabout locations only at country level but you wanted to be able togroup and aggregate your analysis at continent level. You couldgather a dataset that holds values showing the relationships betweencountry and continent and then add a new variable to your datasetagainst which you would perform a simple lookup operation to fill inthe associated continent values.Consolidating by appending data might occur if you had previouslyacquired a dataset that now had more or newer data (specifically,additional records) available to bring it up to date. For instance, youmight have started some analysis on music record sales up to a certainpoint in time, but once you’d actually started working on the taskanother week had elapsed and more data had become available.Additionally, you may start to think about sourcing other media assetsto enhance your presentation options, beyond just gathering extradata. You might anticipate the potential value for gathering photos(headshots of the people in your data), icons/symbols (country flags),links to articles (URLs), or videos (clips of goals scored). All of thesewould contribute to broadening the scope of your annotation options.Even though there is a while yet until we reach that particular layer of

188

design thinking, it is useful to start contemplating this as earlypossible in case the collection of these additional assets requiressignificant time and effort. It might also reveal any obstacles aroundhaving to obtain permissions for usage or sufficiently high qualitymedia. If you know you are going to have to do something, don’tleave it too late – reduce the possibility of such stresses by actingearly.

4.6 Data ExplorationThe examination task was about forming a deep acquaintance with thephysical properties and meaning of your data. You now need to interrogatethat data further – and differently – to find out what potential insights andqualities of understanding it could provide.

Undertaking data exploration will involve the use of statistical and visualtechniques to move beyond looking at data and begin to start seeing it. Youwill be directly pursuing your initially defined curiosity, to determine ifanswers exist and whether they are suitably enlightening in nature. Oftenyou will not know for sure whether what you initially thought wasinteresting is exactly that. This activity will confirm, refine or reject yourcore curiosity and perhaps, if you are fortunate, present discoveries thatwill encourage other interesting avenues of enquiry.

‘After the data exploration phase you may come to the conclusion thatthe data does not support the goal of the project. The thing is: data isleading in a data visualization project – you cannot make up some datajust to comply with your initial ideas. So, you need to have some kind ofan open mind and “listen to what the data has to say”, and learn what itspotential is for a visualisation. Sometimes this means that a project hasto stop if there is too much of a mismatch between the goal of theproject and the available data. In other cases this may mean that the goalneeds to be adjusted and the project can continue.’ Jan Willem Tulp,Data Experience Designer

To frame this process, it is worth introducing something that will becovered in Chapter 5, where you will consider some of the parallelsbetween visualisation and photography. Before committing to take aphotograph you must first develop an appreciation of all the possible

189

viewpoints that are available to you. Only then can you determine whichof these is best. The notion of ‘best’ will be defined in the next chapter, butfor now you need to think about identifying all the possible viewpoints inyour data – to recognise the knowns and the unknowns.

Widening the Viewpoint: Knowns and UnknownsAt a news briefing in February 2002, the US Secretary of Defense, DonaldRumsfeld, delivered his infamous ‘known knowns’ statement:

Reports that say that something hasn’t happened are alwaysinteresting to me, because as we know, there are known knowns;there are things we know we know. We also know there are knownunknowns; that is to say we know there are some things we do notknow. But there are also unknown unknowns – the ones we don’tknow we don’t know. And if one looks throughout the history of ourcountry and other free countries, it is the latter category that tend tobe the difficult ones.

There was much commentary about the apparent lack of elegance in thelanguage used and criticism of the muddled meaning. I disagree with thisanalysis. I thought it was probably the most efficient way he could havearticulated what he was explaining, at least in written or verbal form. Theessence of Rumsfeld’s statement was to distinguish awareness of what isknowable about a subject (what knowledge exists) from the status ofacquiring this knowledge. There is a lot of value to be gained from usingthis structure (Figure 4.8) to shape your approach to thinking about dataexploration.

The known knowns are aspects of knowledge about your subject and aboutthe qualities present in your data that you are aware of – you are aware thatyou know these things. The nature of these known knowns might meanyou have confidence that the origin curiosity was relevant and theavailable insights that emerged in response are suitably interesting. Youcannot afford to be complacent, though. You will need to challengeyourself to check that these curiosities are still legitimate and relevant. Tosupport this, you should continue to look and learn about the subjectthrough research, topping up your awareness of the most potentially

190

relevant dynamics of the subject, and continue to interrogate your dataaccordingly.

Additionally, you should not just concentrate on this potentially quitenarrow viewpoint. As I mentioned earlier, it is important to give yourselfas broad a view as possible across your subject and its data to optimiseyour decisions about what other interesting enquiries might be available.This is where you need to consider the other quadrants in this diagram.

Figure 4.8 Making Sense of the Known Knowns

On occasion, though I would argue rarely, there may be unknown knowns,things you did not realise you knew or perhaps did not wish toacknowledge that you knew about a subject. This may relate to previous

191

understandings that have been forgotten, consciously ignored or buried.Regardless, you need to acknowledge these.

For the knowledge that has yet to be acquired – the known unknowns andthe even more elusive unknown unknowns – tactics are needed to help plugthese gaps as far, as deep and as wide as possible. You cannot possiblyachieve mastery of all the domains you work with. Instead, you need tohave the capacity and be in position to turn as many unknowns as possibleinto knowns, and in doing so optimise your understanding of a subject.Only then will you be capable of appreciating the full array of viewpointsthe data offers.

To make the best decisions you first need to be aware of all the options.This activity is about broadening your awareness of the potentiallyinteresting things you could show – and could say – about your data. Theresulting luxury of choice is something you will deal with in the nextstage.

Exploratory Data AnalysisAs I have stated, the aim throughout this book is to create a visualisationthat will facilitate understanding for others. That is the end goal. At thisstage of the workflow the deficit in understanding lies with you. The taskof addressing the unknowns you have about a subject, as well assubstantiating what knowns already exist, involves the use of exploratorydata analysis (EDA). This integrates statistical methods with visualanalysis to offer a way of extracting deeper understanding and wideningthe view to unlock as much of the potential as possible from within yourdata.

The chart in Figure 4.9 is a great demonstration of the value in combiningstatistical and visual techniques to understand your data better. It showsthe results of nearly every major and many minor (full) marathon fromaround the world. On the surface, the distribution of finishing timesreveals the common bell shape found in plots about many naturalphenomenon, such as the height measurements of a large group of people.However, when you zoom in closer the data reveals some really interestingthreshold patterns for finishing times on or just before the three-, four- andfive-hour marks. You can see that the influence of runners settingthemselves targets, often rounded to the hourly milestones, genuinely

192

appeared to affect the results achieved.

Figure 4.9 What Good Marathons and Bad Investments Have in Common

Although statistical analysis of this data would have revealed manyinteresting facts, these unique patterns were only realistically discoverablethrough studying the visual display of the data. This is the essence of EDAbut there is no instruction manual for it. As John Tukey, the father ofEDA, described: ‘Exploratory data analysis is an attitude, a flexibility, anda reliance on display, not a bundle of techniques’. There is no single pathto undertaking this activity effectively; it requires a number of differenttechnical, practical and conceptual capabilities.

Instinct of the analyst: This is the primary matter. The attitude andflexibility that Turkey describes are about recognising the importanceof the analyst’s traits. Effective EDA is not about the tool. There aremany vendors out there pitching their devices as the magic optionwhere we just have to ‘point and click’ to uncover a deep discovery.Technology inevitably plays a key role in facilitating this endeavourbut the value of a good analyst cannot be underestimated: it isarguably more influential than the differentiating characteristicsbetween one tool and the next. In the absence of a defined procedurefor conducting EDA, an analyst needs to possess the capacity torecognise and pursue the scent of enquiry. A good analyst will havethat special blend of natural inquisitiveness and the sense to knowwhat approaches (statistical or visual) to employ and when.Furthermore, when these traits collide with a strong subject

193

knowledge this means better judgments are made about whichfindings from the analysis are meaningful and which are not.Reasoning: Efficiency is a particularly important aspect of thisexploration stage. The act of interrogating data, waiting for it tovolunteer its secrets, can take a lot of time and energy. Even withsmaller datasets you can find yourself tempted into trying out myriadcombinations of analyses, driven by the desire to find the killerinsight in the shadows.

‘At the beginning, there’s a process of “interviewing” the data – firstevaluating their source and means ofcollection/aggregation/computation, and then trying to get a sense ofwhat they say – and how well they say it via quick sketches in Excelwith pivot tables and charts. Do the data, in various slices, say anythinginteresting? If I’m coming into this with certain assumptions, do thedata confirm them, or refute them?’ Alyson Hurt, News GraphicsEditor, NPR

Reasoning is an attempt to help reduce the size of the prospect. You cannotafford to try everything. There are so many statistical methods and, as youwill see, so many visual means for seeing views of data that you simplycannot expect to have the capacity to try to unleash the full exploratoryartillery. EDA is about being smart, recognising that you need to bediscerning about your tactics.

In academia there are two distinctions in approaches to reasoning –deductive and inductive – that I feel are usefully applied in this discussion:

Deductive reasoning is targeted: You have a specific curiosity orhypothesis, framed by subject knowledge, and you are going tointerrogate the data in order to determine whether there is anyevidence of relevance or interest in the concluding finding. I considerthis adopting a detective’s mindset (Sherlock Holmes).Inductive reasoning is much more open in nature: You will ‘playaround’ with the data, based on your sense or instinct about whatmight be of interest, and wait and see what emerges. In some waysthis is like prospecting, hoping for that moment of serendipity whenyou unearth gold.

In this exploration process you ideally need to accommodate both

194

approaches. The deductive process will focus on exploring further targetedcuriosities, the inductive process will give you a fighting chance of findingmore of those slippery ‘unknowns’, often almost by accident. It isimportant to give yourself room to embark on these somewhat lessstructured exploratory journeys.

I often think about EDA in the context of a comparison with the challengeof a ‘Where’s Wally?’ visual puzzle. The process of finding Wally feelssomewhat unscientific. Sometimes you let your eyes race around the scenelike a dog who has just been let out of the car and is torpedoing across afield. However, after the initial burst of randomness, perhapssubconsciously, you then go through a more considered process of visualanalysis. Elimination takes place by working around different parts of thescene and sequentially declaring ‘Wally-free’ zones. This aids your focusand strategy for where to look next. As you then move across each mini-scene you are pattern matching, looking out for the giveawaycharacteristics of the boy wearing glasses, a red-and-white-striped hat andjumper, and blue trousers.

The objective of this task is clear and singular in definition. The challengeof EDA is rarely that clean. There is a source curiosity to follow, for sure,and you might find evidence of Wally somewhere in the data. However,unlike the ‘Where’s Wally?’ challenge, in EDA you have the chance alsoto find other things that might change the definition of what qualifies as aninteresting insight. In unearthing other discoveries you might determinethat you no longer care about Wally; finding him no longer represents themain enquiry.

Inevitably you are faced with a trade-off between spare capacity in timeand attention and your own internal satisfaction that you have explored asmany different angles of enquiry as possible.

Chart types: This is about seeing the data from all feasible angles.The power of the visual means that we can easily rely on our pattern-matching and sense-making capabilities – in harmony with contextualsubject knowledge – to make observations about data that appear tohave relevance.The data representation gallery that you will encounter in Chapter 6presents nearly 50 different chart types, offering a broad repertoire ofoptions for portraying data. The focus of the collection is on charttypes that could be used to communicate to others. However, within

195

this gallery there are also many chart types that help with pursuingEDA. In each chart profile, indications are given for those chart typesthat be particularly useful to support your exploratory activity. As arough estimate, I would say about half of these can prove to be greatallies in this stage of discovery.The visual methods used in EDA do not just involve charting, theyalso involve selective charting – smart charting, ‘smarting’ if youlike? (No, Andy, nobody likes that). Every chart type presented in thegallery includes helpful descriptions that will give you an idea of theirrole and also what observations – and potential interpretations – theymight facilitate. It is important to know now that the chart types areorganised across five main families (categorical, hierarchical,relational, temporal, and spatial) depending on the primary focus ofyour analysis. The focus of your analysis will, in turn, depend on thetypes of data you have and what you are trying to see.

‘I kick it over into a rough picture as soon as possible. When I can seesomething then I am able to ask better questions of it – then the what-about-this iterations begin. I try to look at the same data in as manydifferent dimensions as possible. For example, if I have a spreadsheet ofbird sighting locations and times, first I like to see where they happen,previewing it in some mapping software. I’ll also look for patterns inthe timing of the phenomenon, usually using a pivot table in aspreadsheet. The real magic happens when a pattern reveals itself onlywhen seen in both dimensions at the same time.’ John Nelson,Cartographer, on the value of visually exploring his data

Research: I have raised this already but make no apology for doingso again so soon. How you conduct research and how much you cando will naturally depend on your circumstances, but it is alwaysimportant to exploit as many different approaches to learning aboutthe domain and the data you are working with. As you will recall, themiddle stage of forming understanding – interpreting – is aboutviewers translating what they have perceived from a display intomeaning. They can only do this with domain knowledge. Similarly,when it comes to conducting exploratory analysis using visualmethods, you might be able to perceive the charts you make, butwithout possessing or acquiring sufficient domain knowledge youwill not know if what you are seeing is meaningful. Sometimes theconsequence of this exploratory data analysis will only mean you

196

have become better acquainted with specific questions and moredefined curiosities about a subject even if you possibly do not yethave any answers.The approach to research is largely common sense: you explore theplaces (books, websites) and consult the people (experts, colleagues)that will collectively give you the best chance of getting accurateanswers to the questions you have. Good communication skills,therefore, are vital – it is not just about talking to others, it is aboutlistening. If you are in a dialogue with experts you will have to findan approach that allows you to understand potentially complicatedmatters and also cut through to the most salient matters of interest.Statistical methods: Although the value of the univariate statisticaltechniques profiled earlier still applies here, what you are oftenlooking to undertake in EDA is multivariate analysis. This concernstesting out the potential existence of a correlation betweenquantitative variables as well as determining the possible causationvariables – the holy grail of data analysis.Typically, I find statistical analysis plays more of a supporting roleduring much of the exploration activity rather than a leading role.Visual techniques will serve up tangible observations about whetherdata relationships and quantities seem relevant, but to substantiate thisyou will need to conduct statistical tests of significance.One of the main exceptions is when dealing with large datasets. Herethe first approach might be more statistical in nature due to theamount of data obstructing rapid visual approaches. Going further,algorithmic approaches – using techniques like machine learning –might help to scale the task of statistically exploring large dimensionsof data – and the endless permutations they offer. What theseapproaches gain in productivity they clearly lose in human quality.The significance of this should not be underestimated. It may bepossible to take a blended approach where you might utilise machinelearning techniques to act as an initial battering ram to help reducethe problem, identifying the major dimensions within the data thatmight hold certain key statistical attributes and then conductingfurther exploration ‘by hand and by eye’.Nothings: What if you have found nothing? You have hit a dead end,discovering no significant relationships and finding nothinginteresting about the shape or distribution of your data. What do youdo? In these situations you need to change your mindset: nothing isusually something. Dead ends and discovering blind alleys are good

197

news because they help you develop focus by eliminating differentdimensions of possible analysis. If you have traits of nothingness inyour data or analysis –gaps, nulls, zeroes and no insights – this couldprove to be the insight. As described earlier, make the gaps the focusof your story.There is always something interesting in your data. If a value has notchanged over time, maybe it was supposed to – that is an insight. Ifeverything is the same size, that is the story. If there is no significancein the quantities, categories or spatial relationships, make those yourinsights. You will only know that these findings are relevant by trulyunderstanding the context of the subject matter. This is why you mustmake as much effort as possible to convert your unknowns intoknowns.

‘My main advice is not to be disheartened. Sometimes the data don’tshow what you thought they would, or they aren’t available in a usableor comparable form. But [in my world] sometimes that research stillturns up threads a reporter could pursue and turn into a really interestingstory – there just might not be a viz in it. Or maybe there’s no story atall. And that’s all okay. At minimum, you’ve still hopefully learnedsomething new in the process about a topic, or a data source (person ordatabase), or a “gotcha” in a particular dataset – lessons that can beapplied to another project down the line.’ Alyson Hurt, News GraphicsEditor, NPR

Not always needed: It is important to couch this discussion aboutexploration in pragmatic reality. Not all visualisation challenges willinvolve much EDA. Your subject and your data might be immediatelyunderstandable and you may have a sufficiently broad viewpoint ofyour subject (plenty of known knowns already in place). Further EDAactivity may have diminishing value. Additionally, if you are facedwith small tables of data this simply will not warrant multivariateinvestigation. You certainly need to be ready and equipped with thecapacity to undertake this type of exploration activity when it isneeded, but the key point here is to judge when.

Summary: Working with DataThis chapter first introduced key foundations for the requisite data literacy

198

involved in visualisation, specifically the importance of the distinctionbetween normalised and cross-tabulated datasets as well as the differenttypes of data (using the TNOIR mnemonic):

Textual (qualitative): e.g. ‘Any other comments?’ data submitted in asurvey.Nominal (qualitative): e.g. The ‘gender’ selected by a surveyparticipant.Ordinal (qualitative): e.g. The response to a survey question, based ona scale of 1 (unhappy) to 5 (very happy).Interval (quantitative): e.g. The shoe size of a survey participant.Ratio (quantitative): e.g. The age of a survey participant in years.

You then walked through the four steps involved in working with data:

Acquisition Different sources and methods for getting your data.

Curated by you: primary data collection, manual collection and dataforaging, extracted from pdf, web scraping (also known as webharvesting).Curated by others: issued to you, downloaded from the Web, systemreport or export, third-party services, APIs.

Examination Developing an intimate appreciation of the characteristics ofthis critical raw material:

Physical properties: type, size, and condition.Meaning: phenomenon, completeness.

Transformation Getting your data into shape, ready for its role in yourexploratory analysis and visualisation design:

Clean: resolve any data quality issues.Create: consider new calculations and conversions.Consolidate: what other data (to expand or append) or other assetscould be sought to enhance your project?

Exploration Using visual and statistical techniques to see the data’squalities: what insights does it reveal to you as you deepen your familiaritywith it?

199

Tips and Tactics

Perfect data (complete, accurate, up to date, truly representative) is analmost impossible standard to reach (given the presence of timeconstraints) so your decision will be when is good enough, goodenough: when do diminishing returns start to materialise?Do not underestimate the demands on your time; working with datawill always be consuming of your attention and effort:

Ensure you have built plenty of time into your handling of thisdata stage.Be patient and persevere.Be disciplined: it is easy to get swallowed up in the potentialhunt for discovering things from your data, attempting to exploreevery possible permutation.

If your data does not already have a unique identifier it is often worthcreating one to track your data preparation process. This is especiallyhelpful if you need to preserve or revert to a very specific ordering ofyour data (e.g. if the rows have been carefully arranged in order toundertake cross-row calculations like cumulative or sub-totals).Clerical tasks like file management are important: maintain backupsof each major iteration of data, employ good file organisation of yourdata and other assets, and maintain logical naming conventions.Data management practices around data security and privacy will beimportant in the more sensitive/confidential cases.Keep notes about where you have sourced data, what you have donewith it, any assumptions or counting rules you have applied, ideasyou might have for transforming or consolidating, issues/problems,things you do not understand.To learn about your data, its meaning and the subject matter to whichit relates, you should build in time to undertake research in order toequip yourself suitably with domain knowledge.Anticipate and have contingency plans for the worst-case scenariosfor data, such as the scarcity of data availability, null values, odddistributions, erroneous values, long values, bad formatting, data loss.Communicate. If you do not know anything about your data, ask: donot assume or stay ignorant. And then listen: always pay attention tokey information.Attention to detail is of paramount importance at this stage, so getinto good habits early and do not cut corners.Maintain an open mind and do not get frustrated. You can only work

200

with what you have. If it is not showing what you expected or hopedfor, you cannot force it to say something that is simply not there.Exploratory Data Analysis is not about design elegance. Do not wastetime making your analysis ‘pretty’, it only needs to inform you.

201

5 Establishing Your Editorial Thinking

It is very easy to introduce every chapter with claims that each of thesereached is the important stage but you have really now reached a criticaljuncture. This is the place in the process where you need to start to committo a definitive pathway.

The data you gathered during Chapter 4 was shaped by your triggercuriosity. You may have found qualities in the data that you feel revealrelevant insights in response to that pursuit. Alternatively, throughexploring your data and researching your subject, you may havediscovered new enquiries that might actually offer more interestingperspectives.

Ahead of commencing the design and development of your solution youneed to decide what you are actually going to do with this data: what areyou going to show your audience? This is where editorial thinkingbecomes important. In my view it is one of the most defining activities thatseparates the best visualisers from the rest, possibly even more so thantechnical talent or design flair.

In this this chapter you will learn about what editorial thinking means, therole it plays, what decisions you need to make and how you might do so.

5.1 What is Editorial Thinking?You will have noticed the common thread of curiosity that weaves its waythrough the preparatory activities of this workflow process. From theopening curiosity that initiated your work, you then effectively sought,gathered and became acquainted with your data in pursuit of some kind ofanswer. In this third stage, you will need to make some decisions. Theessence of editorial thinking is demonstrating a discerning eye for whatyou are going to portray visually to your audience; the matter of howfollows next. This stage is the critical bridge between your data work andyour design work.

In the first chapter I described how a single context can hold several

202

legitimate views of the truth. The glass that is half full of water is also halfempty. It is also half full of air. Its water contents might be increasing ordecreasing. Depending on your perspective, there are several legitimateways of portraying this situation. In a nutshell, editorial thinking is aboutdeciding which of the many viable perspectives offered by your data youwill decide to focus on.

To translate this to data visualisation, assume you have data that breaksdown total organisational spend across many geographic regions overtime. Your profiling of your audience has already informed your thinkingthat the main interest is in how this has changed over time. But at thispoint, having looked at the data closely, you have found some reallyinteresting patterns in the spatial analysis. What are you going to do? Areyou going to show your audience how this spend compares by region on amap, having now established that this might be of interest to them, or areyou going to focus on still showing how it has changed over time byregion? Perhaps you could show both. Do you need to show all the regionsand include all the available time periods or just focus on some specifickey moments? You have got to decide what you are going to do becauseyou are about to face the task of picking chart types, deciding on a layout,possible interactivity, and many other presentation matters.

When trying to explain the role of editorial thinking I find it helpful toconsider some of the parallels that exists between data visualisation andphotography, or perhaps more specifically, photojournalism. By translatinginto data visualisation some of the decisions involved in taking aphotograph, you will find useful perspectives to help shape your editorialthinking. In turn this will have a huge bearing on the design choices thatfollow. There are three particular perspectives to consider: angle, framingand focus.

‘A photo is never an objective reflection, but always an interpretation ofreality. I see data visualization as sort of a new photojournalism – ahighly editorial activity.’ Moritz Stefaner, Truth & Beauty Operator

AngleThink of a chart as being a photograph of data. As with a photograph, invisualisation you cannot show everything at once. A panoramic 360° view

203

of data is impossible to display at any moment and certainly not throughthe window of a single chart. You must pick an angle.

‘When the data has been explored sufficiently, it is time to sit down andreflect – what were the most interesting insights? What surprised me?What were recurring themes and facts throughout all views on the data?In the end, what do we find most important and most interesting? Theseare the things that will govern which angles and perspectives we want toemphasise in the subsequent project phases.’ Moritz Stefaner, Truth& Beauty Operator

In photography the angle would be formed by the position from where youare standing when taking a shot. In visualisation this relates to the angle ofanalysis you intend to show: what are you measuring and by whichdimension(s) are you breaking it down? Are you going to show howproduct sales have changed over time, or how sales look organised byregional hierarchically or how they compare on a map and over time?There are many different angles you could choose. You could also chooseto show data from multiple different angles using several charts presentedtogether. Your key consideration in determining each angle is whether it isrelevant and sufficient.

‘It requires the discipline to do your homework, the ability to quietdown your brain and be honest about what is interesting.’ Sarah Slobin,Visual Journalist

Relevant: Why is it worth providing a view of your data from this angleand not another one? Why is this angle of analysis likely to offer the mostrelevant and compelling window into the subject for your intendedaudience? Is it still relevant in light of the context of the origin curiosity –that is, have definitions evolved since familiarising yourself with the data,learning about its potential qualities as well as researching the subject atlarge?

The judgement of relevance would be similar to the notion ofnewsworthiness in journalism. In that context, terms like timeliness,proximity, novelty, human interest and current prominence are allingredients that shape what ultimately becomes news content. Theecosystem in which your work is consumed is likely to be much narrower

204

in size and diversity than it is for a newspaper, for example. Issues ofhuman interest and novelty will seldom have a bearing on your judgementof relevance. Therefore, I believe it is realistic to reduce the list of factorsthat shape your thinking about relevance to three:

What does your intended audience want or need to know? Thevarious characteristics of your audience’s profile, matters discussed inChapter 1 (accessible design) and Chapter 3 (contextualcircumstances), should provide a good sense of this. Sometimes, youcan simply ask the members of your intended audience: you mightknow who they are personally or at least be able to gather informationabout their needs. On other occasions, with a larger audience, youmight need to consider creating personas: a small number of imaginedidentities that may be demographically representative of the types ofviewer you expect to target. Ask yourself, if you were them, whatwould you want to know?What makes something relevant in your context? Part of yourjudgement will be to consider whether relevance is a product of thenormal or the exceptional; often the worthiness of an item of news isbased on it being exceptional rather than going through the repeatedreporting of normality. Reciting the famous journalistic aphorism,you need to determine if you are reporting news of ‘dog bites man!’or ‘man bites dog!’. A lack of relevance is a curse that strikes a lot ofvisualisation work. What you often see is evidence of data that hasbeen worked up into a visual output just because it is available andjust because visual things are appealing. There is almost a scattergunapproach in hoping that someone, somewhere will find a connectionto justify it as relevant.What do you want your audience to know? You might have thecontrol to decide. Although you respect the possible expressed needsof your audience you might actually be better placed to determinewhat is truly relevant. Depending on the context, and your proximityto the subject and its data, you might have the autonomy to dictate onwhat it is you want to say, more so than what you think the audiencewant to see. Indeed, that audience may not yet know or be sufficientlydomain aware to determine for itself what is relevant or otherwise.

Sufficient: This is about judging how many angles you need. If a chart(generally) offers a single angle into your data, is that sufficientlyrepresentative of what you wish to portray? As I said earlier, you cannot

205

show everything in one chart. Maybe you need multiple charts offering ablend of different angles of analysis to sufficiently represent the mostinteresting dimensions of the subject matter. Perhaps showing a view ofyour data over time needs to be supplemented by a spatial view to providethe context for any interpretations.

It is easy to find yourself being reluctant to commit to just a singularchoice of angle. Even in a small dataset, there are typically multiplepossible angles of analysis you could conduct. It is often hard to ignore thetemptation of wanting to include multiple angles to serve more people’sinterests.

It is important not to fall into the trap of thinking that if you throw moreand more additional angles of analysis into your work you willautomatically enrich that work. Just because you have 100 photographs ofyour holiday, that does not mean you should show me them all. When Ireflect on some of the work I have created down the years, I wish I haddemonstrated better selection discipline – a greater conviction to excludeangles – to avoid additional content creeping in just because it wasavailable. I often found it far too easy to see everything as beingpotentially interesting. And I still do (it’s the curse of the analyst). The realart is to find just enough of those angles that respond to the core essence ofyour – or your inherited – curiosity.

‘I think this is something I’ve learned from experience rather thanadvice that was passed on. Less can often be more. In other words, don’tget carried away and try to tell the reader everything there is to know ona subject. Know what it is that you want to show the reader and don’tstray from that. I often find myself asking others “do we need to showthis?” or “is this really necessary?” Let’s take it out.’ Simon Scarr,Deputy Head of Graphics, ThomsonReuters

FramingThe next perspective to define about your editorial thinking contributes tothe refinement of the angles you have selected. This concerns framingdecisions. In photographic parlance this relates to choices about the fieldof view: what will be included inside the frame of the photograph andwhat will be left out?

206

Just like a photographer, a visualiser must demonstrate careful judgementabout what to show, what not to show, and how to show it. This iseffectively a filtering decision concerned with which data to include andexclude:

All category values, or just a select few?All quantitative values or just those over a certain threshold?All data or just those between a defined start and end date period?

Naturally, the type and extent of the framing you might need to apply willbe influenced by the nature of your trigger curiosity, as well as factors likethe complexity of the subject matter and the amount of data available toshow. Further considerations like the setting (need rapid insights or OK fordeeper, more prolonged engagement?) and output format will also have abearing on this matter.

One of the key motives of framing is to remove unnecessary clutter – thereis only so much that can be accommodated in a single view before itbecomes too busy, too detailed, and too small in resolution. There is onlyso much content your audience will likely be willing and able to process.Inevitably, a balance must be struck to find the most representative view ofyour content. If you zoom in, filtering away too much of the content, itmight hide the important context required for perceiving values.Conversely, if you avoid filtering your content you may fail to makevisible the most salient discoveries.

FocusThe third component of editorial thinking concerns what you might chooseto focus on. This is not a function of filtering – that is the concern offraming – it is about emphasising what is more important in contrast towhat is less important.

The best photographs are able to balance light and colour, not just settingthe mood of a situation but illuminating key elements within the frame thathelp to create depth. They provide a sense of visual hierarchy through theirdepth as well as the sizing and arrangement of each form.

What needs to be brought into view in the foreground, left in the mid-ground, and maybe relegated to the background simply for context or

207

orientation? What needs to be bigger and more prominent and what can beless so?

Whereas framing judgements were about reducing clutter, this is aboutreducing noise. If everything in a visualisation is shouting, nothing isheard; if everything is in the foreground, nothing stands out; if everythingis large, nothing is dominant.

Decisions about focus primarily concern the development of explanatoryvisualisations, because creating such a focus – surfacing insights throughthe astute use of colour or annotated accentuation – is a key purpose forthat type of experience. Beyond colour, focus can be achieved throughcomposition choices such as the way elements are more prominently sizedand located or the way contents are positioned within a view.

5.2 The Influence of Editorial ThinkingIt is important to ground this discussion by explaining practically howthese editorial perspectives will apply to your workflow process and, inparticular, influence your design thinking.

I described a chart as being like a photograph of the data, displaying avisual answer to a data-driven curiosity. Determining the choice of chart(technically, ‘data representation’) is just one part of the overall anatomyof a data visualisation. There are choices to be made about four otherdesign layers, namely features of interactivity, annotation, colour andcomposition.

Your decisions across this visualisation design anatomy are influenced, ina large way, by the editorial definitions you have will make about angle,framing and focus. They might not lead directly or solely to the finalchoices – there are many other factors to consider, as you have seen – butthey will signpost the type of editorial qualities the visualisation will needto accommodate. Let’s look at two illustrations of the connection betweeneditorial and design thinking to explain this.

Example 1: The Fall and Rise of us InequalityThe first example (Figure 5.1) is a chart taken from an article published in

208

the ‘Planet Money: The Economy Explained’ section of the US-basedNational Public Radio (NPR) website. The article is titled ‘The Fall andRise of U.S. Inequality in 2 Graphs’. As the title suggests the full articleincludes two charts, but I just want to focus on the second one for thepurpose of this illustration.

Figure 5.1 The Fall and Rise of U.S. Inequality, in Two Graphs

209

Editorial Perspectives

Let’s assess the editorial perspectives of angle, framing and focus asdemonstrated by this work.

Angle: The main angle of analysis can be expressed as: ‘What is therelationship between two quantitative measures (average income forthe bottom 90% and for the top 1% of earners) and how has thischanged over time (year)?’. This angle would be considered relevantbecause the relationship between the haves and the have-nots is a keyindicator of wealth distribution. It is a topical and suitable choice of

210

analysis to include with any discussion about inequality in the USA.As I mentioned there is a second chart presented so it would bereasonable to say that the two sufficiently cover the necessary anglesto support the article.Framing: The parameters that define the inclusion and exclusion ofdata in the displayed analysis involve filters for time period (1917 to2012) and country (just for the USA). The starting point of the datacommencing from 1917 may reflect a simple arbitrary cut-off point ora significant milestone in the narrative. More likely, it probablyrepresents the earliest available data. One always has a basic desire toalways want every chart to include the most up-to-date view of data.While it only reaches as far forward in time as 2012 (despitepublication in 2015) the analysis is of such historical depth that itshould be considered suitably representative of the subject matter. Tojust focus on the USA is entirely understandable.Focus: The visualisation includes a ‘time slider’ control that allowsusers to move the focus incrementally through each year, colouringeach consecutive yearly marker for emphasis. The colours areorganised into three classifications to draw particular attention to twomain periods of noticeably different relationships between the twoquantitative measures.

Influence on Design Choices

How do these identified editorial perspectives translate directly into designthinking? As you will learn in Chapters 6–10 any visualisation comprisesfive layers of design. Let’s have a look at how they might be influenced byeditorial thinking.

Data representation: The angle is what fundamentally shapes thedata representation approach. In lay terms, it determines which charttype is used. In this example, the defined angle is to show therelationship between two quantitative measures over time (averageincome for bottom 90% vs. top 1% of earners). A suitable chart typeto portray this visually is the scatter plot (as selected). As you willlearn in the next chapter, the scatter plot belongs to the ‘relational’family of chart types. Given there was also a dimension of timeexpressed in this angle, a chart type from the ‘temporal’ family ofcharts could have been used but with the main emphasis being onshowing the relationships the scatter plot was the better choice. The

211

framing perspective defines what data will be included in the chosenchart: only data for the USA and the time period 1917–2012 isdisplayed.Interactivity: As you will discover in Chapter 7, the role ofinteractivity is to enable adjustments to what data is displayed andhow it is displayed. The sole feature of interactivity in this project isoffered through the ‘time slider’ control, which sequences theunveiling of the data points year by year in either a manual orautomated fashion. The inclusion of such interactivity can beinfluenced by the editorial decisions concerning focus: unveiling theyearly values sequences the emphasis on the position – and emergingpattern – of each consecutive value.Annotation: The primary chart annotations on show here are the twoarrows and associated captions, drawing attention to the twoprominent patterns that support the general fall and then rise ofinequality. Again, the inclusion of the captions would be aconsequence of editorial thinking (focus) determining these respectivepatterns in the data should be emphasised to the viewer.Colour: As you will learn about in Chapter 9, one of the keyapplications of colour is to support editorial salience – how toemphasise content and direct the eye. As before, editorial focus wouldinfluence the decision to deploy four colour states within the chart: adefault colour to show all points at the start of the animation and thenthree different emerging colours to separate the three clustered groupsvisually. Note that the final colour choices of red, green and orangetones are not directly informed by editorial thinking, as the identifiedvalue of using four different ones to draw out the focus is what drivesthis choice.Composition: This concerns all of the physical layout, shape and sizedecisions. In this example, the dimensions of editorial thinking havehad limited influence over the composition choices. Although,recognising again that there are two charts in the full article, the focusperspective would have likely informed the decision to sequence theordering of the charts: what made better sense to go first or last andwhy?

Example 2: Why Peyton Manning’s Record WillBe Hard to Beat

212

In this second example, published on ‘TheUpshot’ section of the New YorkTimes website, there are three charts presented in an article titled ‘WhyPeyton Manning’s Record Will Be Hard to Beat’. Here I will look at allthree charts.

Editorial Perspectives

Again, let’s assess the editorial perspectives of angle, framing and focus asdemonstrated by this work.

Figure 5.2 Why Peyton Manning’s Record Will Be Hard to Beat

Angle: The first chart (Figure 5.2) displays the angle of analysisexpressed as ‘How have quantitative values (NFL touchdown passes)broken down by category (quarterbacks) changed over time (year)?’.This analysis was relevant at the time due to the significance ofPeyton Manning setting a new record for NFL quarterbacktouchdown passes, an historic moment and, according to the article,‘evidence of how much the passing game has advanced through thehistory of the game’. Inspired by this achievement, the question posedby this article overall is whether the record will ever be bettered –which would have likely been the origin curiosity that drove thevisualisation project in the first place. The article was time relevantbecause the record had just been achieved. On its own, this analysiswould be deemed insufficient to support the overarching enquiry, asevidenced by the inclusion of two further charts that we will look atshortly.Framing: The parameters that define the inclusion and exclusion

213

framing relate to the time period (1930 to 19 October 2014) andqualifying quantitative threshold (minimum of 30 touchdown passes).It is representative of the truth at the moment of production (i.e. up to19 October 2014) though clearly the data would no longer be up todate as soon as the next round of games took place. The judgment ofthe 30 touchdown passes threshold would either be informed byknowledge of the sport (and 30 TDs being a common measure) ormore likely influenced by the shape of the data for every quarterback,indicating that it was a logical cut-off value.Focus: The chart emphasises the record holder as well as the othercurrent players in order to orientate the significance of theachievement and to highlight other contemporary players who couldhave a chance of pursuing this record. It also emphasises previousrecord holders or noted players to show just how special the newrecord is. If you want to know the achievements of any other player,their career ‘lines’ and values come into focus through mouseover-driven interactivity.

Figure 5.3 Why Peyton Manning’s Record Will Be Hard to Beat

In the second chart (Figure 5.3), the same definitions stand for the angleand framing, but the focus has changed. This chart shows the same angleof analysis as seen in the first chart but is now composed of several smallrepeated charts, each one focusing on the career trajectories of a selectedprevious record holder.

Focus: Colour is used to emphasise the previous record-holdingplayers’ career lines with an illuminating background banding used todisplay the duration/era of their record standing. Value labels show

214

the number of touchdowns achieved.

The final chart (Figure 5.4) has many similarities with the first chart. Onceagain it maintains the same consistent definition for framing and it has thesame focus as the first chart but now there is a subtle difference in angle.

Angle: This is now expressed as: ‘How have cumulative quantitativevalues (NFL touchdown passes) broken down by category(quarterbacks) changed over time (age)?’. The difference is the timemeasure being about age, not year. This is relevant as it provides analternative view of the time measure, switching year for age tocontinue pursuing the curiosity over how long Manning’s recordmight last. More specifically it enquires if ‘the quarterback who willsurpass Manning’s record is playing today?’. Incidentally, as thearticle concludes, it is going to be a very difficult record to beat.

Figure 5.4 Why Peyton Manning’s Record Will Be Hard to Beat

Influence on Design Choices

Now, let’s switch the viewpoint again and look at how this visualisation’sdesign choices are directly informed by the editorial thinking.

Data representation: As I have stated, the angle and framingdimensions are hugely influential in the reasoning of chart type

215

requirements. In each of the charts used we are being shown differentperspectives around the central theme of how touchdown passes havechanged over time for each qualifying quarterback. A line chartshowing cumulative values for all the players was the mostappropriate way of portraying this. Naturally, the line chart belongs tothe ‘temporal’ family of chart types. Alternative angles of analysismay have explored the relationship angle between the measures ofage and total touchdown passes. A scatter plot would have been idealto display that angle, but the inclusion of the cumulative touchdownpasses statistic, as portrayed using the line, made for a much morestriking display of the trajectories.Interactivity: The only feature of interaction determined necessaryhere is achieved through a mouseover event in the first and thirdcharts to reveal the names and total passes for any of the players whoare presented as grey lines. This serves the interests of viewers whowant to identify these background data values for ‘everyone else’. Byintroducing value labels only through interactivity it also means thebusy-ness of labelling all values by default could be elegantly – andwisely – avoided.Annotation: This interactive labelling is a joint decision concernedwith annotation. Elsewhere, the decision to include permanentannotated labels in each chart for category (player) and value(touchdown passes) provides emphasis in the first and third charts onthe career achievements of Peyton Manning, the other currentquarterbacks, and previous record holders. The second chart onlylabels the respective record holders who are the subject of eachseparate display.Colour: The approach to creating focus is further achieved withcolour. In the main chart, emphasis is again drawn to PeytonManning’s line, as the record holder (thick blue line), other currentplayers (highlighted with a blue line) as well as previous recordholders or noted players (dark grey line). For the second chart thelight-blue coloured banding draws out the period of the records heldby selected players down the years. This really helps the viewer toperceive the duration of their records.Composition: The further influence of the editorial decisions forfocus would be seen through the sequencing of the charts in thearticle. Given the rigid dimensions of space in which the articleexists, the decision to order the charts in the way they are presentedwill have been informed by the desired narrative that was required to

216

present analysis to support the articulated statement in the title.

A closing point to make here is that the influence of editorial thinkingdoes not just flow forwards into the design stages. Although presentedas separate, consecutive stages, ‘working with data’ and ‘editorialthinking’ are strongly related and quite iterative: working with datainfluences your editorial perspectives; and your editorial perspectives inturn may influence activities around working with data. In the earlierstages of your development it is useful to create this sequentialdistinction in activities but in reality there will be much toing andfroing. The data transformation activity, in particular, is essentially thekey wormhole that links these two stages. Editorial definitions maytrigger the need for more data to be gathered about the specific subjectmatter or some consolidation in detail to support the desired angles ofanalysis and the framing dimensions. The acquisition of new data willalways then trigger a need to repeat the data examination activity.Editorial definitions might also influence the need for furthercalculations, groupings or general modifications to refine itspreparedness for displaying the analysis.

Summary: Establishing Your EditorialThinkingIn this chapter you learnt about the three perspectives that underpin youreditorial thinking.

Angle

Must be relevant in its potential interest for your audience.Must have sufficient quantities to cover all relevant views – but nomore than required.

Framing

Applying filters to your data to determine the inclusion and exclusioncriteria.Framing decisions must provide access to the most salient content butalso avoid any distorting of the view of the data.

217

Focus

Which features of the display to draw particular attention to?How to organise the visibility and hierarchy of the content?

Tips and Tactics

Data shapes the story, not the other way round: maintain thisdiscipline throughout your work.If your data was especially riddled with gaps, perhaps considermaking this the story: inverting attention towards the potentialconsequence, cause and meaning behind these gaps?There is always something interesting in your data: you just might notbe equipped with sufficient domain knowledge to know this or it maynot be currently relevant. Get to know the difference between relevantand irrelevant by researching and learning more about your subject.Communication: ask people better placed than you, who might havethe subject knowledge, about what is truly interesting and relevant.A good title will often express the main curiosity or angle of analysisfrom the outset, giving viewers a clear idea about what thevisualisation that follows will aim to answer or reveal.

218

Part C Developing Your Design Solution

The Production Cycle

Within the four stages of the design workflow there are two distinct parts.The first three stages, as presented in Part B of this book, were describedas ‘The Hidden Thinking’ stages, as they are concerned with undertakingthe crucial behind-the-scenes preparatory work. You may have completedthem in terms of working through the book’s contents, but in visualisationprojects they will continue to command your attention, even if that isreduced to a background concern.

You have now reached the second distinct part of the workflow whichinvolves developing your design solution. This stage follows a productioncycle, commencing with rationalising design ideas and moving through tothe development of a final solution.

The term cycle is appropriate to describe this stage as there are many loopsof iteration as you evolve rapidly between conceptual, practical andtechnical thinking. The inevitability of this iterative cycle is, in large part,again due to the nature of this pursuit being more about optimisation ratherthan an expectation of achieving that elusive notion of perfection. Trade-offs, compromises, and restrictions are omnipresent as you juggle ambitionand necessary pragmatism.

How you undertake this stage will differ considerably depending on thenature of your task. The creation of a relatively simple, single chart to beslotted into a report probably will not require the same rigour of a formalproduction cycle that the development of a vast interactive visualisation tobe used by the public would demand. This is merely an outline of the mostyou will need to do – you should edit, adapt and participate the steps to fitwith your context.

There are several discrete steps involved in this production cycle:

Conceiving ideas across the five layers of visualisation design.Wireframing and storyboarding designs.Developing prototypes or mock-up versions.

219

Testing.Refining and completing.Launching the solution.

Naturally, the specific approach for developing your design solution (fromprototyping through to launching) will vary hugely, depending particularlyon your skills and resources: it might be an Excel chart, or a Tableaudashboard, an infographic created using Adobe Illustrator, or a web-basedinteractive built with the D3.js library. As I have explained in the book’sintroduction, I’m not going to attempt to cover the myriad ways ofimplementing a solution; that would be impossible to achieve as each taskand tool would require different instructions.

For the scope of this book, I am focusing on taking you through the firsttwo steps of this cycle – conceiving ideas and wireframing/storyboarding.There are parallels here with the distinctions between architecture (design)and engineering (execution) – I’m effectively chaperoning you through tothe conclusion of your design thinking.

To fulfil this, Part C presents a detailed breakdown of the many designoptions you will face when conceiving your visualisation design andprovides you with an appreciation of the key factors that will influence theactual choices you make. The next few chapters are therefore concernedwith the design thinking involved with each of these five layers of thevisualisation design anatomy, namely:

Chapter 6: Data representationChapter 7: InteractivityChapter 8: AnnotationChapter 9: ColourChapter 10: Composition

The sequencing of these layers is deliberate, based on the need to prioritiseyour attention: what will be included and how will it appear. Initially, youwill need to make decisions about what choices to make around datarepresentation (charts), interactivity and annotation. These are the layersthat result in visible design content or features being included in yourwork. You will then complete your design thinking by making decisionsabout the appearance of these visible components, considering their colourand composition.

220

Conceiving: This will cover all your initial thinking across the variouslayers of design covered in the next few chapters. The focus here is onconceiving ideas based on the design options that seem to fit best with thepreparatory thinking that has gone before during the first three stages. Asyou fine-tune your emerging design choices the benefit of sketching re-emerges, helping you articulate your thoughts into a rough visual form. Asmentioned in Chapter 3, for some people the best approach involvessketching with the pen, for others it is best expressed through the mediumof technical fluency. Whichever approach suits you best, it is helpful tostart to translate your conceptual thinking into visual thinking, particularlywhen collaborating. This sketching might build on your instinctivesketched concepts from stage 1, but you should now be far better informedabout the realities of your challenge to determine what is relevant andfeasible.

‘I tend to keep referring back to the original brief (even if it’s a brief I’vemade myself) to keep checking that the concepts I’m creating tick all theright boxes. Or sometimes I get excited about an idea but if I talk about itto friends and it’s hard to describe effectively then I know that the conceptisn’t clear enough. Sometimes just sleeping on it is all it takes to separatethe good from the bad! Having an established workflow is important tome, as it helps me cover all the bases of a project, and feel confident thatmy concept has a sound logic.’ Stefanie Posavec, Information Designer

Wireframing and storyboarding: Wireframing involves creating a low-fidelity illustration of the potential layout for those solutions that willgenerally occupy a single page of space, such as a simple interactivevisualisation or an infographic. There is no need to be too precise just yet,you are simply mapping out what will be on your page/screen (charts,annotations), how they will be arranged and what things (interactivefunctions) it will do. If your project is going to require a deeperarchitecture, like a complex interactive, or will comprise sequenced views,like presentations, reports or animated graphics, each individual wireframeview will be weaved together using a technique called storyboarding. Thismaps out the relationships between all the views of your content to forman overall visual structure. Sometimes you might approach things the otherway round, beginning with a high-level storyboard to provide a skeletonstructure within which you can then form your more detailed thinkingabout the specific wireframe layouts within each page or view.

221

Prototypes/mock-ups: Whereas wireframing and storyboarding arecharacterised by the creation of low-fi ‘blueprints’, the development ofmock-ups (for example, Figure C.1) or prototypes (the terms tend to beused interchangeably) involves advancing your decisions about the contentand appearance of your proposed solution. This effectively leads to thedevelopment of a first working version that offers a reasonably closerepresentation of what the finished product might look like.

Figure C.1 Mockup designs for ‘Poppy Field’

Testing: Once you have an established prototype version, you must thenseek to have it tested. Firstly, you do this ‘internally’ (i.e. by you or bycollaborators/colleagues) to help iron out any obvious immediateproblems. In software development parlance, this would be generallyconsistent with alpha testing. Naturally, beta follows alpha and this iswhere you will seek others to test it, evaluate it, and feedback on it. Thishappens regardless of the output format; it doesn’t need to be a digital,interactive project to merit being tested. There will naturally be manydifferent aspects to your proposed solution that will need checking andevaluating. The three principles of good visualisation design that Ipresented earlier offer a sensible high-level structure to guide this testing:

Trustworthy design testing concerns assessing the reliability of thework, in terms of the integrity of its content and performance. Arethere any inaccuracies, mistakes or even deceptions? Are there anydesign choices that could lead to misunderstandings? Any aspects inhow the data has been calculated or counted that could undermine

222

trust? If it is a digital solution, what is the speed of loading and arethere any technical bugs or errors? Is it suitably responsive andadaptable in its use across different platforms? Try out various userscenarios: multiple and concurrent users, real-time data, all data vssample data, etc. Ask the people testing your solution to try to break itso you can find and resolve any problems now.Accessible design testing relates to how intuitive or sufficiently wellexplained the work is. Do they understand how to read it and what allthe encodings mean? Is the viewer provided with a sufficient level ofassistance that would be required as per the characteristics of theintended audience? Can testers find the answers to the questions youintended them to find and quickly enough? Can they find answers tothe questions they think are most relevant?Elegant design testing relates to questions such as: Is the solutionsuitably appealing in design? Are there any features which areredundant or superfluous design choices that are impeding the processof using the solution?

Who you invite to test your work will vary considerably from one projectto the next but generally you will have different possible people toconsider participating in this task:

Stakeholders: the ultimate customers/clients/colleagues who havecommissioned the work may need to be included in this stage, if notfor full testing then at least to engage them in receiving initial conceptfeedback.Recipients: you might choose a small sample of your target audienceand invite those viewers to take part in initial beta testing.Critical friends: peers/team/colleagues with suitable knowledge andappreciation about the design process may offer a more sophisticatedcapacity to test out your work.You: sometimes (often) it may ultimately be down to you to undertakethe testing, through either lack of access to other people or mosttypically a simple lack of time. To accomplish this effectively youhave to find a way almost to detach yourself from the mindset of thecreator and occupy that of the viewer: you need to see the wood andthe trees.

The timing of when to seek feedback through testing/evaluation will varyacross different contexts again. Sometimes the pressure from stakeholders

223

who request to see progress will determine this. Otherwise, you will needto judge carefully the right moment to do so. You don’t want to getfeedback when it is too late to change or you have invested too much effortcreating a prototype that might require widespread changes in approach.Likewise, it can be risky showing far-too-undercooked concepts tostakeholders or testers when they might not have the capacity to realisethis is just an early indication of the direction of travel. The least valuableform of testing feedback is when pedantic stakeholders spend timepointing out minutiae that of course need correcting but have nosignificance at this stage. No-one comes away with anything of value fromthis kind of situation.

‘We can kid ourselves that we are successful in what we “want” toachieve, but ultimately an external and critical audience is essential.Feedback comes in many forms; I seek it, listen to it, sniff it, touch it, tasteit and respond.’ Kate McLean, Smellscape Mapper and SeniorLecturer Graphic Design

Refining and completing: Based on the outcome of your testing process,this will likely trigger a need to revisit some of the issues that haveemerged and resolve them satisfactorily. Editing your work involves:

correcting issues;stripping away the superfluous content;checking and enhancing preserved content;adding extra degrees of sophistication to every layer of your design;improving the consistency and cohesion of your choices;double-checking the accuracy of every component.

As your work heads towards a state of completion your mindset will needto shift from a micro-level checking back to a macro-level assessment ofwhether you have truly delivered against the contextual requirements andpurpose of your project.

In any creative process a visualiser is faced with having to declare work asbeing complete. Judging this can be quite a tough call to make in manyprojects. As I have discussed plenty of times, your sense of ‘finished’ oftenneeds to be based on when you have reached the status of good enough.While the presence of a looming deadline (and at times increasinglyagitated stakeholders) will sharpen the focus, often it comes down to afingertip sense of when you feel you are entering the period of diminishing

224

returns, when the refinements you make no longer add sufficient value forthe amount of effort you invest in making them.

‘You know you’ve achieved perfection in design, not when you havenothing more to add, but when you have nothing more to take away.’Antoine de Saint-Exupéry, Writer, Poet, Aristocrat, Journalist, andPioneering Aviator

‘Admit that nothing you create on a deadline will be perfect. However, itshould never be wrong. I try to work by a motto my editor likes to say: NoHeroics. Your code may not be beautiful, but if it works, it’s good enough.A visualisation may not have every feature you could possibly want, but ifit gets the message across and is useful to people, it’s good enough. Being“good enough” is not an insult in journalism – it’s a necessity.’ LenaGroeger, Science Journalist, Designer and Developer at ProPublica

‘It was intimidating to release to the public a self-initiated project on sucha delicate subject considering some limitation with content and datasource. But I came to appreciate that it’s OK to offer a relevant way oflooking at the subject, rather than provide a beginning-to-end conclusion.’Valentina D’efilippo, Information Designer, discussing her ‘PoppyField’ project that looked at the history of world conflicts and theresulting loss of life

Launching: The nature of launching work will again vary significantlybased, as always, on the context of your challenge. It may simply beemailing a chart to a colleague or you might be presenting your work to anaudience. For other cases it could be a graphic going to print for anewspaper or involve an anxious go-live moment with the launch of adigital project on a website, to much fanfare and public anticipation.Whatever the context of your ‘launch’ stage, there are a few characteristicmatters to bear in mind – these will not be relevant to all situations butover time you might need to consider their implications for your setting:

Are you ready? Regardless of the scope of your work, as soon as youdeclare work completed and published you are at the mercy of yourdecisions. You are no longer in control of how people will interpretyour work and in what way they will truly use it. If you haveparticularly large, diverse and potentially emotive subject matter, youwill need to be ready for the questions and scrutiny that might head inyour direction.

225

Communicating your work is a big deal. The need to publicise andsell its benefits is of particular relevance if you have a public-facingproject (you might promote it strongly or leave it as a slow burnerthat spreads through ‘word of mouth’). For more modest and personalaudiences you might need to consider directly presenting your workto these groups, coaching them through what it offers. This isparticularly necessary on those occasions when you may be using aless than familiar representation approach.What ongoing commitment exists to support the work? This clearlyrefers to specific digital projects. Do you have to maintain a live datafeed? Will it need to sustain operations with variable concurrentvisitors? What happens if it goes viral – have you got the necessaryinfrastructure? Have you got the luxury of ongoing access to the skillsets required to keep this project alive and thriving?Will you need to revise, update and rerelease the project? As Idiscussed in the contextual circumstances, will you need to replicatethis work on a repeated basis? What can you do to make thereproduction as seamless as possible?What is the work’s likely shelf life? Does it have a point of expiryafter which it could be archived or even killed? How might youdigitally preserve it beyond its useful lifespan?

226

6 Data Representation

In this chapter you will explore in detail the first, and arguably the mostsignificant, layer of the visualisation design anatomy: data representation.This is concerned with deciding in what visual form you wish to showyour data.

To really get under the skin of data representation, we are going to look atit from both theoretical and pragmatic perspectives. You will start bylearning about the building blocks of visual encoding, the real essence ofthis discipline and something that underpins all data representationthinking. Whereas visual encoding is perhaps seen as the purist ‘bottom-up’ viewpoint, the ‘top-down’ perspective possibly offers more pragmaticvalue by framing your data representation thinking around the notion ofchart types. For most people facing up to this stage of data representation,this is conceptually the more practical entry point from which to shapetheir decisions.

To substantiate your understanding of this design layer you will take a tourthrough a gallery of 49 different chart type options, reflecting the manycommon and useful techniques being used to portray data visually in thefield today. This gallery will then by supplemented by an overview of thekey influencing factors that will inform and determine the choices youmake.

6.1 Introducing Visual EncodingAs introduced in the opening chapter, data representation is the act ofgiving visual form to your data. As viewers, when we are perceiving avisual display of data we are decoding the various shapes, sizes, positionsand colours to form an understanding of the quantitative and categoricalvalues represented. As visualisers, we are doing the reverse through visualencoding, assigning visual properties to data values. Visual encodingforms the basis of any chart or map-based data representation, along withthe components of chart apparatus that help complete the chart display.

There are many different ways of encoding data but these always comprise

227

combinations of two different properties, namely marks and attributes.Marks are visible features like dots, lines and areas. An individual markcan represent a record or instance of data (e.g. your phone bill for a givenmonth). A mark can also represent an aggregation of records or instances(e.g. a summation of individual phone charges to produce the bill for agiven month). A set of marks would therefore represent a set of records orinstances (e.g. the 12 monthly phone bills for 2015).

Attributes are variations applied to the appearance of marks, such as thesize, position, or colour. They are used to represent the values held bydifferent quantitative or categorical variables against each record orinstance (or, indeed, each aggregation). If you had 12 marks, one for eachphone bill during 2015, you could use the size attribute of each mark torepresent the various phone bill totals.

Figure 6.1 offers a more visual illustration. In the dataset there are sixrecords, one for each record listed. ‘Gender’ is a categorical variable and‘Years Since First Movie’ is a quantitative variable. ‘Male’ and ‘43’ arethe specific values of these variables associated with Harrison Ford. In theassociated chart, each actor from the table is represented by the mark of aline (or bar). This represents their record or instance in the table. HarrisonFord’s bar is proportionally sized in scale to represent the 43 years sincehis first movie and is coloured purple to distinguish his gender as ‘Male’.Each of the five other actors similarly has a bar sized according to theyears since their first movie and coloured according to their gender.

Figure 6.1 Illustration of Visual Encoding

The objective of visual encoding is to find the right blend of marks andattributes that most effectively will portray the angle of analysis you wishto show your viewers. The factors that shape your choice and define thenotion of what is considered ‘effective’ are multiple and varied in theirinfluence. Before getting on to there, let’s take a closer look at the range of

228

different marks and attributes that are commonly found in the datarepresentation toolkit.

It is worth noting upfront that while the organisation of the ‘attributes’, inparticular, suggests a primary role, several can be deployed to encode bothcategorical (nominal, ordinal) variables and quantitative variables.Furthermore, as you see in the bar chart in Figure 6.1, combinations ofseveral attributes are often applied to marks (such as colour and size) toencode multiple values.

Although beyond the scope of this book, there are techniques beingdeveloped in the field exploring the use of non-visual senses to portraydata, using variations in properties for auditory (sound), haptic (touch),gustatory (taste) and olfactory (smell) senses.

Figure 6.2 List of Mark Encodings

Figure 6.3 List of Attribute Encodings

229

230

Grasping the basics of visual encoding and its role in data visualisation isone of the fundamental pillars of understanding this discipline. However,when it comes to the reality of considering your data representationoptions you do not necessarily need to always approach things from thissomewhat bottom-up perspective. For most people’s needs when creating adata visualisation it is more pragmatic (and perhaps more comprehensible)to think about data representation from a top-down perspective in theshape of chart types.

If marks and attributes are the ingredients, a chart ‘type’ is the recipeoffering a predefined template for displaying data. Different chart typesoffer different ways of representing data, each one comprising uniquecombinations of marks and attributes onto which specific types of data canbe mapped.

Recall that I am using chart type as the all-encompassing term, thoughthis is merely a convenient singular label to cover any variation of map,graph, plot and diagram based around the representation of data.

Let’s work through a few examples to illustrate the relationship betweensome selected chart types demonstrating different combinations of marksand attributes.

To begin with Figure 6.4, visualises the recent fortunes of the world’sbillionaires. The display shows the relative ranking of each profiledbillionaire in the rich list, grouping them by the different sectors ofindustry in which they have developed their wealth. This data is encodedusing the point mark and two attributes of position. The point in thisdeployment is depicted using small caricature face drawings representativeof each individual – effectively unique symbols to represent the distinct‘category’ of each different billionaire. Note that these are points, asdistinct from area marks, because their size is constant and insignificant interms of any quantitative implication. The position in the allocated columnsignifies the industry the individuals are associated with, while the verticalposition signifies the rank (higher position = higher rank towards number1).

For reference, this is considered a derivative of the univariate scatter

231

plot, which usually shows the dispersal of a range of absolute valuesrather than rank.

Figure 6.4 Bloomberg Billionaires

As seen in Chapter 1, the clustered bar chart in Figure 6.5 displays a seriesof line marks (normally described as bars). There are 11 pairs of bars, onefor each of the football seasons included in the aggregated analysis. Theattribute of colour is used to distinguish the bars between the twoquantitative measures displayed: blue is for ‘games’, purple is for ‘goals’.The size dimension of ‘height’ (the widths are constant) along the y-axisscale then represents the quantitative values associated with each seasonand each measure.

Figure 6.6 is called a bubble chart and displays a series of geometric areamarks to represent the top 100 blog posts on my website based on theirpopularity over the previous 100 days. Each circle represents an individualpost and is sized to show the quantitative value of ‘total visits’ and thencoloured according to the seven different post categories I use to organisemy content.

Figure 6.5 Lionel Messi: Games and Goals for FC Barcelona

232

Figure 6.6 Image from the home page of visualisingdata.com

233

Figure 6.7 How the Insane Amount of Rain in Texas Could Turn RhodeIsland Into a Lake

234

Figure 6.7 demonstrates the use of the form, which is more rarely used. Myadvice is that it should remain that way as it is hard for us to judge scalesof volume in 2D displays. However, it can be of merit when values areextremely diverse in size as in this good example. The chart displayedcontextualises the amount of water that had flowed into Texas reservoirs inthe 30 days up to 27 May 2015. The size (volume) of a cube is used todisplay the amount of rain, with 8000 small cubes representing 1000 acre-feet of water (43,560,000 cubic feet or 1233.5 mega litres) to create thewhole (8 million acre-feet), which is then compared against the heights of

235

the Statue of Liberty and what was then the world’s tallest building, theBurj Khalifa, to orient in height terms at least.

6.2 Chart TypesFor many people, creating a visualisation involves using tools that offerchart menus: you might select a chart type and then ‘map’ the records andvariables of data against the marks and attributes offered by that particularchart type. Different tools will offer the opportunity to work with adifferent range of chart types, some with more than others.

As you develop your capabilities in data visualisation and become more‘expressive’ – trying out unique combinations of marks and attributes –your approach might lean more towards thinking about representationfrom a bottom-up perspective, considering the visual encodings you wishto deploy and arriving at a particular chart type as the destination ratherthan an origin. This will be especially likely if you develop or possess atalent for creating visualisations through programming languages.

As the field has matured over the years, and a greater number ofpractitioners have been experimenting with different recipes of marks andattributes, there is now a broad range of established chart types. Onceagain I hesitate to use the universal label of chart type (some mappingtechniques are not chart types per se) but it will suffice. While all of us arelikely to be familiar with the ‘classic three’ – namely, the bar, pie and linechart – there are many other chart type options to consider.

To acquaint you with a broader repertoire of charting options, over thecoming pages I present you with a gallery. This offers a curated collectionof some of the common and useful chart types being used across the fieldtoday. This gallery aims to provide you with a valuable reference that willdirectly assist your judgements, helping you to pick (conceptually, at least)from a menu of options.

I have attempted to assign each chart to one of five main families based ontheir primary analytical purpose. What type of angle of analysis does eachone principally show? Using the five-letter mnemonic CHRTS this shouldprovide a useful taxonomy for organising your thinking about which chartor charts to use for your data representation needs.

236

I know what you’re thinking: ‘well that’s a suspiciously convenientacronym’! Honestly, if it was as intentional as that I would have triedharder to somehow crowbar in an ‘A’ family. OK, I did spend a lot oftime, but I couldn’t find it and it’s now my life’s ambition to do so.Only then will my time on this planet have been truly worthwhile. In themeantime, CHRTS is close enough. Besides, vowels are hugelyoverrated.

Each chart type presented is accompanied by an array of supporting detailsthat will help you fully acquaint yourself with the role and characteristicsof each option.

A few further comments about what this gallery provides:

The primary name used to label each chart type as well as somefurther alternative names that are often usedAn indication of which CHRTS family each chart belongs to, basedon their specific primary role, as well as a sub-family definition forfurther classificationAn indicator for each chart type to show which ones I consider to bemost useful for undertaking Exploratory Data Analysis (the blackmagnifying glass symbol)An indicator for whether I believe a chart would typically requireinteractive features to offer optimum usability (the black cursorsymbol)A description of the chart’s representation: what it shows and whatencodings (marks, attributes) it is comprised ofA working example of the chart type in use with a description of whatit specifically showsA ‘how to read’ guide, advising on the most effective and efficientapproach to making sense of each chart type and what features to lookout forPresentation tips offering guidance on some of the specific choices tobe considered around interactivity, annotation, colour or composition

237

design‘Variations and alternatives’ offer further derivatives and chart‘siblings’ to consider for different purposes

Exclusions: It is by no means an exhaustive list: the vastpermutations of different marks and attributes prevents any finitelimit to how one might portray data visually. I have, however,consciously excluded some chart types from the gallery mainlybecause they were not different enough from other charts that havebeen profiled in detail. I have mentioned charts that representlegitimate derivatives of other charts where necessary but simply didnot deem it worthy to assign a whole page to profile them separately.The Voronoi treemap, for example, is really just a circular treemapthat uses different algorithms to arrange its constituent pieces. Whilethe construction task is different, its usage is not. The waterfall chartis a single stacked bar chart broken down into sequenced stages.Inclusions: I have wrestled with the rights and wrongs of includingsome chart types, unquestionably. The radar chart, for example, hasmany limitations and flaws but is not entirely without merit ifdeployed in a very specific way and only for certain contexts. Byincluding profiles of partially flawed charts like these I am using thegallery as much to signpost their shortcomings so that you know touse them sparingly. There will be some purists gathering in angrymobs and foaming at the mouth in reaction to the audacity of myincluding the pie chart and word cloud. These have limited roles,absolutely, but a role nonetheless. Put down your pitchforks, return toyour homes and have a good read of my caveats. Rather than beingthe poacher of all bad stuff, I think a gamekeeper role is equallyimportant.Although I have excluded several charts on grounds of demonstratingonly a slight variation on profiled charts, there are some typesincluded that do exhibit only small derivations from other charts(such as the bar chart and the clustered bar, or the scatter plot and thebubble plot). In these cases I felt there was sufficient difference intheir practical application, and they were in common usage, to merittheir separate inclusion, despite sharing many similarities with otherprofiled siblings.

‘Interestingly, visualisations of textual data are not as developed as one

238

would expect. There is a great need for such visualisations given theamount of textual information we generate daily, from social media tonews media and so on, not to mention all the materials generated in thepast and that are now digitally available. There are opportunities tocontribute to the research efforts of humanists as well as social scientistsby devising ways to represent not only frequencies of words and topics,but also semantic content. However, this is not at all trivial.’ IsabelMeirelles, Professor, OCAD University (Toronto), discussing one ofthe many remaining unknowns in visualisation

Categorical comparisons: All chart types can feasibly facilitatecomparisons between categories, so why have a separate C family?Well, the distinction is that those charts belonging to the H, R, T andS families offer an additional dimension of analysis as well asproviding comparison between categories.Dual families: Some charts do not fit just into a single family.Showing connected relationships (e.g. routes or flows) on a map isticking the requirements across at least two or family groups(Relational, Spatial). In each case I have tried to best-fit the familyclassifications around the primary angle of analysis portrayed by eachchart – what is the most prominent aspect that characterises eachrepresentation technique.Text visualisation: As I noted in the discussion about data types,when it comes to working with textual-based data you are almostalways going to need to perform some transformation, maybe throughvalue extraction or by applying a statistical technique. The text itselfcan otherwise largely function only as an annotated device. Charttypes used to visualise text actually visualise the properties of text.For example, the word cloud visualises the quantitative frequency ofthe use of words: text might be the subject, but categories (words) andtheir quantities (counts) are the data mappings. Varieties of networkdiagrams might show the relationship between word usage, such asthe sequence of words used in sentences (word trees), but these arestill only made possible through some quantitative, categorical orsemantic property being drawn from the original text.Dashboard: These methods are popular in corporate settings or anycontext where you wish to create instrumentation that offers both at-a-glance and detailed views of many different analytical andinformation monitoring dimensions. Dashboards are not a uniquechart type themselves but rather should be considered projects thatcomprise multiple chart types from across the repertoire of options

239

presented in the gallery. Some of the primary demands of designingdashboards concern editorial thinking (what angles to show and why)and composition choices (how to get it all presented in a unified pagelayout).Small multiples: This is an invaluable technique for visualising databut not necessarily a chart type per se and, once again, more aconcern for about editorial thinking and composition design. Smallmultiples involve repeated display of the same chart type but withadjustments to the framing of the data in each panel. For example,each panel may show the same angle of analysis but for differentcategories or different points in time. Small multiples are highlyvalued because they exploit the capabilities of our visual perceptionsystem when it comes to comparing charts in a simultaneous view,overcoming our weakness at remembering and recalling chart viewswhen consumed through animated sequences or across differentpages.A note about ‘storytelling’: Storytelling is an increasingly popularterm used around data visualisation but I feel it is often misused andmisunderstood, which is quite understandable as we all have differentperspectives. I also feel it is worth clarifying my take on what Ibelieve storytelling means practically in data visualisation andespecially in this discussion about data representation, which is whereit perhaps most logically resides in terms of how it is used.Stories are constructs based on the essence of movement, change ornarrative. A line chart shows how a series of values have changedover a temporal plane. A flow map can reveal what relationships existacross a spatial plane between two points separated by distance – theymay be evident of a journey. However, aside from the temporal andspatial families of charts, I would argue that no other chart familyrealistically offers this type of construct in and of itself.The only way to create a story from other types of charts is toincorporate a temporal dimension (video/slideshow) or provide averbal/written narrative that itself involves a dimension of timethrough the sequence of its delivery.For example, a bar chart alone does not represent a story, but if youshow a ‘before’ and ‘after’ pair of bar charts side by side or betweenslides, you have essentially created ‘change’ through sequence. If youshow a bar chart with a stack on top of it to indicate growth betweentwo points in time, well, you have added a time dimension. Anetwork diagram shows relationships, but stood alone this is not a

240

Sabin Bajracharya

story – its underlying structure and arrangement are in abstract space.Just as you do when showing friends a photograph from your holiday,you might use this chart as a prop to explain how relationshipsbetween some of the different entities presented are significant.Making the chart a prop allows you to provide a narrative. In this caseit is the setting and delivery that are consistent with the notion ofstorytelling, not the chart itself. I made a similar observation aboutthe role of exhibitory visualisations used as props within explanatorysettings.A further distinction to make is between stories as being presentedand stories as being interpreted. The famous six-word story ‘for sale:baby shoes, never worn’ by Ernest Hemingway is not presented as astory, the story is triggered in our mind when we dissect this passageand start to infer meaning, implication and context. The imagined barchart I mentioned earlier in the book that could show the 43 whitepresidents and 1 black president is only presenting a story if it isaccompanied by an explanatory narrative (in which case the chart wasagain really just a prop) or if you understand the meaning of thesignificance of this statistic without this description and are able toform the story in your own mind.

Charts Comparisons

Bar chart

ALSO KNOWN AS Column chart, histogram (wrongly)

REPRESENTATION DESCRIPTION

A bar chart displays quantitative values for different categories. Thechart comprises line marks (bars) – not rectangular areas – with the sizeattribute (length or height) used to represent the quantitative value foreach category.

241

EXAMPLE Comparing the number of Oscar nominations for the 10actors who have received the most nominations without actuallywinning an award.

Figure 6.8 The 10 Actors with the Most Oscar Nominations but NoWins

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know with which categorical value each bar isassociated and what the range of the quantitative values is (min to max).Think about what high and low values mean: is it ‘good’ to be large orsmall? Glance across the entire chart to locate the big, small andmedium bars and perform global comparisons to establish the high-level ranking of biggest > smallest. Identify any noticeable exceptionsand/or outliers. Perform local comparisons between neighbouring bars,to identify larger than and smaller than relationships and estimate therelative proportions. Estimate (or read, if labels are present) the absolutevalues of specific bars of interest. Where available, compare thequantities against annotated references such as targets, forecast, lastyear, average, etc.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlines,in particular, can be helpful to increase the accuracy of the reading ofthe quantitative values. If you have axis labels you should not needdirect labels on each bar – this will lead to label overload, so generallydecide between one or the other.

242

COMPOSITION: The quantitative value axis should always start fromthe origin value of zero: a bar should be representative of the true, fullquantitative value, nothing more, nothing less, otherwise the perceptionof bar sizes will be distorted when comparing relative sizes. There is nosignificant difference in perception between vertical or horizontal barsthough horizontal layouts tend to make it easier to accommodate andread the category labels for each bar. Unlike the histogram, there shouldbe a gap, even if very small, between bars to keep each category’s valuedistinct. Where possible, try to make the categorical sorting meaningful.

VARIATIONS & ALTERNATIVES

A variation in the use of bar charts is to show changes over time. Youwould use a bar chart when the focus is on individual quantitativevalues over time rather than (necessarily) the trend/change betweenpoints, for which a line-chart would be best. ‘Spark bars’ are mini barcharts that aim to occupy only a word’s length amount of space. Theyare often seen in dashboards where space is at a premium and there is adesire to optimise the density of the display. To show furthercategorical subdivisions, you might consider the ‘clustered bar chart’ ora ‘stacked bar chart’ if there is a part-to-whole angle. ‘Dot plots’ offer aparticularly useful alternative to the bar chart for situations where youhave to show large quantitative values with a narrow range ofdifferences.

Charts Comparisons

Clustered bar chart

ALSO KNOWN AS Clustered column chart, paired bar chart

REPRESENTATION DESCRIPTION

A clustered bar chart displays quantitative values for different majorcategories with additional categorical dimensions included for further

243

breakdown. The chart comprises line marks (bars) – not rectangularareas – with the size attribute (length or height) used to represent thequantitative value for each category and colours used to distinguishfurther categorical dimensions.

EXAMPLE Comparing the number of Oscar nominations with thenumber of Oscar awards for the 10 actors who have received the mostnominations.

Figure 6.9 The 10 Actors who have Received the Most OscarNominations

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know with which categorical value each bar isassociated and what the range of the quantitative values is (min to max).Learn about the colour associations to understand what sub-categoriesthe bars within each cluster represent. Glance across the entire chart tolocate the big, small and medium bars and perform global comparisonsto establish the high-level ranking of biggest > smallest. Identify anynoticeable exceptions and/or outliers. Perform local comparisons withinclusters to identify the size relationship (which is larger and by howmuch?) and estimate (or read, if labels are present) the absolute valuesof specific bars of interest.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlines,

244

in particular, can be helpful to increase the accuracy of the reading ofthe quantitative values. If you have axis labels you should not needdirect labels on each bar – this will lead to label overload, so generallydecide between one or the other.

COMPOSITION: The quantitative value axis should always start fromthe origin value of zero: a bar should be representative of the true, fullquantitative value, nothing more, nothing less, otherwise the perceptionof bar sizes will be distorted when comparing relative sizes. If yourcategorical clusters involve a breakdown of more than three bars, itbecomes a little too busy, so you might therefore consider giving eachcluster its own separate bar chart and using small multiples to show achart for each major category. Sometimes one bar might be slightlyhidden behind the other, implying a before and after relationship, oftenwhen space is at a premium – just do not hide too much of the back bar.There is no significant difference in perception between vertical orhorizontal bars though horizontal layouts tend to make it easier toaccommodate and read the category labels for each bar. The individualbars should be positioned adjacent to each other with a noticeable gapand then between each cluster to help direct the eye towards theclustering patterns first and foremost. Where possible try to make thecategorical sorting meaningful.

VARIATIONS & ALTERNATIVES

Clustered bar charts are also sometimes used to show how twoassociated sub-categories have changed over time (like the LionelMessi bar chart discussed in Chapter 1). Alternatives would include the‘dot plot’ or, if you have just two categories forming the clusters andthese categories have a binary state (male, female or yes %, no %), the‘back-to-back bar chart’ would be effective.

Charts Comparisons

Dot plot

245

ALSO KNOWN AS Dot chart

REPRESENTATION DESCRIPTION

A dot plot displays quantitative values for different categories. Incontrast to the bar chart, rather than using the size of a bar, point marks(typically circles but any ‘symbol’ is legitimate) are used with theposition along a scale indicating the quantitative value for eachcategory. Sometimes an area mark is used to indicate one value throughposition and another value through size. Additional categoricaldimensions can be accommodated in the same chart by includingadditional marks differentiated by colour or symbol.

EXAMPLE Comparing the number and percentage of PhDs awardedby gender across different academic subjects.

Figure 6.10 How Nations Fare in PhDs by Sex

HOW TO READ IT & WHAT TO LOOK FOR

For single-series dot plots (i.e. just one dot per row), look at the axes soyou know with which categorical value each row is associated and whatthe range of the quantitative values is (min to max). Where you havemultiple series dot plots (i.e. more than one dot), establish what thedifferent colours/symbols represent in terms of categorical breakdown.Glance across the entire chart to locate the big, small and mediumvalues and perform global comparisons to establish the high-levelranking of biggest > smallest. Identify any noticeable exceptions and/or

246

outliers. Where you have multiple series look across each series of dotvalues separately and then perform local comparisons within rows toidentify the relative position of each dot, observing the gaps, big andsmall. Estimate the absolute values of specific dots of interest. Whereavailable, compare the quantities against annotated references such astargets, forecast, last year, average, etc.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlines,in particular, can be helpful to increase the accuracy of the reading ofthe quantitative values.

COMPOSITION: Given that the quantitative value axis does not needto commence from a zero origin it is important to label clearly the axisvalues when the baseline is not commencing from a minimum of zero.There is no significant difference in perception between vertical orhorizontal arrangement though horizontal layouts tend to make it easierto accommodate and read the category labels for each row. Wherepossible try to make the categorical sorting meaningful, maybeorganising values in ascending/descending size order.

VARIATIONS & ALTERNATIVES

Alternatives would include the ‘bar chart’, to show the size ofquantitative values for different categories. The ‘connected dot plot’would be used to focus on the difference between two measures. The‘univariate scatter plot’ would be used to show the range of multiplevalues across categories, to display the diversity and distribution ofvalues.

Charts Comparisons

Connected Dot Plot

247

ALSO KNOWN AS Barbell chart, dumb-bell chart

REPRESENTATION DESCRIPTION

A connected dot plot displays absolute quantities and quantitativedifferences between two categorical dimensions for different majorcategories. The display is formed by two points (normally circles butany ‘symbol’ is legitimate) to mark the quantitative value positions fortwo comparable categorical dimensions. There is a row of connecteddots for each major category. Colour or difference in symbol isgenerally used to distinguish these points. Joining the two pointstogether is a connecting line which effectively represents the ‘delta’(difference) between the two values.

EXAMPLE Comparing the typical salaries for women and men acrossa range of different job categories in the US.

Figure 6.11 Gender Pay Gap US

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know with which major categorical values eachrow is associated and what the range of the quantitative values is (minto max). Determine which dots resemble which categorical dimension(could be colour, symbol or a combination) and see if there is anymeaning behind the colouring of the connecting bars. Think about whatthe quantitative values mean to determine whether it is a good thing tobe higher or lower. Glance across the entire chart to locate the big,

248

small and medium connecting bars in each direction. Perform globalcomparisons to establish the high-level ranking of biggest > smallestdifferences as well as the highest and lowest values. There may bedeliberate sorting of the display based on one of the quantitativemeasures. Identify any noticeable exceptions and/or outliers. Estimate(or read, if labels are present) the absolute values, direction and size ofdifferences for specific categories of interest.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlines,in particular, can be helpful to increase the accuracy of the reading ofthe quantitative values. Consider labelling categories adjacent to theplotted points rather than next to the axis line (and possibly far awayfrom the values) to make it easier for the reader to understand thecategory–row association.

COLOUR TIPS: Colour may be used to indicate and emphasise thedirectional basis of the connecting line differences.

COMPOSITION: If the two plotted measures are very similar, and thepoint markers effectively overlap, you will need to decide which shouldbe positioned on top. As the representation of the quantitative values isthrough position along a scale and not size (it is the difference that issized, not the absolutes) the quantitative axis does not need to have azero origin. However, a zero origin can be helpful to establish the scaleof the differences. Where possible try to make the sorting meaningfulusing any one of the three quantitative measures to optimise the layout.

VARIATIONS & ALTERNATIVES

Variations in the use of the ‘connected dot plot’ would show before andafter analysis between two points in time, possibly using the ‘arrowchart’ to indicate the direction of change explicitly. Similarly, the‘carrot chart’ uses line width tapering to indicate direction, the fatterend the more recent values. The ‘univariate scatter plot’ would be usedto show the range of multiple values across categories, to display thediversity and distribution of values rather than comparing differencesbetween values.

Charts Comparisons

249

Pictogram

ALSO KNOWN AS Isotype chart, pictorial bar chart, stacked shapechart, tally chart

REPRESENTATION DESCRIPTION

A pictogram displays quantitative values for different major categorieswith additional categorical dimensions included for further breakdown.In contrast with the bar chart, rather than using the size of a bar,quantities of point marks, in the form of symbols or pictures, arestacked to represent the quantitative value for each category. Each pointmay be representative of one or many quantitative units (e.g. a singleshape may represent 1000 people) but note that, unless you use symbolportions, you will not be able to represent decimals. Pictograms may beused to offer a more emotive (humanising or more light-hearted)display than a bar can offer. Additional categorical dimensions can beaccommodated in the same chart by using marks differentiated byvariations in colour, symbol or picture. Always ensure the markers usedare as intuitively recognisable as possible and consider minimising thevariety as this makes it cognitively harder for the viewer to identifyassociations easily and make sense of the quantities.

EXAMPLE Comparing the number of players with different facial hairtypes across the four teams in the NHL playoffs in 2015.

Figure 6.12 Who Wins the Stanley Cup of Playoff Beards?

250

HOW TO READ IT & WHAT TO LOOK FOR

Look at the major categorical axis to establish with which category eachrow is associated. Establish the mark associations to understand whatcategorical dimensions each colour/shape variation represents. Glanceacross the entire chart to locate the big, small and medium stacks ofshapes and perform global comparisons to establish the high-levelranking of biggest > smallest. Identify any noticeable exceptions and/oroutliers. Perform local comparisons between neighbouring categories,to identify larger than and smaller than relationships and estimate therelative proportions. Estimate (or read, if labels are present) the absolutevalues of specific groups of markers of interest.

PRESENTATION TIPS

ANNOTATION: The choice of symbol/ picture should be asrecognisably intuitive as possible and locate any legends as close aspossible to the display.

COLOUR TIPS: Maximise the variation in marker by using differentcombinations in both colour and shape, rather than just variation of oneattribute.

COMPOSITION: If the quantities of markers exceed a single row, tryto make the number of units per row logically ‘countable’, such asdisplaying in groups of 5, 10 or 100. To aid readability, make sure thereis a sufficiently noticeable gap between rows, otherwise sometimes theeye struggles to form the distinct clusters of shapes for each categorydisplayed. Where possible try to make the categorical sortingmeaningful, maybe organising values in ascending/descending sizeorder.

251

VARIATIONS & ALTERNATIVES

Extending the idea of using repeated quantities of representativesymbols, some applications take this further by using large quantities ofindividual symbols to get across the feeling of magnitude and scale.When showing a part-to-whole relationship, the ‘waffle chart’ can usesimple symbol devices to differentiate the constituent parts of a whole.

Charts Comparisons

Proportional shape chart

ALSO KNOWN AS Area chart (wrongly)

REPRESENTATION DESCRIPTION

A proportional shape chart displays quantitative values for differentcategories. The chart is based on the use of different area marks, one foreach category, sized in proportion to the quantities they represent. Byusing the quadratic dimension of area size rather than the lineardimension of bar length or dot position, the shape chart offers scope fordisplaying a diverse range of quantitative values within the same chart.Typically the layout is quite free-form with no baseline or centralgravity binding the display together.

EXAMPLE Comparing the market capitalisation ($) of companiesinvolved in the legal sale of marijuana across different industry sectors.

Figure 6.13 For These 55 Marijuana Companies, Every Day is 4/20

252

HOW TO READ IT & WHAT TO LOOK FOR

Look at the shapes and their associated labels so you know with whatmajor categorical values each is associated. If there are only directlabels, find the largest shape to establish its quantitative value as themaximum and do likewise for the smallest – this will help calibrate thesize judgements. Otherwise, if it exists, acquaint yourself with the sizekey. Glance across the entire chart to locate the big, small and mediumshapes and perform global comparisons to establish the high-levelranking of biggest > smallest. Identify any noticeable exceptions and/oroutliers. Perform local comparisons between neighbouring shapes toidentify larger than and smaller than relationships and estimate therelative proportions. Estimate (or read, if labels are present) the absolutevalues of specific shapes of interest.

PRESENTATION TIPS

ANNOTATION: Sometimes a quantitative size key will be includedrather than direct labelling (usually when there are many shapes andlimited empty space) though direct labels will help overcome some ofthe limitations of judging area size. You will have to decide how tohandle label positioning for those shapes with exceptionally small sizes.

COLOUR TIPS: Colours are not fundamentally necessary to encodecategory (the position/separation of different shapes achieves thatalready) but they can be useful as redundant encodings to make thecategory even more immediately distinguishable.

COMPOSITION: Estimating and comparing the size of areas with

253

accuracy is not as easy as it is for judging bar length or dot position, soonly use this chart type if you have a diverse range of quantitativevalues. The geometric accuracy of the size calculations is paramount.Mistakes are often made, in particular, with circle size calculations: it isthe area you are modifying, not the diameter/radius. Arrangementapproaches vary: sometimes you see the shapes anchored to a commonbaseline (bottom or central alignment) while on other occasions theymight just ‘float’. If you use an organic shape, like a human figure, torepresent different quantities you need to adjust the entire shape area,not just the height. Often the approach for this type of display is to treatthe figure as a rudimentary rectangular shape. Sometimes the volume ofa shape is used rather than area to represent quantitative values(especially if there are almost exponentially different values to show)but this increases the perceptual difficulty in estimating and comparingvalues. Where possible try to make the categorical sorting meaningful,maybe organising values in ascending/descending size order.

VARIATIONS & ALTERNATIVES

The ‘bubble chart’ uses clusters of sized bubbles to compare categoricalvalues and, sometimes, to represent part-to-whole analysis. The ‘nestedshape chart’ might include secondary, smaller area sizes nested withineach shape to display local part-to-whole relationships.

Charts Comparisons

Bubble chart

ALSO KNOWN AS Circle packing diagram

EXAMPLE Comparing the Public sector capital expenditure (£million) on services by function of the UK Government during 2014/15.

REPRESENTATION DESCRIPTION

254

A bubble chart displays quantitative values for different majorcategories with additional categorical dimensions included for furtherbreakdown. It is based on the use of circles, one for each category, sizedin proportion to the quantities they represent. Sometimes severalseparate clusters may be used to display further categorical dimensions,otherwise the colouring of each circle can achieve this. It is similar inconcept to the proportional shape chart but differs through the typicallayout being based on clustering, which therefore also enables it as adevice for showing part-to-whole relationships as well.

Figure 6.14 UK Public Sector Capital Expenditure, 2014/15

HOW TO READ IT & WHAT TO LOOK FOR

Look at the shapes and their associated labels so you know with whatmajor categorical values each is associated, noting any size and colourlegends to assist in forming associations. If there are multiple clusters,learn about the significance of the grouping/separation in each case. Ifthere are direct labels, find the largest shape to establish its quantitativevalue as the maximum and do likewise for the smallest – this will helpcalibrate other size judgements. Glance across the entire chart to locatethe big, small and medium shapes and perform global comparisons toestablish the high-level ranking of biggest > smallest. Identify anynoticeable exceptions and/or outliers. Perform local comparisonsbetween neighbouring shapes to identify larger than and smaller thanrelationships and estimate the relative proportions. Estimate (or read, iflabels are present) the absolute values of specific shapes of interest. Ifthere are multiple clusters, note the general relative size and number ofmembers in each case.

255

PRESENTATION TIPS

INTERACTIVITY: Bubble charts may often be accompanied byinteractive features that let users select or mouseover individual circlesto reveal annotated values for the quantity and category.

ANNOTATION: If interactivity is not achievable, a quantitative sizekey should be included or direct labelling; the latter may make thedisplay busy (and be hard to fit into smaller circles) but will helpovercome some of the limitations of judging area size.

COLOUR TIPS: Colours are sometimes used as redundant encodingsto make the quantitative sizes even more immediately distinguishable.

COMPOSITION: Estimating and comparing the size of areas withaccuracy is not as easy as it is for judging bar length or dot position, soonly use this chart type if you have a diverse range of quantitativevalues. The use of this chart will primarily be about facilitating a gist, ageneral sense of the largest and smallest values. The geometricaccuracy of the circle size calculations is paramount. Mistakes are oftenmade with circle size calculations: it is the area you are modifying, notthe diameter/radius. If you wish to make your bubbles appear as 3Dspheres you are essentially no longer representing quantitative valuesthrough the size of a geometric area mark; rather the mark will be a‘form’ and so the size calculation will be based on volume, not area.There is no categorical or quantitative sorting applied to the layout ofthe bubble chart, instead the tools that offer these charts will generallyuse a layout algorithm that applies a best-fit clustering to arrange thecircles radially about a central ‘gravity’ force.

VARIATIONS & ALTERNATIVES

When the collection of quantities represents a whole, this evolves into achart known as a ‘circle packing diagram’ and usually involves manyparts that pack neatly into a circular layout representing the whole.Another variation of the packing diagram is when the adjacencybetween circle ‘nodes’ indicates a connected relation, offering avariation of the node–link diagram for showing networks ofrelationships. The bubble plot also uses differently sized circles but theposition in each case is overlaid onto a scatter plot structure, based ontwo dimensions of further quantitative variables. Removing the sizeattribute (and effectively replacing area with point mark) you couldsimply use the quantity of points clustered together for differentcategories to create a ’tally chart’.

256

Charts Comparisons

Radar chart

ALSO KNOWN AS Filled radar chart, star chart, spider diagram, webchart

EXAMPLE Comparing the global competitive scores (out of 7) across12 ‘pillars’ of performance for the United Kingdom.

REPRESENTATION DESCRIPTION

A radar chart shows values for three or more different quantitativemeasures in the same display for, typically, a single category. It uses aradial (circular) layout comprising several axes emerging from thecentre-like spokes on a wheel, one for each measure. The quantitativevalues for each measure are plotted through position along each scaleand then joined by connecting lines to form a unique geometric shape.Sometimes this shape is then filled with colour. A radar chart shouldonly be considered in situations where the cyclical ordering (andneighbourly pairings) has some significance (such as data that might beplotted around the face of a clock or compass) and when thequantitative scales are the same (or similar) for each axis. Do not plotvalues for multiple categories on the same radar chart, but use smallmultiples formed of several radar charts instead.

Figure 6.15 Global Competitiveness Report 2014—2015

257

HOW TO READ IT & WHAT TO LOOK FOR

Look around the chart and acquaint yourself with the quantitativemeasure represented by each axis and note the sequencing of themeasures around the display. Is there any significance in thisarrangement that can assist in interpreting the overall shape? Note therange of values along each independent axis so you understand whatpositions along the scales mean in a value sense for each measure. Scanthe shape to locate the outliers both towards the outside (larger values)and inside (smaller values) of the scales. It is more important to payattention to the position of values along an axis than the nature of theconnecting lines between axes, unless the axis scales are consistent or atleast if the relative position along the scale has the same impliedmeaning. If the variable sequencing has cyclical relevance, the spiking,bulging or contracting shape formed will give you some sense of thebalance of values. Perform local comparisons between neighbouringaxes to identify larger than and smaller than relationships. Estimate (orread, if labels are present) the absolute values of specific shapes ofinterest.

PRESENTATION TIPS

258

ANNOTATION: The inclusion of visible annotated features like axislines, tick marks, gridlines and value labels can naturally aid thereadability of the radar chart. Gridlines are only relevant if there arecommon scales across each quantitative variable. If so, the gridlinesmust be presented as straight lines, not concentric arcs, because theconnecting lines joining up the values are themselves straight lines.

COLOUR TIPS: Often the radar shapes are filled with a colour,sometimes with a degree of transparency to allow the backgroundapparatus to be partially visible.

COMPOSITION: The cyclical ordering of the quantitative variableshas to be of optimum significance as the connectors and shape changefor every different ordering permutation. This will have a major impacton the readability and meaning of the resulting chart shape. As the axeswill be angled all around the radial display, you will need to make sureall the associated labels are readable (i.e. not upside down or at difficultangles).

VARIATIONS & ALTERNATIVES

A ‘polar chart’ is an alternative to the radar chart that removes some ofthe main shortcomings caused by connecting lines in the radar chart. Ifyou have consistent value scales across the different quantitativemeasures, a ‘bar chart’ or ‘dot plot’ would be a better alternative. Whilenot strictly a variation, ‘parallel coordinates’ display a similar techniquefor plotting several independent quantitative measures in the samechart. The main difference is that parallel coordinates use a linear layoutand can accommodate many categories in one display.

Charts Comparisons

Polar chart

ALSO KNOWN AS Coxcomb plot, polar area plot

259

REPRESENTATION DESCRIPTION

A polar chart shows values for three or more different quantitativemeasures in the same display. It uses a radial (circular) layoutcomprising several equal-angled circular sectors like slices of a pizza,one for each measure. In contrast to the radar chart (which uses positionalong a scale), the polar chart uses variation in the size of the sectorareas to represent the quantitative values. It is, in essence, a radiallyplotted bar chart. Colour is an optional attribute, sometimes usedvisually to indicate further categorical dimensions. A polar chart shouldonly be considered in situations where the cyclical ordering (andneighbourly pairings) has some significance (such as data that might beplotted around the face of a clock or compass) and when thequantitative scales are the same (or similar) for each axis.

EXAMPLE Comparing the quantitative match statistics across 14different performance measures for a rugby union player.

Figure 6.16 Excerpt from a Rugby Union Player Dashboard

HOW TO READ IT & WHAT TO LOOK FOR

260

Look around the chart and acquaint yourself with the quantitativemeasures each sector represents and note the sequencing of themeasures around the display. Is there any significance in thisarrangement that can assist in interpreting the overall shape? Note therange of values included on the quantitative scale and acquaint yourselfwith any colour associations. Glance across the entire chart to locate thebig, small and medium sectors and perform global comparisons toestablish the high-level ranking of biggest > smallest. Identify anynoticeable exceptions and/or outliers. Perform local comparisonsbetween neighbouring variables to identify the order of magnitude andestimate the relative sizes. Estimate (or read, if labels are present) theabsolute values of specific sectors of interest. Where available, comparethe quantities against annotated references such as targets, forecast, lastyear, average, etc. If there is significance behind the sequencing of thevariables, look out for any patterns that emerge through spiking,bulging or contracting shapes.

PRESENTATION TIPS

ANNOTATION: The inclusion of visible annotated features like tickmarks and value labels can naturally aid the readability of the polarchart. Gridlines are only relevant if there are common scales acrosseach quantitative variable. If so, the gridlines must be presented as arcsreflecting the outer shape of each sector. Connecting lines joining upthe values are themselves straight lines. Each sector typically uses thesame quantitative scale for each quantitative measure but, on theoccasions when this is not the case, each axis will require its own, clearvalue scale.

COLOUR TIPS: Often polar chart sectors are filled with a meaningfulcolour, sometimes with a degree of transparency to allow thebackground apparatus to be partially visible.

COMPOSITION: The cyclical ordering of the quantitative variableshas to be of some significance to legitimise the value of the polar chartover the bar chart. As the sectors will be angled all around the radialdisplay, you will need to make sure all the associated labels arereadable (i.e. not upside down or at difficult angles). The quantitativevalues represented by the size of the sectors need to be carefullycalculated. It is the area of the sector, not the radius length, that will bemodified to portray the values accurately. If you make maximumquantitative value equivalent to the largest sector area, all other sectorsizes can be calculated accordingly. Knowing how many differentquantitative variables you are showing means you can easily calculate

261

the angle of any given sector. The quantitative measure axes shouldalways start from the origin value of zero: a sector should berepresentative of the true, full quantitative value, nothing more, nothingless, otherwise the perception of size will be distorted when comparingrelative sizes.

VARIATIONS & ALTERNATIVES

Unless the radial layout provides meaning through the notion of a‘whole’ or through the cyclical arrangement of measures, you might bebest using a ‘bar chart’. Variations in approach tend to seemodifications in the sector shape with measure values represented byindividual bars lengths or, in the example of the Better Life Indexproject, through variations in ‘petal’ sizes.

Charts Distributions

Range Chart

ALSO KNOWN AS Span chart, floating bar chart, barometer chart

REPRESENTATION DESCRIPTION

A range chart displays the minimum to maximum distribution of aseries of quantitative values for different categories. The display isformed by a bar, one for each category, with the lower and upperposition of the bars shaped by the minimum and maximum quantitativevalues in each case. The resulting bar lengths thus represent the rangeof values between the two limits.

EXAMPLE Comparing the highest and lowest temperatures (°F)recorded across the top 10 most populated cities during 2015.

Figure 6.17 Range of Temperatures Recorded in Top 10 MostPopulated Cities (2015)

262

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know with what major categorical values eachrange bar is associated and what the range of the quantitative values is(min to max). Glance across the entire chart to locate the big, small andmedium bars and perform global comparisons to establish the high-level ranking of biggest > smallest differences as well as the highest andlowest values. Identify any noticeable exceptions and/or outliers.Perform local comparisons between neighbouring bars, to identifylarger than and smaller than relationships and estimate the relativeproportions. There may be deliberate sorting of the display based onone of the quantitative measures. Estimate (or read, if labels arepresent) the absolute values of specific bars of interest. Whereavailable, compare the quantities against annotated references such astargets, forecast, last year, average, etc.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlines,in particular, can be helpful to increase the accuracy of the reading ofthe quantitative values. If you have axis labels you may not need directlabels on each bar – this will be lead to label overload, so generallydecide between one or the other.

COMPOSITION: The quantitative value axis does not need tocommence from zero, unless it means something significant to the

263

interpretation, as the range of values themselves does not necessarilystart from zero and the focus is more on the range and differencebetween the outer values. There is no significant difference inperception between vertical or horizontal layouts, though the latter tendto make it easier to accommodate and read the category labels. Wherepossible, try to make the categorical sorting meaningful, maybeorganising values in ascending/descending size order.

VARIATIONS & ALTERNATIVES

‘Connected dot plots’ will also emphasise the difference between twoselected measure values (as opposed to min/max) or where theunderlying data is a change over time between two observations. ‘Bandcharts’ will often be used to show how the range of data values haschanged over time, displaying the minimum and maximum bands ateach time unit. These are often used in displays like weather forecasts.

Charts Distributions

Box-and-whisker plot

ALSO KNOWN AS Box plot

REPRESENTATION DESCRIPTION

A box-and-whisker plot displays the distribution and shape of a seriesof quantitative values for different categories. The display is formed bya combination of lines and point markers to indicate (through positionand length), typically, five different statistical measures. Three of thestatistical values are common to all plots: the first quartile (25thpercentile), the second quartile (or median) and the third quartile (75thpercentile) values. These are displayed with a box (effectively a widebar) positioned and sized according to the first and third quartile valueswith a marker indicating the median. The remaining two statisticalvalues vary in definition: usually either the minimum and maximum

264

values or the 10th and 90th percentiles. These statistical values arerepresented by extending a line beyond the bottom and top of the mainbox to join with a point marker indicating the appropriate position.These are the whiskers. A plot will be produced for each majorcategory.

EXAMPLE Comparing the distribution of annual earnings 10 yearsafter starting school for graduates across the eight Ivy League schools.

Figure 6.18 Ranking the Ivies

HOW TO READ IT & WHAT TO LOOK FOR

Begin by looking at the axes so you know with which category eachplot is associated and what the range of quantitative values is (min tomax). Establish the specific statistics being displayed, by consulting anylegends or descriptions, especially in order to identify what the‘whiskers’ are representing. Glance across the entire chart to locate themain patterns of spread, identifying any common or noticeably different

265

patterns across categories. Look across the shapes formed for eachcategory to learn about the dispersal of values: starting with the median,then observing the extent and balance of the ‘box’ (the interquartilerange between the 25th and 75th percentiles) and then check the‘whisker’ extremes. Is the shape balanced or skewed around themedian? Is the interquartile range wide or narrow? Are the whiskerextremes far away from the edges of the box? Then return to comparingshapes across all categories to identify more precisely any interestingdifferences or commonalities for each of the five statistical measures.

PRESENTATION TIPS

ANNOTATION: If you have axis labels you may not need direct labelson each bar – this will lead to label overload, so generally decidebetween one or the other.

COMPOSITION: The quantitative value axis does not need tocommence from zero, unless it means something significant to theinterpretation, as the range of values themselves do not necessarily startfrom zero and the focus is on the statistical properties between the outervalues. There is no significant difference in perception between verticalor horizontal box-and-whisker plots, though horizontal layouts tend tomake it easier to accommodate and read the category labels. Try to keepa noticeable gap between plots to enable greater clarity in reading.When you have several or many plots in the same chart, where possibletry to make the categorical sorting meaningful, maybe organising valuesin ascending/descending order based on the median value.

VARIATIONS & ALTERNATIVES

Variations involve reducing the number of statistical measures includedin the display by removing the whiskers to just show the 25th and 75thpercentiles through the lower and upper parts of the box. The‘candlestick chart’ (or OHLC chart) involves a similar approach and isoften used in finance to show the distribution and milestone values ofstock performances during a certain time frame (usually daily), plottingthe opening, highest, lowest and closing prices, using colour to indicatean up or down trend.

Charts Distributions

266

Univariate scatter plot

ALSO KNOWN AS 1D scatter plot, jitter plot

REPRESENTATION DESCRIPTION

A univariate scatter plot displays the distribution of a series ofquantitative values for different categories. In contrast to the box-and-whisker plot, which shows selected statistical values, a univariatescatter plot shows all values across a series. For each category, a rangeof points (typically circles but any ‘symbol’ is legitimate) are used tomark the position along the scale of the quantitative values. From thisyou can see the range, the outliers and the clusters and form anunderstanding about the general shape of the data.

EXAMPLE Comparing the distribution of average critics score (%)from the Rotten Tomatoes website for each movie released across arange of different franchises and movie theme collections.

Figure 6.19 Comparing Critics Scores for Major Movie Franchises

267

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know what each scatter row/column relates toin terms of which category it is associated with and what the range ofthe quantitative values is (min to max). If colour has been used toemphasise or separate different marks, establish what the associationsare. Also, learn about how the design depicts multiple marks on thesame value – these may appear darker or indeed larger. Glance acrossthe entire chart to observe the main patterns of clustering and identifyany noticeable exceptions and/or outliers across all categories. Thenlook more closely at the patterns within each scatter to learn about eachcategory’s specific dispersal of values. Look for empty regions whereno quantitative values exist. Estimate the absolute values of specificdots of interest. Where available, compare the quantities againstannotated references such as the average or median.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like gridlines can be helpfulto increase the accuracy of the reading of the quantitative values. Directlabelling is normally restricted to including values for specificallynoteworthy points only.

COLOUR: Colour may be used to establish focus of certain points

268

and/or distinction between different sub-category groups to assist withinterpretation. When several points have the exact same value youmight need to use unfilled or semi-transparent filled circles to facilitatea sense of value density.

COMPOSITION: The representation of the quantitative values isbased on position and not size, therefore the quantitative axis does notneed to have a zero origin. There is no significant difference inperception between vertical or horizontal arrangement, thoughhorizontal layouts tend to make it easier to accommodate and read thecategory labels. Where possible try to make the categorical sortingmeaningful, maybe organising values in ascending/descending sizeorder.

VARIATIONS & ALTERNATIVES

To overcome occlusion caused by plotting several marks at the samevalue, a variation of the univariate scatter plot may see the pointsreplaced by geometric areas (like circles), where the position attribute isused to represent a quantitative value along a scale and the size attributeis used to indicate the frequency of observations of similar value.Adding a second quantitative variable axis would lead to the use of a’scatter plot’.

Charts Distributions

Histogram

ALSO KNOWN AS Bar chart (wrongly)

REPRESENTATION DESCRIPTION

A histogram displays the frequency and distribution for a range ofquantitative groups. Whereas bar charts compare quantities for differentcategories, a histogram technically compares the number of

269

observations across a range of value ‘bins’ using the size of lines/bars(if the bins relate to values with equal intervals) or the area ofrectangles (if the bins have unequal value ranges) to represent thequantitative counts. With the bins arranged in meaningful order (thateffectively form ordinal groupings) the resulting shape formed revealsthe overall pattern of the distribution of observations.

EXAMPLE Comparing the distribution of movies released over timestarring Michael Caine across five-year periods based on the date ofrelease in the US.

Figure 6.20 A Career in Numbers: Movies Starring Michael Caine

HOW TO READ IT & WHAT TO LOOK FOR

Begin by looking at the axes so you know what the chart depicts interms of the categorical bins and the range of the quantitative values(zero to max). Glance across the entire chart to establish the mainpattern. Is it symmetrically shaped, like a bell or pyramid (around amedian or average value)? Is it skewed to the left or right? Does it dipin the middle and peak at the edges (known as bimodal)? Does it haveseveral peaks and troughs? Maybe it is entirely random in its pattern?

270

All these characteristics of ‘shape’ will inform you about the underlyingdistribution of the data.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlinesin particular can be helpful to increase the accuracy of the reading ofthe quantitative values. Axis labels more than direct value labels tend tobe used so as not to crowd the shape of the histogram.

COMPOSITION: Unlike the bar chart there should be no (or at most avery thin) gap between bars to help the collective shape of thefrequencies emerge. The sorting of the quantitative bins must be inascending order so that the reading of the overall shape preserves itsmeaning. The number of value bins and the range of values covered byeach have a prominent influence over the appearance of the histogramand the usefulness of what it might reveal: too few bins may disguiseinteresting nuances, patterns and outliers; too many bins and the mostinteresting shapes may be abstracted by noise above signal. There is nosingular best approach, the right choice simply arrives throughexperimentation and iteration.

VARIATIONS & ALTERNATIVES

For analysis that looks at the distribution of values across twodimensions, such as the size of populations for age across genders, a‘back-to-back histogram’ (with male on one side, female on the other),also commonly known as a ‘violin plot’ or ‘population pyramid’, is auseful approach to see and compare the respective shapes. A ‘box-and-whisker plot’ reduces the distribution of values to five key statisticalmeasures to describe key dimensions of the spread of values.

Charts Distributions

Word cloud

271

ALSO KNOWN AS Tag cloud

REPRESENTATION DESCRIPTION

A word cloud shows the frequency of individual word items used intextual data (such as tweets, comments) or documents (passages,articles). The display is based around an enclosed cluster of words withthe font (not the word length) sized according to the frequency of usage.In modifying the size of font this is effectively increasing the area sizeof the whole word. All words have a different shape and size so this canmake it quite difficult to avoid the prominence of long words,irrespective of their font size. Word clouds are therefore only usefulwhen you are trying to get a quick and rough sense of some of thedominant keywords used in the text. They can be an option for workingwith qualitative data during the data exploration stage, more so as ameans for reporting analysis to others.

EXAMPLE Comparing the frequency of words used in Chapter 1 ofthis book.

Figure 6.21 Word Cloud of the Text from Chapter 1

HOW TO READ IT & WHAT TO LOOK FOR

The challenge with reading word clouds is to avoid being drawn to thelength and/or area of a word – they are simply attributes of the word,not a meaningful representation of frequency. It is the size of the fontthat you need to focus on. Scan the display to spot the larger textshowing the more frequently used words. Consider any words ofspecific interest to see if you can find them; if they are not significantlyvisible, that in itself could be revealing. While most word cloud

272

generators will dismiss many irrelevant words, you might still need tofilter out perceptually the significance of certain dominantly sized text.

PRESENTATION TIPS

INTERACTIVITY: Interactive features that let users interrogate, filterand scrutinise the words in more depth, perhaps presenting examples oftheir usage in a passage, can be quite useful to enhance the value of aword cloud.

ANNOTATION: While the absolutes are generally of less interest thanrelative comparisons, to help viewers get as much out of the display aspossible a simple legend explaining how the font size equates tofrequency number can be useful.

COLOUR: Colours may be used as redundant encoding to accentuatefurther the larger frequencies or categorically to create useful visualseparation.

COMPOSITION: The arrangement of the words within a word cloudis typically based on a layout process. Although not random, this willgenerally prioritise the placement of words to occupy optimumcollective space that preserves an overall shape (with essentially acentral gravity) over and above any arrangement that might betterenable direct comparison.

VARIATIONS & ALTERNATIVES

The alternative approach would be to use any other method in thiscategorical family of charts that would more usefully display the countsof text, such as a bar chart.

Charts Part-to-whole

Pie chart

273

ALSO KNOWN AS Pizza chart

REPRESENTATION DESCRIPTION

A pie chart shows how the quantities of different constituent categoriesmake up a whole. It uses a circular display divided into sectors for eachcategory, with the angle representing each of the percentageproportions. The resulting size of the sector (in area terms) is a spatialby-product of the angle applied to each part and so offers an additionalmeans for judging the respective values. The role of a pie chart isprimarily about being able to compare a part to a whole than being ableto compare one part to another part. They therefore work best whenthere are only two or three parts included. There are a few importantrules for pie charts. Firstly, the total percentage values of all sectorvalues must be 100%; if the aggregate is greater than or less than 100%the chart will be corrupted. Secondly, the whole has to be meaningful –often people just add up independent percentages but that is not what apie chart is about. Finally, the category values must represent exclusivequantities; nothing should be counted twice or overlap across differentcategories. Despite all these warnings, do not be afraid of the pie chart –just use it with discretion.

EXAMPLE Comparing the proportion of eligible voters in the 2015UK election who voted for the Conservative Party, for other parties andwho did not vote.

Figure 6.22 Summary of Eligible Votes in the UK General Election2015

274

HOW TO READ IT & WHAT TO LOOK FOR

Begin by establishing which sectors relate to what categories. This mayinvolve referring to a colour key legend or through labels directlyadjacent to the pie. Quickly scan the pie to identify the big, medium andsmall sectors. Notice if there is any significance behind the ordering ofthe parts. Unless there are value labels, you next will attempt to judgethe individual sector angles. This usually involves mentally breakingthe pie into 50% halves (180°) or 25% quarters (90°) and using thoseguides to perceptually measure the category values. Comparing partsagainst other parts with any degree of accuracy will only be possibleonce you have formed estimates of the individual sector sizes. If youare faced with the task of judging the size of many parts it is quiteunderstandable if you decide to give up quite soon.

PRESENTATION TIPS

ANNOTATION: The use of local labelling for category values can beuseful but too many labels can become cluttered, especially whenattempting to label very small angled sectors.

COLOUR: Colour is generally vital to create categorical separation

275

and association of the different sectors so aim to use the difference incolour hue and not colour saturation to maximise the visible difference.

COMPOSITION: Positioning the first slice at the vertical 12 o’clockposition gives a useful baseline to help judge the first sector anglevalue. The ordering of sectors using descending values or ordinalcharacteristics helps with the overall readability and allocation of effort.Do not consider using gratuitous decoration (like 3D, gradient colours,or exploding slices).

VARIATIONS & ALTERNATIVES

Sometimes a pie chart has a hole in the centre and is known as a‘doughnut chart’, continuing the food-related theme. The function isexactly the same as a pie but the removal of the centre, often toaccommodate a labelling property, removes the possibility of the readerjudging the angles at the origin. One therefore has to derive the anglesfrom the resulting arc lengths. If you want to display multiple parts(more than three) the bar chart will be a better option and, for manyparts, the ‘treemap’ is best. Depending on the allocated space, a‘stacked bar chart’ may provide an alternative to the pie. Unlike mostchart types, the pie chart does not work well in the form of smallmultiples (unless there is only a single part being displayed). A ‘nestedshape chart’, typically based on embedded square or circle areas,enables comparison across a series of one-part-to-whole relationshipsbased on absolute numbers, rather than percentages, where the wholesmay vary in size.

Charts Part-to-whole

Waffle chart

ALSO KNOWN AS Square pie, unit chart, 100% stacked shape chart

REPRESENTATION DESCRIPTION

276

A waffle chart shows how the quantities of different constituentcategories make up a whole. It uses a square display usuallyrepresenting 100 point ‘cells’ through a 10 × 10 grid layout. Eachconstituent category proportion is displayed through colour-coding aproportional number of cells. Difference in symbol can also be used.The role of the waffle chart is to simplify the counting of proportions incontrast to the angle judgements of the pie chart, though the display islimited to rounded integer values. This is easier when the grid layoutfacilitates quick recognition of units of 10. As with the pie chart, thewaffle chart works best when you are showing how a single partcompares to the whole and perhaps offers greater visual impact whenthere are especially small percentages of a whole. Rather than justcolouring in the grid cells, sometimes different symbols will be used toassociate with different categories. For example, you might see figuresor gender icons used to show the makeup of a given sample population.

EXAMPLE Comparing the proportion of total browser usage forInternet Explorer and Chrome across key milestone moments.

Figure 6.23 The Changing Fortunes of Internet Explorer and GoogleChrome

HOW TO READ IT & WHAT TO LOOK FOR

Begin by establishing how the different shapes or colours are associatedwith different categories. Assess the grid layout to understand thedimension of the chart and the quantity of cell ‘units’ forming thedisplay (e.g. is it a 10 x 10 grid?). Quickly scan the chart to identify thebig, medium and small sectors. Notice if there is any significance

277

behind the ordering of the parts. Unless there are value labels, you willneed to count/estimate the number of units representing each categoryvalue. Comparing parts against other parts will only be possible onceyou have established the individual part sizes. If several related wafflecharts are shown, possibly for different categories or points in time,identify the related colours/shapes in each chart and establish thepatterns of size between and across the various charts, looking fortrends, declines and general differences.

PRESENTATION TIPS

ANNOTATION: Direct labelling can become very cluttered and hardto incorporate elegantly without the need for long arrows.

COLOUR: Borders around each square cell are useful to help establishthe individual units, but do not make the borders too thick to the pointwhere they dominate attention.

COMPOSITION: Always start each row of values from the same side,for consistency and to make it easier for people to estimate the values.When you have several parts in the same waffle chart, where possibletry to make the categorical sorting meaningful, maybe organising valuesin ascending/descending size order or based on a logical categoricalorder.

VARIATIONS & ALTERNATIVES

Sometimes the waffle chart approach is used to show stacks of absoluteunit values and indeed there are overlaps in concept between thisvariation in the waffle chart and potential applications of the pictogram.Aside from the pie chart, a ‘nested shape chart’ will provide analternative way of showing a part-to-whole relationship while alsooccupying a squarified layout.

Charts Part-to-whole

Stacked bar chart

278

ALSO KNOWN AS

REPRESENTATION DESCRIPTION

A stacked bar chart displays a part-to-whole breakdown of quantitativevalues for different major categories. The percentage proportion of eachcategorical dimension or ‘part’ is represented by separate bars,distinguished by colour, that are sized according to their proportion andthen stacked to create the whole. Sometimes the whole is standardisedto represent 100%, at other times the whole will be representative ofabsolute values. Stacked bar charts work best when the parts are basedon ordinal dimensions, which enables ordering of the parts within thestack to help establish the overall shape of the data. If the parts arerepresentative of nominal data, it is best to keep the number ofconstituent categories quite low, as estimating the size of individualstacked parts when there are many becomes quite hard.

EXAMPLE Comparing the percentage of adults (16–65 year olds)achieving different proficiency levels in literacy across differentcountries.

Figure 6.24 Literarcy Proficiency: Adult Levels by Country

279

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know with what major categorical values eachbar is associated and what the quantitative values are, determining if itis a 100% stacked bar or an absolute stacked bar (in which case identifythe min and the max). Establish the colour association to understandwhat categories the bars within each stack represent. Glance across theentire chart. If the categorical data is ordinal, and the sorting/colour ofthe stacks is intuitive, you should be able to derive meaning from theoverall balance of colour patterns, especially where any annotatedgridlines help to guide your value estimation. If the categorical data isnominal, seek to locate the dominant colours and the least noticeableones. Comparing across different stacked bars is made harder by thelack of a common baseline for anything other than the bottom stack onthe zero baseline (and for 100% stacked bars, those final ones at thetop) and so a general sense of magnitude will be your focus. Studycloser the constituent parts within each stack to establish the high-levelranking of biggest > smallest. Estimate (or read, if labels are present)

280

the absolute values of specific stacked parts of interest.

PRESENTATION TIPS

ANNOTATION: Direct value labelling can become very clutteredwhen there are many parts or stacks and you are comparing severaldifferent major categories. You might be better with a table if that isyour aim. Definitely include value axis labels with logical intervals andit is very helpful to annotate, through gridlines, key units such as the25%, 50% and 75% positions when based on a 100% stacked bar chart.

COLOUR: If you are representing categorical ordinal data, colour canbe astutely deployed to give a sense of the general balance of valueswithin the whole, but this will only work if their sorting arrangementwithin the stack is logically applied. For categorical nominal data,ensure the stacked parts have sufficiently different colours so that theirdistinct bar lengths can be efficiently observed.

COMPOSITION: Across the main categories, once again consider theoptimum sorting option, maybe organising values inascending/descending size order or based on a logical categorical order.Judging the size of the stacks with accuracy is harder for those that arenot on the zero baseline, so maybe consider which ones are of mostimportance to be more easily read and place those on the baseline.

VARIATIONS & ALTERNATIVES

The main alternative would be to use ‘multi-panel bar charts’, whereseparate bar charts each include just one ’stack’/part and they are thenrepeated for each subsequent constituent category. In the world offinance the ‘waterfall chart’ is a common approach based on a singlestacked bar broken up into individual elements, almost like a step-by-step narrative of how the components of income look on one side andthen how the components of expenditure look on the other, with theremaining space representing the surplus or deficit. Like their unstackedsiblings, stacked bar charts can also be used to show how categoricalcomposition has changed over time.

Charts Part-to-whole

281

Back-to-back bar chart

ALSO KNOWN AS Paired bar chart

REPRESENTATION DESCRIPTION

A back-to-back bar chart displays a part-to-whole breakdown ofquantitative values for different major categories. As with any bar chart,the length of a bar represents a quantitative proportion or absolute valuefor each part and across all major categories. In contrast to the stackedbar chart, where the constituent bars are simply stacked to form awhole, in a back-to-back bar chart the constituent parts are based ondiverging categorical dimensions with a ‘directional’ essence such asyes/no, male/female, agree/disagree. The values for each dimension aretherefore presented on opposite sides of a shared zero baseline to helpreveal the shape and contrast differences across all major categories.

EXAMPLE Comparing the responses to a survey question asking foropinions about ‘the government collection of telephone and Internetdata as part of anti-terrorism efforts’ across different demographiccategories.

Figure 6.25 Political Polarization in the American Public

282

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know with which major categorical values eachbar is associated and what the range of the quantitative values is (min tomax). Establish what categorical dimensions are represented by therespective sides of the display and any colour associations. Glanceacross the entire chart to locate the big, small and medium bars andperform global comparisons to establish the high-level ranking ofbiggest > smallest. Repeat this for each side of the display, noticing anypatterns of dominance of larger values on either side. Identify anynoticeable exceptions and/or outliers. Perform local comparisons foreach category value to estimate the relative sizes (or read, if labels arepresent) of each bar.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlinesin particular can be helpful to increase the accuracy of the reading ofthe quantitative values.

COLOUR: The bars either side of the axis do not need to be colouredbut often are to create further visual association.

COMPOSITION: The quantitative value axis should always start fromthe origin value of zero: a bar should be representative of the true, fullquantitative value, nothing more, nothing less, otherwise the perception

283

of bar sizes will be distorted when comparing relative sizes. There is nosignificant difference in perception between vertical or horizontal bars,though horizontal layouts tend to make it easier to accommodate andread the category labels. Where possible try to make the categoricalsorting meaningful, maybe organising values in ascending/descendingsize order or based on a logical categorical order.

VARIATIONS & ALTERNATIVES

Back-to-back bar charts facilitate a general sense of the shape ofdiverging categorical dimensions. However, if you want to facilitatedirect comparison, a ‘clustered bar chart’ showing adjacent bars helpsto compare respective heights more precisely. For analysis that looks atthe distribution values across two dimensions, such as the size ofpopulations for age across genders, a ‘back-to-back histogram’ (withmale on one side, female on the other), also commonly known as a‘violin plot’ or ‘population pyramid’, is a useful approach to see andcompare the respective shapes. Some back-to-back applications do notshow a part-to-whole relationship but simply compare quantities fortwo categorical values. Further variations may appear as ‘back-to-backarea charts’ showing mutual change over time for two contrastingstates.

Charts Part-to-whole

Treemap

ALSO KNOWN AS Heat map (wrongly)

REPRESENTATION DESCRIPTION

A treemap is an enclosure diagram providing a hierarchical display toshow how the quantities of different constituent parts make up a whole.It uses a contained rectangular layout (often termed ‘squarified’)representing the 100% total divided into proportionally sized

284

rectangular tiles for each categorical part. Colour can be used torepresent an additional quantitative measure, such as an indication ofamount of change over a time period. The absolute positioning anddimension of each rectangle is organised by an underlying tilingalgorithm to optimise the overall space usage and to cluster relatedcategories into larger rectangle-grouped containers. Treemaps are mostcommonly used, and of most value, when there are many parts to thewhole but they are only valid if the constituent units are legitimatelypart of the same ‘whole’.

EXAMPLE Comparing the relative value of and the daily performanceof stocks across the S&P 500 index grouped by sectors and industries.

Figure 6.26 FinViz: Standard and Poor’s 500 Index

HOW TO READ IT & WHAT TO LOOK FOR

Look at the high-level groupings to understand the different containingarrangements and establish what the colour association is. Glanceacross the entire chart to seek out the big, small and medium individualrectangular sizes and perform global comparisons to establish a generalranking of biggest > smallest values. Also identify the largest throughto smallest container group of rectangles. If the colour coding is basedon quantitative variables, look out for the most eye-catching patterns atthe extreme end of the scale(s). If labels are provided (or offeredthrough interactivity), browse around the display looking for categoriesand values of specific interest. As with any display based on the size ofthe area of a shape, precise reading of values is hard to achieve and so itis important to understand that treemaps can only aim to provide a

285

single-view gist of the properties of the many components of the whole.

PRESENTATION TIPS

INTERACTIVITY: Typically, a treemap will be presented withinteractive features to enable selection/mouseover events to revealfurther annotated details and/or drill-down navigation.

ANNOTATION: Group/container labels are often allocated a cell ofspace but these are not to be read as proportional values. Effectivedirect value labelling becomes difficult as the rectangles get smaller, sooften only the most prominent values might be annotated. Interactivefeatures will generally offer visibility of the relevant labels wherepossible.

COLOUR: Colour can also be used to provide further categoricalgrouping distinction if not already assigned to represent a quantitativemeasure of change.

COMPOSITION: As the tiling algorithm is focused on optimising thedimensions and arrangement of the rectangular shapes, treemaps maynot always be able to facilitate much internal sorting of high to lowvalues. However, generally you will find the larger shapes appear in thetop left of each container and work outwards towards the smallerconstituent parts.

VARIATIONS & ALTERNATIVES

A variation of the treemap sees the rectangular layout replaced by acircular one and the rectangular tiles replaced by organic shapes. Theseare known as ‘Voronoi treemaps’ as the tiling algorithm is informed bya Voronoi tessellation. The ‘circle packing diagram’, a variation of the‘bubble chart’, similarly shows many parts to a whole but uses a non-tessellating circular shape/layout. The ‘mosaic plot’ or ‘Marimekkochart’ is similar in appearance to a treemap but, in contrast to thetreemap’s hierarchical display, presents a detailed breakdown ofquantitative value distributions across several categorical dimensions,essentially formed by varied width stacked bars.

Charts Part-to-whole

286

Venn diagram

ALSO KNOWN AS Set diagram, Euler diagram (wrongly)

REPRESENTATION DESCRIPTION

A Venn diagram shows collections of and relationships betweenmultiple sets. They typically use round or elliptical containers torepresent all different ‘membership’ permutations to include allindependent and intersecting containers. The size of the contained areais (typically) not important: what is important is in which containingregion a value resides, which may be represented through the mark of atext label or ‘point’.

EXAMPLE Comparing sets of permutations for legalities aroundmarijuana usage and same-sex marriage across states of the USA.

Figure 6.27 This Venn Diagram Shows Where You Can Both SmokeWeed and Get a Same-Sex Marriage

287

HOW TO READ IT & WHAT TO LOOK FOR

To read a Venn diagram firstly establish what the different containersare representative of in terms of their membership. Assess themembership of the intersections (firstly ‘all’, then ‘partial’ intersectionswhen involving more than two sets) then work outwards towards theindependent container regions where values are part of one set but notpart of others. Occasionally there will be a further grouping stateoutside of the containers that represents values that have nomembership with any set at all.

PRESENTATION TIPS

ANNOTATION: Unless you are using point markers to representmembership values, clear labels are vital to indicate how many or whichelements hold membership with each possible set combination.

COLOUR: Colour is often used to create more immediate distinctionbetween the intersections and independent parts or members of each

288

container.

COMPOSITION: As the attributes of size and shape of the containersare of no significance there is more flexibility to manipulate the displayto fit the number of sets around the constraint of real estate you arefacing and to get across the set memberships you are attempting toshow. The complexity of creating containers to accommodate allcombinations of intersection and independence states increases as thenumber of sets increases, especially to preserve all possiblecombinations of intersections between and independencies from all sets.As the number of sets increases, the symmetry of shape reduces and thecircular containers are generally replaced with ellipses. While it istheoretically possible to exceed four and five set diagrams, the ability ofreaders to make sense of the displays diminishes and so they commonlyinvolve only two or three different sets.

VARIATIONS & ALTERNATIVES

A common variation or alternative to the Venn (but often mistakenlycalled a Venn) is the ‘Euler diagram’. The difference is that an Eulerdiagram does not need to present all possible intersections with andindependencies from all sets. A different approach to visualising sets(especially larger numbers) can be achieved using the ‘UpSet’technique.

Charts Hierarchies

Dendrogram

ALSO KNOWN AS Node–link diagram, layout tree, cluster tree, treehierarchy

REPRESENTATION DESCRIPTION

A dendrogram is a node–link diagram that displays the hierarchical

289

relationship across multiple tiers of categorical dimensions. It displays ahierarchy based on multi-generational ‘parent-and-child’ relationships.Starting from a singular origin root node (or ‘parent’) each subsequentset of constituent ‘child’ nodes, a tier below and represented by points,is connected by lines (curved or straight) to indicate the existence of arelationship. Each constituent node may have further sub-constituenciesrepresented in the same way, continuing down through to the lowest tierof detail. Each ‘generational’ tier is presented at the same relativedistance from the origin. The layout can be based on either a linear treestructure (typically left to right) or radial tree (outwards from thecentre).

EXAMPLE Showing a breakdown of the 200+ beer brands belongingto SAB InBev across different countries grouped by continent.

Figure 6.28 The 200+ Beer Brands of SAB InBev

HOW TO READ IT & WHAT TO LOOK FOR

290

Reading a dendrogram will generally be a highly individual experiencebased on your familiarity with the subject and your interest in exploringcertain hierarchical pathways. The main focus of attention will likely beto find the main clusters from where most constituent parts branch outand to contrast these with the thinner, lighter paths comprising fewerparts. Work left to right (linear) or in to out (radial) through thedifferent routes that stoke your curiosity.

PRESENTATION TIPS

ANNOTATION: With labelling required for each node, depending onthe number of tiers and the amount of nodes, the size of the text willneed to be carefully considered to ensure readability and minimise theeffect of clutter.

COLOUR: Colour would be an optional choice for accentuating certainnodes or applying some further visual categorisation.

COMPOSITION: There are several different layout options to displaytree hierarchies like the dendrogram. The common choice is a clusterlayout based on the ‘Reingold–Tilford’ tree algorithms that offers atidying and optimisation treatment for the efficiency of the arrangementof the nodes and connections. The sequencing of sub-constituenciesunder each node could be logically arranged in some more meaningfulway than just alphabetical, though the cataloguing nature of A–Z maysuit your purpose. The choice of a linear or radial tree structure will beinformed largely by the space you have to work in as well as by thecyclical or otherwise nature of the content in your data. The main issueis likely to be one of legibility if and when you have numerous layers ofdivisions and many constituent parts to show in a single view.

VARIATIONS & ALTERNATIVES

More advanced applications of dendrograms are used to presenthierarchical clustering (in fields such as computational biology) andapply more quantitative meaning to the length of the links and thepositioning of the nodes. The ‘tree hierarchy diagram’ offers a similartree structure but introduces quantitative attributes to the nodes usingarea marks, such as circles, sized according to a quantitative value. Analternative approach to the dendrogram could involve a ‘linear bracket’.This might show hierarchical structures for data-related sportingcompetitions with knock-out format. The outer nodes would be thestarting point representing all the participating competitors/teams. Eachsubsequent tier would represent those participants who progressed tothe next round, continuing through to the finalists and eventual victors.

291

Charts Hierarchies

Sunburst

ALSO KNOWN AS Adjacency diagram, icicle chart, multi-level piechart

EXAMPLE Showing a breakdown of the types of companiesresponsible for extracting different volumes of carbon-based fuelsthrough various activities.

REPRESENTATION DESCRIPTION

A sunburst chart is an adjacency diagram that displays the hierarchicaland part-to-whole relationships across multiple tiers of categoricaldimensions. In contrast to the dendrogram, the sunburst uses layers ofconcentric rings, one layer for each generational tier. Each ring layer isdivided into parts based on the constituent categorical dimensions atthat tier. Each part is represented by a different circular arc section thatis sized (in length; width is constant) according to the relativeproportion. Starting from the centre ‘parent’ tier, the outward adjacencyof the constituent parts of each tier represents the ‘parent-and-child’hierarchical composition.

Figure 6.29 Which Fossil Fuel Companies are Most Responsible forClimate Change?

292

HOW TO READ IT & WHAT TO LOOK FOR

Reading a sunburst chart will be a highly individual experience basedon your familiarity with the subject and your interest in exploringcertain hierarchical pathways. The main focus of attention will likely beto find the largest arc lengths, representing the largest single constituentparts, and those layers or tiers with the most constituent parts. Workfrom the centre outwards through the different routes that stoke yourcuriosity. Depending on the deployment of colour, this may help youidentify certain additional categorical patterns.

PRESENTATION TIPS

INTERACTIVITY: Often interactive mouseover/selection events arethe only way to reveal the annotations here.

ANNOTATION: Labelling can be quite difficult to fit into the narrowspaces afforded by small proportion ‘parts’. If interactivity is not an

293

option you may decide to label only those parts that can accommodatethe text space.

COLOUR: Colours are often used to achieve further categoricaldistinction.

COMPOSITION: Sometimes the parent–child (and other generational)relationships could be legitimately reversed, so decisions need to bemade about the best hierarchy sequencing to suit the curiosities of theaudience. The sequencing of sub-constituencies under each node couldalso be logically arranged in a meaningful way, more so than justalphabetical, unless the cataloguing nature of A–Z ordering suits yourpurpose.

VARIATIONS & ALTERNATIVES

Where the sunburst chart uses a radial layout, the ‘icicle chart‘ uses avertical, linear layout starting from the top and moving downwards. Thechoice of a linear or radial tree structure will be informed largely by thespace you have to work in as well as by the legitimacy of the cyclicalnature of the content in your data. A variation on the sunburst chartwould be the ‘ring bracket’. This might show a reverse journey forhierarchical data based on something like sporting competitions withknock-out formats. The outer concentric partitions would represent theparticipant competitors/teams at the start of the process. The length ofthese arc line parts would be equally distributed across all constituentparts with each subsequent tier representing ‘participants’ who progressforward to the next ‘round’, continuing through to the finalists andeventual victors in the centre.

Charts Correlations

Scatter plot chart

ALSO KNOWN AS Scatter graph

294

REPRESENTATION DESCRIPTION

A scatter plot displays the relationship between two quantitativemeasures for different categories. Scatter plots are used to explorevisually the potential existence, extent or absence of a significantrelationship between the plotted variables. The display is formed bypoints (usually a dot or circle), representing each category and plottedpositionally along quantitative x- and y-axes. Sometimes colour is usedto distinguish categorical dimensions across all the points. Scatter plotsdo not work too well if one or both of the quantitative measures haslimited variation in value as this especially causes problems of‘occlusion’, whereby multiple instances of the similar values are plottedon top of each other and essentially hidden from the reader.

EXAMPLE Exploring the relationship between life expectancy and thepercentage of healthy years across all countries.

Figure 6.30 How Long Will We Live — And How Well?

HOW TO READ IT & WHAT TO LOOK FOR

Learn what each quantitative axis relates to and make a note of therange of values in each case (min to max). Look at what category orobservation each plotted value on the chart refers to and look up anycolour associations being used for categorical distinction. Scan the chartlooking for the existence of any diagonal trends that might suggest a

295

linear correlation between the variables, or note the complete absenceof any pattern, to mean no correlation. Annotations will often assist indetermining the significance of any patterns like this. Identify anyclusters of points and also look at the gaps, which can be just asrevealing. Some of the most interesting observations come fromindividual outliers standing out separately from others. Look out for anypatterns formed by points with similar categorical colour. One approachto reading the ‘meaning’ of the plotted positions involves trying tobreak down the chart area into a 2 × 2 grid translating what markspositioned in those general areas might mean – which corner is ‘good’or ‘bad’ to be located in? Remember that ruling out significantrelationships can be just as useful as ruling them in.

PRESENTATION TIPS

ANNOTATION: Gridlines can be useful to help make the valueestimates clearer and reference lines (such as a trend line of best fit)might aid interpretation. It is usually hard to make direct labelling of allvalues work well. Firstly, it can be tricky making it clear which valuerelates to which point, especially when several points may be clusteredtogether. Secondly, it creates a lot of visual clutter. Labelling choicesshould be based on values that are of most interest to include editoriallyunless interactive features enable annotations to be revealed throughselection or mouseover events. If possible, you might consider putting anumber inside the marker to indicate a count of the number of points atthe same position if this occurs.

COLOUR: If colours are being used to distinguish the differentcategories, ensure these are as visibly different as possible. On theoccasion where multiple values may be plotted close to or on top ofeach other, you might need to use semi-transparency to enableoverlapping of points to build up a recognisably darker colourcompared to other points, indicating an underlying stack of values at thesame location on the chart.

COMPOSITION: As the encoding of the plotted point values is basedon position along an axis, it is not necessary to start the axes from azero baseline, so just make the scale ranges as representative as possibleof the range of values being plotted. Ideally a scatter plot will have a1:1 aspect ratio (equally as tall as it is wide), creating a squared area tohelp patterns surface more evidently. If one quantitative variable (e.g.weight) is likely to be affected by the other variable (e.g. height), it isgeneral practice to place the former on the y-axis and the latter on the x-axis. If you have to use a logarithmic quantitative scale on either or both

296

axes, you need to make this clear to readers so they avoid makingincorrect conclusions from the resulting patterns (that might implycorrelation if the values were linear, for example).

VARIATIONS & ALTERNATIVES

A ‘ternary plot’ is a variation of the scatter plot through the inclusion ofa third quantitative variable axis. The ‘bubble plot’ also incorporates athird quantitative variable, this time through encoding the size of ageometric shape (replacing the point marker). A ‘scatter plot matrix’involves a single view of multiple scatter plots presenting differentcombinations of plotted quantitative variables, used to explore possiblerelationships among larger multivariate datasets. A ‘connected scatterplot’ compares the shifting state of two quantitative measures over time.

Charts Correlations

Bubble plot

ALSO KNOWN AS Bubble chart

REPRESENTATION DESCRIPTION

A bubble plot displays the relationship between three quantitativemeasures for different categories. Bubble plots are used visually toexplore the potential existence, extent or absence of a significantrelationship between the plotted variables. In contrast to the scatter plot,the bubble plot plots proportionally sized circular areas, for eachcategory, across two quantitative axes with the size representing a thirdquantitative measure. Sometimes colour is used to distinguishcategorical dimensions across all the shapes.

EXAMPLE Exploring the relationship between rates of murders,burglaries (per 100,000 population) and population across states of theUSA.

297

Figure 6.31 Crime Rates by State

HOW TO READ IT & WHAT TO LOOK FOR

Learn what each quantitative axis relates to and make a note of therange of values in each case (min to max). Look at what category orobservation each plotted value on the chart refers to. Establish thequantitative size associations for the bubble areas and look up anycolour associations being used for categorical distinction. Scan the chartlooking for the existence of any diagonal trends that might suggest alinear correlation between the variables, or note the complete absenceof any pattern, to mean no correlation. Annotations will often assist indetermining the significance of any patterns like this. Identify anyclusters of points and also look at the gaps, which can be just asrevealing. Some of the most interesting observations come fromindividual outliers standing out separately from others. Look out for anypatterns formed by points with similar categorical colour. What can youlearn about the distribution of small, medium or large circles: are theyclustered together in similar regions of the chart or quite randomlyscattered? One approach to reading the ‘meaning’ of the plottedpositions involves trying to break down the chart area into a 2 × 2 gridtranslating what marks positioned in those general areas might mean –which corner is ‘good’ or ‘bad’ to be located in? Remember that rulingout significant relationships can be just as useful as ruling them in.

298

Estimating and comparing the size of areas is not as easy as it is forjudging bar length or dot position. This means that the use of this charttype will primarily be about facilitating a gist – a general sense of thehierarchy of the largest and smallest values.

PRESENTATION TIPS

ANNOTATION: Gridlines can be useful to help make the valueestimates clearer and reference lines (such as a trend line of best fit)might aid interpretation. It is usually hard to make direct labelling of allvalues work well. Firstly, it can be tricky making it clear which valuerelates to which point, especially when several points may be clusteredtogether. Secondly, it creates a lot of visual clutter. Labelling choicesshould be based on values that are of most interest to include editoriallyunless interactive features enable annotations to be revealed throughselection or mouseover events.

COLOUR: If colours are being used to distinguish the differentcategories, ensure these are as visibly different as possible. When acircle has a large value its size will often overlap in spatial terms withother values. The use of outline borders and semi-transparent colourshelps with the task of avoiding occlusion (visually hiding values behindothers).

COMPOSITION: As the encoding of the plotted area marker values isbased on position along an axis, it is not necessary to start the axes froma zero baseline – just make the scale ranges as representative as possibleof the range of values being plotted. Make sensible decisions about howlarge to make the maximum bubble size; this will usually require trialand error experimentation to find the right balance. Ideally a bubble plotwill have a 1:1 aspect ratio (equally as tall as it is wide), creating asquared area to help patterns surface more evidently. If one quantitativevariable (e.g. weight) is likely to be affected by the other variable (e.g.height), it is general practice to place the former on the y-axis and thelatter on the x-axis. Geometric accuracy of the circle size calculations isparamount, since mistakes are often made with circle size calculations:it is the area you are modifying, not the diameter/radius. If you wish tomake your bubbles appear as 3D spheres you are essentially no longerrepresenting quantitative values through the size of a geometric areamark, rather the mark will be a ‘form’ and so the size calculation willbe based on volume, not area.

VARIATIONS & ALTERNATIVES

If the third quantitative variable is removed, the display would just

299

become a ‘scatter plot’. Variations on the bubble plot might see the useof different geometric areas as the markers, maybe introducing extrameaning from the underlying data through the shape, size anddimensions used.

Charts Correlations

Parallel coordinates

ALSO KNOWN AS Parallel sets

REPRESENTATION DESCRIPTION

Parallel coordinates display multiple quantitative measures for differentcategories in a single display. They are used visually to explore therelationships and characteristics of multi-dimensional, multivariate data.Parallel coordinates are based on a series of parallel axes representingdifferent quantitative measures with independent axis scales. Thequantitative values for each measure are plotted and then connected toform a single line. Each connected line represents a different categoryrecord. Colour may be used to differentiate further categoricaldimensions. As more data is added the collective ’shape’ of the dataemerges and helps to inform the possibility of relationships existingamong the different measures. Parallel coordinates look quiteoverwhelming but remember that they are almost always only used toassist in exploratory work of large and varied datasets, more so thanbeing used for explanatory presentations of data. Generally the greaterthe number of measures, the more difficult the task of making sense ofthe underlying patterns will be, so be discerning in your choice ofwhich variables to include. This method does not work for showingcategorical (nominal) measures nor does it really offer value with theinclusion of low-range, discrete quantitative variables used (e.g. numberof legs per human). Patterns will mean very little when intersecting withsuch axes (they may be better deployed as a filtering parameter or acoloured categorical separator).

300

EXAMPLE Exploring the relationship between nutrient contents for 14different attributes across 1,153 different items of food.

Figure 6.32 Nutrient Contents — Parallel Coordinates

HOW TO READ IT & WHAT TO LOOK FOR

Look around the chart and acquaint yourself with what eachquantitative measure axis represents. Also note what kind of sequencingof measure has been used: are neighbouring measures significantlypaired? Note the range of values along each independent axis so youunderstand what positions along the scales represent and can determinewhat higher and lower positions mean. If colour has been used to grouprelated records then identify what these represent. Scan the overall massof lines to identify any major patterns. Study the patterns in the spacebetween each pair of adjacent axes. This is where you will really see thepotential presence or absence of, and nature of, relationships betweenmeasures. The main patterns to identify involve the presence of parallellines (showing consistent relationships), lines converging in similardirections (some correlation) and then complete criss-crossing (negativerelationship). Look out for any associations in the patterns across colourgroupings. Remember that ruling out significant relationships can bejust as useful as ruling them in.

PRESENTATION TIPS

INTERACTIVITY: Parallel coordinates are particularly useful whenoffered with interactive features, such as filtering techniques, enablingthe user to interrogate and manipulate the display to facilitate visualexploration. Additionally, the option to rearrange the sequence of themeasures can be especially useful.

301

ANNOTATION: The inclusion of visible annotated features like axislines, tick marks, gridlines and value labels can naturally aid thereadability of the data but be aware of the impact of clutter.

COLOUR: When you are plotting large quantities of records,inevitably there will be over-plotting and this might disguise the realweight of values, so the variation in the darkness of colour can be usedto establish density of observations.

COMPOSITION: The ordering of the quantitative variables has to beof optimum significance as the connections between adjacent axes willoffer the main way of seeing the local relationships: the patterns willchange for every different ordering permutation. Remember that theline directions connecting records are often inconsequential in theirmeaning unless neighbouring measures have a common scale andsimilar meaning: the connections are more about establishingcommonality of pattern across records, rather than there being anythingtoo significant behind the absolute slope direction/length.

VARIATIONS & ALTERNATIVES

The ‘radar chart’ has similarities with parallel coordinates in that theyinclude several independent quantitative measures in the same chart buton a radial layout and usually only showing data for one record in thesame display. A variation on the parallel coordinate would be the‘Sankey diagram’, which displays categorical composition andquantitative flows between different categorical dimensions or ‘stages’.

Charts Correlations

Heat map

ALSO KNOWN AS Matrix chart, mosaic plot

REPRESENTATION DESCRIPTION

302

A heat map displays quantitative values at the intersection between twocategorical dimensions. The chart comprises two categorical axes witheach possible value presented across the row and column headers of atable layout. Each corresponding cell is then colour-coded to represent aquantitative value for each combination of category pairing. It is noteasy for the eye to determine the exact quantitative values representedby the colours, even if there is a colour scale provided, so heat mapsmainly facilitate a gist of the order of magnitude.

EXAMPLE Exploring the connections between different Avengerscharacters appearing in the same Marvel comic book titles between1963 and 2015.

Figure 6.33 How the ‘Avengers’ Line-up Has Changed Over the Years

HOW TO READ IT & WHAT TO LOOK FOR

Learn what each categorical dimension relates to and make a note of therange of values in each case, paying attention to the significance of anyordering. Establish the quantitative value associations for the colourscales, usually found via a legend. Glance across the entire chart tolocate the big, small and medium shades (generally darker = larger) andperform global comparisons to establish the high-level ranking ofbiggest > smallest. Scan across each row and/or column to see if thereare specific patterns associated with either set of categories. Identifyany noticeable exceptions and/or outliers. Perform local comparisonsbetween neighbouring cell’s areas, to identify larger than and smaller

303

than relationships and estimate the relative proportions. Estimate (orread, if labels are present) the absolute values of specific colour scalesof interest.

PRESENTATION TIPS

ANNOTATION: Direct value labelling is possible, otherwise a clearlegend to indicate colour associations will suffice.

COLOUR: Sometimes multiple different colour hues may be used tosubdivide the quantitative values into further distinct categoricalgroups. Decisions about how many colour-scale levels and whatintervals each relates to in value ranges will affect the patterns thatemerge. There is no single right answer – you will arrive at it largelythrough trial and error/experimentation – but it is important to consider,especially when you have a diverse distribution of values.

COMPOSITION: Logical sorting (and maybe even sub-grouping) ofthe categorical values along each axis will aid readability and may helpsurface key relationships.

VARIATIONS & ALTERNATIVES

A ‘radial heat map’ offers a structure variation whereby the table maybe portrayed using a circular layout. As with any radial display this isonly really of value if the cyclical ordering means something for thesubject matter. A variation would see colour shading replaced by ameasure of pattern density, using a scale of ‘packedness’ to indicateincreasing quantitative values. An alternative approach would be the‘matrix chart’ using size of a shape to indicate the quantitative or arange of point marker to display categorical characteristics.

Charts Connections

Matrix chart

304

ALSO KNOWN AS Table chart

REPRESENTATION DESCRIPTION

A matrix chart displays quantitative values at the intersection betweentwo categorical dimensions. The chart comprises two categorical axeswith each possible value presented across the row and column headersof a table layout. Each corresponding cell is then marked by ageometric shape with its area sized to represent a quantitative value andcolour often used visually to distinguish a further categoricaldimension. While they are most commonly seen using circles, you canuse other proportionally sized shapes.

EXAMPLE Exploring the perceived difficulty of fixtures across theseason for teams in the premier league 2013–14.

Figure 6.34 Interactive Fixture Molecules

HOW TO READ IT & WHAT TO LOOK FOR

Learn what each categorical dimension relates to and make a note of therange of values in each case, paying attention to the significance of anyordering. Establish the quantitative size associations for the area marksand look up any colour associations being used, both usually found viaa legend. Glance across the entire chart to locate the big, small andmedium areas and perform global comparisons to establish the high-level ranking of biggest > smallest. Scan across each row and/or columnto see if there are specific patterns associated with either set ofcategories. Identify any noticeable exceptions and/or outliers. Performlocal comparisons between neighbouring circular areas, to identifylarger than and smaller than relationships and estimate the relativeproportions. Estimate (or read, if labels are present) the absolute values

305

of specific geometric areas of interest.

PRESENTATION TIPS

ANNOTATION: Direct value labelling is possible, otherwise be sureto include a clear size legend. Normally this will be more than sufficientas the reader may simply be looking to get a gist of the order ofmagnitude.

COLOUR: If colours are being used to distinguish the differentcategories, ensure these are as visibly different as possible.

COMPOSITION: If there are large outlier values there may beoccasions when the size of a few circles outgrows the cell it occupies.You might editorially decide to allow this, as the striking shape maycreate a certain impact, otherwise you will need to limit the largestquantitative value to be represented by the maximum space availablewithin the table’s cell layout. Logical sorting (and maybe even sub-grouping) of the categorical values along each axis will aid readabilityand may help surface key relationships. The geometric accuracy of thecircle size calculations is paramount. Mistakes are often made withcircle size calculations: it is the area you are modifying, not thediameter/radius.

VARIATIONS & ALTERNATIVES

A variation may be to remove the quantitative attribute of the areamarker, replacing it with a point marker to represent a categorical statusto indicate simply a yes/no observation through the presence/absence ofa point or through the quantity of points to represent a total. Anapplication of this might be in calendar form whereby a marker in adate cell indicates an instance of something. It could also employ abroader range of different categorical options; in practice any kind ofmarker (symbol, colour, photograph) could be used to show acharacteristic of the relationship at each coordinate cell. An alternativemight be the ‘heat map’ which colour-codes the respective cells toindicate a relationship based on a quantitative measure.

Charts Connections

306

Node–link diagram

ALSO KNOWN AS Network diagram, graph, hairballs

REPRESENTATION DESCRIPTION

Node–link diagrams display relationships through the connectionsbetween categorical ‘entities’. The entry-level version of this type ofdiagram displays entities as nodes (represented by point marks andusually including a label) with links or edges (represented by lines)depicting the existence of connections. The connecting lines will oftendisplay an attribute of direction to indicate the influencer relationship.In some versions a quantitative weighting is applied to the showrelationship strength, maybe through increased line width. Replacingpoint marks with a geometric shape and using attributes of size andcolour is a further variation. Often the complexity seen in these displaysis merely a reflection of the underlying complexity of the subject and/orsystem upon which the data is based, so oversimplifying cancompromise the essence of such content.

EXAMPLE Exploring the connections of voting patterns forDemocrats and Republicans across all members of the US House ofRepresentatives from 1949 to 2012.

Figure 6.35 The Rise of Partisanship and Super-cooperators in the U.S.

307

HOW TO READ IT & WHAT TO LOOK FOR

The first thing to consider is what entity each node (point or circulararea) represents and what the links mean in relationship terms. Theremay be several other properties to acquaint yourself with, includingattributes like the size of the node areas, the categorical nature ofcolouring, and the width and direction of the connections. Across thegraph you will mainly be seeking out the clusters that show the nodeswith the most relationships (representative of influencers or hubs) andthose without (including outliers). Small networks will generally enableyou to look closely at specific nodes and connections and easily see theemerging relationships. When datasets are especially large, consistingof thousands of nodes and greater numbers of mutual connections, thedisplays can seem overwhelmingly cluttered and will be too dense tomake many detailed observations at node–link level. Instead, just relaxand know that your readability will be about a higher level sense-making of the clusters/hubs and main outliers.

PRESENTATION TIPS

INTERACTIVITY: Node–link diagrams are particularly useful whenoffered with interactive features, enabling the user to interrogate andmanipulate the display to facilitate visual exploration. The option toapply filters to reduce the busy-ness of the visual and enable isolation ofindividual node connections helps users to focus on specific parts of thenetwork of interest.

308

ANNOTATION: The extent of annotated features tends to be throughthe inclusion of value labels for each entity. Accommodating therelative word sizes on each node can be difficult to achieve with realelegance (once again that is where interactivity adds value, through theselect/mouseover event to reveal the label).

COLOUR: Aside from the possible categorical colouring of each node,decisions need to be made about the colour of the connecting lines,especially on deciding how prominent these links will be in contrast tothe nodes.

COMPOSITION: Composition decisions are where most of thepresentation customisation exists. There are several commonalgorithmic treatments used to compute custom arrangements tooptimise network displays, such as force-directed layouts (using thephysics of repulsion and springs to amplify relationships) andsimplifying techniques (such as edge bundling to aggregate/summarisemultiple similar links).

VARIATIONS & ALTERNATIVES

There are many derivatives of the node–link diagram, as explained,based on variations in the use of different attributes. ‘Hive plots’ and‘BioFabric’ offer alternative approaches based on replacing nodes withvertices.

Charts Connections

Chord diagram

ALSO KNOWN AS Radial network diagram, arc diagram (wrongly)

REPRESENTATION DESCRIPTION

A chord diagram displays relationships through the connectionsbetween and within categories. They are formed around a radial display

309

with different categories located around the edge: either as individualnodes or proportionally sized segments (arcs) of the circumferenceaccording to a part-to-whole breakdown. Emerging inwards from eachorigin position are curved lines that join with other related categoricallocations around the edge. The connecting lines are normallyproportionally sized according to a quantitative measure and adirectional or influencing relationship is often indicated. The perceivedreadability of the chord diagram will always be influenced by thequantity and range of values being plotted. Small networks will enable areader to look closely at specific categories and their connections to seethe emerging relationships easily; larger systems will look busy throughthe network of lines but they can still provide windows into complexnetworks of influence. Often the complexity seen in these displays ismerely a reflection of the underlying complexity of the subject and/orsystem upon which the data is based, so oversimplifying cancompromise the essence of such content.

EXAMPLE Exploring the connections of migration between andwithin 10 world regions based on estimates across five-year intervalsbetween 1990 and 2010.

Figure 6.36 The Global Flow of People

310

HOW TO READ IT & WHAT TO LOOK FOR

First determine how categories are displayed around the circumference,either as nodes or part-to-whole arcs, and identify each oneindividually. Consider the implication of the radial sorting of thesecategorical values and, if based on part-to-whole sizes, establish a senseof the largest > smallest arc lengths. Colour-coding may be applied tothe categories so note any associations. Look inside the display todetermine what relationships the connecting lines represent and checkfor any directional significance. Look closer at the tangled collection oflines criss-crossing this space, noting the big values (usually throughline weight or width) and the small ones. Avoid being distracted by thedistance a line travels, which is just a by-product of the outercategorical arrangement: a long connecting line is just as significant arelationship as a short one. For this reason, pay close attention to anyconnecting lines that have very short looping distances to adjacentcategories. Are there any patterns of lines heading towards or leavingcertain categories?

311

PRESENTATION TIPS

INTERACTIVITY: Chord diagrams are particularly useful whenoffered with interactive features, enabling the user to interrogate andmanipulate the display to facilitate visual exploration. The option toapply filters to reduce the busy-ness of the visual and enable isolation ofindividual node connections helps users to focus on specific parts of thenetwork of interest.

ANNOTATION: Annotated features tend to be limited to valuelabelling of the categories around the circumference and, occasionally,directly onto the base or ends of the connecting lines (usually just thosethat are large enough to accommodate them).

COLOUR: Aside from the categorical colouring of each node,decisions need to be made about the colour of the connecting lines,especially on deciding how prominent these links will be in contrast tothe nodes. Sometimes the connections will match the origin ordestination colours, or they will combine the two (with a start and endcolour to match the relationship).

COMPOSITION: The main arrangement decisions come throughsorting, firstly by generating as much logical meaning from thecategorical values around the edge of the circle and secondly bydeciding on the sorting of the connecting lines in the z-dimension – ifmany lines are crossing, there is a need to think about which will be ontop and which will be below. Showing the direction of connections canbe difficult as there is so little room for manoeuvring many more visualattributes, such as arrows or colour changes. One common, subtlesolution is to pull the destination join back a bit, leaving a small gapbetween the connecting line and the destination arc. This then contrastswith connecting lines that emerge directly from the categorical arcs,showing it is their origin.

VARIATIONS & ALTERNATIVES

The main alternatives would be to consider variations of the ‘node–linkdiagram’ or, specifically, the ‘arc diagram’, which offers a furthervariation on the theme of networked displays, placing all the nodesalong a baseline and forming connections using semi-circular arcs,rather than using a graph or radial layout.

Charts Connections

312

Sankey diagram

ALSO KNOWN AS Alluvial diagram

REPRESENTATION DESCRIPTION

Sankey diagrams display categorical composition and quantitativeflows between different categorical dimensions or ‘stages’. The mostcommon contemporary form involves a two-sided display, with eachside representing different (but related) categorical dimensions ordifferent states of the same dimension (such as ‘before and after’). Oneach side there is effectively a stacked bar chart displayingproportionally sized and differently coloured (or spaced apart)constituent parts of a whole. Curved bands link each side of the displayto represent connecting categories (origin and destination) with theproportionally sized band (its thickness) indicating the quantitativenature of this relationship. Some variations involve multiple stages andmight present attrition through the diminution size of subsequent stacks.Traditionally the Sankey has been used as a flow diagram to visualiseenergy or material usage across engineering processes. It is closelyrelated to the ‘alluvial diagram’, which tends to show changes incomposition and flow over time, but the Sankey label is often applied tothese displays also.

EXAMPLE Exploring the seat changes among political partiesbetween the 2010 and 2015 UK General Elections.

Figure 6.37 UK Election Results by Political Party, 2010 vs 2015

313

HOW TO READ IT & WHAT TO LOOK FOR

Based on the basic two-sided version of the Sankey diagram, look downboth sides of the chart to learn what states are represented and what theconstituent categories are. Pay close attention to the categorical sortingand pick out the large and small values on each side. Then look at theconnecting lines, making observations about the largest and narrowestbands and noting any that seem to be mostly redistributed into adifferent category compared to those that just join with the same. Noticeany small break-off bands that seem to cross the height of the wholechart, perhaps representing a more dramatic change or diversionbetween states. As with most network-type visualisations, the perceivedreadability of the Sankey diagram will always be influenced by thequantity and range of values being plotted, as well as the number ofdifferent states presented.

PRESENTATION TIPS

INTERACTIVITY: Sankey diagrams are particularly useful whenoffered with interactive features, enabling the user to interrogate andmanipulate the display to facilitate visual exploration. The option toapply filters to reduce the busy-ness of the visual and enable isolation of

314

individual node connections helps users to focus on specific parts of thenetwork of interest.

ANNOTATION: Annotated features tend to be limited to valuelabelling of the categories that make up each ‘state’ stack.

COLOUR: Colouring is often used visually to indicate the categoriesof the connecting bands, though it can get a little complicated whentrying to combine a sense of change through an origin category colourblending with a destination category colour when there has been aswitch.

COMPOSITION: The main arrangement decisions come throughsorting, firstly by generating as much logical meaning from thecategorical values within the stacks and, secondly, by deciding on thesorting of the connecting lines in the z-dimension – if many lines arecrossing, there is a need to think about which will be on top and whichwill be below. There is no significant difference between a landscape orportrait layout, which will depend on the subject matter ‘fit’ and thespace within which you have to work. Try to ensure that the sorting ofthe categorical dimensions is as logical and meaningful as possible.

VARIATIONS & ALTERNATIVES

The concept of a Sankey diagram showing composition and flow canalso be mapped onto a geographical projection as one of the variationsof the ‘flow map’. You could use a ‘chord diagram’ as an alternative toshow how larger networks are composed proportionally and in theirconnections. Showing how component parts have changed over timecould just be displayed using a ‘stacked area chart’. A ‘funnel chart’ is amuch simplified display to show how a single value changes (usuallydiminishing) across states, for topics like sales conversion. This often isbased on a funnel-like shape formed by a wide bar at the top (thoseentering the system) and then gradually narrower bars, stage by stagetowards the end state.

Charts Trends

315

Line chart

ALSO KNOWN AS Fever chart, stock chart

REPRESENTATION DESCRIPTION

A line chart shows how quantitative values for different categories havechanged over time. They are typically structured around a temporal x-axis with equal intervals from the earliest to latest point in time.Quantitative values are plotted using joined-up lines that effectivelyconnect consecutive points positioned along a y-axis. The resultingslopes formed between the two ends of each line provide an indicationof the local trends between points in time. As this sequence is extendedto plot all values across the time frame it forms an overall linerepresentative of the quantitative change over time story for a singlecategorical value. Multiple categories can be displayed in the sameview, each represented by a unique line. Sometimes a point (circle/dot)is also used to substantiate the visibility of individual values. The linesused in a line chart will generally be straight. However, sometimescurved line interpolation may be used as a method of estimating valuesbetween known data points. This approach can be useful to helpemphasise a general trend. While this might slightly compromise thevisual accuracy of discrete values if you already have approximations,this will have less impact.

EXAMPLE Showing changes in percentage income growth for the Top1% and Bottom 90% of earners in the USA between 1917 and 2012.

Figure 6.38 The Fall and Rise of U.S. Inequality, in 2 Graphs

316

HOW TO READ IT & WHAT TO LOOK FOR

Firstly, learn about the axes: what is the time period range presented onthe x-axis (and in what order) and what is the range of quantitativevalues shown on the y-axis, paying particular attention to the originvalue (which may not be zero)? Inside the chart, determine whatcategories each line represents: for single lines this will usually be clearfrom the chart title, for multiple lines you might have direct labelling ora legend to learn colour associations. Think about what high and lowvalues mean: is it ‘good’ to be large/small, increasing or decreasing?Glance at the general patterns (especially if there are many) looking forobservations such as any trends (short or long term), any suddenmoments of a rise or fall (V- or W -shapes, or inverted), any sense ofseasonal or cyclical patterns, any points of interest where lines crosseach other or key thresholds that are reached/exceeded. Can youmentally extrapolate from the values shown any sense of a forecastedtrend? Avoid jumping to spurious interpretations if you see two lineseries following a similar pattern; this does not necessarily mean thatone thing has caused the other, it might just be coincidence. Then lookmore closely at categories of interest and at patterns around specificmoments in time, and pick out the peak, low, earliest and latest valuesfor each line. Where available, compare the changing quantities againstannotated references such as targets, forecast, previous time periods,range bands, etc.

PRESENTATION TIPS

317

INTERACTIVITY: Interactivity may be especially helpful if you havemany categories and wish to enable the user to isolate (in focus terms) acertain line category of interest.

ANNOTATION: Chart apparatus devices like tick marks and gridlinesin particular can be helpful to increase the accuracy of the reading ofthe quantitative values. If you have axis labels you should not needdirect labels on each value point – this will be label overload. Youmight choose to annotate specific values of interest (highest, lowest,specific milestones). Think carefully about what is the most useful andmeaningful interval for your time axis labelling. When severalcategories are being shown, if possible, try directly to label thecategories shown by each line, maybe at the start or end position.

COLOUR: When many categories are shown it may be that onlycertain emphasised lines of interest possess a colour and a label – therest are left in greyscale for context.

COMPOSITION: Composition choices are mostly concerned with thechart’s dimensions: its aspect ratio, how high and wide to make it. Thesequencing of values tends to be left to right for the sequence of thetime-based x-axis and low rising to high values on the y-axis; you willneed a good (and clearly annotated) reason to break this convention.Line charts do not always need the y-axis to start at zero, as we are notjudging the size of a bar, rather the position along an axis. You shouldexpect to see a zero baseline if zero has some critical significance in theinterpretation of the trends. If your y-axis origin is not going to be zero,you might include a small gap between the x-axis and the minimum sothat it is not implied. Be aware that the upward and downward trends ona line chart can seem more significant if the chart width is narrow andless significant if it is more stretched out. There is no single rule tofollow here but a useful notion involves ‘banking to 45°’ whereby theaverage slope angle across your chart heads towards 45°. While it isimpractical to actually measure this, judging by eye tends to be morethan sufficient.

VARIATIONS & ALTERNATIVES

Variations of the line chart may include the ‘cumulative line chart’ or‘step chart’. ‘Spark lines’ are mini line charts that aim to occupy almostonly a word’s length amount of space. Often seen in dashboards wherespace is at a premium and there is a desire to optimise the density of thedisplay. ‘Bar charts’ can also be used to show how values look overtime when there is perhaps greater volatility in the quantitative valuesacross the time period and when the focus is on the absolute values at

318

each point in time, more so than trends. Sometimes a line chart canshow quantitative trends over continuous space rather than time. Forshowing ranking over time, consider the ‘bump chart’, and for beforeand after comparisons, the ‘slope graph’.

Charts Trends

Bump chart

ALSO KNOWN AS

REPRESENTATION DESCRIPTION

A bump chart shows how quantitative rankings for categories havechanged over time. They are typically structured around a temporal x-axis with equal intervals from the earliest to latest point in time.Quantitative rankings are plotted using joined-up lines that effectivelyconnect consecutive points positioned along a y-axis (typically top =first). The resulting slopes formed between the two ends of each lineprovide an indication of the local ranking trends between points in time.As this sequence is extended to plot all values across the time frame itforms an overall line representative of the ranking story for a singlecategorical value. Multiple categories are often displayed in the sameview, showing how rankings have collectively changed over time.Sometimes a point (circle/dot) mark is also used to substantiate theconnected visibility of category lines, as is colour (for the lines and/orthe points).

EXAMPLE Showing changes in rank of the most populated US citiesat each census between 1790 and 1890.

Figure 6.39 Census Bump: Rank of the Most Populous Cities at EachCensus, 1790—1890

319

HOW TO READ IT & WHAT TO LOOK FOR

Firstly, you need to learn about the axes. What is the time period rangepresented on the x-axis (and in what order)? What are the range ofquantitative rankings shown on the y-axis (check that the ranks start at 1from the top downwards)? Inside the chart, determine what categorieseach line represents: this might be explained through direct labelling, acolour legend, interactivity or through differentiating point markerattributes of colour/shape/pattern. Think about what high and low ranksmean: is it ‘good’ to be high up the rankings and is it better to bemoving up or down? Consider the general patterns to look forobservations such as consistent trends (largely parallel lines) orcompletely non-relational patterns (lines moving in all directions). Arethere any prominent stories of categories that have had a sudden rise orfall (V- or W-shapes, or inverted)? Is there any evidence of seasonal orcyclical patterns, any key points of interest where lines cross each otheror key thresholds that are reached/exceeded? Next, look more closely atcategories of interest and at patterns around specific moments in time,and pick out the peak, low, earliest and latest values for each line.

PRESENTATION TIPS

INTERACTIVITY: Interactivity is usually necessary with bumpcharts, especially if you have many categories and wish to enable theuser to isolate (in focus terms) a certain line category of interest.

320

ANNOTATION: The ranking labels can be derived from the verticalposition along the scale so direct labelling is usually unnecessary. Youmight choose to annotate specific values of interest (highest, lowest,specific milestones). Think carefully about what is the most useful andmeaningful interval for your time axis labelling.

COLOUR: Often, with many categories to show in the same chart, thebig challenge is to distinguish each line, especially as they likely criss-cross often with others. Using colour association can be useful for lessthan 10 categories, but for more than that you really need to offer theinteractivity or maybe decide that only certain emphasised lines ofinterest will possess a colour and the rest are left in greyscale forcontext.

COMPOSITION: The sequencing of values tends to be left to right forthe sequence of the time-based x-axis with high rankings (low number)on the y-axis moving downwards. You will therefore need a good (andclearly annotated) reason to break this convention.

VARIATIONS & ALTERNATIVES

Alluvial diagrams (similar to Sankey diagrams) can show how rankingshave changed over time while also incorporating a component ofquantitative magnitude. This approach is effectively merging the ‘bumpchart’ with the ‘stacked area chart’. Consider ‘line charts’ and ‘areacharts’ if the ranking is of secondary interest to the absolute values.

Charts Trends

Slope graph chart

ALSO KNOWN AS Slope chart

REPRESENTATION DESCRIPTION

A slope graph shows a ‘before and after’ display of changes in

321

quantities for different categories. The display is based on (typically)two parallel quantitative axes with a consistent scale range to cover allpossible quantitative values. A line is plotted for each categoryconnecting the two axes together with the vertical position on each axisrepresenting the respective quantitative values. Sometime a dot is alsoused to further substantiate the visibility of the value positions. Theseconnecting lines form slopes that indicate the upward, downward orstable trend between points in time. The resulting display incorporatesabsolute values, reveals rank and, of course, shows change betweentime. Colours are often used visually to distinguish different categoricallines, otherwise this can be used to surface visibly the major trend states(up, down, no change). A slope graph works less well when all values(or the majority) are going in the same direction; consider alternatives ifthis is the case.

EXAMPLE Showing changes in the share of power sources across allUS states between 2004 and 2014.

Figure 6.40 Coal, Gas, Nuclear, Hydro? How Your State GeneratesPower

HOW TO READ IT & WHAT TO LOOK FOR

322

Firstly, learn about the axes: what are the two points in time beingpresented and what is the possible range of quantitative values shownon the y-axis, checking that the ranks start from the top down? Insidethe chart, learn what each category line relates to and determine whatcategories each line represents: this might be explained through directlabelling, a colour legend, or through interactivity. Think about whatupward, downward and stable trends mean: is it ‘good’ to be moving upor down? Is it more interesting to show no change? Look at the generalpatterns to observe such things as consistent trends (largely parallellines in either direction) or completely non-relational patterns (linesmoving in all directions). Colour may be used to accentuate thedistinction between upward and downward trends. Are there anyprominent stories of categories that have had a dramatic rise or fall?Even if no values have dramatically altered, that in itself can be animportant finding, especially if change was expected. Next, look moreclosely at categories of interest and pick out the highest and lowestvalues on each side to learn about those stories. Look for the gapswhere there are no values, and at outlier values too, to see if some sitoutside the normal value clusters.

PRESENTATION TIPS

INTERACTIVITY: Depending on the number of category valuesbeing presented, slope graphs can become quite busy, especially if thereare bunches of similar values and slope transitions. This also causes aproblem with accommodating multiple labels on the same value. Onthese occasions you might find interactive slope graphs to helpfilter/exclude certain values.

ANNOTATION: Labelling of each category will get busy, especiallywhen there are shared values, so you might choose to annotate specificvalues of interest (highest, lowest, of editorial interest).

COLOUR: Often when you have many categories to show in the samechart the big challenge is to distinguish each line, especially as theylikely criss-cross often with others. Using colour association can beuseful for less than 10 categories usually with direct labelling on the leftand/or right of the chart.

COMPOSITION: The aspect ratio of the slope graph (height andwidth) will often be determined by the space you have to work with.

VARIATIONS & ALTERNATIVES

Rather than showing a before and after story, some slope graphs are

323

used to show the relationship between different quantitative measuresfor linked categories. In this case the connecting line is not indicative ofa directional relationship, just the relationship itself. An alternativeoption would be the ‘connected dot plot’ which can also show beforeand after stories and is a better option when all values are moving in thesame direction.

Charts Trends

Connected scatter plot

ALSO KNOWN AS Trail chart

REPRESENTATION DESCRIPTION

A connected scatter plot displays the relationship between twoquantitative measures over time. The display is formed by plottingmarks like a dot or circle for each point in time at the respectivecoordinates along two quantitative x- and y-axes. The collection ofindividual points is then connected (think of a dot-to-dot drawingpuzzle) using lines joining each consecutive point in time to form asequence of change. Generally there would only be a single connectedline plotted on a chart to avoid the great visual complexity of overlayingseveral in one display. However, if multiple categories are to beincluded, colour is typically used to distinguish each series.

EXAMPLE Showing changes in the daily price and availability ofSuper Bowl tickets on the secondary market four weeks prior to theevent across five Super Bowl finals.

Figure 6.41 Holdouts Find Cheapest Super Bowl Tickets Late in theGame

324

HOW TO READ IT & WHAT TO LOOK FOR

Learn what each quantitative axis relates to and make a note of therange of values in each case (min to max). Look at what each plottedvalue on the chart refers to in terms of its date label and determine themeaning of line direction. It usually helps to parse your thinking byconsidering what higher/lower values mean for each quantitative axisindividually and then combining the joint meaning thereafter. Try tofollow the chart from the start to the end, mapping out in your mind thesequence of a narrative as the values change in all directions and notingthe extreme values in the outer edges of your line’s reach. Look at theoverall pattern of the connected line: is it consistently moving in onedirection? Does it ebb and flow in all directions? Does it create a spiralshape? Compare consecutive points for a more focused view of changebetween two points.

PRESENTATION TIPS

INTERACTIVITY: The biggest challenge is making the connectionsand the sequence as visible as possible. This becomes much harderwhen values change very little and/or they loop back almost in spiral

325

fashion, crossing back over themselves. It is especially hard to label thesequential time values elegantly. One option to overcome this isthrough interactivity and particularly through animated sequenceswhich build up the display, connecting one line at a time and unveilingthe date labels as time progresses. It is often the case that only oneseries will be plotted. However, interactive options may allow the userto overlay one or more for comparison, switching them on and off asrequired.

ANNOTATION: Connected scatter plots are generally seen as one ofthe most complex chart types for the unfamiliar reader to work out howto read, given the amount of different attributes working together in thedisplay. It is therefore vital that as much help is given to the reader aspossible with ‘how to read’ guides and illustrations of what the differentdirections of change mean.

COLOUR: Colour is only generally used to accentuate certain sectionsof a sequence that might represent a particularly noteworthy stage ofnarrative.

COMPOSITION: As the encoding of the plotted point values is basedon position along an axis, it is not necessary to start the axes from azero baseline – just make the scale ranges as representative as possibleof the range of values being plotted. Ideally a connected scatter plot willhave a 1:1 aspect ratio (equally as tall as it is wide), creating a squaredarea to help patterns surface more evidently. If one quantitative variable(e.g. weight) is likely to be affected by the other variable (e.g. height), itis general practice to place the former on the y-axis and the latter on thex-axis.

VARIATIONS & ALTERNATIVES

The ‘comet chart’ is to the connected scatter plot what the ‘slope graph’is to the ‘line chart’ – a summarised view of the changing relationshipsacross two quantitative values between just two points in time.Naturally a reduced variation of the connected scatter plot is simply the‘scatter plot’ where there is no time dimension or elements ofconnectedness.

Charts Trends

326

Area chart

ALSO KNOWN AS

REPRESENTATION DESCRIPTION

A line chart shows how quantitative values for different categories havechanged over time. They are typically structured around a temporal x-axis with equal intervals from the earliest to latest point in time.Quantitative values are plotted using joined-up lines that effectivelyconnect consecutive points positioned along a y-axis. The resultingslopes formed between the two ends of each line provide an indicationof the local trends between points in time. As this sequence is extendedto plot all values across the time frame it forms an overall linerepresentative of the quantitative change over time story for a singlecategorical value. To accentuate the magnitude of the quantitativevalues and the change through time the area beneath the line is filledwith colour. The height of each coloured layer at each point in timereveals its quantity. Area charts can display values for severalcategories, using stacks, to show also the changing part-to-wholerelationship.

EXAMPLE Showing changes in the average monthly price ($ perbarrel) of crude oil between 1985 and 2015.

Figure 6.42 Crude Oil Prices (West Texas Intermediate), 1985—2015

327

HOW TO READ IT & WHAT TO LOOK FOR

Firstly, learn about the axes: what is the time period range presented onthe x-axis (and in what order) and what is the range of quantitativevalues shown on the y-axis, paying particular attention to whether it is apercentage or absolute based scale? Inside the chart, determine whatcategories each area layer represents: for single areas this will usuallybe clear from the chart title, for multiple areas you might have directlabelling or a nearby legend to learn colour associations. Think aboutwhat high and low values mean: is it ‘good’ to be large/small,increasing or decreasing? Glance at the general patterns (especially ifthere are many layers), looking at the visible ‘thickness’ of the colouredlayers. At what points are the values highest or lowest? When are theygrowing or shrinking as the time axis moves along? If there are multiplecategories, which ones take up the largest and smallest slices of theoverall total? Are there any trends (short or long term), any suddenmoments of a rise or fall, any sense of seasonal or cyclical patterns? Ifthere are multiple categories, look more closely at individual layers ofinterest.

PRESENTATION TIPS

ANNOTATION: Direct labelling of quantitative values will get far toobusy so you might choose to annotate specific values of interest(highest, lowest, specific milestones). Think about the most useful

328

interval for your axis labelling. As ever there is no single rule, so adoptthe Goldilocks principle of not too many, not too few. If you have astacked area chart, try directly to label the category layers shown asclosely as possible (if the heights allow it) or at least ensure any colourassociations are easily identifiable through a nearby legend. Thinkcarefully about what is the most useful and meaningful interval for yourtime axis labelling.

COLOUR: If you are using a stacked area chart, ensure the categoricallayers have sufficiently different colours so that their distinct readingcan be efficiently performed.

COMPOSITION: Similar to the line chart, the area chart’s dimensionsshould ideally utilise an aspect ratio that optimises the readabilitythrough 45° banking (roughly judging the average slope angle). Thesequencing of values tends to be left to right for the sequence of thetime-based x-axis and low rising to high values on the y-axis; you willneed a good (and clearly annotated) reason to break this convention.Unlike the line chart, the quantitative axis for area charts must start atzero as it is the height of the coloured areas under each line that helpsreaders to perceive the quantitative values. Do not have overlappingcategories on the same chart because it makes it very difficult to see(imagine hills behind hills, peaking out and then hiding behind eachother). Rather than stacking categories you might consider using smallmultiples, especially as this will present the respective displays from acommon baseline (and make reading sizes a little easier).

VARIATIONS & ALTERNATIVES

Like area charts, ‘alluvial diagrams’ display proportional stacked layersfor multiple categories showing the absolute value change over time.However, they also show the evolving ranks, switching the relativeordering of each layer of values based on the current magnitude. Somedeployments of the area chart are not plotted over time but overcontinuous dimensions of space, perhaps showing the changing natureof a given quantitative measure along a given route. When you havemany concurrent layers to show and these layers start and stop atdifferent times, a ‘slope graph’ is worth considering.

Charts Trends

329

Horizon chart

ALSO KNOWN AS

EXAMPLE Showing percentage changes in price for selected fooditems in the USA between 1990 and 2015.

REPRESENTATION DESCRIPTION

Horizon charts show how quantitative values for different categorieshave changed over time. They are valuable for showing changes overtime for multiple categories within space-constrained formats (such asdashboards). They are structured around a series of rows each showingchanges in quantitative values for a single category. The temporal x-axis has equal intervals from the earliest to latest point in time.Quantitative values are plotted using joined-up lines that connectconsecutive points positioned along a value y-axis. The resulting slopesformed between the ends of each line provide an indication of the localtrends between two points in time. As this sequence is extended to plotall values across the time frame it forms an overall line representative ofthe quantitative changes. To accentuate the magnitude of thequantitative values the area beneath the line is filled with colour.Negative values are highlighted in one colour, positive values inanother colour. Variations in colour lightness are used to indicatedifferent degrees or bands of magnitudes, with the extremes gettingdarker. Negative value areas are then flipped from underneath thebaseline to above it, joining the positive values but differentiated intheir polarity by colour. Finally, like slicing off layers of a mountain,each distinct threshold band that sits above the imposed maximum y-axis scale is chopped off and dropped down to the baseline, in front ofits foundation base. The final effect shows overlapping layers ofincreasingly darker colour-shaded areas all occupying the same verticalspace with combinations of height, colour and shade representing thevalues.

Figure 6.43 Percentage Change in Price for Select Food Items, Since

330

1990

HOW TO READ IT & WHAT TO LOOK FOR

Firstly, learn about the category rows: what do they represent and inwhat order are they presented? Next, the chart axes: what is the timeperiod range presented on the x-axis (and in what order) and what is therange of quantitative values shown on the y-axis, paying attention towhether it is a percentage or absolute value scale? Next, what are thecolour associations (for positive and negative values) and the differentshaded banding thresholds? Think about what high and low valuesmean: is it ‘good’ to be large/small, increasing or decreasing? Glance atthe general patterns over time, looking at the most visible dark areas ofeach colour polarity: where have values reached a peak in eitherdirection? Maybe then separate your reading between looking at thepositive value insights and then the negative ones: which chunks of

331

colour are increasing in value (darker) or shrinking (getting lighter) asthe time axis moves along? Where can you see most empty space,indicating low values? Are there any trends (short or long term), anysudden moments of a rise or fall, any sense of seasonal or cyclicalpatterns, any points of interest where lines cross each other or keythresholds that are reached/exceeded? Then look more closely atcategories of interest, assessing their own patterns around specificmoments in time and picking out the peak, low, earliest and latestvalues for each row.

PRESENTATION TIPS

ANNOTATION: The decisions around annotations are largely reducedto labelling the category rows. Such is the busy-ness of the chart areasthat any direct labelling is going to clutter the display too much: horizoncharts are less about precise value reading and more about getting asense of the main patterns, so avoid the temptation to over-label. Thinkcarefully about what is the most useful and meaningful interval for yourtime axis labelling.

COLOUR: Colour decisions mainly concern the choices of quantitativescale bandings to show the positive and negative value ranges.

COMPOSITION: The height of the chart area in which you canaccommodate a single row of data will have an influence on the entireconstruction of the horizon chart. It will often involve an iterative/trialand error process, looking at the range of quantitative values acrosseach category, establishing the most sensible and meaningful thresholdswithin these range and then fixing the y-axis scales accordingly. Try toensure the sorting of the main categorical rows is as logical andmeaningful as possible.

VARIATIONS & ALTERNATIVES

An alternative to the horizon chart is the entry-level single category‘area chart’, which does not suffer the same constraints of restrictions tothe vertical scale. For space-constrained displays, ‘spark lines’ wouldoffer an option suitable to such situations and easily accommodatemultiple category displays.

Charts Trends

332

Stream graph

ALSO KNOWN AS Theme river

REPRESENTATION DESCRIPTION

A stream graph shows how quantitative values for different categorieshave changed over time. They are generally used when you have manyconstituent categories at any given point in time and these categoriesmay start and stop at different points in time (rather than continuethroughout the presented time frame). As befitting the name, theirappearance is characterised by a flowing, organic display of meanderinglayers. They are typically structured around a temporal x-axis withequal intervals from the earliest to latest point in time. Quantitativevalues are plotted using joined-up lines that effectively connectconsecutive points to quantify the height above a local baseline, whichis not a stable zero baseline but rather a shifting shape formed out ofother category layers. To accentuate the size of the category’s height atany given point the area beneath the line is filled with colour. Theheight of each coloured layer at each point in time reveals its quantity.This colour is often used to further represent a quantitative value scaleor to associate with categorical colours. The stacking arrangement ofthe different categorical streams goes above and below the central axisline to optimise the layout but not with any implication of polarity.

EXAMPLE Showing changes in the total domestic gross takings ($US)and the longevity of all movies released between 1986 and 2008.

Figure 6.44 The Ebb and Flow of Movies: Box Office Receipts 1986—2008

333

HOW TO READ IT & WHAT TO LOOK FOR

Firstly, determine what is the time period presented on the x-axis (andin what order). In most stream graphs you do not see the quantitative y-axis scale because the level of reading is more about getting a gist forthe main patterns in a relative sense rather than an absolute one. Youmight find that the colouring of layers has a quantitative scale orcategorical association so look for any keys. Also, you will often findguides to help estimate the quantitative heights of each layer. Thinkabout what high and low values mean: is it ‘good’ to be large/small,increasing or decreasing? Glance at the general patterns over time.Remember that above or below means nothing in the sense of polarityof values, so your focus is on the entirety of the collective shape. Lookfor the largest peaks and the shallowest troughs, possible seasonalpatterns or the significant moments of change. Note where thesepatterns occur in relation to the timescale. Can you see any prominentlytall (big values) or wide (long-duration) layers? Notice when layers startand end, noting times when there are many concurrent categories andwhen there are few. Pick out the layers of personal interest and assesstheir patterns over time. Do not spend too much effort trying to estimateprecise values of height, but keep your focus on the bigger picture level.It is often useful to rotate the display so the streams are travellingvertically, offering a different perspective and removing the instinct tosee positive values above and negative values below the central axis.

PRESENTATION TIPS

334

INTERACTIVITY: If interactivity is a possibility, this could enableselection or mouseover events to reveal annotated values at any givenpoint in time or to filter the view.

ANNOTATION: Chart apparatus devices are generally of limited usein a stream graph with the priority on a general sense of pattern morethan precision value reading. Direct labelling of categories is likely tobe quite busy but may be required, at least to annotate the mostinteresting patterns (highest, lowest, specific milestones). Thinkcarefully about what is the most useful and meaningful interval for yourtime axis labelling.

COLOUR: Ensure any colour associations or size guides are easilyidentifiable through a nearby legend.

COMPOSITION: Composition choices are firstly concerned with thelandscape or portrait layout. This will largely be informed by the formatand space of your outputs and the meaning of the data. The streamlayers are often smoothed, giving them an aesthetically organicappearance, both individually and collectively. This is achieved viacurved line interpolation.

VARIATIONS & ALTERNATIVES

The fewer categorical series you have in your data, the more likely astacked ‘area chart’ is going to best-fit your needs. You could considera stacked ‘bar chart’ over time also, but there is less chance ofmaintaining the connected visibility of continuous categorical series viaa singular shape.

Charts Activities

Connected timeline

ALSO KNOWN AS Relationship timeline, storyline visualisations,

335

swim-lane chart

REPRESENTATION DESCRIPTION

A connected timeline displays the duration, milestones and categoricalrelationships across a range of categorical ‘activities’. It represents aparticularly diverse and creative way of showing changes over time andso involves many variations in approach. The structure is generallyformed of time-based quantitative x-axis and categorical y-axis lanes.Each categorical activity will commence at a point in time and fromwithin a vertical category ‘family’. Over time, the line will progress,possibly switching to a different categorical lane position as the natureof the activity alters. The lines may be of fixed width or proportionallyweighted to represent a quantitative measure. Some activity lines maycease, restart or merge with others to build a multi-faceted narrative.Colour can also be used to present further relevant detail. The mainissue with any connected timeline approach is simply the complexity ofthe content and the number of moving parts crossing over the display.As there are many entry points into reading such a timeline there can beinefficiency in the reading process, but this is usually proportionalsimply to the subject at hand and you may not wish to see these nuancesbeing removed.

EXAMPLE Showing changes in US major college football programmeallegiance to different conferences between 1965 and 2015.

Figure 6.45 Tracing the History of N.C.A.A. Conferences

336

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know what the major categorical ‘lanes’represent and what the range of date values is (min to max). Then try todetermine what each categorical activity line represents. As there are somany derivatives there is no single reading strategy, but generallyglance across the entire chart noting the sequence of the activities; thereis usually a sequential logic attached to their sorting based on the startdate milestone in particular. Follow the narrative from left to right,noting observations about any big, small and medium weighted linesand spotting any moment when they connect with, overlap or detachfrom other activities. Are there any major convergences or divergencesin pattern? Any hubs of dense activity and other sparse moments? Lookfor the length of lines to determine the long, medium and shortdurations of activity. Where available, compare the activities againstannotated references about other key milestone dates that might holdsome significance or influence.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlinesin particular can be helpful to increase the accuracy of the reading ofboth the quantitative values and the activity ‘lanes’, which may becoloured to help recognise divisions between categories. Direct

337

labelling is usually seen in these timelines to help maintain associationsacross the display with the categories of characters or activities, perhapsannotating the consequence or cause of lines merging, etc. Thinkcarefully about what is the most useful and meaningful interval for yourtime axis labelling.

COLOUR: Even if colour does not have a direct association with givenactivities, it can be a useful property to highlight certain features of thenarrative, sometimes acting as a container device to group activitiestogether, even if just for a momentary time period.

COMPOSITION: Where possible, try to make the categorical sortingmeaningful, maybe organising values in ascending/descending sizeorder. The vertical (y) or horizontal (x) sequencing of time will dependon the amount of data to show and the space you have to work with.Also, depending on the narrative, the past > present ordering may bereversed.

VARIATIONS & ALTERNATIVES

There are similarities with the organic nature of the ‘alluvial diagram’,which shows ranking and quantitative change over time for a number ofconcurrent categories. When there are fewer inter-activity relationshipsand more discrete categories are involved, then the ‘Gantt chart’ offersan alternative way of showing this analysis.

Charts Activities

Gantt chart

ALSO KNOWN AS Range chart, floating bar chart

REPRESENTATION DESCRIPTION

A Gantt chart displays the start and finish points and durations fordifferent categorical ‘activities’. The display is commonly used in

338

project management to illustrate the breakdown of a schedule of tasksbut can be a useful device to show any data based on milestone datesand durations. The chart is structured around a time-based quantitativex-axis and a categorical y-axis. Each categorical activity is representedby lines positioned according to the start moment and then stretched outto the finish point. There may be several start/finish durations within thesame activity row. Sometimes points are used to accentuate thestart/finish positions and the line may be coloured to indicate a relevantcategorical value (e.g. separating completed vs ongoing).

EXAMPLE Showing the events of birth, death and period serving inoffice for the first 44 US Presidents.

Figure 6.46 A Presidential Gantt Chart

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know with what major categorical values eachGantt bar is associated and what the range of the date values is (min tomax). Follow the narrative, noting the sequence of the categories –there is usually a sequential sorting based on the start date milestone.Glance across the entire chart and perform global comparisons toestablish the high-level ranking of biggest > smallest durations (based

339

on the length of the line) as well as early and late milestones. Identifyany noticeable exceptions and/or outliers. Perform local comparisonsbetween neighbouring bars to identify proportional differences and anyconnected dependencies. Estimate (or read, if labels are present) theabsolute values for specific categories of interest. Where available,compare the activities against annotated references about other keymilestone dates that might hold some significance or influence.

PRESENTATION TIPS

ANNOTATION: Chart apparatus devices like tick marks and gridlines(or row band-shading) in particular can be helpful to increase theaccuracy of the reading of the start point and duration of activities alongthe timeline. If you have axis labels you may not need direct labels forthe values shown with each duration bar – this will be label overload, sogenerally decide between one or the other. Think carefully about whatis the most useful and meaningful interval for your time axis labelling.

COMPOSITION: There is no significant difference in perceptionbetween vertical or horizontal Gantt charts, though horizontal layoutsare more metaphorically consistent with the concept of reading time.Additionally, these layouts tend to make it easier to accommodate andread the category labels. Where possible, try to sequence the categorical‘activities’ in a way that makes for the most logical reading, eitherorganised by the start/finish dates or maybe the durations (depending onwhich has most relevance).

VARIATIONS & ALTERNATIVES

Variations might involve the further addition of different point markers(represented by combinations of symbols and/or colours) along eachactivity row to indicate additional milestone details, using the ‘instancechart’. An emerging trend in technique terms involves preserving theposition of activity lines adjacent to other concurrent activities, ratherthan fixing them to stay within discrete rows. Sometimes there is muchmore fluidity and less ‘discreteness’ in the relationships betweenactivity, so approaches like the ‘connected timeline’ may be morefitting.

Charts Activities

340

Instance chart

ALSO KNOWN AS Milestone map, barcode chart, strip plot

REPRESENTATION DESCRIPTION

An instance chart displays individual moments or instances ofcategorical ‘activities’. There are many variations in approach for thiskind of display but generally you will find a structure based on a time-based quantitative x-axis and a categorical y-axis. For each categoricalactivity, instances of note are represented by different point markersthat indicate along the timeline when something has happened. Thepoint markers may have different combinations of symbols and coloursto represent different types of occurrences, but avoid having too manydifferent combinations so that viewers do not have to learn an entirelynew alphabet of meaning.

EXAMPLE Showing the instances of different Avengers charactersappearing in Marvel’s comic book titles between 1963 and 2015.

Figure 6.47 How the ‘Avengers’ Line-up Has Changed Over the Years

341

HOW TO READ IT & WHAT TO LOOK FOR

Look at the axes so you know with what major categorical values eachrow of instances is associated and what the range of the date values is(min to max). Look up any legend that will explain what (if any)associations exist between the instance markers and theircolour/symbol. Glance down the y-axis noting the sequence of thecategories; there is usually a sequential logic attached to their sortingbased on the start date milestone in particular. Follow the narrative,noting observations about the type and frequency of instances beingplotted. Look across the entire chart to locate the headline patterns ofclustering and identify any noticeable exceptions and/or outliers. Lookacross the patterns within each row individually to learn about eachcategory’s dispersal of instances. Look for empty regions where nomarks appear. How do all these patterns relate to the time framedisplayed? Where available, compare the activities against annotatedreferences about other key milestone dates that might hold somesignificance or influence.

PRESENTATION TIPS

ANNOTATION: The main annotation properties will be used to servethe role of explaining the associations between marks and attributes

342

through clear legends/keys.

COMPOSITION: Where possible, try to sequence the categorical‘activities’ in a way that makes for the most logical reading, eitherorganised by the start/finish dates or maybe the durations (depending onwhich has most relevance).

VARIATIONS & ALTERNATIVES

Some variations may see the size of a geometric shape used instead ofjust a point to indicate also a quantitative measure to go with theinstance. The marking of an instance through a ‘when’ moment couldalso be based on data that talks about positional moments within asequence. If the basic activity is reduced to a start/finish moment thenthe ‘Gantt chart’ will be the best-fit option.

Charts Overlays

Choropleth map

ALSO KNOWN AS Heat map

REPRESENTATION DESCRIPTION

A choropleth map displays quantitative values for distinct, definablespatial regions on a map. Each geographic region is represented by apolygonal area based on its outline shape, with each distinct shape thencollectively arranged to form the entire landscape. (Note that most toolsfor mapping have a predetermined reference between a region name andthe dimensions of the regional polygon.) Each area is colour-coded torepresent a quantitative value based on a scale with colour variationintervals that (typically) go from a light tint for smaller values to a darkshade for larger values. Choropleth maps should only be used when thequantitative measure is directly associated with and continuouslyrelevant across the spatial region on which it will be displayed.

343

Similarly, if your quantitative measure is about or related to theconsequence of more people living in an area, interpretations may bedistorting, so consider transforming your data to per capita or per acre(or other spatial denominator) to standardise the analysis accordingly.

EXAMPLE Mapping the percentage change in the populations ofBerlin’s districts across new and native Berliners since the fall of theBerlin Wall.

Figure 6.48 Native and New Berliners — How the S-Bahn RingDivides the City

HOW TO READ IT & WHAT TO LOOK FOR

Acquaint yourself with the geographic region you are presented withand carefully consider the quantitative measure that is beingrepresented. Establish the colour-scale value associations, usually foundvia a legend. Glance across the entire chart to locate the dark, light andmedium shades (generally darker = larger) and perform globalcomparisons to establish the high-level ranking of biggest values >smallest. Identify any noticeable exceptions and/or outliers. Bewaremaking judgements about the significance of prominent large

344

geographical areas: size is an attribute of the underlying region, not thesignificance of the measure displayed. Gradually zoom in your focus toperform increasingly local comparisons between neighbouring regionalareas to identify any noticeable consistencies or inconsistenciesbetween their values. Estimate (or read, if labels are present) theabsolute values of specific regions of interest.

PRESENTATION TIPS

ANNOTATION: Directly labelling the regional areas withgeographical details and the value they hold is likely to lead to toomuch clutter. You might include only a limited number of regionallabels to provide spatial context and orientation.

COLOUR: Legends explaining the colour scales should ideally beplaced as close to the map display as possible. The border colour andstroke width for each spatial area should be distinguishable to define theshape but not so prominent as to dominate attention – usually a subtlegrey- or white-coloured thin stroke will be fine. As well as variation incolour scales, sometimes pattern or textures may add an extra layer ofdetail to the value status of each region. When including a projectedmapping layer image in the background, ensure it is not overlycompeting for visual prominence by making it light in colour andpossibly semi-transparent. Do not include any unnecessary geographicaldetails that add no value to the spatial orientation or interpretation andclutter the display (e.g. roads, building structures).

COMPOSITION: With Earth being a sphere, there are many differentmapping projections for representing the regions of the world on aplane surface. Be aware that the transformation adjustments made bysome map projections can distort the size of regions of the world,inflating their size relative to other regions.

VARIATIONS & ALTERNATIVES

Some choropleth maps may be used to indicate categorical associationrather than quantitative measurements. Alternative thematic mappingapproaches to representing quantitative values might include the‘proportional symbol map’ and the ‘dot density map’. This is a variationthat involves plotting a representative quantity of dots equally (butrandomly) across and within a defined spatial region. The position ofindividual dots is therefore not to be read as indicative of preciselocations but used to form a measure of quantitative density. This offersa useful alternative to the choropleth map, especially when categoricalseparation of the dots through colour is of value. ‘Dasymetric mapping’

345

is similar in approach to choropleth mapping but breaks the constituentregional areas into much more specific, almost custom-drawn, sub-regions to better represent the realities of the distribution of human andphysical phenomena within a given spatial boundary.

Charts Overlays

Isarithmic map

ALSO KNOWN AS Contour map, isopleth map, isochrone map

REPRESENTATION DESCRIPTION

An isarithmic map displays distinct spatial surfaces on a map that sharethe same quantitative classification. All spatial regions (transcendinggeo-political boundaries) that share a certain quantitative value orinterval are formed by interpolated ‘isolines’ connecting points ofsimilar measurement to form distinct surface areas. Each area is thencolour-coded to represent the relevant quantitative value. The scale ofcolour variation intervals differs between deployments but will typicallyrange from a light tint for smaller values to a dark shade for largervalues. An isarithmic map would be used in preference to a choroplethmap when the patterns of data being displayed transcend the distinctregional polygons. They could be used to show temperature bandings orsmoothed regions of political attitudes.

EXAMPLE Mapping the degree of dialect similarity across the USA.

Figure 6.49 How Y’all, Youse and You Guys Talk

346

HOW TO READ IT & WHAT TO LOOK FOR

Acquaint yourself with the geographic region you are presented withand carefully consider the quantitative measure that is beingrepresented. Establish the colour scale value associations, usually foundvia a legend. Glance across the entire chart to locate the dark, light andmedium shades (generally darker = larger) and perform globalcomparisons to establish the high-level ranking of biggest values >smallest. Identify any noticeable exceptions and/or outliers, includingregions that appear in isolation from their otherwise related values andnotable for their position adjacent to very different shaded regions. Notethat any interpolation used to smooth the joins between data points toform organic surfaces will inevitably reduce the precision of thesurfaces in their relationship to land position. Gradually zoom in yourfocus to perform increasingly local comparisons between neighbouringregional areas to identify any noticeable consistencies or inconsistenciesbetween their values. Estimate the absolute values of specific regions ofinterest.

PRESENTATION TIPS

ANNOTATION: Directly labelling the surface areas to show thequantitative value or range they represent will be too cluttered. Youmight include only a limited number of regional labels to provide

347

spatial context and orientation.

COLOUR: Legends explaining the colour scales should ideally beplaced as close to the map display as possible. If using visible contouror boundary lines there is a clear implication of a location being insideor outside the line, so make these lines as prominent in colour aspossible according to the precision of their representation. If thesmoothing of the surface locations has been applied the representationof these areas should similarly avoid looking definitive. You thereforemight consider subtle colour gradation/overlapping between differentregions to capture appropriately the underlying ‘fuzziness’ of the data.As well as colour scales, sometimes pattern or textures may add anextra layer of detail to the value status of each surface region. Whenincluding a projected mapping layer image in the background, ensure itis not overly competing for visual prominence by making it light incolour and possibly semi-transparent. Do not include any unnecessarygeographical details that add no value to the spatial orientation orinterpretation and clutter the display (e.g. roads, building structures).

COMPOSITION: Be aware that the transformation adjustments madeby some map projections can distort the size of regions of the world,inflating their size relative to other regions.

VARIATIONS & ALTERNATIVES

There are specific applications of isarithmic maps used for showingelevation (‘contour maps’), atmospheric pressure (‘isopleth maps’) ortravel–time distances (‘isochrone maps’). Sometimes you might useisarithmic maps to show a categorical status (perhaps even a binarystate) rather than a quantitative scale.

Charts Overlays

Proportional symbol map

348

ALSO KNOWN AS Graduated symbol map

REPRESENTATION DESCRIPTION

A proportional symbol map displays quantitative values for locations ona map. The values are represented via proportionally sized areas(usually circles), which are positioned with the centre mid-point over agiven location coordinate. Colour is sometimes used to introducefurther categorical distinction.

EXAMPLE Mapping the origin and size of funds raised across the 22major candidates running for US President during the first half of 2015.

Figure 6.50 Here’s Exactly Where the Candidates’ Cash Came From

HOW TO READ IT & WHAT TO LOOK FOR

Acquaint yourself with the geographic region you are presented withand carefully consider the quantitative measure that is beingrepresented. Establish the area size value associations, usually found viaa legend. Glance across the entire chart to locate the large, medium andsmall shapes and perform global comparisons to establish the high-levelranking of biggest values > smallest. Identify any noticeable exceptionsand/or outliers. Gradually zoom in your focus to perform increasinglylocal comparisons between neighbouring regional areas to identify anynoticeable consistencies or inconsistencies between their values.Estimate (or read, if labels are present) the absolute values of specific

349

regions of interest. Also note where there are no markers. If colour isbeing used to further break down the categories of the values shown,identify any grouped patterns that emerge.

PRESENTATION TIPS

INTERACTIVITY: Interaction may be helpful to reveal location andvalue labels through selection or mouseover events.

ANNOTATION: Directly labelling the shapes with geographicaldetails and the value they hold is likely to lead to too much clutter. Youmight therefore include only a limited number of regional labels toprovide spatial context and orientation. Legends explaining the sizescales – and any colour associations – should ideally be placed as closeto the map display as possible. Avoid including unnecessarygeographical details that add no value to the spatial orientation orinterpretation and clutter the display (e.g. roads, building structures).

COLOUR: Sometimes the circular shapes are filled, at other times theyremain unfilled. If colours are being used to distinguish the differentcategories, ensure these are as visibly different as possible. When acircle has a large value its shape will transgress well beyond the originof its geographical location, intruding on and overlapping with otherneighbouring values. The use of outline borders and semi-transparentcolours helps with the task of avoiding occlusion (visually hiding valuesbehind others). When including a projected mapping layer image in thebackground, ensure it is not overly competing for visual prominence bymaking it light in colour and possibly semi-transparent.

VARIATIONS & ALTERNATIVES

Variations may see the typical circle replaced by squares andgeographical space replaced by anatomical regions. Alternatives to theproportional symbol map include the ‘choropleth map’, which colour-codes regions, or the ‘dot map’, which uses a dot to represent aninstance of something. Avoid the temptation to turn the circle symbolsinto pie charts; it is not a good look. If you absolutely positively have toshow a part-to-whole relationship, only show two categories, as per therecommended practice for pies.

Charts Overlays

350

Prism map

ALSO KNOWN AS Isometric map, spike map, datascape

REPRESENTATION DESCRIPTION

A prism map displays quantitative values for locations on a map. Thevalues are represented via proportionally sized lines, appearing as 3Dbars, that typically cover a fixed surface area of space and are justextended in height proportionally to represent the quantitative value forthat location. Being able to judge the dimensions of 3D forms in a 2Dview is very difficult, so they are only ever really used to create a gist ofthe profile of values, enabling recognition of the main peaks inparticular.

EXAMPLE Mapping the population of trees for each 180 square km ofland across the globe.

Figure 6.51 Trillions of trees

351

HOW TO READ IT & WHAT TO LOOK FOR

Acquaint yourself with the geographic region you are presented withand carefully consider the quantitative measure that is beingrepresented. Establish the area size value associations, usually found viaa legend. Glance across the entire chart to locate the large, medium andsmall shapes and perform global comparisons to establish the high-levelranking of biggest values > smallest. Identify any noticeable exceptionsand/or outliers. Gradually zoom in your focus to perform increasinglylocal comparisons between neighbouring regional areas to identify anynoticeable consistencies or inconsistencies between their values.Estimate (or read, if labels are present) the absolute values of specificregions of interest. Also note where there are no bars emerging from thesurface.

PRESENTATION TIPS

352

INTERACTIVITY: Ideally prism maps would be provided withinteractive features that allow panning around the map region to offerdifferent viewing angles to overcome the perceptual difficulties ofjudging the dimensions of 3D forms in a 2D view. Without this, smallervalues will be hidden behind the larger forms, just as smaller buildingsare hidden by skyscrapers in a city.

ANNOTATION: Directly labelling the prism shapes is infeasible – atmost you might include only a limited number of labels to providespatial context and orientation against the largest forms. Legendsexplaining the size scales should ideally be placed as close to the mapdisplay as possible.

COLOUR: Most tools that enable this type of mapping will likely havevisual property settings for a faux light effect, helping the physicalshapes to emerge more prominently through light and shadow. Ensurecolour assist in helping the shape of the forms to be as visible aspossible, maybe with opacity to enable smaller values to be not entirelyhidden behind any larger ones. When including a mapping layer imageon the surface, ensure it is not overly competing for visual prominenceby making it light in colour and possibly semi-transparent. Do notinclude any unnecessary geographical details that add no value to thespatial orientation or interpretation and clutter the display (e.g. roads,building structures).

COMPOSITION: Be aware that the transformation adjustments madeby some map projections can distort the size of regions of the world,inflating their size relative to other regions.

VARIATIONS & ALTERNATIVES

Alternatives to the prism map, especially to avoid 3D form, include the‘proportional symbol map’, which uses proportionally sized geometricshapes, and the ‘choropleth map’, which colour-codes regional shapes.

Charts Overlays

353

Dot map

ALSO KNOWN AS Dot distribution map, pointillist map, locationmap, dot density map

REPRESENTATION DESCRIPTION

A dot map displays the geographic density and distribution ofphenomena on a map. It uses a point marker to indicate a categorical‘observation’ at a geographical coordinate, which might be plottinginstances of people, notable sites or incidences. The point marker isusually a filled, small dot. Colour can be used to distinguish categoricalclassifications. Sometimes a dot represents a one-to-one phenomenon(i.e. a single record at that location) and sometimes a dot will representone-to-many phenomena (i.e. for an aggregated statistic whereby thelocation represents a logical mid-point). As the proliferation of GPSrecording devices increases, the accuracy and prevalence of detailedlocation marked incidences are leading to increased potential for thistype of approach. However, think carefully about the potentialsensitivity of directly plotting a phenomenon or data incidence at agiven location.

EXAMPLE Mapping each resident of the USA based on the location atwhich they were counted during the 2010 Census across differentethnicities.

Figure 6.52 The Racial Dot Map

354

HOW TO READ IT & WHAT TO LOOK FOR

Acquaint yourself with the geographic region you are presented withand carefully consider the phenomenon that is being represented.Establish the unit of this measure (is it a one-to-one relationship or one-to-many?) by referring to a legend. If categorical colours have beendeployed, establish the different classifications and associations. Scanthe chart looking for the existence of noticeable clusters as well as thewidely dispersed (and maybe empty) regions. Some of the mostinteresting observations come from individual outliers that stand outseparately from others. Are there any patterns between the presence ofdots and their geographical location? Are there any patterns across thepoints with similar categorical colour? Gradually zoom in your focus toperform increasingly local assessments between neighbouring regionalareas to identify any noticeable consistencies or inconsistenciesbetween their patterns.

PRESENTATION TIPS

INTERACTIVITY: One method for dealing with plotting highquantities of observations is to provide interactive semantic zoomfeatures, whereby each time a user zooms in by one level of focus, theunit quantity represented by each dot decreases, from a one-to-manytowards a one-to-one relationship.

355

ANNOTATION: Direct labelling is not necessary, just provide alimited number of regional labels to offer spatial context andorientation. Legends explaining the dot unit scale and any colourassociations should ideally be placed as close to the map display aspossible.

COLOUR: If colours are being used to distinguish the differentcategories, ensure these are as visibly different as possible. Whenincluding a mapping layer image in the background, ensure it is notoverly competing for visual prominence by making it light in colourand possibly semi-transparent. Do not include any unnecessarygeographical details that add no value to the spatial orientation orinterpretation and clutter the display (e.g. roads, building structures).

COMPOSITION: Dot maps must always be displayed on a map thatdemonstrates an equal-area projection as the precision of the plottedlocations is paramount. From a readability perspective, try to find abalance between making the size of the dots small enough to preservetheir individuality but not too tiny to be indecipherable.

VARIATIONS & ALTERNATIVES

A ‘dot density map’ is a variation that involves plotting a representativequantity of dots equally (but randomly) across and within a definedspatial region. The position of individual dots is therefore not to be readas indicative of precise locations but used to form a measure ofquantitative density. This offers a useful alternative to the choroplethmap, especially when categorical separation of the dots through colouris of value. Plotting the location of an incidence of a phenomenon cantranscend geographical mapping to any spatial display, such as the seatlayout and availability at a theatre or on a flight, or showing the keypatterns of play across a sports pitch.

Charts Overlays

Flow map

356

ALSO KNOWN AS Connection map, route map, stream map, particleflow map

REPRESENTATION DESCRIPTION

A flow map shows the characteristics of the movement or flow of aphenomenon across spatial regions. It is often formed using line marksto map flow and combinations of attributes to display the characteristicsof this flow. Examples might include the patterns of traffic and travelacross or between given routes, the dynamics of the patterns of weather,or the movement patterns of people or animals. There is no fixedtemplate for a flow map but it generally displays characteristics oforigin and destination (positions on a map), route (using organic orvector paths), direction (arrow or tapered line width), categoricalclassification (colour) and some quantitative measure (line weight ormotion speed).

EXAMPLE Mapping the average number of vehicles using HongKong’s main network of roads during 2011.

Figure 6.53 Arteries of the City

357

HOW TO READ IT & WHAT TO LOOK FOR

Acquaint yourself with the geographic region you are presented withand carefully consider the phenomenon that is being displayed.Establish the association of all visible attributes to understand fully theirclassification and representation, such as the use of quantitative scales(colour, line size or width) or categorical associations (colour). Scan thechart looking for the existence of patterns of movement, maybe throughclustering or common direction, and identify any main hubs anddensities within the network. Find the large and the small, the dense andthe sparse, and draw out any patterns formed by colour classifications.Gradually zoom in your focus to perform increasingly local assessmentsbetween neighbouring regional areas to identify any noticeableconsistencies or inconsistencies between their patterns.

PRESENTATION TIPS

INTERACTIVITY: Animated sequences will be invaluable to conveymotion if the nature of the flow being presented has the relevant physicsof movement.

358

ANNOTATION: Annotation needs will be unique to each approachand the inherent complexity or otherwise of the display. Often thegeneral patterns may offer the sufficient level of readability without theneed for imposing amounts of value labels.

COLOUR: The colour relationship needs careful consideration to getthe right balance between the intricacies of the foreground data layerand the background mapping layer image. Ensure the background is notoverly competing for visual prominence by making it light in colourand possibly semi-transparent. Do not include any unnecessarygeographical details that add no value to the spatial orientation orinterpretation, but do include those features that have a directassociation with the subject matter (such as roads, routes, etc.).

COMPOSITION: Some degree of geographic distortion of routes orconnecting lines may be required practically to display flow data.Choices like interpolation of lines to smooth an activities route or themerging of relatively similar pathways may be entirely legitimate butensure that this is made clear to the reader.

VARIATIONS & ALTERNATIVES

There are naturally many variations in how you might show flow. Itgenerally differs between whether you are showing point A to point B‘connection maps’, more nuanced ‘route maps’ or surface phenomenasuch as ‘particle flow maps’.

Charts Distortions

Area cartogram

ALSO KNOWN AS Contiguous cartogram, density-equalizing map

EXAMPLE Mapping the measures of climate change responsibilitycompared to vulnerability across all countries.

359

REPRESENTATION DESCRIPTION

An area cartogram displays the quantitative values associated withdistinct definable spatial regions on a map. Each geographic region isrepresented by a polygonal area based on its outline shape with thecollective regional shapes forming the entire landscape. (Note that mosttools for mapping have a predetermined reference between a regionname and the dimensions of the regional polygon.) Quantitative valuesare represented by proportionately distorting (inflating or deflating) therelative size of and, to some degree, shape of the respective regionalareas. Traditionally, area cartograms strictly aim to preserve theneighbourhood relationships between different regions. Colour issometimes used to further represent the same quantitative value or toassociate the region with a categorical classification. Area cartogramsrequire the reader to be relatively familiar with the original size andshape of regions in order to be able to establish the degree of relativechange in their proportions. Without this it is almost impossible toassess the degree of distortion and indeed to identify the regionsthemselves.

Figure 6.54 The Carbon Map

HOW TO READ IT & WHAT TO LOOK FOR

Acquaint yourself with the geographic region you are presented withand carefully consider the quantitative measure that is beingrepresented. Establish the quantitative value scales or categoricalclassifications associated with the colour scale, usually found via alegend. Glance across the entire chart to locate the big-, small- andmedium-sized shapes according to their apparent distortion. Identifyany noticeable exceptions and/or outliers. Gradually zoom in your focus

360

to perform increasingly local comparisons between neighbouringregional areas to identify any noticeable consistencies or inconsistenciesbetween their values. Estimate (or read, if labels are present) theabsolute values of specific regions of interest.

PRESENTATION TIPS

INTERACTIVITY: Animated sequences enabled through interactivecontrols can help to better identify instances and degrees of change butusually only over a small set of regions and only if the change isrelatively smooth and sustained. Manual animation will help providemore control over the experience.

ANNOTATION: Directly labelling the regional areas withgeographical details and the value they hold is likely to lead to toomuch clutter. You might include only a limited number of regionallabels to provide spatial context and orientation.

COLOUR: Legends explaining any colour scales should ideally beplaced as close to the map display as possible. The border colour andstroke width for each spatial area should be distinguishable to define theshape but not so prominent as to dominate attention, usually a subtlegrey- or white-coloured thin stroke will be fine.

COMPOSITION: To aid the readability of the size of the distortions, itcan be useful to present a thumbnail view of the undistorted originalgeographical layout to help the readers orient themselves with thechanges.

VARIATIONS & ALTERNATIVES

Unlike contiguous cartograms, non-contiguous cartograms tend topreserve the shape of the individual polygons but modify the size andthe neighbouring connectivity to other adjacent regional polygon areas.The best alternative ways of showing similar data would be to considerusing the ‘choropleth map’ or ‘Dorling cartogram’.

Charts Distortions

361

Dorling cartogram

ALSO KNOWN AS Demers cartogram

REPRESENTATION DESCRIPTION

A Dorling cartogram displays the quantitative values associated withdistinct, definable spatial regions on a map. Each geographic region isrepresented by a circle which is proportionally sized to represent aquantitative value. The placement of each circle generally resembles theregion’s geographic location with general preservation ofneighbourhood relationships between adjacent shapes. Colour is used toassociate the region with a categorical classification.

EXAMPLE Mapping the predicted electoral voting results for eachstate in the 2012 Presidential Election.

Figure 6.55 Election Dashboard

362

HOW TO READ IT & WHAT TO LOOK FOR

Acquaint yourself with the geographic region you are presented withand carefully consider the quantitative measure that is beingrepresented. Establish the quantitative value scales or categoricalclassifications associated with the colour scale, usually found via alegend. Glance across the entire chart to locate the big-, small- andmedium-sized shapes. Identify any noticeable exceptions and/oroutliers. Gradually zoom in your focus to perform increasingly localcomparisons between neighbouring regional areas to identify anynoticeable consistencies or inconsistencies between their values.Estimate (or read, if labels are present) the absolute values of specificregions of interest.

PRESENTATION TIPS

INTERACTIVITY: Interactive features that enable annotation for

363

category and value labelling can be useful to overcome the difficultiesassociated with the geographic distortion.

ANNOTATION: Directly labelling the shapes with geographicaldetails and the value they hold is common, though you might restrictthis to the circles that have sufficient size to hold such annotation.Otherwise you will need to decide how to handle the labelling of smallvalues.

COLOUR: Legends explaining the size scales and colour associationsshould ideally be placed as close to the map display as possible. Ifcolours are being used to distinguish the different categories, ensurethese are as visibly different as possible.

COMPOSITION: Remember that preserving the adjacency withneighbouring regions is important. Dorling cartograms tend not to allowcircles to overlap or occlude, so some accommodation of large valuesmight result in location distortion.

VARIATIONS & ALTERNATIVES

A variation on the approach, called the ‘Demers cartogram’, involvesthe use of squares or rectangles instead of circles, which offers analternative way of connecting adjacent shapes. Other approaches wouldbe through the ‘area cartogram’ and the ‘choropleth map’.

Charts Distortions

Grid map

ALSO KNOWN AS Cartogram, bin map, equal-area cartogram,hexagon bin map

REPRESENTATION DESCRIPTION

A grid map displays the quantitative values associated with distinct

364

definable spatial regions on a map. Each geographic region (or astatistically consistent interval of space, known as a ‘bin’) isrepresented by a fixed-size uniform shape, sometimes termed a ‘tile’.The shapes used tend to be squares or hexagons, though any tessellatingshape would work in theory in order to help arrange all the regionaltiles into a collective shape that roughly fits the real-world geographicaladjacency. Colours are applied to each regional tile either to represent aquantitative value or to associate the region with a categoricalclassification. Note that the mark used for this chart type is a pointrather than an area mark as its size attributes are constant.

EXAMPLE Showing the percentage of household waste recycled ineach council region across London between April 2013 to March 2014.

Figure 6.56 London is Rubbish at Recycling and Many Boroughs areGetting Worse

HOW TO READ IT & WHAT TO LOOK FOR

365

Acquaint yourself with the geographic region you are presented withand carefully consider the quantitative measure that is beingrepresented. Identify the general layout of the constituent tiles todetermine how good a fit they are with their adjacent regions inabsolute and relative geographical terms. Establish the categorical orquantitative classifications associated with the colour scale, usuallyfound via a legend. Glance across the entire chart to locate the big,small and medium shaded tiles (if quantitative) or the main patternsformed by the categorical colouring. Identify any noticeable exceptionsand/or outliers. Gradually zoom in your focus to perform increasinglylocal comparisons between neighbouring regional areas to identify anynoticeable consistencies or inconsistencies between their values.Estimate (or read, if labels are present) the absolute values of specificregions of interest.

PRESENTATION TIPS

INTERACTIVITY: Interactive features that enable annotation forcategory and value labelling can be useful to overcome the difficultiesassociated with the geographic distortion.

ANNOTATION: Directly labelling the shapes with geographicaldetails is usually too hard. Some versions of the ‘grid map’ will includeabbreviated labels, maybe two digits, to indicate the region theyrepresent and to aid orientation. Otherwise it may require interactivityto facilitate such annotations. Legends explaining the colourassociations should ideally be placed as close to the map display aspossible.

COLOUR: If colour is being used to distinguish the differentcategories, ensure they are as visibly different as possible.

COMPOSITION: The main challenge is to find the most appropriateand representative tile–region relationship (what is the right amount andgeographical level for each constituent tile?) and to optimise the best-fitcollective layout that preserves as many of the legitimate neighbouringregions as possible.

VARIATIONS & ALTERNATIVES

‘Hexagon bin maps’ are specific deployments of the grid map that offera layout formed by a high resolution of smaller hexagons to preservelocalised details. Beyond geographical space, the grid map approach isapplicable to any spatial analysis such as in sports.

366

6.3 Influencing Factors and ConsiderationsHaving covered the fundamentals of visual encoding and profiled manychart type options that deploy different encoding combinations you nowneed to consider the general factors that will influence your specificchoices for which chart or charts to use for your data representation.

Choosing which chart type(s) to use is, inevitably, not a single-factordecision. Rather, as ever with data visualisation, it is an imperfect recipemade up of many ingredients. A pragmatic balance has to be foundsomewhere between taking on board the range of influencing factors thatshape selections and not becoming frozen with indecision caused by theburden of having to consider so many different issues.

Firstly, you need to reflect on the relevant factors that emerge from thefirst three ‘preparatory’ stages of the design process and then supplementthis by addressing the guidance offered by the three visualisation designprinciples introduced in Chapter 1. It must be emphasised that there are nodirect answers provided for you here, simply guidance. How you mightresolve the unique challenges posed by your project has to be somethingyou arrive at yourself.

Formulating Your BriefSkills and resources, frequency: What charts can you actually makeand how efficiently can you create them? This is the big question.Having the ability to create a broad repertoire of different chart typesis the vocabulary of this discipline, judging when to use them is theliteracy. What will have a great influence on the ambitions of the typeof charts you might employ is the ‘expressiveness’ of your abilitiesand that of the technology (applications, programs, tools) you haveaccess to. Expressiveness is a term I first heard used in this context byArvind Satyanarayan, a Computer Science PhD candidate at StanfordUniversity. It describes the amount of variety and extent of controlyou are provided with by a given technology in the construction ofyour visualisation solution, so long as you also possess the necessaryskills to exploit such features, of course:

In a data representation context, maximum expressiveness meansyou can create any combination of mark and attribute encoding

367

to display your data – that is, you can create many differentcharts. Programming libraries like D3.js and open source toolslike R offer broad libraries of different chart options andcustomisations. The drawing-by-hand nature of Adobe Illustratorwould similarly enable you to create a wide range of solutions(though unquestionably more manual in effort and lessreplicable).Restricted expressiveness means you have much more limitedscope to adapt different mark and attribute encodings. Indeedyou might be faced with assigning data to the fixed encodingoptions afforded by a modest menu of chart types. A tool likeExcel has a relatively limited range of (useful) chart types in itsmenu. While there are ways of enhancing the options throughplugins and different ‘workaround’ techniques that broaden itsscope, it is a relatively limited tool. It may, however, suffice formost people’s visualisation ambitions. Elsewhere, there aremany web-based visualisation creation tools which are of valuefor those who want quick and simple charting, though theycertainly reduce the range of options and the capability tocustomise their appearance.

‘The capability to cope with the technological dimension is a keyattribute of successful students: coding – more as a logic and a mindsetthan a technical task – is becoming a very important asset for designerswho want to work in Data Visualization. It doesn’t necessarily meanthat you need to be able to code to find a job, but it helps a lot in thedesign process. The profile in the (near) future will be a hybrid one,mixing competences, skills and approaches currently separated intodisciplinary silos.’ Paolo Ciuccarelli, discussing students on hisCommunication Design Master Programme at Politecnico diMilano

As you reflect on the gallery of charts, my advice would be to perform anassessment of the charts you can make using a scoring system as follows:

368

For any of the charts that fail to score 3 points, here are somestrategies to dealing with this:

Tools are continually being enhanced. The applications you usenow that cannot create, for example, a Sankey diagram, maywell offer that in the next release. So wait it out!For those charts that currently score 1 or 0 points, look aroundthe web for examples of workaround approaches that will helpyou achieve them. For example, you might use conditionalformatting in an Excel worksheet to create a rudimentary heatmap. This is not a chart type offered as standard within the toolbut represents an innovative solution through appropriatingexisting features intended to serve other purposes. Any suchsolutions, though, have to be framed by the frequency of yourwork – will this work realistically need to be replicable andrepeatable (for example, every month) and does my solutionmake that achievable?Invest time in developing skills in the other tools to broadenyour repertoire. Tools like R have a large community of userssharing code, tutorials and examples, resources that wouldgreatly help to facilitate your learning.Lower your ambitions. Sometimes the most significant disciplineto demonstrate is acknowledging what you cannot do andaccepting that (at least, for now) you might need to sacrifice theideal choices you would make for more pragmatic ones.

Purpose: Should you even seek to represent you data in chart form? Willit add any value, enabling new insights or greater perceptual efficiencycompared with its non-visualised form? Will portraying your data via anelegantly presented table, offering the viewer the ability to look up andreference values, actually offer a more suitable solution? Do not rule outthe value of a table. Additionally, perhaps you are trying to representsomething in chart form that would actually be better displayed through

369

information-based (rather than data-based) explanations using imagery,textual anecdotes, video and photos? Most of the time the charting of datawill be fit for purpose, but just keep reminding yourself that you do nothave to chart everything – just make sure you are doing it to add value.

‘I was in the middle of this huge project, juggling as fast and as focusedas I could, and I had this idea of a set of charts stuck in my head thatkept resurfacing. And then, as we were heading close to deadline, Irealized I couldn’t do it. I failed. I couldn’t make it work. Because wehad pictures of the children, and that was enough … I had to let it go.’Sarah Slobin, Visual Journalist, discussing a project profiling agroup of families with children who have a fatal disease

Purpose map: In defining the ‘tone’ of the project, your were determiningwhat the optimum perceptibility of your data would be for your audience.Your definitions were based on whether you were aiming to facilitate thereading of the data or more a general feeling of the data? Were youconcerned with enabling precise and accurate perceptions of values or is itmore about the sense-making of the big, medium and small judgments –getting the ‘gist’ of values more than reading back the values? Were thereemotional qualities that you wanted to emphasise perhaps at thecompromise of perceptual efficiency? Maybe there was a balance betweenthe two?

How these tonal definitions apply specifically to data representationrequires our appreciation of some fundamental theory about datavisualisation. In his book Semiology Graphique, published in 1967,Jacques Bertin was the first, most notable author to propose the idea thatdifferent ways of encoding data might offer varying degrees ofeffectiveness in perception. In 1984 William Cleveland and Robert McGillpublished a seminal paper, ‘Graphical Perception: Theory,Experimentation, and Application to the Development of GraphicalMethods’, that offered more empirical evidence of Bertin’s thoughts. Theyproduced a general ranking that explained which attributes used to encodequantitative values would facilitate the highest degree of perceptualaccuracy. In 1986, Jock Mackinlay’s paper, ‘Automating the Design ofGraphical Presentations of Relational Information’, further extended thisto include proposed rankings for encoding categorical nominal andcategorical ordinal data types as well as quantitative ones. The table shownin Figure 6.57, adapted from Mackinlay’s paper, presents the ‘Ranking of

370

Perceptual Tasks’.

In a nutshell, this ancestry of studies reveals that certain attributes used toencode data may make it easier, and others may make it harder, to judgeaccurately the values being portrayed. Let’s illustrate this with a couple ofexamples. Looking at Figure 6.58, ask yourself: if A is 10, how big is B inthe respective bar and circular displays?

In both cases the answer is B = 5, but while the B ‘bar’ being 5 feels aboutright, the idea that the B ‘circle’ is 5 does not feel quite right. That isbecause our ability to perform relative judgements for the length of bars isfar more precise and accurate than the relative judgements for the area ofcircles. This is explained by the fact that when judging the variation in sizeof a line (bar) you are detecting change in a linear dimension, whereas thevariation in size of a geometric area (circle) occurs across a quadraticdimension. If you look at the rankings in Figure 6.57 in the ‘Quantitative’column, you will see the encoding attribute of Length is ranked higher thanthe attribute of Area.

Figure 6.57 The Ranking of Perceptual Tasks

371

Figure 6.58 Comparison of Judging Line Size vs Area Size

Now let’s consider an example (Figure 6.59) that shows the relativeaccuracy of using different dimensions of colour variation to representcategorical nominal values. In the next pair of charts you can see differentattributes being used to represent the categorical groupings of the points inthe respective scatter plots. On the left you can see variation in theattribute of colour hue (blue, orange and green) to separate the categoriesvisually; on the right you will see the attribute of shape (diamond, circleand square) applied to the same category groupings. What you should beexperiencing is a far more immediate, effortless and accurate sense of thegroupings of the coloured category markers compared with the shaped

372

category markers. It is simply easier to spot the associations throughvariation in colour than variation in shape. This explains why colour hue ismuch higher in the proposed rankings for nominal data than shape.

Figure 6.59 Comparison of judging related items using variation in colour(hue) vs variation in shape

So you can see from these simple demonstrations that there are clearlyways of encoding data that will make it easier to read values accuratelyand efficiently. However, as Cleveland and McGill stress in their paper,this ranking should be taken as only one ingredient of guidance: ‘Theordering does not result in a precise prescription for displaying data butrather is a framework within which to work’.

This is important to note because you have to take into account otherfactors. You have to decide whether precise perceiving is actually whatyou need to facilitate for your readers. If you do, then the likes of the barchart – through the variation in length of a bar – will evidently offer a veryprecise approach. As stated in Chapter 3, that is why they are such animportant part of your visual artillery.

However, sometimes getting a ‘gist’ of the data is sufficient. A few pagesago I presented an image of a bubble chart on my website’s home page,showing the popularity of my blog posts over the previous 100-day period.The purpose of this display was purely to give visitors a sense of thegeneral order of magnitude from the most popular to the relative leastpopular posts. I do not need visitors to form a precise understanding ofabsolute values or exact rankings. I just want them to get a sense of theranking hierarchy. I can therefore justify moving down the quantitativeattribute rankings proposed and deploy a series of circles that encode the

373

visitor totals through the size of their area (colour is used to representdifferent article categories). The level of perceptibility (accuracy andefficiency) that I need to facilitate is adequately achieved by the resulting‘frogspawn’-like display. Furthermore, it offers an appealing and varieddisplay that suits the purpose of this front-page navigation device.

In practice, what all this shows is that chart types vary in the relativeefficiency and accuracy of perception offered to a viewer. Moreover, manyof the charts shown in the gallery can therefore only ever facilitate a gist ofthe values of data due to the complexity of their mark and attributecombinations and the amount of data values they might typically contain(e.g. the treemap often has many parts of a whole in a single display). It isup to you to judge what the right threshold is for your purpose.

Working With DataData examination: Inevitably, the physical characteristics of yourdata are especially influential. What types of data you are trying todisplay will have a significant impact on how you are able to showthem. Only certain types of data can fit into certain chart types; onlycertain chart types can accommodate certain types of data. That iswhy it is often most useful practically to think of this task in terms ofchart types and particularly in terms of these as templates, able toaccommodate specific types of data.For example, representing data through a bar chart requires onecategorical variable (e.g. department) and one quantitative variable(e.g. maximum age). If you want to show a further categoricalvariable (let’s say, to break down departments by gender) you aregoing to need to switch ‘template’ and use something like a clusteredbar chart which can accommodate this extra dimension.I explained earlier how the shape of data influenced the viability ofthe flower metaphor used in the ‘Better Life Index’. The range ofcategorical and quantitative values will certainly influence the mostappropriate chart type choice. For example, suppose you want toshow some part-to-whole analysis and you have only three parts(three sub-categories belonging to the major category or whole) thena treemap really does not make a great deal of sense – they are betterat representing many parts to a whole. The unloved pie chart wouldprobably suffice if the percentage values were quite diverse otherwisethe bar chart would be best.

374

Beyond the size and shape of your data you also might be influencedby its inherent meaning. Sometimes, you will have scope in yourencoding choices to incorporate a certain amount of visual immediacyin accordance with your topic. The flowers of the Better Life Indexfeel consistent in metaphor with the idea of better life: the more inbloom the flowers, the more colourful and proud each petal appearsand the better the quality of life in that country. There is a congruencebetween subject matter and visual form. Think about the billionaires’project from earlier in the chapter, with rankings displayed byindustry. Each point marking each billionaire was a small caricatureface. This is not necessary – a small circular mark for each personwould have been fine – but by using a face for the mark it creates amore immediate recognition that the subject matter is about people.Data exploration: One consistently useful pointer towards how youmight visually communicate your data to others is to consider whichtechniques helped you to unearth key insights when you were visuallyexploring the data. What chart types have you already tried out andmaybe found to reveal interesting patterns? Exploratory data analysisis, in many ways, a bridge to visual communication: the charts youuse to inform yourself often represent prototype thinking on how youmight communicate with others. The design execution may end upbeing different once you introduce the influence of audiencecharacteristics into your thinking, naturally, but if a method is alreadyworking, why not utilise the same approach again?

‘Effective graphics conform to the Congruence Principle according towhich the content and format of the graphic should correspond to thecontent and format of the concepts to be conveyed.’ Barbara Tverskyand Julie Bauer Morrison, taken from Animation: Can it Facilitate?

Establishing Your Editorial ThinkingAngle: When articulating the angles of analysis you intend to portrayto your viewers, you are effectively dictating which chart types mightbe most relevant. If you intend to show how quantities have changedover time, for example, there will be certain charts best placed toportray that and many others that will not. By expressing your desirededitorial angles of analysis in language terms, this will be extremelyhelpful in identifying the primary families of charts across the

375

CHRTS taxonomy that will provide the best option.It is vital to treat every representation challenge on its own merits –do not fall into the trap of going through the motions. Just becauseyou have spatial data does not mean that the most useful portrayal ofthat data will be via a map. If the interesting insights are notregionally and spatially significant, then the map may not provide themost relevant window on that data. The composition of a map – theshape, size and positioning of the world’s regions – is so diverse,inconsistent and truly non-uniform that it may hinder your analysisrather than illuminate it. So always make sure you have carefullyconsidered the relevance of your chosen angle through your editorialthinking.

Trustworthy DesignAvoiding deception: In the discussion about tone I explained howvariations in the potential precision of perception may be appropriatefor the purpose and context of your work. Precision in perception isone thing, but precision in design is another. Being truthful andavoiding deception in how you portray data visually are fundamentalobligations.There are many ways in which viewers can be deceived throughincorrect and inappropriate encoding choices. The main issues arounddeception tend to concern encoding the size of quantities. Forbeginners, these mistakes can be entirely innocent and unintended butneed to be eradicated immediately.

Geometric calculations – When using the area of shapes torepresent different quantitative values, the underlying geometryneeds to be calculated accurately. One of the common mistakeswhen using circles, for example, is simply to modify thediameters: if a quantitative value increases from 10 to 20, justdouble the diameter, right? Wrong. That geometric approachwould be a mistake because, as viewers, when perceiving thesize of a circle, it is the area, not the width, of the circle uponwhich we base our estimates of the quantitative value beingrepresented.

Figure 6.60 Illustrating the correct and incorrect circle sizeencoding

376

The illustration in Figure 6.60 shows the incorrect and correctways of encoding two quantitative values through circle size,where the value of A is twice the size of B. The orange circle forB has half the diameter of A, the green circle for B has half thearea of A. The green circle area calculations are the correct wayto encode these two values, whereas the orange circlecalculations disproportionately shrink circle B by halving thediameter rather than halving the area. This makes it appear muchsmaller than its true value.

3D decoration – In the vast majority of circumstances the use of3D charts is at best unnecessary and at worst hugely distorting inthe display of data. I have some empathy for those who mightvolunteer that they have made and/or like the look of 3D charts.In the past I did too. Sometimes we don’t know not to dosomething until we are told. So this is me, here and now, tellingyou.

The presence of 3D in visualisation tends to be motivated by adesire to demonstrate technical competence with the features ofa tool in terms of ‘look how many things I know how to do withthis tool!’ (users of Excel, I am pointing an accusatory finger atyou right now). It is also driven by the appetite of ratherunsophisticated viewers who are still attracted by the apparentnovelty of 3D skeuomorphic form. (Middle and seniormanagement of the corporate world, with your ‘make me a fancychart’ commands, my finger of doom is now pointing in yourdirection.)

Using psuedo-3D effects in your charts when you have only twodimensions of data means you are simply decorating data. Andwhen I say ‘decorating’, I mean this with the same sneer that

377

would greet memories of avocado green bathrooms in 1970sBritain. A 3D visualisation of 2D data is gratuitous and distortsthe viewer’s ability to read values within any degree ofacceptable accuracy. As illustrated in Figure 6.61, in perceivingthe value estimates of the angles and segments in the respectivepie charts, the 3D version makes it much harder to form accuratejudgements. The tilting of the isometric plane amplifies the frontpart of the chart and diminishes the back. It also introduces araised ‘step’ which is purely decorative, thus embellishing thejudgement of the segment sizes.

Figure 6.61 Illustrating the Distortions Created by 3DDecoration

Furthermore, for charts based on three dimensions of data, 3Deffects should only be considered if – and only if – the viewer isprovided with means to move around the chart object to establishdifferent 2D viewing angles and the collective representation ofall the 3D of data makes sense in showing a whole ‘system’.Truncated axis scales – When quantitative values are encodedthrough the height or length components of size (e.g. for barcharts and area charts), truncating the value axis (not starting therange of quantitative values from the true origin of zero) distortsthe size judgements. I will look at this in more detail in thechapter on composition because it is ultimately more about thesize considerations of scales and deployment of chart apparatusthan necessarily just the representation choices.

Accessible Design

378

The bullet chart is a derivative of the bar chart – the older, moresophisticated brother of the idiot gauge chart – but I didn’t think it wasnecessary to profile as a separate chart type.

Encoded overlays: Beyond the immediate combinations of marksand attributes that comprise a given chart type, you may find value inincorporating additional detail to help viewers with the perceivingand interpretation task. Encoded overlays are useful to help explainfurther the context of values and amplify the interpretation of thegood and the bad, the normal and the exceptional. In some ways thesefeatures might be considered forms of annotation, but as theyrepresent data values (and therefore require encoding choices) itmakes sense to locate these options within this chapter. There aremany different types of visual overlays that may be useful to include:

Figure 6.62 Example of a Bullet Chart Using Banding Overlays

Figure 6.63 Excerpt from ‘What’s Really Warming the World?’

Bandings – These are typically shaded areas that provide some

379

sense of contrast between the main data value marks andcontextual judgements of historic or expected values. In a bulletchart (Figure 6.62) there are various shaded bands that mighthelp to indicate whether the bar’s value should be consideredbad, average or good. In the line chart (Figure 6.63) here you cansee the observed rise in global temperatures. To facilitatecomparison with potentially influencing factors, in thebackground there is a contextual overlay showing the change ingreenhouse gases with banding to indicate the 95% confidenceinterval.Markers – Adding points to a display might be useful to showcomparison against a target, forecast, a previous value, or tohighlight actual vs budget. Figure 6.64 shows a chart thatfacilitates comparisons against a maximum value marker.

Figure 6.64 Example of Using Markers Overlays

Figure 6.65 Why Is Her Paycheck Smaller?

380

Reference lines – These are useful in any display that usesposition or size along an axis as an attribute for a quantitativevalue. Line charts or scatter plots (Figure 6.65) are particularlyenhanced by the inclusion of reference lines, helping to directthe eye towards calculated trends, constants or averages and,with scatter plots specifically, the lines of best fit or correlation.

Elegant DesignVisual appeal: This fits again with the thinking about ‘tone’ and mayalso be informed by some of the mental visualisations that might haveformed in the initial stages of the process. Although you should notallow yourself to be consumed by ideas over the influence of the data,sometimes there is scope to squeeze out an extra sense of stylisticassociation between the visual and the content. For example, the‘pizza’ pie chart in Figure 6.66 presents analysis about the politicalcontributions made by companies in the pizza industry. The decisionto use pizza slices as the basis of a pie chart makes a lot of sense. Thegraphic in Figure 6.67 displays the growth in online sales of razors.Like the pizzas, the notion of creating bar charts by scraping awaylengths of shaving foam offers a clever, congruent and charmingsolution.

Figure 6.66 Inside the Powerful Lobby Fighting for Your Right to Eat

381

Pizza

Figure 6.67 Excerpt from ‘Razor Sales Move Online, Away FromGillette’

382

Summary: Data RepresentationVisual Encoding All charts are based on combinations of marks andattributes:

Marks: represent records (or aggregation of records) and can bepoints, lines, areas or forms.Attributes: represent variable values held for each record and caninclude visual properties like position, size, colour, connection.

Chart Types If visual encoding is the fundamental theoreticalunderstanding of data representation, chart types are the practicalapplication. There are five families of chart types (CHRTS mnemonic):

383

Influencing Factors and Considerations

Formulating the brief: skills and resources – what charts can youmake and how efficiently? From the definitions across the ‘purposemap’ what ‘tone’ did you determine this project might demonstrate?Working with data: what is the shape of the data and how might thatimpact on your chart design? Have you already used a chart type toexplore your data that might prove to be the best way to communicateit to others?Establishing your editorial thinking: what is the specific angle of theenquiry that you want to portray visually? Is it relevant andrepresentative of the most interesting analysis of your data?Trustworthy design: avoid deception through mistaken geometriccalculations, 3D decoration, truncated axis scales, corrupt charts.Accessible design: the use of encoded overlays, such as bandings,markers, reference lines, can aid readability and interpretation.Elegant design: consider the scope of certain design flourishes thatmight enhance the visual appeal through the form of your chartswhilst also preserving their function.

Tips and Tactics

Data is your raw material, not your ideas, so do not arrive at this stagedesperate and precious about wanting to use a certain datarepresentation approach.Be led by the preparatory work (stages 1 to 3) but do use the charttype gallery for inspiration if you need to unblock!Be especially careful in how you think about representing instances ofzero, null (no available data) and nothing (no observation).Do not be too proud to acknowledge when you have made a bad callor gone down a dead end.

384

385

7 Interactivity

The advancement of technology has entirely altered the nature of how weconsume information. Whereas only a generation ago most visualisationswould have been created exclusively for printed consumption,developments in device capability, Internet access and bandwidthperformance have created an incredibly rich environment for digitalvisualisation to become the dominant output. The potential now exists forcreative and capable developers to produce powerful interactive andengaging multimedia experiences for cross-platform consumption.

Unquestionably there is still an fundamental role for static (i.e. notinteractive) and print-only work: the scope offered by digital simplyenables you to extend your reach and broaden the possibilities. In the rightcircumstances, incorporating features of interactivity into yourvisualisation work offers many advantages:

It expands the physical limits of what you can show in a given space.It increases the quantity and broadens the variety of angles of analysisto serve different curiosities.It facilitates manipulations of the data displayed to handle variedinterrogations.It increases the overall control and potential customisation of theexperience.It amplifies your creative licence and the scope for exploring differenttechniques for engaging users.

The careful judgements that distinguish this visualisation design processmust be especially discerning when handling this layer of the anatomy.Well-considered interactivity supports, in particular, the principle of‘accessible’ design, ensuring that you are adding value to the experience,not obstructing the facilitation of understanding. Your main concern inconsidering potential interactivity is to ensure the features you deploy areuseful. This is an easy thing to say about any context but just because youcan does not mean to say you should. For some who possess a naturaltechnical flair, there is often too great a temptation to create interactivitywhere it is neither required nor helpful.

386

Having said that, beyond the functional aspects of interactive designthinking, depending on the nature of the project there can be valueattached to the sheer pleasure created by thoughtfully conceivedinteractive features. Even if these contribute only ornamental benefit therecan be merit in creating a sense of fun and playability so long as suchfeatures do not obstruct access to understanding.

There is a lot on your menu when it comes to considering potentialinteraction design features. As before, ahead of your decision makingabout what you should do, you will first consider what you could do. Tohelp organise your thinking, your options are divided into two main groupsof features:

Data adjustments: Affecting what data is displayed.Presentation adjustments: Affecting how the data is displayed.

There is an ever-increasing range of interfaces to enable interactionevents beyond the mouse/touch through gesture interfaces like theKinect device, oculus rift, wands, control pads. These are beyond thescope of this book but it is worth watching out for developments in thefuture, especially with respect to the growing interest in exploring theimmersive potential of virtual reality (VR).

When considering potential interactive features you first need to recognisethe difference between an event, the control and the function. The event isthe input interaction (such as a click), applied to a control (maybe a button)or element on your display, with the function being the resulting operationthat is performed (filter the data).

Where once we were limited to the mouse or the trackpad as the commonperipheral, over the past few years the emergence of touch-screens in theshape of smartphones and tablets has introduced a whole new eventvocabulary. For the purposes of this chapter we focus on the language ofthe mouse or trackpad, but here is a quick translation of the equivalenttouch events. Note that arguably the biggest difference in assigning eventsto interactive data visualisations exists in the inability to register amouseover (or ‘hover’) action with touch-screens.

387

7.1 Features of Interactivity: DataAdjustmentsThis first group of interactive features covers the various ways in whichyou can enable your users to adjust and manipulate your data. Specifically,they influence what data is displayed at a given moment.

I will temporarily switch nomenclature to ‘user’ in this chapter becausea more active role is needed than ‘viewer’.

Framing: There is only so much one can show in a singlevisualisation display and thus giving users the ability to modifycriteria to customise what data is visible at any given point is a strongadvantage. Going back to the discussion on editorial thinking, inChapter 5, this set of adjustments would specifically concern the‘framing’ of what data to isolate, include or exclude from view.

For those of you familiar with databases, think of this group of featuresas similar in scope to modifying the criteria when querying data in adatabase.

388

In ‘Gun Deaths’ (Figure 7.1), you can use the filters in the pop-upcheck-box lists at the bottom to adjust the display of selectedcategorical data parameters. The filtered data is then shown inisolation above the line from all non-selected groups, which areshown below the line. The ‘Remove filters’ link can be used to resetthe display to the original settings.

Figure 7.1 US Gun Deaths

In the bubble map view of the ‘FinViz’ stock market analysis site,you can change the values of the handles along the axes to modify themaximum and minimum axis range, which allows you effectively tozoom in on the records that match this criterion. You can also selectthe dropdown menus to change the variables plotted on each axis.

Notice the subtle transparency of the filter menu (in Figure 7.1) so thatit doesn’t entirely occlude the data displayed beneath.

389

Figure 7.2 FinViz: Standard and Poor’s 500 Index

Navigating: There are dynamic features that enable users to expandor explore greater levels of detail in the displayed data. This includeslateral movement and vertical drill-down capabilities.

You will see that many of these interactive projects include links toshare the project (or view of the project) with others via social media orthrough offering code to embed work into other websites. This helps tomobilise distribution and open up wider access to your work.

Figure 7.3 The Racial Dot Map

390

The dot map in Figure 7.3, showing the 2010 Census data, displayspopulation density across the USA. As a user you can use a scrollablezoom or scaled zoom to zoom in and out of different map view levels.The map can also be navigated laterally to explore different regions atthe same resolution.This act of zooming to increase the magnification of the view isknown as a geometric zoom. This is considered a data adjustmentbecause through zooming you are effectively re-framing the windowof the included and excluded data at each level of view.In the ‘Obesity Around the World’ visualisation (Figure 7.4),selecting a continent connector expands the sub-category display toshow the marks for all constituent countries. Clicking on the sameconnector collapses the countries to revert back to the main continent-level view.The ‘Social Progress Imperative’ project (Figure 7.5) provides anexample of features that enable users to view the tabulated form ofthe data – the highest level of detail – by selecting the ‘Data Table’tab. The data adjustment taking place here is through providingaccess to the data in a non-visual form. Users can also export the data

391

by clicking on the relevant button to conduct further local analysis.

Figure 7.4 Obesity Around the World

Animating: Data with a temporal component often lends itself tobeing portrayed via animated sequences. The data adjustment takingplace here involves the shifting nature of the timeframe in view at anygiven point. Operations used to create these sequences may beautomatic and/or manual in nature.

Figure 7.5 Excerpt from ‘Social Progress Index 2015’

392

This next project (Figure 7.6) plots NFL players’ height and weightover time using an animated heat map. When you land on the webpage the animation automatically triggers. Once completed, you canalso select the play button to recommence the animation as well asmoving the handle along the slider to manually control the sequence.The gradual growth in the physical characteristics of players is clearlyapparent through the resulting effect.Sequencing: In contrast to animated sequences of the samephenomena changing over time, there are other ways in which a morediscrete sequenced experience can suit your needs. This commonlyexists by letting users navigate through predetermined, differentangles of analysis about a subject. As you navigate through thesequence a narrative is constructed. This is a quintessential exampleof storytelling with data exploring the metaphor of the anecdote: ‘thishappened’ and then ‘this happened’…

Figure 7.6 NFL Players: Height & Weight Over Time

393

The project ‘How Americans Die’ (Figure 7.7) offers a journeythrough many different angles of analysis. Clicking on the series of‘pagination’ dots and/or the navigation buttons will take you througha pre-prepared sequence of displays to build a narrative about thissubject.

Figure 7.7 Excerpt from ‘How Americans Die’

394

Sometimes data exists in only two states: a before and after view.Using normal animated sequences would be ineffective – too suddenand too jumpy – so one popular technique, usually involving twoimages, employs the altering of the position of a handle along a sliderto reveal/fade the respective views. This offers a more graduatedsequence between the two states and facilitates comparisons far moreeffectively as exhibited by the project shown in Figure 7.8.A different example of sequencing – and an increasingly populartrend – is the vertical sequence. This article from the Washington Post(Figure 7.9) profiles the beauty of baseball player Bryce Harper’sswing and uses a very slick series of illustrations to break down fourkey stages of his swing action. As you scroll down the page it actslike a lenticular print or flip-book animation. Notice also how welljudged the styles of the illustrations are.

Figure 7.8 Model Projections of Maximum Air Temperatures Nearthe Ocean and Land Surface

395

Figure 7.9 Excerpt from ‘A Swing of Beauty’

396

Contributing: So far the features covered modify the criteria of whatdata is included/excluded, that then help you dive deeper into thedata, and move through sequenced views of that data. The finalcomponent of ‘data adjustment’ concerns contributing data.Sometimes there are projects that require user input, either forcollecting further records to append and save to an original dataset orjust for temporary (i.e. not held beyond the moment of usage)participation. Additionally, there may be scope to invite users tomodify certain data in order to inform calculations or customise adisplay. In each case, the events and controls associated with this kindof interaction are designed to achieve one function: input data.

397

The first example ‘How well do you know your area?’ (Figure 7.10)by ONS Digital, employs simple game/quiz dynamics to challengeyour knowledge of your local area in the UK. Using the handle tomodify the position along the slider you input a quantitative responseto the questions posed. Based on your response it then providesfeedback revealing the level of accuracy of your estimation.

Figure 7.10 How Well Do You Know Your Area?

In the next project (Figure 7.11), by entering personal details such asyour birth date, country and gender into the respective input boxesyou learn about your place in the world’s population with some rathersobering details about your past, present and future on this planet.

Figure 7.11 Excerpt from ‘Who Old Are You?’

398

Figure 7.12 shows an excerpt from ‘512 Paths to the White House’. Inthis project the toggle buttons are used to switch between threecategorical data states (unselected, Democratic and Republican) tobuild up a simulated election outcome based on the user’s predictionsfor the winners in each of the key swing states. As each winner isselected, only the remaining possible pathways to victory for eithercandidate are shown.

Inevitably data privacy and intended usage are key issues of concern forany project that involves personal details being contributed, so becareful to handle this with integrity and transparency.

Adjusting the position of the handle along the slider in the Better LifeIndex project (Figure 7.13) modifies the quantitative data valuerepresenting the weighting of importance you would attach to eachquality of life topic. In turn, this modifies the vertical positioning ofthe country flowers based on the recalculated average quality of life.

Figure 7.12 512 Paths to the White House

399

Figure 7.13 OECD Better Life Index

7.2 Features of Interactivity: PresentationAdjustmentsIn contrast to the features of ‘data adjustment’, this second group ofinteractive features does not manipulate the data but rather lets youconfigure the presentation of your data in ways that facilitate assistanceand enhance the overall experience.

400

Focusing: Whereas the ‘framing’ features outlined previouslymodified what data would be included and excluded, ‘focus’ featurescontrol what data is visually emphasised and, sometimes, how it isemphasised. Applying such filters helps users select the values theywish to bring to the forefront of their attention. This may be throughmodifying the effect of depth through colour (foreground, mid-ground and background) or a sorting arrangement. The maindifference with the framing features is that no data is eliminated fromthe display but simply relegated in its contrasting prominence orposition.

Figure 7.14 Nobel Laureates

The example in Figure 7.14 provides a snapshot of a project whichdemonstrates the use of a focus filter. It enables users to select a radiobutton from the list of options to emphasise different cohorts of allNobel Laureates (as of 2015). As you can see the selections includefilters for women, shared winners and those who were still living atthe time. The selected Laureates are not coloured differently, rather

401

the unselected values are significantly lightened to create the contrast.

Figure 7.15 Geography of a Recession

The project shown in Figure 7.15 titled ‘Geography of a Recession’allows users to select a link from the list of filters provided on the leftto emphasise different cohorts of counties across the USA. Onceagain, the selected counties are not coloured differently here, theunselected regions are de-emphasised by washing-out their originalshades.

Figure 7.16 How Big Will the UK Population be in 25 Years’ Time?

402

‘Brushing’ data is another technique used to apply focus filters. Inthis next example (Figure 7.16), looking at the UK Census estimatesfor 2011, you use the cursor to select a range of marks from withinthe ‘violin plot’ display in order to view calculated statistics of thosechosen values below the chart.The next example (Figure 7.17), portraying the increase or cuts inWorkers’ Compensation benefits by US state, demonstrates atechnique known as ‘linking’, whereby hovering over a mark in onechart display will then highlight an associated mark in another chartto draw attention to the relationship. In this case, hovering over a statecircle in any of the presented ‘grid maps’ highlights the same state inthe other two maps to draw your eye to their respective statuses. Youmight also see this technique combined with a brushing event tochoose multiple data marks and then highlight all associationsbetween charts, as also demonstrated in the population ‘violin plot’ inFigure 7.16.

403

Figure 7.17 Excerpt from ‘Workers’ Compensation Reforms byState’

Sorting is another way of emphasising the presentation of data. InFigure 7.18, featuring work by the Thomson Reuters graphics team,‘ECB bank test results’, you see a tabular display with sortingfeatures that allow you to reorder columns of data by clicking on thecolumn headers. For categorical data this will sort valuesalphabetically; for quantitative data, by value order. You can alsohand-pick individual records from the table to promote them to thetop of the display to facilitate easier comparisons through closerproximity.

Linking and brushing are particularly popular approaches used forexploratory data analysis where you might have several chart panels andwish to see how a single record shows up within each display.

Annotating: As you saw in the previous chapter on datarepresentation, certain combinations of marks and attributes may onlyprovide viewers with a sense of the order of magnitude of the valuespresented. This might be entirely consistent with the intended tone ofthe project. However, with interactivity, you can at least enableviewers to interact with marks to view more details momentarily.

404

This temporary display is especially useful because most datarepresentations are already so busy that permanently including certainannotated apparatus (like value labels, gridlines, map layers) wouldoverly clutter the display.

Figure 7.18 Excerpt from ‘ECB Bank Test Results’

The example in Figure 7.19, profiles the use of language throughoutthe history of US Presidents’ State of the Union addresses, usingcircle sizes to encode the frequency of different word mentions,giving a gist of the overall quantities and how patterns have formedover time. By hovering over each circle you get access to a tooltipdialogue box which reveals annotations such as the exact word-usequantities and extra contextual commentary.One issue to be aware of when creating pop-up tooltips is to ensurethe place they appear does not risk obstructing the view of importantdata in the chart beneath. This can be especially intricate to handlewhen you have a lot of annotated detail to share. One tactic is to

405

utilise otherwise-empty space on your page display, occupying it withtemporary annotated captions only when triggered by a select orhover event from within a chart.Orientating: A different type of interactive annotation comes in theform of orientation devices, helping you to make better sense of yourlocation within a display – where you are or what values you arelooking at. Some of these functions naturally supplement featureslisted in the previous section about ‘data adjustment’ specifically fornavigation support.

Figure 7.19 Excerpt from ‘History Through the President’s Words’

This snapshot, again from the ‘How Americans Die’ project (Figure7.20), dynamically reveals the values of every mark (both x and yvalues) in this line chart depending on the hover position of thecursor. This effect is reinforced by visual guides extending out to theaxes from the current position.

406

Figure 7.20 Excerpt from ‘How Americans Die’

Figure 7.21 Twitter NYC: A Multilingual Social City

Figure 7.21 displays the language of tweets posted over a period oftime from the New York City area. Given the density and number ofdata points, displaying the details of the mapping layer would be quitecluttered, yet this detail would provide useful assistance for judgingthe location of the data patterns. The effective solution employed letsyou access both views by providing an adjustable slider that allowsyou to modify the transparency of the network of roads to reveal theapparatus of the mapping layer.

Figure 7.22 Killing the Colorado: Explore the Robot River

407

Finally, as mentioned in the previous section, navigating throughdigital visualisation projects increasingly uses a vertical landscape tounfold a story (some term this ‘scrollytelling’). Navigation is oftenseamlessly achieved by using the scroll wheel to move up and downthrough the display. To assist with orientation, especially when youhave a limited field of view of a spatial display, a thumbnail imagemight be used to show your current location within the overalljourney to give a sense of progress. The project featured in Figure7.22 is a great example of the value of this kind of interface,providing a deep exploration of some of the issues impacting on theColorado River.

7.3 Influencing Factors and ConsiderationsYou now have a good sense of the possibilities for incorporatinginteractive features into your work, so let’s turn to consider the factors thatwill have most influence on which of these techniques you might need toor choose to apply.

Formulating Your BriefSkills and resources: Interactivity is unquestionably something that

408

many people aspire to create in their visualisation work, but it issomething greatly influenced by the skills possessed, the technologyyou have access to and what they offer. These will be the factors thatultimately shape your ambitions. Remember, even in commondesktop tools like Excel and Powerpoint, which may appear morelimited on this front, there are ways to incorporate interactive controls(e.g. using VBA in Excel) to offer various adjustment features (e.g.links within Powerpoint slides to create sequences and navigate toother parts of a document).Timescales: It goes without saying that if you have a limitedtimeframe in which to complete your work, even with extensivetechnical skills you are going to be rather pushed in undertaking anyparticularly ambitious interactive solutions. Just because you want todoes not mean that you will be able to.Setting: Does the setting in which the visualisation solution will beconsumed lend itself to the inclusion of an interactive element to theexperience? Will your audience have the time and know-how to takefull advantage of multi-interactive features or is it better to look toprovide a relatively simpler, singular and more immediate staticsolution?Format: What will be the intended output format that this projectneeds to be created for? What device specifications will it need towork across? How adaptable will it need to be?The range and varied characteristics of modern devices presentvisualisers (or perhaps more appropriately, at this stage, developers)with real challenges. Getting a visualisation to work consistently,flexibly and portably across device types, browsers and screendimensions (smartphone, tablet, desktop) can be something of anightmare. Responsive design is concerned with integrating automaticor manually triggered modifications to the arrangement of contentswithin the display and also the type and extent of interactive featuresthat are on offer. Your aim is to preserve as much continuity in thecore experience as possible but also ensure that the same process andoutcome of understanding can be offered to your viewers.While the general trend across web design practice is heading towardsa mobile-first approach, for web-based data visualisationdevelopments there is still a strong focus on maximising thecapabilities of the desktop experience and then maybe compromising,in some way, the richness of the mobile experience.For ProPublica’s work on ‘Losing Ground’ (Figure 7.23), the

409

approach to cross-platform compatibility was based around the rule ofthumb ‘smallify or simplify’. Features that worked on ProPublica’sprimary platform of the desktop would have to be either simplified tofunction practically on the smartphone or simply reduced in size. Youwill see in the pair of contrasting images how the map display is bothshrunk and cropped, and the introductory text is stripped back to onlyinclude the most essential information.

Figure 7.23 Losing Ground

Other format considerations include whether your solution will beprimarily intended for the Web, but will it also need to work in print?The proverb ‘horses for courses’ comes to mind here: solutions needto be created as fit for the format it will be consumed in. The designfeatures that make up an effective interactive project will unlikelytranslate directly as a static, print version. You might need to pursuetwo parallel solutions to suit the respective characteristics of eachoutput format.Another illustration of good practice from the ‘History through thePresidents’ words’ (Figure 7.24) includes a novel ‘Download graphic’function which, when selected, opens up an entirely different staticgraphic designed to suit a printable, pdf format.

Figure 7.24 Excerpt from ‘History Through the President’s Words’

410

Purpose map: Interactivity does not only come into your thinkingwhen you are seeking to create ‘Exploratory’ experiences. You mayalso employ interactive features for creating ‘Explanatory’visualisations, such as portraying analysis across discrete sequenced

411

views or interactively enabling focus filters to emphasise certaincharacteristics of the data. The general position defined on thepurpose map will not singularly define the need for interactivity,rather it will inform the type of interactivity you may seek toincorporate to create the experience you desire.There will also often be scope for an integrated approach wherebyyou might lead with an explanatory experience based around showingheadline insights and then transitions into a more exploratoryexperience through offering a set of functions to let users interrogatedata in more detail.

Working With DataData examination: As profiled with the functions to facilitate drill-down navigation, one of the keen benefits of interactivity is when youhave data that is too big and too broad to show in one view. Torepeat, you can only show so much in a single-screen display. Oftenyou will need to slice up views across and within the varioushierarchies of your data.One particular way the physical properties of the data will informyour interaction design choices is with animation. To justify ananimated display over time, you will need to consider the nature ofthe change that exists in your data. If your data is not changing much,an animated sequence may simply not prove to be of value.Conversely, if values are rapidly changing in all dimensions, ananimated experience will prove chaotic and a form of changeblindness will occur. It may be that the intention is indeed to exhibitthis chaos, but the value of animated sequences is primarily to helpreveal progressive or systematic change rather than random variation.The speed of an animation is also a delicate matter to judge as youseek to avoid the phenomenon of change blindness. Rapid sequenceswill cause the stimulus of change to be missed; a tedious pace willdampen the stimulus of change and key observations may be lost. Theoverall duration will, of course, be informed by the range of values inyour temporal data variable. There is no right or wrong here, it issomething that you will get the best sense of by prototyping andtrialling different speeds.

Establishing Your Editorial Thinking

412

Angle, framing and focus: If you have multiple different angles ofanalysis you wish to portray then these will have to be accommodatedwithin the space allocated. Alternatively, using interactivity, youcould provide access to them via sequenced views or menus enablingtheir selection. The value of incorporating the potential features toachieve this – and the specific range of different options you do wishto facilitate – will be informed by the scope of the decisions youmade in the editorial thinking stages.Thinking again about animations, you must consider whether ananimated sequence will ultimately convey the clearest answer to anangle of interest about how something has changed over time. Thisreally depends on what it is you want to show: the dynamics of a‘system’ that changes over time or a comparison between differentstates over time?The animated project in Figure 7.25 shows the progressive clearing ofsnow across the streets of New York City during the blizzard ofFebruary 2014. The steady and connected fluidity of progress of thesnow-clearing is ideally illustrated through the intervals of changeacross the 24 hours shown.

Figure 7.25 Plow: Streets Cleared of Snow in New York City

413

Sometimes, you might wish to compare one moment directly againstanother. With animated sequences, there is a reliance on memory toconduct this comparison of change. However, our ability to recall isfleeting at best and weakens the further apart (in time) the basis of thecomparison has occurred. Therefore, to facilitate such a comparisonyou ideally need to juxtapose individual frames within the same view.The most common technique used to achieve this is through smallmultiples, where you repeat the same representation for each momentin time of interest and present them collectively in the same view,often through a grid layout. This enables far more incisivecomparisons, as you can see through ‘The Horse in Motion’ work byEadward Muyrbidge, which was used to learn about the gallopingform of a horse by seeing each stage of the motion throughindividually framed moments.

‘Generations of masterpieces portray the legs of galloping horses

414

incorrectly. Before stop-gap photography, the complex interaction ofhorses’ legs simply happened too fast to be accurately apprehended …but in order to see the complex interaction of moving parts, you needthe motion.’ [Paraphrasing] Barbara Tversky and Julie BauerMorrison, taken from Animation: Can it Facilitate?

Figure 7.26 The Horse in Motion

Data RepresentationChart type choice: Some charts are inherently visually complex andideally need interactivity to make them more accessible and readablefor the viewer. The bump chart, chord diagram, and Sankey diagramare just a few of the charts that are far more readable and, byextension, usable if they can offer users the means to filter or focus oncertain selected components of the display through interactivity.

Trustworthy DesignFunctional performance: Faith in the reliability, consistency andgeneral performance of a visualisation is something that impacts onthe perception of a project as ‘trustworthy’. Does it do what itpromises and can I trust the functions that it performs? Projects that

415

involve the collection of user-inputted data will carry extra riskaround trust: how will the data be used and stored? You need toalleviate any such concerns upfront.

‘Confusing widgets, complex dialog boxes, hidden operations,incomprehensible displays, or slow response times … may curtailthorough deliberation and introduce errors.’ Jeff Heer and BenSchneiderman, taken from Interactive Dynamics for Visual Analysis

Accessible DesignUseful: Does it add value? Resort to interactivity only when you haveexhausted the possibility of an appropriate and effective staticsolution. Do not underestimate how effective a well-conceived andexecuted static presentation of data can be. This is not about holding adraconian view about any greater merits offered by static or printwork, but instead recognising that the brilliance of interactivity iswhen it introduces new means of engaging with data that simplycould not be achieved in any other way.Unobtrusive: As with all decisions, an interactive project needs tostrive for the optimum ease of usability: minimise the frictionbetween the act of engaging with interactive features and theunderstanding they facilitate. Do not create unnecessary obstacles thatstifle sparks of curiosity and the scent of intrigue that stirs within theuser. The main watchword here is affordance, making interactivefeatures seamless and either intuitive or at least efficientlyunderstandable.Visual accessibility: To heighten the accessibility levels of yourwork you may offer different presentations of it. For people withvisual impairments you might offer options to magnify the view ofyour data and all accompanying text. For those with colourdeficiencies, as you will learn about shortly, you could offer optionsto apply alternative, colour-blind friendly palettes. A further exampleof this is seen with satellite navigation devices whereby the displayedcolour combinations change to better suit the surrounding lightness ordarkness at a given time of day.

Elegant Design

416

Feature creep: The discipline required to avoid feature creep isindisputable. The gratuitous interactive operation of today is theequivalent of the flashy, overbearing web design trends of the late1990s and early 2000s. People were so quick and so keen to showhow competent and expressive they could be through this (relatively)new technology that they forgot to judge if it added value.If your audience is quite broad you may be (appropriately) inclined tocover more combinations of features than are necessary in the hope ofresponding to as many of the anticipated enquiries as well as possibleand serving the different types of viewer. Judging the degree offlexibility is something of a balancing act within a single project: youdo not want to overwhelm the user with more adjustments than theyneed, nor do you want to narrow the scope of their likelyinterrogations. For a one-off project you have to form your own bestjudgement; for repeatedly used projects you might have scope toaccommodate feedback and iteration.Minimise the clicks: With visualisation you are aiming to make theinvisible (insights) visible. Conversely, to achieve elegance in designyou should be seeking to make visible design features as seamlesslyinconspicuous as possible. As Edward Tufte stated, ‘the best design isinvisible; the viewer should not see your design. They should only seeyour content’.Fun: A final alternative influence is to allow yourself room for atleast a little bit of fun. So long as the choices do not gratuitouslyinterrupt the primary objective of facilitating understanding, oneshould not downplay the heightened pleasure that can be generated byinteractive features that might incorporate an essence of playability.

Summary: InteractivityData adjustments affect what data is displayed and may include thefollowing features:

Framing: isolate, include or exclude data.Navigating: expand or explore greater levels of detail in the displayeddata.Animating: portray temporal data via animated sequences.Sequencing: navigate through discrete sequences of different anglesof analysis.

417

Contributing: customising experiences through user-inputted data.

Presentation adjustments affect how the data is displayed and mayinclude the following features:

Focusing: control what data is visually emphasised.Annotating: interact with marks to bring up more detail.Orientating: make better sense of your location within a display.

Influencing Factors and Considerations

Formulating the brief: skills and resources, timescales, setting, andformat will all influence the scope of interactivity. What experienceare you facilitating and how might interactive options help achievethis?Working with data: what range of data do you wish to include? Largedatasets with diverse values may need interactive features to helpusers filter views and interrogate the contents.Establishing your editorial thinking: choices made about your chosenangle, as well as definitions for framing and focus will all influenceinteractive choices, especially if users must navigate to view multipleangles of analysis or representations portrayed through animatedsequences.Data representation: certain chart choices may require interactivity toenable readability.Trustworthy design: functional performance and reliability willsubstantiate the perception of trust from your users.Accessible design: any interactive feature should prove to be usefuland unobtrusive. Interactivity can also assist with challenges aroundvisual accessibility.Elegant design: beware of feature creep, minimise the clicks, butembrace the pleasure of playability.

Tips and Tactics

Initial sketching of concepts will be worth doing first before investingtoo much time jumping into prototype mode.Project management is critical when considering the impact ofdevelopment of an interactive solution.Backups, contingencies, version control.

418

Do not be precious about – nor overly impressed with – ‘cool’-sounding interaction features that will disproportionately divertprecious resources (time, effort, people).Beware of feature creep: keep focusing on what is important andrelevant. A technical achievement is great for you, but is it great forthe project?Version control and file management will be important here.

419

8 Annotation

Annotation is the third layer of the visualisation design anatomy and isconcerned with the simple need to explain things: what is the right amountand type of help your viewers will need when consuming thevisualisation?

Annotation is unquestionably the most often neglected layer of thevisualisation anatomy. Maybe this is because it involves the least amountof pure design thinking relative to the other matters requiring attention,like interactivity and colour. More likely, it is because effective annotationrequires visualisers truly to understand their intended audience. This canbe a hard frame of mind to adopt, especially when your potential viewersare likely to have a diverse knowledge, range of interests and capability.

In contrast to the greater theoretical and technical concerns around datarepresentation, colour and interactivity, I find thinking about annotationrelatively refreshing. It is not only uncomplicated and based on a hugedose of common sense, but also hugely influential, especially in directlyfacilitating understanding.

Annotation choices often conform to the Goldilocks principle: too muchand the display becomes cluttered, overwhelming, and potentiallyunnecessarily patronising; too little and the viewers may beinappropriately faced with the prospect of having to find their own wayaround a visualisation and form their own understanding about what it isshowing.

Later in this chapter we will look at the factors that will influence yourdecision making but to begin with here is a profile of some of the keyfeatures of annotated design that exist across two main groups:

Project annotations: helping viewers understand what the project isabout and how to use it.Chart annotations: helping viewers perceive the charts and optimisetheir potential interpretations.

420

8.1 Features of Annotation: ProjectAnnotationThis collection of annotation options is related to decisions about howmuch and what type of help you might need to offer your audiences intheir understanding of the background, function and purpose of yourproject.

Headings: The titles and subtitles occupy such prime real estatewithin your project’s layout, yet more often than not visualisers fail toexploit these to best effect. There are no universal practices for what aheading should do or say; this will vary considerably between subjectareas and industries, but should prove fundamentally useful.

Figure 8.1 A Sample of Project Titles

The primary aim of a title (and often subtitle combination) is toinform viewers about the immediate topic or display, giving them afair idea about what they are about to see. You might choose toarticulate the essence of the curiosity that has driven the project byframing it around a question or maybe a key finding you unearthedfollowing the work.Subheadings, section headings and chart titles will tend to be morefunctional in their role, making clear to the viewer the contents orfocus of attention associated with each component of the display.Your judgement surrounds the level of detail and the type of language

421

you use in each case to fit cohesively with the overall tone of thework.Introductions: Essentially working in conjunction with titles,introductions typically exist as short paragraphs that explain, moreexplicitly than a title can, what the project is about. The content ofthis introduction might usefully explain in clear language terms someof the components you considered during the editorial thinkingactivity, such as:

details of the reason for the project (source curiosity);an explanation of the relevance of this analysis;a description of the analysis (angle, framing) that is presented;expression of the main message or finding that the work is aboutto reveal (possibly focus).

Some introductions will extend beyond a basic description of the project toinclude thorough details of where the data comes from and how it has beenprepared and treated in advance of its analysis (including any assumptions,modifications or potential shortcomings). There may also be further linksto ‘read more’ detail or related articles about the subject.

Figure 8.2 Excerpt from ‘The Color of Debt’

Introductions may be presented as fixed text located near the top (orstart) of a project (usually underneath a title) as in Figure 8.2 or,through interactivity, may be hidden from view and brought up in aseparate window or pop-up to provide the details upon request.User guides: As you have seen, some projects can incorporate manydifferent features of interactivity. While they may not necessarily beoverly technical – and therefore not that hard to learn how to use

422

them – the full repertoire of features may be worth walking through,as in Figure 8.3. This is important to consider so that, as a visualiser,you can be sure your users are acquainted with the entire array ofoptions they have to explore, interrogate and control their experience.You should want people to fully utilise all the different features youhave carefully curated and created, so it is in everyone’s interest tothink about including these types of user guides.

Figure 8.3 Excerpt from ‘Kindred Britain’

423

424

Multimedia: There is increasing potential and usage of broadermedia assets in visualisation design work beyond charts, such asvideo and imagery. In visualisation this is perhaps a relativelycontemporary trend (infographics have incorporated such media butvisualisations generally have done so far less) and, in some ways,reflects the ongoing blurring of boundaries between this and otherrelated fields. Incorporating good-quality and sympathetically styledassets like illustrations or photo-imagery can be a valuablecomplementary device alongside your data representation elements.In the ‘Color of Debt’ (Figure 8.4) project, different neighbourhoodsof Chicago that have been hardest hit by debt are profiled usingaccompanying imagery to show more graphic context of thecommunities affected, including a detailed reference map of the areaand an animated panel displaying a sequence of street view images.Imagery, in particular, will be an interesting option to consider whenit adds value to help exhibit the subject matter in tangible form,offering an appealing visual hook to draw people in or simply to aidimmediate recognition of the topic. In Bloomberg’s billionairesproject (Figure 8.5), each billionaire is represented by a pen-and-inkcaricature. This is elegant in choice and also dodges the likely flawsof having to compose the work around individual headshotphotographs that would have been hard to frame and colourconsistently.

Figure 8.4 Excerpt from ‘The Color of Debt’

It was worth Bloomberg investing in the time/cost involved incommissioning these illustrations, given that the project was not aone-off but something that would be an ongoing, updated daily

425

resource.Problems with the integration of such media within a visualisationproject will occur when unsuitable attempts are made to combineimagery within the framework of a chart. Often the lack of cohesioncreates a significant hindrance whereby the data representations areobscured or generally made harder to read, as the inherent form andcolour clashes undermine the functional harmony.Researching, curating, capturing or creating assets of imageryrequires skill and a professional approach, otherwise the resultingeffect will look amateurish. Incorporating these media into a datavisualisation is not about quickly conducting some Google Imagefishing exercise. Determining what imagery you will be able to useinvolves careful considerations around image suitability, quality and,critically, usage rights. Beware the client or colleague who thinksotherwise.

‘Although all our projects are very much data driven, visualisation isonly part of the products and solutions we create. This day and ageprovides us with amazing opportunities to combine video, animation,visualisation, sound and interactivity. Why not make full use of this? …Judging whether to include something or not is all about editing: asking“is it really necessary?”. There is always an aspect of “gut feel” or“instinct” mixed with continuous doubt that drives me in these cases.’Thomas Clever, Co-founder CLEVER°FRANKE, a data drivenexperiences studio

A frequent simple example of incorporated imagery is when you have toinclude logos according to the needs of the organisation for whom yourwork is being created. Remember to consider this early so you at leastknow in advance that you will have to assign some space to accommodatethis component elegantly.

Footnotes: Often the final visible feature of your display, footnotesprovide a convenient place to share important details that furthersubstantiate the explanation of your work. Sometimes thisinformation might be stored within the introduction component(especially if that is interactively hidden/revealed to allow it moreroom to accommodate detail):

Data sources should be provided, ideally in close proximity tothe relevant charts.

426

Credits will list the authors and main contributors of the work,often including the provision of contact details.

Figure 8.5 Excerpt from ‘Bloomberg Billionaires’

Attribution is also important if you wish to recognise theinfluence of other people’s work in shaping your ideas or toacknowledge the benefits of using an open source application orfree typeface, for example.Usage information might explain the circumstances in which thework can be viewed or reused, whether there are anyconfidentialities or copyrights involved.Time/date stamps are often forgotten but they will give anindication to viewers of the moment of production and from thatthey might be able to ascertain the work’s current accuracy andcontextualise their interpretations accordingly.

Figure 8.6 Excerpt from ‘Gender Pay Gap US’

8.2 Features of Annotation: Chart AnnotationThis second group of annotated features concerns the ways you provideviewers with specific assistance for perceiving and interpreting the charts.Think of these as being the features that refer directly to your charts orexist directly within or in immediate proximity to each chart.

Reading guides: These are written or visual instructions that provide

427

viewers with a guide for how to read the chart or graphic and offergreater detailed assistance than a legend (see later). The idea oflearnability in visualisation is an important consideration. It is a two-way commitment requiring will and effort from the viewer andsufficient assistance from the visualiser. This is something to bediscussed in Chapter 11 under ‘Visualisation literacy’.Recognising that their readership may not necessarily understandconnected scatter plots, Bloomberg’s visual data team offer a ‘How toRead this Graphic’ guide immediately as you land on the projectshown in Figure 8.7. This can be closed but a permanent ‘How to’button remains for those who may need to refer to it again. Theconnected scatter plot was the right choice for this angle of analysis,so rather than use a different ‘safer’ representation approach (andtherefore alter what analysis was shown) it is to their credit that theyrespected the capacity of their viewers to be willing to learn how toread this unfamiliar graphical form.

Figure 8.7 Excerpt from ‘Holdouts Find Cheapest Super BowlTickets Late in the Game’

428

Figure 8.8 Excerpt from ‘The Life Cycle of Ideas’

429

The second example shown (Figure 8.8) is from the ‘How to Read’guide taken from the ‘Life Cycle of Ideas’ graphic created byAccurate, a studio renowned for innovative and expressivevisualisation work. Given the relative complexity of the encodingsused in this piece, it is necessary to equip the viewer with as muchguidance as possible to ensure its potential is fully realised.Chart apparatus: Options for chart apparatus relate to the structuralcomponents found in different chart types. Every visualisationdisplayed in this book has different elements of chart apparatus(Figure 8.9), specifically visible axis lines, tick marks or gridlines tohelp viewers orient their judgements of size and position. There is noright or wrong for including or excluding these features, it tends to beinformed by your tonal definitions based on how much precision inthe perceiving of values you wish to facilitate. I will discuss the rangeof different structures underlying each chart type (such as Cartesian,Radial or Spatial) in Chapter 10 on composition, as these have more

430

to do with issues of shape and dimension.

‘Labelling is the black magic of data visualization.’ Gregor Aisch,Graphics Editor, The New York Times

Labels: There are three main labelling devices you will need to thinkabout using within your chart: axis titles, axis labels and value labels:

Axis titles describe what values are being referenced by eachaxis. This might be a single word or a short sentence dependingon what best fits the needs of your viewers. Often the role of anaxis is already explained (or implied) by project annotationselsewhere, such as titles or sub-headings, but do not alwaysassume this will be instantly clear to your viewers.Axis labels provide value references along each axis to helpidentify the categorical value or the date/quantitative valueassociated with that scale position. For categorical axes (as seenin bar charts and heat maps, for example) one of the mainjudgments relates to the orientation of the label: you will need tofind sufficient room to fit the label but also preserve itsreadability. For non-categorical data the main judgement will bewhat scale intervals to use. This has to be a combination of whatis most useful for referencing values by your viewer, what is themost relevant interval based on the nature of the data (e.g.maybe a year-level label is more relevant than marking eachmonth), and also what feels like it achieves the best-lookingvisual rhythm along the chart’s edge. This will be another matterthat is discussed more in the composition chapter.

Figure 8.9 Mizzou’s Racial Gap Is Typical On CollegeCampuses

431

Value labels will appear in proximity to specific mark encodingsinside the chart. Typically, these labels will be used to reveal aquantity, such as showing the percentage sizes of the sectors in apie chart or the height of bars. Judging whether to include suchannotations will refer back to your definition of the appropriatetone: will viewers need to read off exact values or will theirperceived estimates of size and/or relationship be sufficient? Theneed to include categorical labels will be a concern for maps(whether to label locations?) or charts like the scatter plot seen inFigure 8.9, where you may wish to draw focus to a select sampleof the categories plotted across the display.

As you have seen, one way of providing detailed value labels is

432

through interaction, maybe offering a pop-up/tooltip annotationthat is triggered by a hover or click event on different markencodings. Having the option for interactivity here is especiallyuseful as it enables you to reduce clutter from your display thatcan develop as more annotated detail is added.

Redundancy in labelling occurs when you include value labelling ofquantities for all marks whilst also including axis-scale labelling. Youare effectively unnecessarily doubling the assistance being offered andso, ideally, you should choose to include one or the other.

Legend: A legend is an annotated feature within or alongside yourchart that presents one or several keys to help viewers understand thecategorical or quantitative meaning of different attributes.

Figure 8.10 Excerpt from ‘The Infographic History of the World’

For quantitative data the main role for a legend will be if the attributeof area size has been used to encode values, as found on the bubbleplot chart type. The keys displayed there will provide a reference forthe different size scales. Which selection of sizes to show needscareful thought: what is the most useful guide to help your viewersmake their perceptual judgements from a chart? This might not entailshowing only even interval sizes (50, 100, 150 etc.); rather, you mightoffer viewers a indicative spread of sizes to best represent thedistribution of your data values. The example in Figure 8.10 showslogical interval sizes to reflect the range of values in the data and alsohelpfully includes reference to the maximum value size to explainthat no shape will be any larger than this. For categorical data youalso see a key showing the meaning of different colours and shapes

433

and their associated values.

Figure 8.11 Twitter NYC: A Multilingual Social City

A nice approach to getting more out of your legends is shown inFigure 8.11. Here you will see a key explaining the colourassociations combined with a bar chart to display the distribution ofquantities for each language grouping from this analysis of tweetsposted around New York City.Captions: These exist typically as small passages of written analysisthat bring to the surface some of the main insights and conclusionsfrom the work. These might be presented close to related valuesinside the chart or in separate panels to provide commentary outsidethe chart.In ‘Gun Deaths’ (Figure 8.12), there is a nice solution that combinesannotated captions with interactive data adjustments. Below the mainchart there is a ‘What This Data Reveals’ section which some of themain findings from the analysis of the gun death data. The captionsdouble up as clickable shortcuts so that you can quickly apply therelevant framing filters and update the main display to see what thecaptions are referring to.

Figure 8.12 Excerpt from ‘US Gun Deaths’

434

As creative tools become more ubiquitous the possibility forincorporating non-visual data in you work increases. As an alternativeto the written caption there is greater scope to consider using audio asa means of verbally narrating a subject and explaining key messages.Over the past few years one of the standout projects using this featurewas the video profiling ‘Wealth Inequality in America’ (Figure 8.13),as introduced in Chapter 3, where the voiceover provides a verycompelling and cohesive narrative against the backdrop of theanimated visuals that present the data being described.

Figure 8.13 Image taken from ‘Wealth Inequality in America’

435

8.3 TypographyAs you have seen, many features of annotation utilise text. This meansyour choices will be concerned not just with what text to include, but alsowith how it will look. This naturally merits a brief discussion about howtypography will have a significant role in the presentation of your work.

Firstly, some clarity about language. A typeface is a designed collection ofglyphs representing individual letters, numbers and other symbols oflanguage based on a cohesive style. A font is the variation across severalphysical dimensions of the typeface, such as weight, size, condensationand italicisation. A typeface can have one or many different fonts in itsfamily. Type effectively represents the collective appearance formed by thechoice of typeface and the font.

Tahoma and Century Gothic are different typefaces. This font and this fontboth belong to the Georgia typeface family but display variations in size,weight and italicisation.

I discussed earlier the distinction between definitions of datavisualisation and other related fields. I mentioned how the personcreating their design is not necessarily conscious or concerned aboutwhat label is attached to their work, they are simply doing their workregardless. The same could be applied to people’s interchangeable useof and meaning of the terms typeface and font, the clarity of which hasbeen irreparably confused by Microsoft’s desktop tools in particular.

Serif typefaces add an extra little flourish in the form of a small line at theend of the stroke in a letter or symbol. Garamond is an example of a seriffont. Serif typefaces are generally considered to be easier to read for longsequences of text (such as the full body text) and are especially used inprint displays.

Sans-serif typefaces have no extra line extending the stroke for eachcharacter. Verdana is an example of a sans-serif typeface. These typefacesare commonly used for shorter sections of text, such as axis or value labelsor titles, and for screen displays.

In making choices about which type to use, there are echoes with thethinking you are about to face on using colour. As you will see, colour

436

decisions concern legibility and meaning first, decoration last. Withtypeface choices you are not dressing up your text, you are optimising itsreadability and meaning across your display. The desired style of typefaceonly comes into your thinking after legibility and meaning.

In terms of legibility, you need to choose a typeface and font combinationthat will be suitable for the role of each element of text you are using.Viewers need to be able to read the words and numbers on display withoutdifficulty. Quite obvious, really. Some typefaces (and specifically fonts)are more easily read than others. Some work better to make numbers asclearly readable as possible, others work better for words. There are plentyof typefaces that might look cool and contemporary but if they make textindecipherable then that is plain wrong.

Typeface decisions will often be taken out of your hands by the visualidentity guidelines of organisations and publications, as well as bytechnical reasons relating to browser type, software compatibility andavailability.

Just as variation in colour implies meaning, so does variation in typefaceand font. If you make some text capitalised, large and bold-weight this willsuggest it carries greater significance and portrays a higher prominenceacross the object hierarchy than any text presented in lower case, with asmaller size and thinner weight. So you should seek to limit the variationin font where possible.

Text-based annotations should be considered part of the supporting castand the way you consider typeface and font choices should reflect thisrole. Typography in visualisation should be seen but really not heard.Deciding on the most suitable type is something that can ultimately comedown to experience and influence through exposure to other work. Everyindividual has their own relied-upon preferences. In practice, I find there isa good chunk of trial and error as well as viewer testing that goes intoresolving the final selection. Across the spectrum of data visualisationwork being produced there are no significant trends to be informed bylargely because judging the most suitable typography choices will beunique to the circumstances influencing each project.

Typography is just another of the many individual ingredients relevant todata visualisation that exists as a significant subject in its own right. It is

437

somewhat inadequate to allocate barely two pages of this book todiscussing its role in visualisation, but these will at least offer you a bite-sized window into the topic.

‘Never choose Times New Roman or Arial, as those fonts are favoredonly by the apathetic and sloppy. Not by typographers. Not by you.’Matthew Butterick, Typographer, Lawyer and Writer

8.4 Influencing Factors and ConsiderationsHaving become familiar with the principal options for annotating, you nowhave to decide which features to incorporate into your work and how youmight deploy these.

Formulating Your Brief

‘Think of the reader – a specific reader, like a friend who’s curious but anovice to the subject and to data-viz – when designing the graphic. Thathelps. And I rely pretty heavily on that introductory text that runs witheach graphic – about 100 words, usually, that should give the new-to-the-subject reader enough background to understand why this graphic isworth engaging with, and sets them up to understand and contextualizethe takeaway. And annotate the graphic itself. If there’s a particularpoint you want the reader to understand, make it! Explicitly! I often runa few captions typeset right on the viz, with lines that connect them tokey elements in the design.’ Katie Peek, Data Visualisation Designerand Science Journalist, on making complex and/or complicatedsubject matter accessible and interesting to her audience

Audience: Given that most annotations serve the purpose of viewerassistance, your approach will inevitably be influenced by thecharacteristics of your intended audience. Having an appreciation ofand empathy towards the knowledge and capabilities of the differentcohorts of viewers is especially important with this layer of design.How much help will they need to understand the project and also thedata being portrayed? You will need to consider the following:

Subject: how well acquainted will they be with this subjectmatter? Will they understand the terminology, acronyms,

438

abbreviations? Will they recognise the relevance of thisparticular angle of analysis about this subject?Interactive functions: how sophisticated are they likely to be interms of being able to understand and utilise the differentfeatures of interactivity made possible through your design?Perceiving: how well equipped are they to work with thisvisualisation? Is it likely that the chart type(s) will be familiar orunfamiliar; if the latter, will they need support to guide themthrough the process of perceiving?Interpreting: will they have the knowledge required to formlegitimate interpretations of this work? Will they know how tounderstand what is good or bad, what big and small mean, whatis important, or not? Alternatively, will you need to providesome level of assistance to address this potential gap?

Purpose map: The defined intentions for the tone and experience ofyour work will influence the type and extent of annotation featuresrequired.If you are working towards a solution that leans more towards the‘reading’ tone you are placing an emphasis on the perceptibility of thedata values. It therefore makes sense that you should aim to provideas much assistance as possible (especially through extensive chartannotations) to maximise the efficiency and precision of this process.If it is more about a ‘feeling’ tone then you may be able to justify theabsence of the same annotations. Your intent may be to provide moreof a general sense – a ‘gist’ – of the order of magnitude of values.If you are seeking to provide an ‘explanatory’ experience it would belogical to employ as many devices as possible that will help informyour viewers about how to read the charts (assisting with the‘perceiving’ stage of understanding) and also bring some of the keyinsights to the surface, making clear the meaning of the quantities andrelationships displayed (thus assisting with the stage of‘interpreting’). The use of captions and visual overlays will beparticularly helpful in achieving this, as will the potential for audioaccompaniments if you are seeking to push the explanatoryexperience a step further.‘Exploratory’ experiences are less likely to include layers of insightassistance, instead the focus will be more towards project-levelannotation, ensuring that viewers (and particularly here, users) haveas much understanding as possible about how to use the project fortheir exploratory benefit. You might find, however, that devices like

439

‘How to read this graphic’ are still relevant irrespective of thedefinition of your intended experience.Characteristically, ‘exhibitory’ work demonstrates far less annotatedassistance because, by intention, it is more about providing a visualdisplay of the data rather than offering an explanatory presentation orthe means for exploratory interrogation. The assumptions here arethat audiences will have sufficient domain and project knowledge notto require extensive additional assistance. Common chart annotationslike value labels and legends, and project annotations like titles andintroductions, are still likely to be necessary, but these might reflectthe extent required.

Establishing Your Editorial ThinkingFocus: During your editorial thinking you considered focus and itsparticular role in supporting explanatory thinking. Are there specificvalue labels that you wish to display over others? Rather thanlabelling all values, for example, have you determined that onlycertain marks and attributes will merit labelling? As you saw earlierin the example scatter plot about the under-representation of blackstudents in US colleges, only certain points were labelled, not all.These would have been judged to have been the most relevant andinteresting elements to emphasise through annotation.

Trustworthy DesignTransparency: Annotation is one of the most important aids toensure that you secure and sustain trust from your viewers bydemonstrating integrity and openness:

Explain what the project is and is not going to show.Detail where the data came from and what framing criteria wereused during the process of acquisition, and also make what hasbeen ultimately included in the chart(s).Outline any data transformation treatments, assumptions andcalculations. Are there any limitations that viewers need to beaware of?Highlight and contextualise any findings to ensure accuracy ininterpretation.With digital projects in particular, provide access to coding

440

repositories to lay open all routines and programmatic solutions.

Accessible DesignUnderstandable: If you recall, in the section profiling circumstancesyou considered what the characteristics were of the setting orsituation in which your audience might consume your visualisation.Well-judged project and chart annotations are entirely concerned withproviding a sufficient level of assistance to achieve understanding.The key word there is ‘sufficient’ because there is a balance: toomuch assistance makes the annotations included feel overburdening;too little and there is far more room for wrong assumptions andmisconceptions to prosper. A setting that is consistent with the needto deliver immediate insights will need suitable annotations to fulfilthis. There will be no time or patience for long introductions orexplanations in that setting. Conversely, a visualisation about asubject matter that is inherently complex may warrant suchassistance.

Elegant DesignMinimise the clutter: A key concern about annotations is judging themerits of including structural or textual assistance against thepotential disruption and obstruction caused by these to the view of thedata. Any annotation device added to your display has a spatial andvisual consequence that needs to be accommodated. Of course, asmentioned, with the benefit of interactivity it is possible to show andhide layers of detail. Overall, you will have to find the most elegantsolution for presenting your annotations to ensure you do notinadvertently undermine the help you are trying to provide.

Summary: AnnotationProject annotations help viewers understand what the project is about andhow to use it, and may include the following features:

Headings: titles, sub-titles and section headings.Introductions: providing background and aims of the project.

441

User guides: advice or instruction for how to use any interactivefeatures.Multimedia: the potential to enhance your project using appropriateimagery, videos or illustrations.Footnotes: potentially includes data sources, credits, usageinformation, and time/date stamps.

Chart annotations help viewers perceive the charts and optimise theirpotential interpretations and may include the following features:

Chart apparatus: axis lines, gridlines, tick marks.Labels: axis titles, axis labels, value labels.Legend: providing detailed keys for colour or size associations.Reading guides: detailed instructions advising readers how toperceive and interpret the chart.Captions: drawing out key findings and commentaries.

Typography Most of the annotation features you include are based on textand so you will need to consider carefully the legibility of the typeface youchoose and the logic behind the font-size hierarchy you display.

Influencing Factors and Considerations

Formulating the brief: consider the characteristics and needs of theaudience. Certain chart choices and subjects may require moreexplanation. From the ‘purpose map’ what type of tone andexperience are you trying to create and what role might annotationplay?Establishing your editorial thinking: what things do you want toemphasise or direct the eye towards (focus)?Trustworthy design: maximise the information viewers have to ensureall your data work is transparent and clearly explained.Accessible design: what is the right amount and type of annotationsuitable to the setting and complexity of your subject?Elegant design: minimise the clutter.

Tips and Tactics

Attention to detail is imperative: all instructions, project information,captions and value labels need to be accurate. Always spell-check

442

digitally and manually, and ask others to proofread if you are too‘close’ to see.Do not forget to check on permission to use any annotated asset, suchas imagery, photos, videos, quotations, etc.

443

9 Colour

Having established which charts you will use, the potential interactivefunctions that might be required and the annotation features that will beespecially useful, you have effectively determined all the visible elementsthat will be included in your project. The final two layers of designconcern not what elements will be included or excluded, but how they willappear. After this chapter you will look at issues on composition, butbefore that the rather weighty matter of colour.

As one of the most powerful sensory cues, colour is a highly influentialvisual property. It is arguably the design decision that has the mostimmediate impact on the eye of the viewer. All the design features of yourvisualisation display hold some attribute of colour, otherwise they areinvisible:

Every mark and item of apparatus in your charts will be coloured;indeed colour in itself may be an attribute that represents your datavalues.Interactive features do not always have an associated visible property(some are indeed invisible and left as intuitively discoverable).However, those features that involve buttons, menus, navigation tabsand value sliders will always have a colour.Annotation properties such as titles, captions and value labels will allbe coloured.Composition design mainly concerns the arrangement of all the abovefeatures, though you might use colour to help achieve a certain designlayout. As you will see, emptiness is a useful organising device –leaving something blank is a colour choice.

Thankfully, there is a route through all of this potential complexity relyingon just a little bit of science mixed in with lots of common sense. Byreplacing any arbitrary judgements that might have been previously basedon taste, and through increasing the sensitivity of your choices, colourbecomes one of the layers of visualisation design that can be most quicklyand significantly improved.

444

‘Colors are perhaps the visual property that people most often misuse invisualization without being aware of it.’ Robert Kosara, SeniorResearch Scientist at Tableau Software

The key factor in thinking about colour is to ensure you establish meaningfirst and decoration last. That is not to rule out the value of certaindecorative benefits of colour, but to advise that these should be your lastconcern. Besides, in dealing with meaningful applications of colour youwill already have gone a long way towards establishing the ‘decorative’qualities of your project’s aesthetic appearance.

This chapter begins with a look at some of the key components of colourscience, offering a foundation for your understanding about this topic.After that you will learn about the ways and places in which colour couldbe used. Finally, you will consider the main factors that influence colourdecisions.

COLOUR thinking begins from inside the chart(s), working outwardsacross the rest of the visualisation anatomy:

Data legibility.Editorial salience.Functional harmony.

9.1 Overview of Colour TheoryCOLOUR in visualisation is something of a minefield. As with many ofthese design layer chapters, an introduction to colour involves judging theright amount of science and the right amount of practical application. Whatdoes justice to the essence of the subject and gives you the most relevantcontent to work with is a delicate balance.

When you lift the lid on the science behind colour you open up a world ofbrain-ache. When this chapter is finalised I will have spent a great deal oftime agonising over how to explain this subject and what to leave in orleave out because there is so much going on with colour. And it is tricky.Why? Because you almost come face to face with philosophical questionslike ‘what is white?’ and the sort of mathematical formulae that you reallyrather hoped had been left behind at school. You learn how the colours you

445

specify in your designs as X might be perceived by some people as Y andothers as Z. You discover that you are not just selecting colours from aneat linear palette but rather from a multi-dimensioned colour spaceoccupying a cubic, cylindrical or spherical conceptual shape, depending ondifferent definitions.

The basis of this topic is the science of optics – the branch of physicsconcerned with the behaviour and properties of light – as well ascolorimetry – the science and technology used to quantify and describehuman colour perception. Two sciences, lots of maths, loads of variables,endless potential for optical illusions and impairment: that is why colour istricky and why you need to begin this stage of thinking with anappreciation of some colour theory.

The most relevant starting point is to recognise that when dealing withissues of colour in data visualisation you will almost always be creatingwork on some kind of computer. Unless you are creating something byhand using paints or colouring pencils, you will be using software viewedthrough an electronic display.

This is important because a discussion about colour theory needs to beframed around the RGB (Red, Blue, Green) colour model. This is used todefine the combination of light that forms the colours you see on a screen,conceptually laid out in a cubic space based on variations across thesethree attributes.

The output format of your work will vary between screen display and printdisplay. If you are creating something for print you will have to shift yourcolour output settings to CMYK (Cyan, Magenta, Yellow and Black). Thisis the model used to define the proportions of inks that make up a printedcolour. This is known as a subtractive model, which means that combiningall four inks produces black, whereas RGB is additive as the three screencolours combine to produce white.

When you are creating work to be consumed on the Web through screendisplays, you will often program using HEX (Hexadecimal) codes tospecify the mix of red, green and blue light (in the form #RRGGBBusing codes 00 to FF).

While CMYK communicates from your software to a printer, telling it

446

what colours to print as an output, it does not really offer a logical modelto think about the input decisions you will make about colour. Neither, forthat matter, does RGB: it just is not realistic to think in those terms whenconsidering what choices are needed in a visualisation design. There aredifferent levers to adjust and different effects being sought that require analternative model of thinking.

Figure 9.1 HSL Colour Cylinder

Figure 9.2 Colour Hue Spectrum

I share the belief with many in the field that the most accessible colourmodel – in terms of considering the application of colour in datavisualisation – is HSL (Hue, Saturation, Lightness), devised by AlbertMunsell in the 1980s. These three dimensions combine to make up what isknown as a cylindrical-coordinate colour representation of the RGB colourmodel (I did warn you about the cylinders).

Hue is considered the true colour. With hue there are no shades(adding black), tints (adding whites) or tones (adding grey) – a

447

consideration of these attributes follows next. When you aredescribing or labelling colours you are most commonly referring totheir hue: think of the colours of the rainbow ranging through variousmixtures of red, orange, yellow, green, blue, indigo and violet. Hue isconsidered a qualitative colour attribute because it is defined bydifference and not by scale.Saturation defines the purity or colourfulness of a hue. This doeshave a scale from intense pure colour (high saturation) throughincreasing tones (adding grey) to the no-colour state of grey (lowsaturation). In language terms think vivid through to muted.

Figure 9.3 Colour Saturation Spectrum

Lightness defines the contrast of a single hue from dark to light. It isnot a measure of brightness – there are other models that define that –rather a scale of light tints (adding white) through to dark shades(adding black). In language terms I actually think of lightness more asdegrees of darkness, but that is just a personal mindset.

Figure 9.4 Colour Lightness Spectrum

Technically speaking, black, white and grey are not considered colours.

I have deliberately described these dimensions separately because, as youwill see when looking at the applications of colour in visualisation, yourdecisions will often be defined by how you might employ these distinctdimensions of colour to form your visual display. The main choices tend tofall between employing difference in hue and variation in lightness, withthe different levels of saturation often being a by-product of the definitionsmade for the other two dimensions.

Alternative models exist offering variations on a similar theme, such asHSV (Hue, Saturation, Value), HSI (Hue, Saturation, Intensity), HSB(Hue, Saturation, Brightness) and HCL (Hue, Chroma, Luminance).

These are all primarily representations of the RGB model space butinvolve differences in the mathematical translation into/from RGB and

448

offer subtle differences in the meaning of the same terms (local definitionsof hue and saturation vary). The biggest difference relates to theiremphasis as a means of specifying either a colour quality (in an input,created sense) or a colour perception (in how a colour is ultimatelyexperienced).

Pantone is another colour space that you might recognise. It offers aproprietary colour-matching, identifying and communicating service forprint, essentially giving ‘names’ to colours based on the CMYKprocess.

The argument against using the HSL model for defining colour is that,while it is fine for colour setting (i.e. an intuitive way to think about andspecify the colours you want to set in your visualisation work), theresulting colours will not be uniformly perceived the same, from onedevice to the next. This is because there are many variables at play in theprojection of light to display colour and the light conditions present in themoment of perception. That means the same perceptual experience will notbe guaranteed. It is argued that more rigorous models (such as CIELAB)offer an absolute (as opposed to a relative) definition of colour for bothinput and output. My view is that they are just a little bit too hard to easilytranslate into visualisation design thinking. Furthermore, trying to controlfor all the subtleties of variation in consumption conditions is an extraburden you should ideally avoid.

At this stage, it is important to be pragmatic about colour as much aspossible. The vast majority of your colour manipulating and perceptualneeds should be nicely covered by the HSL model. As and when youdevelop a deeper, purist interest in colour you should then seek to learnmore about the nuances in the differences between the definitions of thesemodels and their application.

9.2 Features of Colour: Data LegibilityData legibility concerns the use of the attribute of colour to encode datavalues in charts. The objective here is to make the data being representedby differences in colour as clearly readable and as meaningful as possible.

While you have probably already decided by now the chart or charts you

449

intend to use, you still need to take think carefully – and separately – abouthow you will specifically employ colour. To do this we first need to revisitthe classification of data types and consider how best to use colour forrepresenting each different type.

Nominal (Qualitative)With nominal data colour is used to classify different categorical values.The primary motive for the choice of colour is to create a visibledistinction between each unique categorical association, helping the eye todiscern the different categories as efficiently and accurately as possible.

Creating contrast is the main aim of representing nominal data. What youare not seeking to show or even imply is any sense of an order ofmagnitude. You want to help differentiate one category from the next –and make it easily identifiable – but to do so in a way that preserves thesense of equity among the colours deployed.

Figure 9.5 Excerpt from ‘Executive Pay by the Numbers’

Variation in hue is typically the colour dimension to consider using for

450

differentiating categories. Additionally, you might explore different tones(variations in saturation across the hues). You should not, though, considerusing variations in the lightness dimension. That is because the result isinsufficiently discernible. As you can see demonstrated in Figure 9.5, thelightness variation of a blue tone makes it quite hard to connect the colourscale presented in the key at the top with the colours displayed in thestacked bars underneath. With the shading in the column header and the2011 grey bar also contributing similar tones to the overall aesthetic of thetable our visual processing system has to work much harder to determinethe associations than it should need to do.

Often the categories you will be differentiating with colour will berelatively few in number, maybe two or three, such as in the separationbetween political parties or plotting different values for gender, as seen inFigure 9.6.

Figure 9.6 How Nations Fare in PhDs by Sex

Figure 9.7 How Long Will We Live – And How Well?

451

Beyond these small numbers, you still typically might only need tocontend with assigning colours to around four to six categories, perhaps inanalysis that needs to visually distinguish values for different continents ofthe world, as seen in the scatter plot in Figure 9.7.

As the range of different categories grows, the ability to preserve cleardifferentiation becomes harder. In expanding your required palette, thecolours used become decreasingly unique. The general rule of thumb isthat once you have more than 12 categories it will not be possible to find asufficiently different colour to assign to categories from 13 upwards.Additionally, you are really increasing the demands of learning andrecognition for viewers. This then becomes quite a cognitive burden anddelays the process of understanding.

Figure 9.8 Charting the Beatles: Song Structure

452

Two approaches for dealing with this. Firstly, consider offering interactivefilters to modify what categories are displayed in a visualisation – thuspotentially reducing the impact of so many being available. Secondly,think about transforming your data by excluding or combining categoriesin to a reduced number of aggregate groupings.

Depending on the subject of your data, sometimes you can look tosupplement the use of colour with texture or pattern to create furthervisible distinctions. In Figure 9.8 you can see two patterns being usedoccasionally as additive properties to show the structure of tracks on TheBeatles’ album.

Ordinal (Qualitative)With ordinal data you are still dealing with categories but now they have anatural hierarchy or ordering that can be exploited. The primary motive forusing colour in this case is not only to create a visible distinction betweeneach unique category association but also to imply some sense of an orderof magnitude through the colour variation. The colour dimensions used toachieve this tend to employ variations of either the saturation or thelightness (or a combination of both). You might also introduce differenthues when dealing with diverging (dual-direction) scales rather thansimply converging (single-direction) ones.

453

Figure 9.9 displays a simple example of colour used to display aconverging ordinal variable. This is the teacup that I use in my office. Onthe inside you can see it has a colour guide to help ascertain how muchmilk you might need to add: going through Milky, Classic British,Builder’s Brew, and finally Just Tea (zero milk).

Figure 9.9 Photograph of MyCuppa Mug

A typical example of a diverging ordinal scale might be seen in the stackedbar chart showing the results of a survey question (Figure 9.10). Theanswers are based on the strength of feeling: strongly agree, agree, neutral,disagree, strongly disagree. By colouring the agreement in red (‘hot’sometimes used to represent ‘good’) and the disagreement in blue (‘cold’

454

to mean ‘bad’) means a viewer can quickly perceive the general balance offeelings being expressed.

Figure 9.10 Example of a Stacked Bar Chart Based on Ordinal Data

Another example of ordinal data might be to represent the notion ofrecency. In Figure 9.11 you see a display plotting the 2013 YosemiteNational Park fire. Colour is used to display the recorded day-by-dayprogress of the fire’s spread. The colour scale is based on a recency scalewith darker = recent, lighter = furthest away (think faded memory).

Figure 9.11 The Extent of Fire in the Sierra Nevada Range and YosemiteNational Park, 2013

455

Interval and Ratio (Quantitative)With quantitative data (ratio and interval) your motive, as it is with ordinaldata, is to demonstrate the difference between and of a set of values. In thechoropleth map in Figure 9.12, showing the variation in electricity pricesacross Switzerland, the darker shades of blue indicate the higher values,the lighter tints the lower prices. This approach makes the viewer’sperception of the map’s values immediate – it is quite intuitive torecognise the implication of the general patterns of light and dark shades.

Figure 9.12 What are the Current Electricity Prices in Switzerland[Translated]

456

Typically, using colour to represent quantitative data will involve breakingup your data values into discrete classifications or ‘bins’. This makes thetask of reading value ranges from their associated colour shade or tone alittle easier than when using a continuous gradient scale. While ourcapacity to judge exact variations in colour is relatively low (even with acolour key for reference), we are very capable of detecting local variationsof colour through differences in tint, shade or tone. Assessing the relativecontrast between two colours is generally how we construct a quantitativehierarchy.

Look at the fascinating local patterns that emerge in the next map (Figure9.13), comparing increases in the percentage of people gaining healthinsurance in the USA (during 2013–14). The data is broken down tocounty level detail with a colour scale showing a darker red for the higherpercentage increases.

Some of the most relevant colour practices for data visualisation comefrom the field of cartography (as do many of the most passionate colour

457

purists). Just consider the amount of quantitative and categorical detailshown in a reference map that relies on colour to differentiate types ofland, indicate the depth of water or the altitude of high ground, presentroute features of road and rail networks, etc. The best maps pack anincredible amount of detail into a single display and yet somehow theynever feel disproportionately overwhelming.

Figure 9.13 Excerpt from ‘Obama’s Health Law: Who Was Helped Most’

Aside from the big-picture observations of the darker shades in the westand the noticeably lighter tints to the east and parts of the mid-west, take acloser look at some of the interesting differences at a more local level. Forexample, notice the stark contrast across state lines between the darkregions of southern Kentucky (to the left of the annotated caption) and thelight regions in the neighbouring counties of northern Tennessee. Despitetheir spatial proximity there are clearly strong differences in enrolment onthe programme amongst residents of these regions.

Both of these previous examples use a convergent colour scale, movingthrough discrete variations in colour lightness to represent an increasing

458

scale of quantitative values, from zero or small through to large. Asillustrated with the stacked bar chart example shown earlier, portraying therange of feelings from an ordinal dataset, sometimes you may need toemploy a divergent colour scale. This is when you want to show howvalues are changing in two directions either side of a set breakpoint.

Figure 9.14 Daily Indego Bike Share Station Usage

Figure 9.14 shows a cropped view of a larger graphic comparing therelative peaks and troughs of usage across all bike share stations inPhiladelphia over a 24-hour period. The divergent colour scale uses twohues and variations in lightness to show the increasingly busy andincreasingly slow periods of station activity either side of a breakpoint,represented by a very light grey to indicate the average point. The darkestred means the station is full, the darkest blue means the station is empty.

Regardless of whether you are plotting a converging or diverging scale,judging how you might divide up your colour scales into discrete valuebins needs careful thought. The most effective colour scales help viewersperceive not just the relative order of magnitude – higher or lower – butalso a sense of the absolute magnitude – how different a value might becompared to another value.

There is no universal rule about the number of value bins. Indeed, it is notuncommon to see entirely continuous colour scales. However, a generalrule of thumb I use is that somewhere between between four and ninemeaningful – and readable – value intervals should suffice. There are twokey factors to consider when judging your scales:

Are you plotting observed data or observable data? You might only

459

have collected data for a narrow range of quantities (e.g. 15 to 35) sowill your colour classifications be based on this observed range or onthe potentially observable data range i.e. the values you knowwould/could exist with a wider sample size or on a differentcollection occasion (e.g. 0 to 50)?What are the range and distribution of your data? Does it make senseto create equal intervals in your colour classifications or are theremore meaningful intervals that better reflect the shape of your dataand the nature of your subject? Sometimes, you will have legitimateoutliers that, if included, will stretch your colour scales far beyond themeaningful concentration of most of your data values.

Figure 9.15 Battling Infectious Diseases in the 20th Century: The Impactof Vaccines

You can see this effect in Figure 9.15, showing the incidence of HepatitisA per 100,000 population. There are only three values that exceed 100(you can see them on the top line for Alaska in the late 1970s). Toaccommodate these outliers the colour scale becomes somewhat stretched-out, with a wide range of potential values being represented by a darkyellow to red colour. With 99.9% of the values being under 100 there islittle discernibly in the blue/green shades used for the lower values. Ifoutliers are your focus, it makes sense to include these and colour

460

accordingly to emphasise their exceptional quality. Otherwise if they riskcompromising the discrete detail of the lower values you might look tocreate a broad classification that uses a single colour for any value beyonda threshold of maybe 75, with even value intervals of maybe 15 below thathelp to show the patterns of smaller values.

For diverging scales, the respective quantitative shades either side of abreakpoint need to imply parity in both directions. For example, a shade ofcolour that means +10% one side of the breakpoint should have an equalshade intensity in a different hue on the other side to indicate the sameinterval, i.e. −10%. Additionally, the darkest shades of hues at the extremeends of a diverging scale must still be discernible. Sometimes the darkestshades will be so close to black that you will no longer be able todistinguish the differences in their underlying hues when plotted in a chartor map.

As well as considering the most appropriate discrete bins for your values,for diverging scales one must also pay careful attention to the role of thebreakpoint. This is commonly set to separate values visually above orbelow zero or those either side of a meaningful threshold, such as target,average or median.

One of the most common mistakes in using colour to represent quantitativedata comes with use of the much-derided rainbow scale. Look at Figure9.16, showing the highest temperatures across Australia during the firstcouple of weeks in 2013. Consider the colour key to the right of the mapand ask yourself if this feels like a sufficiently intuitive scale. If the keywas not provided, would you be able to perceive the order of magnituderelationship between the colours on the map? If you saw a purple colournext to a blue colour, which would you expect to mean hotter and whichcolder?

Figure 9.16 Highest Max Temperatures in Australia

461

While the general implication of blue = ‘colder’ through to red = ‘hotter’ isincluded within sections of this temperature colour scale, it is the presenceof many other hues that obstructs the accessibility and createsinconsistency in logic. For instance, do the colours used to show 24°C(light blue) jumping to 26°C (dark green) make sense as a means forshowing an increasing temperature? How about 18°C (grey) to 20°C (darkblue), or the choice of the mid-brown used for 46°C which interrupts theincreasingly dark red sequence? If you saw on the map a region with thepink tone as used for 16°C would you be confident that you could easilydistinguish this from the lighter pink used to represent 38°C? Unless thereare meaningful thresholds within your quantitative data – justifiablebreakpoints – you should only vary your colour scales through thelightness dimension, not the hue dimension.

One of the interesting recurring challenges faced by visualisers is how torepresent nothing. For example, if a zero quantity or no category is ameaningful state to show, you still need to represent this visuallysomehow, even though it might possess no size, no position and no area.How do you distinguish between no data and a zero value?

Figure 9.17 State of the Polar Bear

462

Typically, using colour is one of the best ways to portray this. Figure 9.17shows one solution to making ‘no data’ a visible value. This map displaysthe population trends of the polar bear. Notice those significant areas ofgrey representing ‘data deficient’. A subtle but quite effective politicalpoint is being made here by including this status indicator. As I mentionedbefore, sometimes the absence of data can be the message itself.

Figure 9.18 Excerpt from ‘Geography of a Recession’

463

When considering colour choices for quantitative classifications, you willneed to think especially carefully about the lowest value grouping: is it tobe representative of zero, an interval starting from zero up to a low value,or an interval starting only from the minimum value and never includingzero? In this choropleth map (Figure 9.18) looking at the unemploymentrate across the counties of the USA, no value is as low as zero. Theremight be value that are close, but nowhere is the unemployment rate at0%. As you can see, the lowest tint used in this colour key is not white,rather a light shade of orange, so as not to imply zero. Whilst not relevantto this example, if you wanted to create a further distinction between thelowest value interval and the ‘null’ or ‘no data’ state you could achievethis by using a pure white/blank.

9.3 Features of Colour: Editorial SalienceHaving considered options for the application of colour in facilitating datalegibility, the next concern is colour used for editorial salience. Whereasdata legibility was concerned with helping to represent data, using colourfor editorial salience is about drawing the viewer’s attention to the

464

significant or meaningful features of your display. Colour offers such apotent visual stimulus and an influential means for drawing out keyaspects of your data and project that you might feel are sufficientlyrelevant to make prominent.

Consider again the idea of photography and the effect of taking aphotograph of a landscape. You will find the foreground objects are darkerand more prominent than the faded view of the background in the distanceas light and colour diminish. Using colour to achieve editorial salienceinvolves creating a similar effect of depth across your visualisation’scontents: if everything is shouting, nothing is heard.

The goal of using colour to facilitate editorial salience is a suitablecontrast. For things to stand out, you are in turn determining which otherthings will not.

The degree of contrast you might seek to create will vary. Often you willbe seeking to draw a significant contrast, maximising the emphasis of avalue or subset of values so the viewer can quickly home in on what youhave elevated for their attention relative to everything else.

For this reason, grey will prove to be one of your strongest allies in datavisualisation. When contrasted with reasonably saturated hues, grey helpsto create depth. Elements coloured in greyscale will sit quietly at the backof the view, helping to provide a deliberately subdued context that enablesthe more emphasised coloured properties to stand proudly in theforeground.

In Figure 9.19, the angle of analysis shows a summary of the mostprevalent men’s names featuring among the CEOs of the S&P 1500companies. As you can see there are more guys named ‘John’ or ‘David’than the percentage of all the women CEOs combined. With the emphasisof the analysis on this startling statement of inequality the bar for ‘Allwomen’ is emphasised in a burgundy colour, contrasting with the grey barsof all the men’s names. Notice also that the respective axis and bar valuelabels are both presented using a bold font, which further accentuates thisemphasis. It is also editorially consistent with the overriding enquiry of thearticle. As discussed in Chapter 3, bringing to the surface key insightsfrom data displays in this way contributes towards facilitating an‘explanatory’ experience.

465

Figure 9.19 Fewer Women Run Big Companies Than Men Named John

Figure 9.20 NYPD, Council Spar Over More Officers

Sometimes, only noticeable contrast – not shouting, just being slightlymore distinguishable – may be appropriate. Compared with the previousbar chart example, Figure 9.20 creates a more subtle distinction betweenthe slightly darker shade of green (and emboldened text) emphasising theNew York figures compared to the other listed departments in a slightlylighter green. As with the CEOs’ example, the object of our attention is the

466

subject of focus in the analysis, in this case regarding a drive for moreNYPD officers. This does not need to be any more contrasting; it is just assufficiently noticeable as the visualiser wishes it to be.

Sometime you will seek to create several levels of visual ‘urgency’ in therelative contrast of your display. The colour choices in Figure 9.21 givesforeground prominence to the yellow coloured markers and values (thedots are also larger) and then mid-ground/secondary prominence to theslightly muted red markers. In perceiving the values of the yellow markers,the viewer is encouraged to concentrate on primarily comparing these withthe red markers. The subtle grey markers are far less visible – closer inshade to the background than the foreground – and deliberately relegatedto a tertiary level so they do not clutter up the display and causeunwarranted attention. They provide further context for the distribution ofthe values but do not need to be any more prominent in their relationshipwith the foreground and mid-ground colours.

Figure 9.21 Excerpt from a Football Player Dashboard

I touched on the use of encoded overlays earlier where coloured areas orbandings can be used to help separate different regions of a display inorder to facilitate faster interpretation of the meaning of values. In thebubble plot in Figure 9.22, you can see the circle markers are colour coded

467

to help viewers quickly ascertain the significance of each location on thechart according to the quadrants in which they fall. Notice how in thebackground the diagonal shading further emphasises the distinctionbetween above the line ‘improvement’ and below the line ‘worsening’, avery effective approach.

Figure 9.22 Elections Performance Index

9.4 Features of Colour: Functional HarmonyAfter achieving data legibility and editorial salience through astute colourchoices, functional harmony is concerned with ensuring that any remainingcolour choices will aid, and not hinder, the functional effectiveness andelegance of the overall visualisation.

‘When something is not harmonious, it’s either boring or chaotic. Atone extreme is a visual experience that is so bland that the viewer is notengaged. The human brain will reject under-stimulating information. Atthe other extreme is a visual experience that is so overdone, so chaotic,that the viewer can’t stand to look at it. The human brain rejects what itcan not organise, what it cannot understand.’ Jill Morton, Colour

468

Expert and Researcher

You must judge the overall balance of and suitability of your collectivecolour choices and not just see these as isolated selections. This is againprimarily a judgement about contrast – what needs to be prominent andwhat needs to be less so. Such an apparent calming quality about a well-judged and cohesive colour palette is demonstrated by Stefanie Posavec’schoices in visualising the structure of Walter Benjamin’s essay ‘Art in theage of mechanical reproduction’ (Figure 9.23). There is effortless harmonyhere between the colour choices extending across the entire anatomy ofdesign: the petals, branches, labels, titles, legend, and background.

A reminder that any and every design feature you incorporate into yourdisplay will have a property of colour otherwise they will be invisible. Inlooking at data legibility and editorial salience you have considered yourcolour choices for representing data. A desire to achieve functionalharmony means considering further colour decisions that will helpestablish visual relationships across and between the rest of yourvisualisation’s anatomy: its interactive features, annotations andcomposition.

Figure 9.23 Art in the Age of Mechanical Reproduction: Walter Benjamin

Interactive features: Visible interactive features will includecontrols such as dropdown menus, navigation buttons, time slidersand parameter selectors. The colour of every control used will need tobe harmonious with the rest of the project but also, critically, must befunctionally clear. How you use colour to help the user discern what

469

is selected and what is not will need to be carefully judged.To illustrate this, Figure 9.24 shows an interactive project thatexamines the connected stories of the casualties and fatalities fromthe Iraqi and Afghan conflicts. Here you can see that there are severalinteractive features, all of which are astutely coloured in a way thatfeels both consistent with the overall tone of the project but alsomakes it functionality evident what each control’s selected status ordefined setting is. This is achieved through very subtle but effectivecombinations of dark and light greys that help create intuitive clarityabout which values the user has selected or highlighted. When abutton has a toggle setting (on/off, something/something else), suchas the ‘Afghanistan’ or ‘Iraq’ tabs at the top, the selected tab ishighlighted in bright grey and the unselected tab in a more subduedgrey. Filters can either frame (include/exclude) or focus(highlight/relegate) the data. The same approach to using brightergreys for the selected parameter values makes it very clear what youhave chosen, but also what you have excluded (while making evidentthe other currently-unselected values from which you can potentiallychoose).

Figure 9.24 Casualties

Annotations: Chart annotations such as gridlines, axis lines andvalue labels all need colouring in a way that will be sympathetic tothe colour choices already made for the data representation and,possibly, editorial contrasting. As mentioned in the last chapter, many

470

annotation devices exist in the form of text and so the relative fontcolour choices will need to be carefully considered. For anyannotation device the key guiding decision is to find the level atwhich these are suitably prominent. Not loud, not hidden, just at theright level. This will generally take a fair amount of trial and error toget right but once again, depending on your context, your firstthought should be to consider the merits offered by different shadesof grey.

You might be starting to suspect I’m a lobbyist for the colour grey.Nobody wants to live in a world of only grey. The point is more about howits presence enables other colours to come alive. The great Bill Shanklyonce said ‘Football is like a piano, you need 8 men to carry it and 3 whocan play the damn thing’. In data visualisation, grey does the heavy liftingso the more vibrant colours can bring the energy and vibrancy to yourdesign.

Figure 9.25 First Fatal Accident in Spain on a High-speed Line[Translated]

Another example of the role of greyscale is demonstrated by Figure 9.25,illustrating key aspects of the tragic rail crash in Spain in 2013. The senseof foreground and background is clearly achieved by the prominence ofthe scarlet-coloured annotations and visual cues offset against the

471

backdrop of an otherwise greyscale palette.

Figure 9.26 Lunge Feeding

472

There are other features of annotation that will have an impact onfunctional harmony through their colouring. Multimedia assets likephotos, embedded videos, images and illustrations need to beconsistent in tone according to their relative role on the display. Ifthey are to dominate the page then unleash the vibrancy of theircolours to achieve this; if they are playing more of a secondary orsupporting role then relegate their constituent colours to allow otherprimary features due prominence.Figure 9.26 includes small illustrations of a whale, showing how itgoes through the stages of lunge feeding. The elegance of the coloursused in these illustrations is entirely harmonious with the look andfeel of the overall piece. They are entirely at one with the rest of thegraphic.Composition: The clarity in layout of a project will often be achievedby the use of background colour to create logical organisation. In the‘Lunge Feeding’ graphic the shading of the blue sea getting darker asit moves down is not attempting to offer a precise representation ofthe sea, but it gives a sense of depth and draws maximum attention tothat panel. It is also naturally congruent with the subject matter.

Figure 9.27 Examples of Common Background Colour Tones

In general, there are no fixed rules on the benefits of any particularcolour for background shading. Your choices will depend mostly onthe circumstances and conditions in which your viewers are

473

consuming the work. Usually, when there is no associated congruencefor a certain background colour, your options will tend to come fromone of the selection of neutral and/or non-colours (Figure 9.27). Thisis because they particularly help to aid accentuation in combinationwith foreground colours.Typically, though, a white background (at least for your chart area)gives viewers the best chance of being able to accurately perceive thedifferent colour attributes used in your data representation and thecontrasting nature of your editorial contrast.White – or more specifically emptiness – is one of your mostimportant options for creating functional meaning for nothingness,something I touched on earlier. The emptiness of uncoloured spacecan be used very effectively to direct the eye’s attention. It organisesthe relationship between space on a page without the need for visibleapparatus, as seen in the left hand column of the lunge feedinggraphic. It can also be used to represent or emphasise values thatmight have the state of ‘null’ or ‘zero’ to maximise contrast.

‘The single most overlooked element in visual design is emptiness.Space must look deliberately used.’ Alex White, Author, TheElements of Graphic Design

9.5 Influencing Factors and ConsiderationsHaving mapped out the ways and places where colour could be used, youwill now need to consider the factors that will influence your decisionsabout how colour should be used.

Formulating Your BriefFormat: This is a simple concern but always worth pointing out: ifyou are producing something for screen display you will need to setyour colour output to RGB; if it is for print you will need CMYK.Additionally, when you are preparing work for print, running offplenty of proofs before finalising a design is imperative. What you arepreparing digitally is a step away from the form of its intendedoutput. What looks like a perfect colour palette on screen may notultimately look the same when printed.

474

Print quality and consistency is also a factor. Graphics editors whocreate work for print newspapers or magazines will often considerusing colours as close in tone as possible to pure CMYK, especially iftheir work is quite intricate in detail. This is because the colour platesused in printing presses will not always be 100% aligned and thusmixtures of colours may be slightly compromised.As black and white printing is still commonplace, you need to beaware of how your work might look if printed without colour. If youare creating a visualisation that might possibly be printed by certainusers in black and white, the only colour property that you canfeasibly utilise will be the lightness dimension. Sometimes, as adesigner or author, you will be unaware of this intent and thecolourful design that you worked carefully towards will end up notbeing remotely readable.

We all refer to black and white printing, but technically printers do notactually print using white ink, it is just less black or no black.

Furthermore, there is an important difference in how colours appear whenpublished in colour and how they appear when published in black andwhite. Hues inherently possess different levels of brightness: the purestblue is darker than the purest yellow. If these were printed in black andwhite, blue would therefore appear a darker, more prominent shade ofgrey. If your printed work will need to be compatible for both colour andblack and white output, before finalising your decisions check that thelegibility and intended meaning of your colour choices are beingmaintained across both forms.

Setting: For digital displays, the conditions in which the work will beconsumed will have some influence over the choice between light anddark backgrounds. The main factor is the relative contrast and thestresses this can place on the eye to adjust against the surroundings. Ifyour work is intended for consumption in a light environment, lighterbackgrounds tend to be more fitting; likewise darker backgroundswill work best for consuming in darker settings. Fortablets/smartphones, the bordering colour of the devices can alsoinfluence the most suitable choice of background tone to mostsympathetically contrast with the surroundings.Colour rules and identities: In some organisations there are style

475

guidelines or branding identities that require the strict use of onlycertain colour options. Similar guidelines may exist if you arecreating work for publication in a journal, magazine, or on certainwebsites. Guidelines like these are well intended, driven by a desireto create conformity and consistency in style and appearance.However, in my experience, the basis of such colour guides rarelyincorporates consideration for the subtleties of data visualisation. Thismeans that the resulting palettes are often a bad fit for idealvisualisation colour needs, providing limited scope for the variationand salience you might seek to portray.Your first task should always be to find out if there is anycompromise – any chance of not having these colour restrictionsimposed. If there is no flexibility, then you will just have to acceptthis and begin acquainting yourself with the colours you do have towork with. Taking a more positive view, achieving consistency in theuse of colour for visualisation within an organisation does have meritsif the defined palettes offer suitably rich variety. Developing arecognisable ‘brand’ and not having to think from scratch about whatcolours to use every time you face a new project is something that canbe very helpful, especially across a team.Purpose map: Does it need to be utilitarian or decorative? Should itbe functional or appealingly seductive? Does it lend itself to beingvivid and varied in colour or more muted and distinguished? Colouris the first thing we notice as viewers when looking at a visualisation,so your choices will play a huge part in setting the visible tone ofvoice. How you define your thinking across the vertical dimension ofyour purpose map will therefore have an influence on your colourthinking.Along the horizontal dimension, the main influencing considerationwill be a desire to offer an ‘explanatory’ experience. As mentioned,some of the tactics for incorporating editorial salience will be ofspecific value if you are seeking to emphasise immediately apparent,curated insights.Ideas and inspiration: In the process of sketching out your ideas andcapturing thoughts about possible sources of influence, maybe therewere already certain colours you had identified as being consistentwith your thinking about this subject? Additionally, you might havealready identified some colours you wish to avoid using.

476

Working With DataData examination: The characteristics of your data will naturallyhave a huge impact, on the decisions you make around data legibility.Firstly, the type of data you are displaying (primarily nominal vs allother types) will require a different colour treatment, as explained.Secondly, the range of categorical colour associations (limits ondiscernible hues) and the range and distribution of quantitative values(numbers of divisions and definition of the intervals across yourclassification scale) will be directly shaped by the work you did in theexamination stage.In Figure 9.28 you can see a census of the prevalence and species oftrees found around the boroughs of New York City. This initial big-picture view creates a beautiful tapestry made up of tree populationsacross the region (notice the big void where JFK Airport is located).

Figure 9.28 Excerpt from ‘NYC Street Trees by Species’

To observe patterns for individual tree types is harder: with 52different tree species there are simply too many classifications to beable to allocate sufficiently unique colours to each. To overcome this,the project features a useful pop-up filter list which then allows you toadjust the data on view to reveal the species you wish to explore.It is often the case when thinking about colour classifications that youmay need to revisit the data transformation actions to find new waysof grouping your data to create better-fit quantitative valueclassifications or to look at ways of grouping your categories. For thelatter, actions such as combining less important categories in an

477

‘other’ bin to reduce the variability or eliminating certain values fromyour analysis may be necessary.

‘If using colour to identify certain data, be careful to not accidentallyapply the same identity to a nearby part of the graphic. Don’t allowcolour to confuse just for the sake of aesthetics. I also like to use colourto highlight. A single colour highlight on a palette of muted colours canbe a strong way to draw attention to key information.’ Simon Scarr,Deputy Head of Graphics, ThomsonReuters

Establishing Your Editorial ThinkingFocus: When considering the perspective of ‘focus’ in the editorialthinking stage, you were defining which, if any, elements of contentwould merit being emphasised. Are there features of your analysisthat you might wish to accentuate? How might colour be used toaccentuate key insights in the foreground and push other (lessimportant) features into the background? What are the characteristicsof your data that you might want to emphasise through changes incolour? For example, are there certain threshold values that will needto be visually amplified if exceeded? Your decisions here will directlyinfluence your thinking about using colour to facilitate editorialsalience.

Data RepresentationChart type choice: Specifically in relation to data legibility,depending on which chart type you selected to portray your data, thismay have attributes requiring decisions about colour. The heat mapand choropleth map are just two examples that use variation in colourto encode quantitative value. Almost every chart has the potential touse colour for categorical differentiation.

Trustworthy DesignData classification: The decisions you make about how to encodedata through colour have a great bearing on the legibility andaccuracy of your design, especially with quantitative data. You will

478

need to ensure the classifications present a true reflection of the shapeand characteristics of your data and do not suppress any significantinterpretations.

‘Start with black and white, and only introduce color when it hasrelevant meaning. In general, use color very sparingly.’ Nigel Holmes,Explanation Graphic Designer

Meaningful: Eliminating arbitrary decisions is not just aboutincreasing the sophistication of your design thinking, it is also anessential part of delivering a trustworthy design. If something looksvisually significant in its data or editorial colouring it will be read assuch, so make sure it is significant, otherwise remove it. Youespecially want to avoid any connotation of significant meaningacross your functional or decorative colour choices. This will beconfusing at best, or will appear deceptive at worst.Do not try to make something look more interesting than itfundamentally is. Colour should not be used to decorate data. Youmight temporarily boost the apparent appeal of your work in the eyeof the viewer but this will be short-lived and artificial.Illusions: The relationship between a foreground colour and abackground one can create distorting illusions that modify theperceived judgement of a colour. You saw an effect of this earlierwith the inverted area chart showing ‘Gun deaths in Florida’,whereby the rising white mountain was seen by some as theforeground data, when in fact it was the background emptinessframed by the red area of data and the axis line. Illusions can affectall dimensions of colour perception. There are simply too many tomention here and they are hard to legislate for entirely; it is reallymore about mentioning that you need to be aware of these as aconsequence of your colour choices.

Accessible DesignConsistency: Consistency in the use of colours helps to avoid visualchaos and confusion and minimises cognitive effort. When youestablish association through colour you need to maintain thatmeaning for as long as possible. Once a viewer has allocated time andeffort to learn what colours represent, that association becomes

479

locked down in the eye and the mind. However, if you then allocatethe same colour(s) to mean something different (within the samegraphic or on a different page/screen view) this creates an additionalcognitive burden. The viewer has almost to disregard the previousassociation and learn the new one. This demands effort thatundermines the accessibility of your design.

Sometimes this can prove difficult, especially if you have a restrictedcolour palette. The main advice here is to try to maximise the ‘space’between occasions of the same colour meaning different things. This spacemay be physical (different pages, interactive views), time (the simpleduration of reading between the associations being changed) or editorial(new subject matter, new angle of analysis). Such space effectively helpsto clean the palate (pun intended). Of course, at the point of any newassignment in your colour usage, clear explanations are mandatory.

Visual accessibility: Approximately 5% of the population have visualimpairments that compromise their ability to discern particular colours andcolour combinations. Deuteranopia is the most common form, oftenknown as red–green colour blindness, and is a particular genetic issueassociated with men. The traffic light scheme of green = ‘good’, red =‘bad’ is a widespread approach for using colour as an indicator. It is aconvenient and common metaphor and the reasons for its use are entirelyunderstandable. However, as demonstrated in the pair of graphics in Figure9.29, looking at some word-usage sentiment analysis, the reds and greensthat most of us would easily discern (from the left graphic) are often not atall distinguishable for those with colour blindness (simulated on the right).

Figure 9.29 Demonstrating the Impact of Red-green Colour Blindness(deuteranopia)

480

Of course, if you have a particularly known, finite and fixed audience thenyou can easily discover if any colour-blindness issues do in fact exist.However, if your audience is much larger and undefined you are going toalienate potentially 1 person in every 14, in which case the use of thedefault red–green colour combination is not acceptable. Be more sensitiveto your viewers by considering other options:

Figure 9.30 Colour-blind Friendly Alternatives to Green and Red

If you are working on an interactive solution, you may considerhaving a toggle option to switch between different colour modes. For

481

print outputs you might normally have reduced flexibility, but incertain circumstances the option of creating dual versions (secondoutput for colour-impaired viewers) may be legitimate.Connotations and congruence: Whether it is in politics, sport,brands or in nature, there are many subjects that already haveestablished colour associations you can possibly look to exploit. Thisassociation may sit directly with the data, such as the normal colourassociations for political party categories, or more through themeaning of the data, such as perhaps through the use of green topresent analysis about ecological topics.In support of accessible design, exploiting pre-existing colourassociations in your work can create more immediacy in subjectrecognition. You might also benefit from the colour learningexperiences your viewers may already have gone through. Thisprovides a shortcut to understanding through familiarity.However, while some colour connotations can be a good thing, insome cases they can be a bad thing and possibly should be avoided.You need to be considerate of and sensitive to any colour usage toensure that you do not employ connotations that may have a negativeimplication and may evoke strong emotions and reactions frompeople.Sometimes a colour is simply incongruent with a subject. You wouldnot use bright, happy colours if you were portraying data about deathor disease. Earlier, in the ‘Vision: Ideas’ section, I described a projectcontext where I knew I wanted to avoid the use of blue colours in aparticular project about psychotherapy treatment in the Arctic,because it would carry an unwelcome clichéd association given thesubject matter. The use of ‘typical’ skin colours to represent ethnicgroups in a visualisation is something that would be immediatelyclumsy (at best) and offensive (at worst).Cultural sensitivities and inconsistencies are also important toconsider. In China, for example, red is a lucky colour and so the useof red in their stock market displays, for example, indicates the risingvalues. A sea of red on the FTSE or Dow Jones implies the opposite.In Western society red is often the signal for a warning or danger.Occasionally established colour associations are out of sync withcontemporary culture or society. For example, when you think aboutcolour and the matter of gender, because it has been so endlesslyutilised down the years, it is almost impossible not to thinkinstinctively about the use of blue (boys) and pink (girls). My

482

personal preference is to avoid this association entirely. I agree withso many commentators out there that the association of pink tosignify the female gender, in particular, is clichéd, outdated and nolonger fit for purpose. It is not too much to expect viewers to learn theassociation of – at most – two new colours for representing gender.

Elegant DesignUnity: As I alluded to in the discussion about using colours foreditorial salience, colour choices are always about contrast. The effectof using one colour is not isolated to just that instance of colour:choosing one colour will automatically create a relationship withanother. There is always a minimum of two colours in anyvisualisation – a foreground and background colour – but generallythere are many more.We notice the impact of colour decisions more when they are donebadly. Inconsistent and poorly integrated colour combinations createjarring and discordant results. If we do not consciously notice colourdecisions this probably means they have been seamlessly blended intothe fabric of the overall communication.Neutral colouring: Even if there is no relevance in the use of colourfor quantitative or categorical classifications, you still have to giveyour chart some colour, otherwise it will be invisible. The decisionyou make will depend again on the relative harmony with othercolour features but should also avoid unnecessarily ‘using up’ auseful colour. Suppose you colour your bars in blue but thenelsewhere across your visualisation project blue would have been auseful colour to show something meaningful; you then haveunnecessarily taken blue out of the reckoning. My default choice is togo with grey to begin with (Figure 9.31) and only use a colour if thereis a suitable and available colour not used elsewhere or if it needs tobe left as a back- or mid-ground artefact to preserve prominenceelsewhere in the display.

Figure 9.31 Excerpt from ‘Pyschotherapy in The Arctic’

483

Justified: Achieving elegant design is about eliminating the arbitrary.In thinking about colour usage I often get quite tough with myself. If Iwant to show any feature on my visualisation display I have to seekpermission from myself to unlock access to the more vibrant coloursby justifying why I should be allowed to use and apply that colour (Iknow what you’re thinking, ‘what a fun existence this guy leads’).Elegance in visualisation design is often about using only the coloursyou need to use and avoiding the temptation to inject unnecessarydecoration. The Wind Map project (Figure 9.32) demonstratesunquestionable elegance and yet uses only a monochromatic palette.There is no colouring of the sea, no topographic detail, noemphasising of any extreme wind speed thresholds being reached.The resulting elegance is quite evident: the map has artistic andfunctional beauty.To emphasise again, I am not advocating a need to pursueminimalism: while you can create incredibly elegant and detailedworks from a limited palette of colours, justifying the use of coloursis not the same as unnecessarily restricting the use of colour.Feels right: The last component of influence is yourself. Sometimesyou will just find colours that feel right and look good when youapply them to your work. There is maybe no underlying sciencebehind such choices, and as such you will simply need to back yourown instinctive judgement as an astute visualiser and know whensomething looks good. Creating the right type of visual appeal,something that is pleasing to the eye and equally fit for purpose in all

484

the functional ways I have outlined, is a hard balance to achieve, butyou will find that weighing up all these different components ofinfluence alongside your own flair for design judgement will give youthe best chance of getting there.

Figure 9.32 Wind Map

Summary: ColourData legibility involves using colours to represent different types of data.The most appropriate colour association or scale decisions will depend onthe data type: nominal (qualitative), ordinal (qualitative), interval and ratio(quantitative).

Editorial salience is about using colour to direct the eye. For whichfeatures and to what degree of emphasis do you want to create contrast?

Functional harmony concerns deciding about every other colour propertyas applied to all interactive features, annotations and aspects of yourcomposition thinking.

Influencing Factors and Considerations

Formulating the brief: format, setting, colour rules and imposedguidelines all have a significant impact. Your definitions about bothtone and and experience, on the purpose map, will lead to specific

485

choices being more suitable than others. What initial ideas did youform? Have any sources of inspiration already implanted ideas insideyour head about which colours you could use?Working with data: what type of data and what range ofvalues/number of classifications have you got?Establishing your editorial thinking: what things do you want toemphasise or direct the eye towards (focus)?Data representation: certain chart type choices will already includecolour as an encoded attribute.Trustworthy design: ensure that your colour choices are faithful to theshape of your data and the integrity of your insights. If somethinglooks meaningful it should be, otherwise it will confuse or deceive.Accessible design: once you’ve committed colour to mean somethingpreserve the consistency of association for as long as possible. Beaware of the sensitivities around visual accessibility andpositive/negative colour connotations.Elegant design: the perception of colours is relative so the unity ofyour choices needs to be upheld. Ensure that you can justify every dotof colour used and, ultimately, rely on your own judgment todetermine when your final palette feels right.

Tips and Tactics

Use the squint test: shrink things down and/or half close your eyes tosee what coloured properties are most prominent and visible – arethese the right ones?Experimentation: trial and error is still often required in colour,despite the common sense and foundation of science attached to it.Developing a personal style guide for colour usage saves you the painof having to think from scratch every time and will help your workbecome more immediately identifiable (which may or may not be animportant factor).Make life easier by ensuring your preferred (or imposed) colourpalettes are loaded up into any tool you are using, even if it is just thetool you are using for analysis rather than for the final presentation ofyour work.If you are creating for print, make sure you do test print runs of thedraft work to see how your colours are looking – do not wait for thefirst print when you (think you) have finished your process.

486

487

10 Composition

Composition concerns making careful decisions about the physicalattributes of, and relationships between, every visual property to ensure theoptimum readability and meaning of the overall, cohesive project.

Composition is the final layer of your design anatomy, but this should notimply that it is the least important part of your design workflow. Far fromit. It is simply that now is the most logical time to think about this, becauseonly at this point will you have established clarity about what content toinclude in your work. As I explained, this final layer of design thinking,along with colour, is no longer about what elements will be included buthow they will appear. Composition is a critical component of any designdiscipline. The care and attention afforded in the precision of yourcomposition thinking will continue until the final dot or pixel has beenconsidered.

Visual assets such as your chart(s), interactive controls and annotations alloccupy space. In this chapter you will be judging what is the best way touse space in terms of the position, size and shape of every visible property.In many respects these individual dimensions of thought are inseparableand so, similar to the discussion about annotation, the division in thinkingis separated between project- and chart-level composition options:

Project composition: defining the layout and hierarchy of the entirevisualisation project.Chart composition: defining the shape, size and layout choices for allcomponents within your charts.

10.1 Features of Composition: ProjectCompositionThis first aspect of composition design concerns how you might lay outand size all the visual content in your project to establish a meaningfulhierarchy and sequence. Content, in this case, means all of your charts,interactive operations and elements of annotation.

488

Where will you put all of this, what size will it be and why? How will thehierarchy (across views) and sequencing (within a view) best fit the spaceyou have to work in? How will you convey the relative importance andprovide a connected narrative where necessary?

I will shortly run through all the key factors that will influence yourdecisions, but it is worth emphasising that so much about compositionthinking is rooted in common sense and involves a process of iterationtowards what feels like an optimum layout. Of course, there are certainestablished conventions, such as the positioning of titles first or at the top(usually left or centrally aligned). Introductions are inevitably useful tooffer early, whereas footnotes detailing data sources and credits might beof least importance, relatively speaking. You might choose to show themain features first, exploiting the initial attention afforded by youraudience, or you may wish to build up to this, starting off with contextualcontent before the big ‘reveal’.

Figure 10.1 City of Anarchy

489

490

The hierarchy of content is not just a function of relative position throughlayout design, it can also be achieved through the relative variation in sizeof the contents. Just as variation in colour implies significance, so too doesvariation in size: a chart that is larger than another chart will imply that theanalysis it is displaying carries greater importance.

The ‘City of anarchy’ infographic demonstrates a clear visual hierarchyacross its design. There is a primary focal point of the main subject‘cutaway’ illustration in the centre with a small thumbnail image above itfor orientation. At the bottom there are small supplementary illustrations toprovide further information. It is clear through their relative placement atthe bottom of the page and their more diminutive stature that they are ofsomewhat incidental import compared with the main detail in the centre.

There are generally two approaches for shaping your ideas about thisproject-level composition activity, depending on your entry-pointperspective: wireframing and storyboarding. I profiled these at the startof this part of the book, but it is worth reinforcing their role now you arefocusing on this section of design thinking.

Wireframing involves sketching the potential layout and size of all themajor contents of your design thinking across a single-page view.This might be the approach you take when working on an infographicor any digital project where all the interactive functions are containedwithin a single-screen view rather than navigating users elsewhere.Any interactive controls included would have a description within thewireframe sketch to explain the functions they would trigger.Figure 10.2 is an early wireframe drawn by Giorgia Lupi whenshaping up her early thoughts about the potential layout of a graphicexploring various characteristics of Nobel prizes and laureatesbetween 1901 and 2012.

Figure 10.2 Wireframe Sketch

491

Storyboarding is something you would undertake with wireframing ifyou have a project that will entail multiple pages or many differentviews and you want to establish a high-level feel for the overallarchitecture of content, its navigation and sequencing. This would bean approach relevant for linear outputs like discrete sequences inreports, presentation slides or video graphics, or for non-linearnavigation around different pages of a multi-faceted interactive. Theindividual page views included as cells in this big-picture hierarchywill each merit more detailed wireframing versions to determine howtheir within-page content will be sized and arranged, and how thenavigation between views would operate.With both wireframing and storyboarding activities all you areworking towards, at this stage, are low-fidelity sketched concepts.Whether this sketching is on paper or using a quick layout tool doesnot matter; it just needs to capture with moderate precision theessence of your early thinking about the spatial consequence ofbringing all your design choices together. Gradually, through furtheriteration, the precision and finality of your solution will emerge.

492

10.2 Features of Composition: ChartCompositionAfter establishing your thoughts about the overall layout, you will nowneed to go deeper in your composition thinking and contemplate thedetailed spatial matters local to each chart, to optimise its legibility andmeaning. There are many different components to consider.

Chart size: Do not be afraid to shrink your charts. The eye can stilldetect at quite small resolution and with great efficiency chartattributes such as variation in size, position, colour, shape and pattern.This supports the potential value of the small-multiples technique, anapproach that tends to be universally loved in data visualisation. As Iexplained earlier, this technique offers an ideal solution for when youare trying to display the same analysis for multiple categories ormultiple points in time. Providing all the information in asimultaneous view means that viewers can efficiently observe overallpatterns as well as perform a more detailed inspection. Figure 10.3provides a single view of a rugby team’s match patterns across thefirst 12 matches of a season. Each line chart panel portrays thecumulative scoring for the competing teams across the 80 minutes ofa match. The 12 match panels are arranged in chronological order,from top left to bottom right, based on the date of the match.

Figure 10.3 Example of the Small Multiples Technique

493

The main obstacle to shrinking chart displays is the impact on text.The eye will not cope too well with small fonts for value or categorylabels, so there has to be a trade-off, as always, between the amountof detail you show and the size you show it.Chart scales: When considering your chart-scales try to think abouthow you might use these to tell the viewer something meaningful.This can be achieved through astute choices around the maximumvalue ranges and also in the choice of suitable intervals for labellingand gridline guides.The maximum values that you assign to your chart scales, informedby decisions around editorial framing, can be quite impactful insurfacing key insights. You may recall the chart from earlier thatlooked at the disproportionality of women CEO’s amongst the S&P1500 companies. Figure 10.4 is another graphic on a similar subject,which contextualises the relative progress in the rise of women CEOsamongst the Fortune 500 companies. By setting the maximum y-axisvalue range to reflect the level at which equality would exist, theresulting empty space emphasises the significant gap that stillpersists.

Figure 10.4 Reworking of ‘The Glass Ceiling Persists’

494

Figure 10.5 Fast-food Purchasers Report More Demands on TheirTime

495

Figure 10.5 shows how the lack of careful thought about your scalescan undermine the ease of readability. This chart shows howAmerican adults spend their time on different activities. The analysisis broken down into minutes and so the maximum is set at 1440minutes in a day. For some reason, the y-axis labels and theassociated horizontal gridlines are displayed at intervals of 160minutes. This is an entirely meaningless quantity of time so whydivide the day up into nine intervals? To help viewers perceive thesignificance and size of the different stacked activities it would havebeen far more logical to use 60-minute time intervals as that is howwe tend to think when dividing our daily schedule.Chart orientation: Decisions about the orientation of your chart andits contents can sometimes help squeeze out an extra degree ofreadability and meaning from your display.

Figure 10.6 Illustrating the Effect of Chartorientation Decisions

496

The primary concern about chart orientation is towards the legibilityof labels along the axis. A vertical bar chart, with multiple categoriesalong the x-axis, will present a challenge of making the labels legibleand avoiding them overlapping. Ideally you would want to preservelabel reading in line with the eye, but you might need to adjust theirorientation to either 45° or 90°. My preference for handling this withbar charts is to switch the orientation of the chart and to then have

497

much more compatible horizontal space to accommodate the labels.The meaning of your subject’s data may also influence your choice.While there may have been constraints on the dimension of space inits native setting, Figure 10.6, portraying the split of political partiesin Germany, feels like a missed opportunity to display a political axisof the Left and the Right through using a landscape rather thanportrait layout.As you saw earlier, the graphic about ‘Iraq’s bloody toll’ (Figure1.11) uses an inverted bar chart to create a potent display of data thateffectively conveys the subject matter, but importantly does sowithout introducing any unnecessary obstacles in readability.In the previous section I presented a wireframe sketch of a graphicabout Nobel prize winners. Figure 10.7 shows the final design. Noticehow the original concept of the novel diagonal orientation wasaccomplished in the final composition, exploiting the greater roomthat this dimension of space offers within the page. It feels quiteaudacious to do this in a newspaper setting.

Figure 10.7 Nobels no Degrees

498

Figure 10.8 Kasich Could Be The GOP’s Moderate Backstop

Figure 10.8, from FiveThirtyEight, rotates the scatter plot by 45° andthen overlays a 2 × 2 grid which helps to guide the viewer’sinterpretation by making it easier to observe which values are locatedin each quadrant. It is also used to emphasise the distinction betweenlocation in the top and bottom halves of the chart along the axis ofpopularity, essentially the primary focus of the analysis.

Although the LATCH and CHRTS acronyms share some similarities,the application of each concerns entirely different aspects of your designthinking. They are independent of one another. A bar chart, whichbelongs to the categorical (C) family of charts, could have its datapotentially sorted by location, alphabet, time, category or hierarchy.

499

Chart value sorting: Sorting content within a chart is important forhelping viewers to find and compare quickly the most relevantcontent. One of the best ways to consider the options for value sortingcomes from using the LATCH acronym, devised by Richard SaulWurman, which stands for the five ways of organising displays ofdata: Location, Alphabet, Time, Category or Hierarchy.Location sorting involves sequencing content according to the orderof a spatial dimension. This does not refer to sorting data on a maplocations are fixed, rather it could be sorting data by geographicalspatial relationships (such as presenting data for all the stops along asubway route) or a non-geographical spatial relationship (like asequence based on the position of major parts of the body from headto toe). You should order by location only when you believe it offersthe most logical sequence for the readability of the display or if thereis likely to be interest or significance in the comparison ofneighbouring values. An example of location sorting is displayed in‘On Broadway’ (Figure 10.9) on the following page, an interactiveinstallation that stitches together a sequenced compilation of data andmedia related to 30 metre intervals of life along the 13 miles (21 km)of Broadway that stretches across the length of Manhattan. Thiscontinuous narrative offers compelling views of the fluctuatingcharacteristics as you transport yourself down the spine of the city.

Figure 10.9 On Broadway

Alphabetical sorting is a cataloguing approach that facilitates efficientlookup and reference. Only on rare occasions, when you are

500

especially keen to offer convenient ordering for looking upcategorical values, will you find that alphabetical sorting alone offersthe best sequence. In Figure 10.10, investigating different measures ofwaiting times in emergency rooms across the United States, the barcharts are presented based on the alphabetical sorting of each state.This is the default setting but users can also choose to reorder thetable hierarchically based on the increasing/decreasing values acrossthe four columns.

Data representation techniques that display overlapping connections,like Sankey diagrams, slope graphs and chord diagrams, also introducethe need to contemplate value sorting in the z-dimension: that is, whichof these connections will be above and which will be below, and why.

Alphabetical sorting might be seen as a suitably diplomatic optionshould you not wish to imply any ranking significance that would bedisplayed when sorting by any other dimension. Additionally, there isa lot of sense in employing alphabetical ordering for values listed indropdown menus as this offers the most immediate way for viewersto quickly find the options they are interested in selecting.

Figure 10.10 ER Wait Watcher: Which Emergency Room Will SeeYou the Fastest?

501

Time-based sorting is used when the data has a relevant chronologicalsequence and you wish to display and compare how changes haveprogressed over time. In Figure 10.11, you can see a snapshot of agraphic that portrays the rain patterns in Hong Kong since 1990. Eachrow of data represents a full year of 365/366 daily readings runningfrom left to right. The subject matter and likely interest in theseasonality of patterns make chronological ordering a common-sensechoice.

Figure 10.11 Rain Patterns

Categorical sorting can be usefully applied to a sequence ofcategories that have a logical hierarchy implied by their values orunique to the subject matter. For example, if you were presentinganalysis about football players you might organise a chart based onthe general order of their typical positions in a team (goalkeeper >defenders > midfielders > forwards) or use seniority levels as a wayto present analysis about staff numbers. Alternatively, if you haveordinal data you can logically sort the values according to theirinherent hierarchy. In Figure 10.12, that you saw earlier in the profileof ordinal colours, the columns are sequenced left to right in orderfrom ‘major deterioration’ to ‘major improvement’, to help reveal thebalance of treatment outcomes from a sample of psychotherapyclients.

Figure 10.12 Excerpt from ‘Pyschotherapy in The Arctic’

502

Hierarchical sorting organises data by increasing or decreasingquantities so a viewer can efficiently perceive the size, distributionand underlying ranking of values. In Figure 10.13, showing thehighest typical salaries for women in the US, based on analysis ofdata from the US Bureau of Labour Statistics, the sorting arrangementpresents the values by descending quantity to reveal the highestrankings values.

Figure 10.13 Excerpt from ‘Gender Pay Gap US’

In Figure 10.12 the bubbles in each column do not need to be colouredas their position already provides a visual association with the‘deterioration’ through to ‘improvement’ ordinal categories. Theattribute of colour, specifically, can therefore be considered redundantencoding. However, you might still choose to include this redundancy if

503

you believed it aided the immediacy of association and distinction. Inthis case, the chart was part of a larger graphic that employed the samecolour associations across several different charts and therefore it madesense to preserve this association.

10.3 Influencing Factors and ConsiderationsYou are now familiar with the array of various aspects of compositionthinking. At this point you will need to weigh up your decisions on howyou might employ these in your own work. Here are some of the specificfactors to bear in mind.

Formulating Your BriefFormat: Naturally, as composition is about spatial arrangement, thenature and dimensions of the canvas you have to work with will havea fundamental bearing on the decisions you make. There are twoconcerns here: what will be the shape and size of the primary formatand how transferable will your solution be across the differentplatforms on which it might be used or consumed?Another factor surrounding format concerns the mobility of viewingthe work. If the form of your output enables viewers to easily move adisplay or move around a display in a circular plane (such as lookingat a printout or work on a tablet) this means that issues such as labelorientation can be largely cast aside. If your output is going to beconsumed in a relatively fixed setting (desktop/laptop or via apresentation) the flexibility of viewing positions will be restricted.

Working With DataData examination: Not surprisingly, the shape and size of your datawill directly influence your chart composition decisions. Whendiscussing physical properties in Chapter 4, I described the influenceof quantitative values with legitimate outliers distorting ideal scalechoices. One solution for dealing with this is to use a non-linearlogarithmic (often just known as a ‘log’) scale. Essentially, eachmajor interval along a log scale increases the value at that markedposition by a factor of 10 (or by one order of magnitude) rather than

504

by equal increments. In Figure 10.14, looking at ratings for thousandsof different board games, the x-axis is presented on a log scale inorder to accommodate the wide range of values for the ‘Number ofratings’ measure and to help fit the analysis into a square-chart layout.Had the x-axis remained as a linear scale, to preserve a square layoutwould have meant squashing values below 1000 into such a tightlypacked space that you would hardly see the patterns. Alternatively, awide rectangular chart would have been necessary but impracticalgiven the limitations of the space this chart would occupy.I have great sympathy for the challenges faced by designers likeZimbabwe-based Graham van de Ruit, when working on typesetting abook titled Millions, Billions, Trillions: Letters from Zimbabwe,2005−2009 in 2014. The book was all text, apart from one or twotables. One of the tables of data supplied to Graham showedZimbabwe’s historical monthly inflation rates, which, as you can see(Figure 10.15), included some incredibly diverse values.I love the subtle audacity of Graham’s solution. Even though it ispresented in tabular form there is a strong visual impact created byallowing the sheer spatial consequence of the exceptional mid-2008numbers to cause the awkward widening of the final column. I thinkthis makes the point much more effectively than a chart might, in thiscase.

Figure 10.14 The Worst Board Games Ever Invented

505

Figure 10.15 From Millions, Billions, Trillions: Letters from Zimbabwe,2005−2009

506

‘I thought that a graph might be more effective, but I quickly realisedthat the scale would be a big challenge… The whole point of graphingwould have been to show the huge leap in 2008, something that I feltthe log scale would detract from and was impractical with the spaceconstraints. I also felt that a log scale might not be intuitive to the targetaudience.’ Graham van de Ruit, Editorial and Information Designer

Establishing Your Editorial ThinkingAngles: The greater the number of different angles of analysis youwish to cover in your work, the greater the challenge will be toseamlessly accommodate the resulting chart displays in one view. Themore content you include increases the need to contemplatereductions in the size of charts or a non-simultaneous arrangement,perhaps through multi-page sequences with interactive navigation.In defining your editorial perspectives, you will have likelyestablished some sense of hierarchy that might inform which anglesshould be more prominent (regarding layout position and size) andwhich less so. There might also be some inherent narrative binding

507

each slice of analysis that lends itself to being presented in adeliberate sequence.

Data RepresentationChart type choice: Different charts have different spatialconsequences. A treemap generally occupies far more space than apie chart simply because there are many more ‘parts’ being shown. Apolar chart is circular in shape, whereas a waffle chart is squared.With each chart you include you will have a uniquely shaped piecethat will form part of the overall jigsaw puzzle. Inevitably there willbe some shuffling of content to find the right size and placementbalance.The table in Figure 10.16 summarises the main chart structures andthe typical shapes they occupy. This list is based only on the chartsincluded in the Chapter 6 gallery but still offers a reasonablecompilation of the main structures. These are ordered in descendingfrequency as per the distribution of the different structures of charts inthe gallery.

Figure 10.16 List of chart structures

508

Trustworthy DesignChart-scale optimisation: Decisions about chart scales concern themaximum, minimum and interval choices that ensure integritythrough the representation as well as optimise readability.Firstly, let’s look at decisions around minimum values used on thequantitative value axis, known as the origin, and the reasons why it isnot OK for you to truncate the axis in methods like the bar chart. Anydata representation where the attribute of size is used to encode aquantitative value needs to show the full, true size, nothing more andnothing less. The origin needs to be zero. When you truncate a barchart’s quantitative value axis you distort the perceived length orheight of the bar. Visualisers are often tempted to crop axis scaleswhen values are large and the differences between categories aresmall. However, as you can see in Figure 10.17, the consequence isthat it creates the impression of highly noticeable relative differencebetween values when the absolute values do not support this.

509

Figure 10.17 Illustrating the Effect of Truncated Bar Axis Scales

The single instance in which it is remotely reasonable to truncate anaxis would be if you had a main graphic which effectively offered athumbnail view of the whole chart for orientation positionedalongside a separate associated chart (similar to that on the right).This separate chart might have a truncated axis that would provide amagnified view of the main chart, showing just the tips of the bar, tohelp viewers see the differences close up.In contrast to the bar chart, a line chart does not necessarily needalways to have a zero origin for the value axis (normally the y-axis).A line chart’s encoding involves a series of connected lines (marks)joining up continuous values based on their absolute position along ascale (attribute). It therefore does not encode quantitative valuesthrough size, like the bar chart does, so the truncation of a value axiswill not unduly impact on perceiving the relative values against thescale and the general trajectory. For some data contexts the notion ofa zero quantity might be impossible to achieve. In Figure 10.18,showing 100m sprint record times, no human is ever going to be ableto run 100m in anywhere near zero seconds. Times have improved, ofcourse, but there is a physical limit to what can be achieved. To showthis analysis with the y-axis starting from zero would be unnecessaryand even more so if you plotted similar analysis for longer distanceraces.However, if you were to plot the 100m results and the 400m resultson the same chart, you would need to start from zero to enableorientation of the scale of comparable values. This sense ofcomparable scale is missing from the next chart, whereby includingthe full quantitative value range down to zero would be necessary toperceive the relative scale of attitudes towards same-sex marriage.The chart’s y-axis appears to start from an origin of 20 but as we are

510

looking at part-to-whole analysis, the y-axis should really bedisplayed from an origin of zero. The maximum doesn’t need to goup to 100%, the highest observed value is fine in this case, but itcould be interesting to set the maximum range to 100% in order tocreate a similar sense of the gap to be bridged before 100% ofrespondents are in agreement.

Figure 10.18 Excerpt from ‘Doping under the Microscope’

Figure 10.19 Record-high 60% of Americans Support Same-sexMarriage

Aspect ratios: The aspect ratio of a line chart, as derived from theheight and width dimensions of the chart area, can have a largeimpact on the perceived trends presented. If the chart is too narrow,

511

the steepness of connections will be embellished and look moresignificant; if the chart is stretched out too wide, the steepness ofslopes will be much more dampened and key trends may besomewhat disguised. There is no absolutely right or wrong approachhere but clearly there is a need for sensitivity to avoid the possibilityof unintended deception. A general rule of thumb is to seek a chartarea that enables the average slope to be presented at 45°, though thisis not something that can be easily and practically applied, especiallyas there are many other variables at play, such as the range ofquantitative and time values and the scales being used. My advice isjust to make a pragmatic judgement by eye to find the ratio that youthink is faithful to the significance of the trends in your data.Mapping projections: One of the most contentious matters in thevisual representation of data relates to thematic mapping andspecifically to the choice of map projection used. The Earth is not flat(hopefully no contention there, otherwise this discussion is ratheracademic), yet the dominant form through which maps are presentedportrays the Earth as being just that. Features such as size, shape anddistance can be measured accurately on Earth but when projected on aflat surface a compromise has to occur. Only some of these qualitiescan be preserved and represented accurately.

I qualify this with ‘dominant’ because, increasingly, advances intechnology (such as WebGL) mean we can now interact with sphericalportrayals of the Earth within a 2D space.

There are lots of exceptionally complicated calculations attached tothe variety of spatial projections. The main things you need to knowabout projection mapping are that:

every type of map projection has some sort of distortion;the larger the area of the Earth portrayed as a flat map, thegreater the distortion;there is no single right answer – it is often about choosing theleast-worst case.

Thematic mapping (as opposed to mapping spatially for navigation orreference purposes) is generally best portrayed using mapping projectionsbased on ‘equal-area’ calculations (so the sacrifice is more on the shape,not the size). This ensures that the phenomena per unit – the values you are

512

typically plotting – are correctly represented by proportion of regionalarea. For choosing the best specific projection, in the absence of perfect,damage limitation is often the key: that is, which choice will distort thespatial truth the least given the level of mapping required. There are somany variables at play, however, based on the scope of view (world,continent, or country/sub-region), the potential distance from the equatorof your region of focus and whether you are focusing on land, sea or sky(atmosphere), to name but a few. As with many other topics in this field, adiscussion about mapping projections requires a dedicated text but let meat least offer a brief outline of five different projections to begin youracquaintance:

Many tools that offer rudimentary mapping options will tend to onlycome with a default (non-adjustable) projection, often the Mercator (orWeb Mercator). The more advanced geospatial analysis tools will offerpre-loaded or add-in options to broaden and customise the range ofprojections. Hopefully, in time, an increasing range of the morepragmatic desktop tools will enhance projection customisations.

Figure 10.20 A Selection of Commonly Deployed Mapping Projections

513

Accessible DesignGood design is unobtrusive: One of the main obstructions tofacilitating understanding through a visualisation design is whenviewers are required to rely on their memory to perform comparisonsbetween non-simultaneous views.When the composition layout requires viewers to flick between pagesor interactively generated views, they have to try store one view intheir mind and then mentally compare that against the live view that

514

has arrived on the screen. This is too hard and too likely to fail giventhe relatively weak performance of the brain’s working memory.Content that warrants direct comparison should be enabled throughproximity to and alignment with related items. I mentioned in thesection on animation that if you want to compare different states overtime, rather than see the connected system of change, you will need tohave access to the ‘moment’ views simultaneously and without areliance on memory.

‘Using our eyes to switch between different views that are visiblesimultaneously has much lower cognitive load than consulting ourmemory to compare a current view with what was seen before.’Tamara Munzner taken from Visualization Analysis and Design

Elegant Design

‘I’m obsessed with alignments. Sloppy label placement on final filescauses my confidence in the designer to flag. What other details haven’tbeen given full attention? Has the data been handled sloppily as well?… On the flip side, clean, layered and logically built final files are athing of beauty and my confidence in the designer, and their attention todetail, soars.’ Jen Christiansen, Graphics Editor at ScientificAmerican

Unity: As I discussed with colour, composition decisions are alwaysrelative: an object’s place and its space occupied within a displayimmediately create a relationship with everything else in the display.Unity in composition provides a similar sense of harmony andbalance between all objects on show as was sought with colour. Theflow of content should feel logical and meaningful.The enduring idea that elegance in design is most appreciated when itis absent is just as relevant with composition. Look around and openyour eyes to composition that works and does not work, andrecognise the solutions that felt effortless as you read them and thosethat felt punctured and confusing. This is again quite an elusiveconcept and one that only comes with a mixture of common-sensejudgement, experience and exposure to inspiration from elsewhere.Thoroughness: Precision positioning is the demonstration of

515

thoroughness and care that is so important in the pursuit of elegance.You should aim to achieve pixel-perfect accuracy in the position andsize of every single property.Think of the importance of absolute positioning in the context ofdetailed architectural plans that outline the position of every finedetail down to power sockets, door handles and the arc of a window’sopening manoeuvre. A data visualiser has to commit to ultimateprecision and consistency because any shortcomings will beimmediately noticeable and will fundamentally impact on thefunction of the work. If you do not feel a warm glow from everyemphatic snap-to-grid resize operation or upon seeing the results of amass alignment of page objects, you are not doing it right. (Honestly,I am loads of fun to be around.

Summary: CompositionProject composition defines the layout and hierarchy of the entirevisualisation project and may include the following features:

Visual hierarchy – layout: how to arrange the position of elements?Visual hierarchy – size: how to manage the hierarchy of elementsizes?Absolute positioning: where specifically should certain elements beplaced?

Chart composition defines the shape, size and layout choices for allcomponents within your charts and may include the following features:

Chart size: don’t be afraid to shrink charts, so long as any labels arestill readable, and especially embrace the power of small multiple.Chart scales: what are the most meaningful range of values given thenature of the data?Chart orientation: which way is best?Chart value sorting: consider the most meaningful sortingarrangement for your data and editorial focus, based on the LATCHacronym.

Influencing Factors and Considerations

Formulating the brief: what space have you got to work within?

516

Working with data: what is the shape and size of your data and howmight this affect your chart design architecture?Establishing your editorial thinking: how many different angles(charts) might you need to include? Is there any specific focus forthese angles that might influence a sequence or hierarchy betweenthem?Data representation: any chart has a spatial consequence – differentcharts have different structures that will create different dimensionsthat will need to be accommodated.Trustworthy design: the integrity and meaning of your chart scale,chart dimensions, and (for mapping) your projection choices areparamount.Accessible design: remember that good design is unobtrusive – if youwant to facilitate comparisons between different chart displays theseideally need to be presented within a simultaneous view.Elegant design: unity of arrangement is another of the finger-tip sensejudgments but will be something achieved by careful thinking aboutthe relationships between all components of your work.

Tips and Tactics

You will find that as you reach the latter stages of your designprocess, the task of nudging things by fractions of a pixel andrealigning features will dominate your attention. As energy andattention start to diminish you will need to maintain a commitment tothoroughness and a pride in precision right through to the end!Empty space is like punctuation in visual language: use it to break upcontent when it needs that momentary pause, just as how a comma orfull stop is needed in a sentence. Do not be afraid to use empty spacemore extensively across larger regions as a device to create impact.Like the notes not played in jazz, effective visualisation design canalso be about the relationship between something and nothing.

517

518

Part D Developing Your Capabilities

519

11 Visualisation Literacy

This final chapter explores some of the important ingredients and tacticsthat will help you continue to develop and refine your data visualisationliteracy. By definition, literacy is the ability to read and write. Applied todata visualisation, this means possessing the literacy both to createvisualisations (write) and consume them (read).

Data visualisation literacy is increasingly an essential capability regardlessof the domain in which we work and the nature of our technical skills. Justas computer literacy is now a capability that is expected of everyone, onecan imagine a time in the not-too-distant future when having datavisualisation capabilities will be viewed as a similarly ‘assumed’ attributeacross many different roles.

In exploring the components of visualisation literacy across this chapterwe will look at two sides of the same coin: the competencies that make upthe all-round talents of a visualiser but, first, the tactics and considerationsrequired to be an effective and efficient viewer of data visualisation.

11.1 Viewing: Learning to SeeLearning how to understand a data visualisation, as a viewer, is not a topicthat has been much discussed in the field until recently. For many the ideathat there are possible tactics and efficient ways to approach this activity israrely likely to have crossed their mind. We just look at charts and readthem, don’t we? What else is there to consider?

Many of the ideas for this section emerged from the Seeing Datavisualisation literacy research project (seeingdata.org) on which Icollaborated.

The fact is we are all viewers. Even if you never create a visualisationagain you will always be a viewer and you will be widely exposed todifferent visual forms of data and information across your daily life. Youcannot escape them. Therefore, it seems logical that optimisingvisualisation literacy as a consumer is a competency worth developing,

520

Let’s put this into some sort of context. As children we develop the abilityto read numbers and words. These are only understandable because we aretaught how to recognise the association between numeric digits and theirrepresentation as numbers and the connection between alphabeticalcharacters with letters and words. From there we begin to understandsentences and eventually, as we build up a broader vocabulary, we acquirethe literacy of language. This is all a big effort. We are not born knowing alanguage but we are born with the capacity to learn one.

Beyond written language, something as simple and singular as, forexample, the Wi-Fi symbol is now a universally recognised form of visuallanguage but one that only exists in contemporary culture. For millions ofpeople today, this symbol is a signal of relief and tangible celebration –‘Thank God, Wi-Fi is available here!’ The context of the use of thissymbol would have meant nothing to people in the 1990s: it is a symbol ofits time and we have learnt to recognise its use and understand its meaning.

Across all aspects of our lives, there are things that once seemedcomplicated and inaccessible but are now embedded within us asautomatic competencies: driving a car, using a keyboard, cooking a meal. Ioften think back to growing up as a kid in the 1980s and my first(functioning) computer, the mighty Commodore 64 (C64). One of the mostfamous games in the UK from this period was Daley Thompson’sDecathlon. Of particular nostalgic fame was the brutally simple operationof maniacally waggling the single joystick arm left and right to control therunning events (if memory serves me correctly, the single button came intouse when there were hurdles to jump over).

Consider the universally and immediately understandable controlconfiguration of that game with the frankly ludicrous number of optionsand combinations that exist on the modern football games, such as theFIFA series on contemporary consoles like the Xbox or PS4. The controlcombinations required to master the array of attacking moves alone requirean entire page of instruction and remarkable levels of finger dexterity. Yetyoung kids today are almost immediate masters of this game. I shouldknow – I have been beaten by some awfully young opponents. It hurts. Butthey have simply utilised their capacity to learn through reading andrepeated practice.

As discussed in Chapter 1 when looking at the principle of ‘accessible’design, many data visualisations will be intended – and designed – for

521

relatively quick consumption. These might be simple to understand andoffer immediately clear messages for viewers to easily comprehend. Theyare the equivalent of the C64 joystick controls. However, there will beoccasions when you as a viewer are required to invest a bit more time andeffort to work through a visualisation that might be based on subject matteror analysis of a more complex nature, perhaps involving many angles ofanalysis or numerous rich features of interactivity. This is the equivalentprospect of mastering the Xbox controls. Without having the confidence orcapability to extract as much understanding from the viewing experienceas possible and doing so as efficiently as possible, you are potentiallymissing out.

‘Though I consider myself a savvy consumer of bar charts, line graphs,and other traditional styles of data display, I’m totally at sea whentrying to grasp what’s going on in, say, arc diagrams, circular hierarchygraphs, hyperbolic tree charts, or any of the seemingly outlandishvisualisations … I haven’t thought much about this flip side, except thatI do find I now view other people’s visualisations with a more criticaleye.’ Marcia Gray, Graphic Designer

As viewers, we therefore need to acknowledge that there might be a needto learn and a reward from learning. We should not expect every type ofvisualisation to signpost every pearl of insight that is relevant to us. Wemight have to work for it. And we have to work for it because we are notborn with the ability or the right to understand everything that is presentedto us. Few of us will have ever been taught how to go about effectivelyconsuming charts and graphics. We might be given some guidance on howto read charts and histograms, maybe even a scatter plot, if we study mathsor the sciences at school. Otherwise, we get by.

But ‘getting by’ is not really good enough, is it? Even if, through exposureand repetition, we hope gradually to become more familiar with the mostcommon approaches to visualising data, this does not sufficiently equip uswith the breadth and range of literacy that will be required.

I mentioned earlier the concept, proposed by Daniel Kahneman, of System1 and System 2 thinking. The distinctions of these modes of thoughtmanifest themselves again here. Remember how System 1 was intuitiveand rapid whereas System 2 was slow, deliberate and almost consciouslyundertaken? For example, you are acutely aware of thinking when trying

522

to run a mathematical calculation through your mind. That is System 2 atwork. In part, due to the almost hyperactive and instinctive characteristicsof System 1, when there is a need for System 2 thinking to kick intoaction, we might try to avoid whatever that activity entails. We get lazyand resort to shortcut solutions or decisions based on intuition. System 1almost persuades System 2 to sit back and let it look after things. Anythingto avoid having to expend effort thinking deeply and rationally.

The demands of learning anything new or hard can trigger that kind ofresponse. It is understandable that somebody facing a complex orunfamiliar visualisation that needs learning might demonstrate antipathytowards the effort required to learn.

Of course, there are other factors involved in learning, such as having thetime, receiving assistance or tuition, and recognising the incentive. Theseare all enablers and therefore their absence can create obstacles to learning.Without assistance from the visualiser, viewers are left to fend forthemselves. The role of this book has primarily been to try to raise thestandard of the design choices that visualisers make when creatingvisualisations. Visualisers do not want to obstruct viewers from being ableto read, interpret and comprehend. If work is riddled with design errorsand misjudgements then viewers are naturally going to be disadvantaged.

However, even with a technically perfect design, as I explained in thedefinition section of the first chapter, we as visualisers can only do somuch to control this experience. There are things we can do to make ourwork as accessible as possible, but there is also a partial expectation of theviewer to be willing to make some effort (so long as it is ‘proportional’) toget the most out of the experience. The key point, however, is that thiseffort should be rewarded.

Many of the visualisations that you will have seen in this book,particularly in Chapter 6, may have been unfamiliar and new to you. Theyneed learning. Your confidence in being able to read different types ofcharts is something that will develop through practice and exposure. It willbe slow and deliberate at first, probably a little consciously painful, butthen, over time, as the familiarity increases and the experiential benefitskick in, perceiving these different types of representations will becomequite effortless and automatic. System 2 thinking will then transform into areliably quick form of System 1 thinking.

523

Over the next few pages I will present a breakdown of the components ofeffectively working with a visualisation from the perspective of being aviewer. This demonstration will provide you with a strategy forapproaching any visualisation with the best chance of understanding howto read it and ensure you gain the benefit of understanding from being ableto read it.

To start with I will outline the instinctive thoughts and judgements youwill need to make before you begin working with a visualisation. I willthen separate the different features of a visualisation, first by consideringthe common components that sit outside the chart and then some pointersfor how to go about perceiving what is presented inside the chart. This partwill also connect with the content included in the chart type gallery foundin Chapter 6 describing how to read each unique chart type. Finally, I willtouch on the attributes that will lead you, in the longer term, to becoming amore sophisticated viewer.

It is important to note that not all data visualisation and infographicdesigns will have all the design features and apparatus items that I describeover the next few sections. There may be good reasons for this in eachcase, depending on the context. However, if you find there are significantgaps in the work you are consuming, or features of assistance have beendeployed without real care or quality, that would point to flawed design. Inthese cases the viewer is not really being given all the assistance required:the visualiser has failed to facilitate understanding.

Figure 11.1 The Pursuit of Faster

524

To illustrate this process I will refer to a case-study project titled ‘ThePursuit of Faster: Visualising the Evolution of Olympic Speed’. As the titlesuggests, the focus of this work was to explore how results have changed(improved or declined) over the years of the Olympics for those eventswhere speed (as measured by a finishing time) was the determinant ofsuccess.

Before You BeginHere are some of the instinctive, immediate thoughts that will cross yourmind as soon as you come face to face with a data visualisation. Onceagain, these are consistent with the impulsive nature of the System 1thoughts mentioned earlier.

Setting: Think about whether the setting you are in is conducive to

525

consuming a visualisation at that moment in time. Are you under anypressure of time? Are you on a bumpy train trying to read this on yoursmartphone?Visual appeal? In this early period of engaging with the work youwill be making a number of rapid judgements to determine whetheryou are ‘on board’. One of the ingredients of this is to considerwhether the look and feel (the ‘form’) of the visualisation attract youand motivate you to want to spend time with it.Relevance? In addition to the visual appeal, the second powerfulinstinct is to judge whether the subject matter interests you. Youmight have decided you are on board with your instinctive reaction tothe visuals but the key hurdle is whether it is even interesting orrelevant to you. Ask yourself if this visualisation is going to deliversome form of useful understanding that confirms, enlightens or thrillsyou about the topic.If you respond positively to both those considerations you will likelybe intent on continuing to work with the visualisation. Even if you arejust positive about one of these factors (form or subject) you willmost probably persevere despite the indifference towards the other. Ifyour thoughts are leaning towards a lack of interest in both therelevance of this work and its visual appeal then, depending oncircumstances, your tolerance may not be high enough to continueand it will be better to abandon the task there and then.Initial scan? It is inevitable that your eyes will be instinctively drawnto certain prominent features. This might be the title or even the chartitself. You may be drawn to a strikingly large bar or a sudden upwardrise on a line chart. You might see a headline caption that capturesyour attention or maybe some striking photo imagery. It is hard tofight our natural instincts, so don’t. Allow yourself a brief glance atthe things you feel compelled to look at – these are likely the samethings the visualiser is probably hoping you are drawn to. Quicklyscanning the whole piece, just for an initial period of time, gives youa sense of orientation about what is in store.In ‘The Pursuit of Faster’ project you might find yourself only drawnto this if you have a passing interest in the Olympics and/or thehistory of athletic achievement. On the surface, the visuals might lookquite analytical in nature, which might turn some people off. Theinitial scan probably focuses on elements like the Olympic rings andthe upward direction of the lines in the chart which might offer adegree of intrigue, as might the apparent range of interactive controls.

526

Outside the ChartBefore getting into the nuts and bolts of understanding the chart displays,you will first need to seek assistance from the project at large tounderstand in more detail what you are about to take on and how youmight need to go about working with it.

The Proposition

Considering the proposition offered by the visualisation is aboutdetermining how big a task of consuming and possibly interacting youhave ahead of you. What is its shape, size and nature?

Format: Is it presented in a print, physical or digital format and whatdoes this make you feel about your potential appetite and the level ofyour engagement? Is it static or interactive and what does this presentin terms of task?

If it is a static graphic, how large and varied is the content – is ita dense display with lots of charts and text, or quite a small andcompact one? Does the sequence of content appear logical?If it is interactive, how much potential interactivity does thereappear to be – are there many buttons, menus, options, etc.?Where do the interactive events take you? Are there multipletabs, pages or layers beneath this initial page? Have a clickaround.

Shape and size: Do you think you will probably to have to put in alot of work just to scan the surface insights? Is there a clear hierarchyor sequence derived through the size and position of elements on thepage? Does it feel like there is too much or too little content-wise? Ifthe project layout exceeds the dimensions of your screen display, howmuch more scrolling or how many different pages will you have tolook through to see the whole?

This initial thinking helps you establish how much work and effort you aregoing to be faced with to explore the visualisation thoroughly. In ‘ThePursuit of Faster’ project, it does not feel like there is too much contentand all the possible analysis seems to be located within the boundaries ofthe immediate screen area. However, with a number of different selectabletabs, interactive options and collapsible content areas lurking beneath thesurface, it could be more involving than it first appears.

527

What’s this Project About?

Although you have already determined the potential relevance of thissubject matter (or otherwise) you will now look to gain a little moreinsight into what the visualisation is specifically about.

Title: You will have probably already glanced at the title but nowhave another look at it to see if you can learn more about the subjectmatter, the specific angle of enquiry or perhaps a headline finding. Inthe sample project (Figures 11.2 and 11.3), the presence of theOlympic rings logo on the right provides an immediate visual cueabout the subject matter, as you might have observed in the initialscan. The title, ‘The Pursuit of Faster’, is quite ambiguous, but as thesupporting subtitle reveals, ‘Visualising the evolution of Olympicspeed’ helps to explain what the visualisation is about.

Figure 11.2 Excerpt from ‘The Pursuit of Faster’

Source: If it is a web-based visualisation the URL is worthconsidering. You might already know where you are on the Web, butif not you can derive plenty from the site on which this project isbeing hosted. An initial sense about trust in the data, the author andthe possible credibility of insights can be drawn from this single bit ofinformation. This particular project is hosted on my website,visualisingdata.com, and so may not carry the same immediaterecognition that an established Olympics or sport-related site mightcommand. There is nothing provided in the main view of thevisualisation that informs the viewer who created the project.Normally this might have been detailed towards the bottom of thedisplay or underneath the title, but in this case viewers have to clickon a ‘Read more…’ link to find this out. If there are no detailsprovided about the author/visualiser, as a viewer, this anonymitymight have any affect on your trust in the work’s motives and quality.Introduction: While some visualisation projects will be relativelyself-explanatory, depending on the familiarity of the audience withthe subject matter, others will need to provide a little extra guidance.

528

The inclusion of introductory text will often help ‘set the scene’,providing some further background about the project. If, as theviewer, the introduction fails to equip you with all the informationyou feel you need about the visualisation, then the visualiser hasneglected to include all the assistance that might be necessary.

In ‘The Pursuit of Faster’ project, the introductory text provides sufficientinitial information about the background of the project based on a curiosityabout what improvements in speed have been seen throughout the historyof the Olympics. As mentioned, there is a ‘Read more…’ link to find moreinformation that was perhaps too much to include in the main openingparagraph. This includes a comprehensive ‘How to use it’ guide providinga detailed account about the content and role of each section of the project,including advice on how to read the chart and utilise the interactivefeatures.

Figure 11.3 Excerpt from ‘The Pursuit of Faster’

What Data?

Any visualisation of data should include clear information to explain theorigin of the data and what has been done with it in preparation for itsvisual portrayal.

529

Data source: Typically, details of the data source will be located inthe introduction, as a footnote beneath a chart or at the bottom of apage. It is important to demonstrate transparency and give credit tothe origin of your data. If none is provided, that lowers trust.Data handling: It is also important to explain how the data wasgathered and what, if any, criteria were applied to include or excludecertain aspects of the subject matter. These might also mentioncertain assumptions, calculations or transformations that have beenundertaken on the data and are important for the reader to appreciate.

In ‘The Pursuit of Faster’ project, the link you saw earlier to ‘Read more…’ provides details about the origin of the data and the fact that it onlyincludes medal winners from summer Olympic events that have a time-based measure.

What Interactive Functions Exist?

As you have seen in Chapter 8, interactive visualisations (typically hostedon the Web or in an app) aim to provide users with a range of features tointerrogate and customise the presentation of the data.

Sometimes, interactive features are enabled but not visible on the surfaceof a project. This might be because visualisers feel that users will beexperienced enough to expect certain interactive capabilities withouthaving to make these overly conspicuous by labelling or signposting theirpresence. For example, rather than show all the value labels on a bar chartyou might be able to move the mouse over a bar of choice and a pop-upwill reveal the value. The project might not tell you that you can do this,but you may intuitively expect to. Always fully explore the display withthe mouse or through touch in order to gain a sense of all the differentvisible and possibly invisible ways you can interact with the visualisation.

In ‘The Pursuit of Faster’ project (Figure 11.4), you will see multiple tabsat the top, one for each of the four sports being analysed. Clicking on eachone opens up a new set of sub-tabs beneath for each specific event withinthe chosen sport.

Figure 11.4 Excerpt from ‘The Pursuit of Faster’

530

Choosing an event will present the results in the main chart area (Figure11.5). Once a chart has loaded up, you can then filter for male/female andalso for each of the medals using the buttons immediately below the chart.Within the chart, hovering above a marker on the chart will reveal thespecific time value for that result. Clicking on the marker will show thefull race results and offer further analysis comparing those results with theall-time results for context.

Figure 11.5 Excerpt from ‘The Pursuit of Faster’

Finally, the collapsible menus below the chart show further detailedanalysis and comparisons within and between each sporting event (Figure11.6). The location of this implies that it is of lower relative importancethan the chart or maybe is a more detailed view of the data.

Figure 11.6 Excerpt from ‘The Pursuit of Faster’

531

Inside the ChartNow you have acquainted yourself with the key features of a visualisationoutside the chart, the next stage is to start the process of derivingunderstanding from the chart.

The process of consuming a chart varies considerably between differentchart types: the approach to drawing observations from a chart showingtrends over time is very different from how you might explore a map-based visualisation. The charts I profiled in Chapter 6 were eachaccompanied by detailed information on the type of observations youshould be looking to extract in each case.

In Chapter 1 you learnt how there were three elements involved in theachievement of understanding a chart: perceiving, interpreting andcomprehending. Let’s work through these steps by looking at the analysisshown for the 100m Finals.

Perceiving: The first task in perceiving a chart is to establish yourunderstanding about the role of every aspect of the display. Here wehave a line chart (Figure 11.7) which shows how quantitative values

532

for categories have changed over time. This chart is structured arounda horizontal x-axis showing equal intervals from the earliestOlympics (1896) on the left through to the most recent (2012) on theright, although the latest values in the data only seem to reach 2008.Depending on your interest in this topic, the absence of data for themore recent Olympics may undermine your sense of its completenessand representativeness.

Figure 11.7 Excerpt from ‘The Pursuit of Faster’

The vertical y-axis is different from what you might normally see fortwo reasons. Firstly, it moves downwards below the x-axis (ratherthan upwards, as is more common), and secondly, there is nolabelling, either of the variable plotted or of scale values.I can see that the encoding is formed by points (marking the raceresults) and connecting lines showing the change over time. Throughthe use of colour there are plotted lines for the gold, silver and bronzemedal winning times for each Olympics. There are two sets of medallines but there is no obvious distinction to explain what these are.With no direct labelling of the values I hover over the point (‘medal’)markers and a tooltip annotation comes up with the athlete’s nameand time in a medal-coloured box. I compare tooltip info for the linesat the top and those below and discover the lower lines are thewomen’s results and the upper lines are the men’s results.From the tooltip info I can determine that the quicker times (the goldmedal line) are at the top so this suggests that the y-axis scale isinverted with quicker (smaller) times at the top and slower (larger)times at the bottom. This also reveals that there is no origin of zero inthe vertical axis; rather the quickest time is anchored just below the

533

top of the chart, the slowest stretches down to the bottom of the chart,and then all the values in between are distributed proportionally.

Interjecting as the visualiser responsible for this project, let me explainthat the focus was on patterns of relative change over time, notnecessarily absolute result times. As every different event has adifferent distance and duration behind the final timed results, a commonscale for all results needed to be established, which is why this decisionwas taken to standardise all results and plot them across the verticalchart space provided.

Inside the chart I now try clicking on the markers and this brings updetails about the event (for that gender), including the three medalwinners, their times and small flags for the countries they represented.I can also read an interesting statistic that explains if the time for themedallist I selected had been achieved throughout the event’s history,it would have secured gold, silver or bronze medals on x number ofoccasions.I now know enough about the chart’s structure and encodings to beable to start the process of perceiving the patterns to make someobservations about what the data is showing me:

I can see that there is a general rise across all Olympics for theevent in both men’s and women’s results.It feels like the women’s times are getting closer to the men’s,with Florence Griffith Joyner’s victory time in 1988 being theclosest that the respective times have been – her result therewould have been good enough for a men’s bronze in 1956.There are no real patterns between medal times; they are neitheralways more packed closely together, nor always spread out – itchanges on each occasion.I notice the gaps where there were no events, during the Firstand Second World Wars, and also the presence of an obscure1906 event, the only Olympic Games that did not follow thefour-year interval.

Interpreting: As someone who follows a lot of sport and, like mostpeople, is particularly familiar with the 100m event, I feel there is a lot ofinformation I can get out of this display at both a general level, looking atthe relative patterns of change, and a local level, checking up on individualmedallists and their absolute values. Thinking about what these patterns

534

mean, on looking at the times from the first Olympic Games in 1896 untilthe 1960s there was a lot of improvement and yet, since the 1960s, there isgenerally a much flatter shape – with only a gradual improvement in thetimes for both genders. This tells me that maybe the threshold for thecapacity of athletes to run faster is getting closer. Even with all thecontributions of sports science over the past few generations, the increasein speed is only ever marginal. That was until Usain Bolt blew the worldaway in 2008 and, likewise for women, Shelly-Ann Fraser improved thewomen’s results for the first time in 20 years.

Comprehending: What does this all mean to me? Well it isinteresting and informative and, while I have no direct investment inthis information in terms of needing to make decisions or it triggeringany sense of emotion in me, in outcome terms I feel I have learntmore about a topic through this chart than I would have done justlooking at the data. My understanding of the history of the Olympic100m final has been expanded and, in turn, I have a betterappreciation of the advancements in speed across and between bothgenders.

Becoming a More Sophisticated ConsumerEffective visualisation requires the visualiser and viewer to operate inharmony, otherwise the possibility of facilitating understanding iscompromised. Beyond the mechanics of perceiving a visualisation, thereare softer ‘attitudinal’ differences you can make to give yourself evenmore of a chance of gaining understanding. This is about modifying yourmindset to be more critically appreciative of the challenges faced by thevisualiser responsible for producing the work as well as its intendedpurpose. It is about showing empathy in your critical evaluation which willmarkedly help you become an increasingly sophisticated consumer.

Appreciation of context: When consuming a visualisation try toimagine some of the circumstances and constraints that might haveinfluenced the visualiser’s decisions:

You might not find the subject matter interesting, but otherpeople might. You have the right not to read or interact with avisualisation that has no relevance to you. If it should haverelevance, then that’s when there may be some problems!If you are struggling to understand a visualisation it could be that

535

the project was aimed more at specialists, people with specificdomain knowledge. Your struggles are possibly not a reflectionof an ineffective visualisation or any deficit in your expectedknowledge – it just was not intended for you.If the size of the text is frustratingly tiny on your screen, maybeit was intended primarily for printing as a poster and would havebeen the right size if consumed in its native format?When criticising a work, spare a thought for what could havebeen done differently. How would you imagine an alternativeway to represent the data? What other design solutions wouldyou have tried? Sometimes what is created is a reflection ofcrippling constraints and might more closely resemble the least-worst solution than the best.

Overview first, details if provided: Sometimes a visualiser only aims tooffer a sense of the big picture – the big values, the medium and the smallones. Just because we cannot instantly read precise values from a chart it isimportant to avoid getting frustrated. Our default state as viewers is oftento want every detail available. Sometimes, we just need to accept the ideathat a gist of the hierarchy of values is of more worth than the precisedecimal point precision of specific values. It may be that it was notfeasible to use a chart that would deliver such detailed reading of the data– many charts simply cannot fulfil this. We might not even realise that weare just a mouseover or click away from bringing up the details we desire.

False consciousness: Do you really like the things you like?Sometimes we can be too quick to offer a ‘wow’ or a ‘how cool isthat?’ summary judgement before even consuming the visualisationproperly. It is quite natural to be charmed by a superficial surfaceappeal (occasionally, dare I say it, following the crowd?). Askyourself if it is the subject, the design or the data you like? Could anyportrayal of that compelling data have arrived at an equallycompelling presentation of that content?Curiosities answered, curiosities not answered: Just because thecuriosity you had about a subject is not answerable does not make thevisualisation a bad one. Statements like ‘This is great but I wishthey’d shown it by year …’ are valid because they express your owncuriosity, to which you are entirely entitled. However, a visualiser canonly serve up responses to a limited number of different angles ofanalysis in one project. The things you wanted to know about, which

536

might be missing, may simply have not been possible to include orwere deemed less interesting than the information provided. If youare thinking ‘this would have been better on a map’, maybe there wasno access to spatial data? Or maybe the geographical details were toovague or inaccurate to generate sufficient confidence to use them?

11.2 Creating: The Capabilities of theVisualiserNow that you are reaching the end of this journey, it will be quite evidentthat data visualisation design is truly multidisciplinary. It is the variety thatfuels the richness of the subject and makes it a particularly compellingchallenge. To prepare you for your ongoing development, the second partof this final chapter aims to help you reflect on the repertoire of skills,knowledge and mindsets required to achieve excellence in datavisualisation design.

The Seven Hats of Data VisualisationInspired by Edward de Bono’s Six Thinking Hats, the ‘Seven hats of datavisualisation’ is a breakdown of the different capabilities that make up themulti-talented visualiser. The attributes listed under each of these hats canbe viewed as a wish-list of personal or team capabilities, depending on thecontext of your data visualisation work.

Project Manager

The coordinator – oversees the projectInitiates and leads on formulating the briefIdentifies and establishes definitions of key circumstancesOrganises the resources according to the ambition of a projectManages progress of the workflow and keeps it cohesiveHas a ‘thick skin’, patience and empathyGets things done: checks, tests, finishes tasksPays strong attention to detail

Communicator

537

The broker – manages the people relationshipsHelps to gather and understand requirementsManages expectations and presents possibilitiesHelps to define the perspective of the audienceIs a good listener with a willingness to learn from domain expertsIs a confident communicator with laypeople and non-specialistsPossesses strong copy-editing abilitiesLaunches and promotes the final solution

Scientist

The thinker – provides scientific rigourBrings a strong research mindset to the processUnderstands the science of visual perceptionUnderstands visualisation, statistical and data ethicsUnderstands the influence of human factorsVerifies and validates the integrity of all data and design decisionsDemonstrates a system’s thinking approach to problem solvingUndertakes reflective evaluation and critique

Data Analyst

The wrangler – handles all data workHas strong data and statistical literacyHas the technical skills to acquire data from multiple sourcesExamines the physical properties of the dataUndertakes initial descriptive analysisTransforms and prepares the data for its purposeUndertakes exploratory data analysisHas database and data modelling experience

Journalist

The reporter – pursues the scent of an enquiryDefines the trigger curiosity and purpose of the projectHas an instinct to research, learn and discoverDriven by a desire to help others understandPossesses or is able to acquire salient domain knowledgeUnderstands the essence of the subject’s dataHas empathy for the interests and needs of an audience

538

Defines the editorial angle, framing and focus

Designer

The conceiver – provides creative directionEstablishes the initial creative pathway through the purpose mapForms the initial mental visualisation: ideas and inspirationHas strong creative, graphic and illustration skillsUnderstands the principles of user interface designIs fluent with the full array of possible design optionsUnifies the decision-making across the design anatomyHas a relentless creative drive to keep innovating

Technologist

The developer – constructs the solutionPossesses a repertoire of software and programming capabilitiesHas an appetite to acquire new technical solutionsPossesses strong mathematical knowledgeCan automate otherwise manually intensive processesHas the discipline to avoid feature creepWorks on the prototyping and development of the solutionUndertakes pre- and post-launch testing, evaluation and support

Assessing and Developing Your CapabilitiesData visualisation is not necessarily a hard subject to master, but there areplenty of technical and complicated matters to handle. A trained or naturaltalent in areas like graphic design, computer science, journalism and dataanalysis is advantageous, but very few people have all these hats. Thosethat do cannot be exceptional at everything listed, but may be sufficientlycompetent at most things and then brilliant at some. Developing masteryacross the full collection of attributes is probably unachievable, but itoffers a framework for guiding an assessment of your current abilities anda roadmap for the development of any current shortcomings.

I am painfully aware of the things I am simply not good enough at(programming), the things I have no direct education in (graphic design)and the things I do not enjoy (finishing, proofreading, note-taking).

539

Compromise is required with the things you do not like – there are alwaysgoing to be unattractive tasks, so just bite the bullet and get on with them.Otherwise, you must seek either to address your skills gap throughlearning and/or intensive practice, finding support from elsewhere throughcollaboration, or to simply limit your ambitions based on what you can do.

Regardless of their background or previous experience, everyone hassomething to contribute to data visualisation. Talent is important, ofcourse, but better thinking is, in my view, the essential foundation to focuson first. Mastering the demands of a systems’ thinking approach to datavisualisation – being aware of the options and the mechanics behindmaking choices – arguably has a greater influence on effective work.Thereafter, the journey from good to great, as with anything, involves hardwork, plenty of learning, lots of guidance and, most importantly, relentlesspractice.

‘Invariably, people who are new to visualisation want to know where tobegin, and, frankly, it’s understandably overwhelming. There is somuch powerful work now being done at such a high level of quality,that it can be quite intimidating! But you have to start somewhere, and Idon’t think it matters where you start. In fact, it’s best to start whereveryou are now. Start from your own experience, and move forward. Onereason I love this field is that everyone comes from a differentbackground – I get to meet architects, designers, artists, coders,statisticians, journalists, data scientists … Data vis is an inherentlyinterdisciplinary practice: that’s an opportunity to learn something abouteverything! The people who are most successful in this field are curiousand motivated. Don’t worry if you feel you don’t have skills yet; juststart from where you are, share your work, and engage with others.’Scott Murray, Designer

The Value of the TeamThe idea of team work is important. There are advantages to pursuing datavisualisation solutions collaboratively, bringing together different abilitiesand perspectives to a shared challenge. In workplaces across industries andsectors, as the field matures and becomes more embedded, I would expectto see a greater shift towards recognising the need for interdisciplinaryteams to fulfil data visualisation projects collectively.

540

The best functioning visualisation team will offer a collective blend ofskills across all these hats, substantiating some inevitably, but also,critically, avoiding skewing the sensibilities towards one dominant talent.Success will be hard to achieve if a team comprises a dominance intechnologists or a concentration of ‘ideas’ people whose work neverprogresses past the sketchbook. You need the right blend in any team.

We have seen quite a lot of great examples of visualisation and infographicwork from newspaper and media organisations. In the larger organisationsthat have the fortune of (relatively) large graphics departments, teamworking is an essential ingredient behind much of the success they havehad. Producing relentlessly high-quality, innovative and multiple projectsin parallel, within the demands of the news environment, is no mean feat.Such organisations might have the most people and also some of the bestpeople, but their output is still representative of their punching above theirweight, no matter how considerable that base.

Developing Through EvaluatingThere are two components in evaluating the outcome of a visualisationsolution that will help refine your capabilities: what was the outcome ofthe work and how do you reflect on your performance?

Outcome: Measuring effectiveness in data visualisation remains anelusive task – in many ways it is the field’s ‘Everest’ – largelybecause it must be defined according to local, contextual measures ofsuccess. This is why establishing an early view of the intended‘purpose’, and then refining it if circumstances change, was necessaryto guide your thinking throughout this workflow.Sometimes effectiveness is tangible, but most times it is entirelyintangible. If the purpose of the work is to further the debate about asubject, to establish one’s reputation or voice of authority, then thoseare hard things to pin down in terms of a yes/no outcome. One optionmay be to flip the measure of effectiveness on its head and seek outevidence of tangible ineffectiveness. For example, there may besignificant reputation-based impacts should decisions be made oninaccurate, misleading or inaccessible visual information.There are, of course, some relatively free quantitative measures thatare available for digital projects, including web-based measures suchas visitor counts and social media metrics (likes, retweets, mentions).

541

These, at least, provide a surface indicator of success in terms of theproject’s apparent appeal and spread. Ideally, however, you shouldaspire also to collect more reliable qualitative and value-addedfeedback, even if this can, at times, be rather expensive to secure.Some options include:

capturing anecdotal evidence from comments submitted on asite, opinions attributed to tweets or other social mediadescriptors, feedback shared in emails or in person;informal feedback through polls or short surveys;formal case studies which might offer more structured interviewsand observations about documented effects;experiments with controlled tasks/conditions and trackedperformance measures.

Your performance: A personal reflection or assessment of yourcontribution to a project is important for your own development. The bestway to learn is by considering the things you enjoyed and/or did well (anddoing more of those things) and identifying the things you did not enjoy/dowell (and doing less of those things or doing them better). So look backover your project experience and consider the following:

Were you satisfied with your solution? If yes, why; if no, why andwhat would you do differently?In a different context, what other design solutions might you haveconsidered?Were there any skill or knowledge shortcomings that restricted yourprocess and/or solution?Are there aspects of this project that you might seek to recycle orreproduce in other projects? For instance, ideas that did not make thefinal cut but could be given new life in other challenges?How well did you utilise your time? Were there any activities onwhich you feel you spent too much time?

Developing effectiveness and efficiency in your data visualisation workwill take time and will require your ongoing efforts to learn, apply, reflectand repeat again. I am still learning new things every day. It is a journeythat never stops because data visualisation is a subject that has no ending.

‘There is not one project I have been involved in that I would executeexactly the same way second time around. I could conceivably pick any

542

of them – and probably the thing they could all benefit most from?More inter-disciplinary expertise.’ Alan Smith OBE, DataVisualisation Editor, Financial Times

However, to try offer a suitable conclusion to this book, at least, I willleave you with this wonderful bit of transcribed from a video of Ira Glass,host and producer of ‘This American Life’.

Nobody tells this to people who are beginners, I really wish someonehad told this to me. All of us who do creative work, we get into itbecause we have good taste… [but] there is this gap and for the firstcouple of years that you’re making stuff, what you’re making is justnot that good… It’s trying to be good, it has potential, but it’s not.But your taste, the thing that got you into the game, is still killer. Andyour taste is why your work disappoints you. A lot of people neverget past this phase, they quit. Most people I know who do interesting,creative work went through years of this. We know our work doesn’thave this special thing that we want it to have. We all go through this.And if you are just starting out or you are still in this phase, you gottaknow it’s normal and the most important thing you can do is do a lotof work. Put yourself on a deadline so that every week you will finishone story. It is only by going through a volume of work that you willclose that gap, and your work will be as good as your ambitions. AndI took longer to figure out how to do this than anyone I’ve ever met.It’s gonna take awhile. It’s normal to take awhile. You’ve just gottafight your way through.

Summary: Visualisation Literacy

Viewing: Learning to See

Before You Begin

Setting: is the situation you are in conducive to the task of consuminga visualisation? In a rush? Travelling?Visual appeal: are you sufficiently attracted to the appearance of thework?Relevance: do you have an interest or a need to engage with this

543

topic?Initial scan: quickly orientate yourself around the page or screen, andallow yourself a brief moment to be drawn to certain features.

Outside the Chart

The proposition: what task awaits? What format, function, shape andsize of visualisation have you got to work with?What’s the project about?: look at the titles, source, and read throughany introductory explanations.What data?: look for information about where the data has originatedfrom and what might have been done to it.What interactive functions exist?: if it is a digital solution browsequickly and acquaint yourself with the range of interactive devices.

Inside the Chart Refer to the Chart Type Gallery in Chapter 6 to learnabout the approaches to perceiving and interpreting different chart types.

Perceiving: what does it show?Interpreting: what does it mean?Comprehending: what does it mean to me?

Becoming a More Sophisticated Consumer

Appreciation of context: what circumstances might the visualiserhave been faced with that are hidden from you as a viewer?Overview first, details if provided: accept that sometimes a projectonly aims to (or maybe only can) provide a big-picture gist of thedata, rather than precise details.False consciousness: don’t be too quick to determine that you like avisualisation. Challenge yourself, do you really like it? Do you reallygain understanding from it?Curiosities answered, curiosities not answered: just because it doesnot answer your curiosity, it might answer those of plenty of others.

Creating: The Capabilities of the Visualiser

The Seven Hats of Data Visualisation Design

544

Project Manager: the coordinator – oversees the project.Communicator: the broker – manages the people relationships.Scientist: the thinker – provides scientific rigour.Data analyst: the wrangler – handles all the data work.Journalist: the reporter – pursues the scent of enquiry.Designer: the conceiver – provides creative direction.Technologist: the developer – constructs the solution.

Assessing and Developing Your Capabilities

The importance of reflective learning: evaluating the outcome of thework you have created and assessing your own performance duringits production.

Tips and Tactics

The life and energy of data visualisation are online: keep on top ofblogs, the websites of major practitioners and agencies creating greatwork. On social media (especially Twitter, Reddit) you will find avery active and open community that is willing to share and help.Practise, practise, practise: experience is the key – identify personalprojects to explore different techniques and challenges.Learn about yourself: take notes, reflect, self-critique, recognise yourlimits.Learn from others: consume case studies and process narratives,evaluate the work of others (‘what would I do differently?’).Expose yourself to the ideas and practices of other related creativeand communication fields: writing, video games, graphic design,architecture, cartoonists.

545

References

These references relate to content mentioned in the body text and/orattributed quotes that do not come from individual interviews with theauthor. Extensive further reading lists to support each chapter’s content areprovided in the companion digital resources.Bertin, Jacques (2011) Semiology of Graphics: Diagrams, Networks,

Maps. Redlands, CA: ESRI Press.

Beveridge, Harriet and Hunt-Davis, Ben (2011) Will it Make the Boat GoFaster? Olympic-winning Strategies for Everyday Success. Leicester:Matador.

Booker, Christopher (2004) The Seven Basic Plots: Why We Tell Stories.New York: Continuum.

Buckle, Catherine (2014) Millions, billions, trillions: Letters fromZimbabwe, 2005−2009. http://www.cathybuckle.com/Millions-Billions-Trillions.php

Butterick, Matthew (2013) Practical Typography.http://practicaltypography.com

Buxton, Bill (2007) Sketching User Experiences. San Francisco, CA:Elsevier.

Cairo, Alberto (2012) The Functional Art. San Francisco, CA: Peachpit.

Chimero, Frank (2012) The Shape of Design.http://shapeofdesignbook.com/

Cleveland, William S. and McGill, Robert M. (1984) ‘Graphical

546

Perception: Theory, Experimentation, and Application to theDevelopment of Graphical Methods’. Journal of the AmericanStatistical Association, vol. 79, no. 387, pp. 531–54.

Coats, Emma (2011) Originally via Twitter, collated viahttp://www.aerogrammestudio.com/2013/03/07/pixars-22-rules-of-storytelling/

Cox, Amanda (2013) Harvard Business Review.https://hbr.org/2013/03/power-of-visualizations-aha-moment/

Crawford, Kate (2013) Harvard Business Review.http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html

de Bono,Edward (1985) Six thinking hats. New York: Little, Brown.

Dilnott, Andrew (2013) Presented at a conference about the Office forNational Statistics.https://twitter.com/GuardianData/status/313965008478425089

Fernandez, Manny (2013) New York Times.http://www.nytimes.com/2013/06/30/us/from-americas-busiest-death-chamber-a-catalog-of-last-rants-pleas-and-apologies.html?_r=0

Glass, Ira (2009) Open Culture.http://www.openculture.com/2009/10/ira_glass_on_the_art_of_story_telling.html

Gore, Al (2006) Presentation from An Inconvenient Truth, directed byDavis Guggenheim.

Heer, Jeffrey and Schneiderman, Ben (2012) ‘Interactive Dynamics forVisual Analysis’. ACM Queue, vol. 10, no. 2, p. 30.

547

Ive, Jonny, Kemp, Klaus and Lovell, Sophie (2011) Dieter Rams: As LittleDesign As Possible. London: Phaidon Press.

Jordan, Chris (2006) TEDtalk.http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html

Kahneman, Daniel (2011) Thinking Fast and Slow. New York: Farrar,Straus & Giroux.

Kosara, Robert (2013) eagereyes: How The Rainbow Color Map Misleads.http://eagereyes.org/basics/rainbow-color-map

Lupi, Giorgia (2014) Green Futures Magazine.https://www.forumforthefuture.org/greenfutures/articles/why-i-draw-giorgia-lupi-art-visual-understanding

Mackinlay, Jock (1986) ‘Automating the Design of GraphicalPresentations of Relational Information’. ACM Transactions onGraphics (TOG), vol. 5, no. 2, pp. 110–41.

Meyer, Robinson (2014) ‘The New York Times’ Most Popular Story of2013 Was Not an Article’. The Atlantic.http://www.theatlantic.com/technology/archive/2014/01/-em-the-new-york-times-em-most-popular-story-of-2013-was-not-an-article/283167/

Morton, Jill (2015) Color Matters. http://www.colormatters.com/color-and-design/basic-color-theory

Munzner, Tamara (2014) Visualization Analysis and Design. Boca Raton,FL: CRC Press.

Reichenstein, Oliver (2013) Information Architects. https://ia.net/know-how/learning-to-see

548

Rosling, Hans (2006) TEDtalk.https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen

Rumsfeld, Donald (2002) US DoD News Briefing.https://en.wikipedia.org/wiki/There_are_known_knowns

Satyanarayan, Arvind and Heer, Jeffrey (2014) ‘Lyra: An InteractiveVisualization Design Environment’. Computer Graphics Forum(Proceedings of EuroVis, 2014).

Shneiderman, Ben (1996) ‘The Eyes Have It: A Task by Data TypeTaxonomy for Information Visualizations’. Proceedings of the IEEESymposium on Visual Languages. Washington, DC: IEEE ComputerSociety Press, pp. 336–43.

Slobin, Sarah (2014) Source. https://source.opennews.org/en-US/learning/what-if-data-visualization-actually-people

Stefaner, Moritz (2014) Well-Formed Data. http://well-formed-data.net/archives/1027/worlds-not-stories

Tukey, John W. (1980) ‘We Need Both Exploratory and Confirmatory’.The American Statistician, vol. 34, no. 1, pp. 23–5.

Tversky, Barbara and Bauer Morrison, Julie (2002) ‘Animation: Can itfacilitate?’. International Journal of Human-Computer Studies – Specialissue: Interactive graphical communication, vol. 57, no. 4, pp. 247–62.

Vitruvius Pollio, Marcus (15 BC) ‘De architectura’.

White, Alex (2002) The Elements of Graphic Design. New York: AllworthPress.

549

Wolfers, Justin (2014) TheUpshot.http://www.nytimes.com/2014/04/23/upshot/what-good-marathons-and-bad-investments-have-in-common.html

Wooton, Sir Henry (1624) The Elements of Architecture. London:Longmans, Green.

Wurman, Richard Saul (1997) Information Architects. New York:Graphis.

Yau, Nathan (2013) Data Points. Chichester: Wiley.

550

Index

Titles of charts are printed in italics

absent data 115, 275accessibility 30, 37–42, 218–219

and angles of analysis 133of annotation 259–260audience influence on 38–41and colour 265, 275, 286–288in composition 307–309of data 108, 115of interactive design 223, 243, 244and purpose 83testing 147and visual impairment 244visualiser’s influence on 41–42see also audiences

Accurat viii, 252aesthetic qualities see eleganceaims/uses of this book 3, 5, 6–7Aisch, Gregor 29, 252Aisch, Gregor and Kevin Quealy (The New York Times) viii, 138–140Al-Jamea, Sohail, Wilson Andrews, Bonnie Berkowitz and ToddLindeman (Washington Post) xi, 231Albers Equal-area Conic projection 308analysis vs communication 8–9Andrews, Wilson, Amanda Cox et al vii, 84angle/slope as attribute 153angles of analysis 116, 132–134, 136, 216

and composition 303number of 133–134relevance 132–133

animation 241, 242speed 241

annotation 137, 247–261absence of 37, 42audiences 258

551

captions 255–256chart apparatus 252, 253clutter 260focus 259footnotes 251headings, subheadings and titles 248imagery 249, 250introductions 248–249labels 47, 252–254legends 254–255multimedia 249–251, 250

attribution 250credits 250data sources 250integration 250–251time/date stamps 250usage 250

project annotations 260reading guides 251–252tone and experience 258–259transparency 259typography 256–257understanding 259–260user guides 249voiceover 255–256

area marks 153Art in the Age of Mechanical Reproduction: Walter Benjamin 279Arteries of the City 206Asia Loses Its Sweet Tooth for Chocolate 45attention to detail 58–59attributes and marks 21, 151–152audiences 1, 70

and annotation 258and anticipated intrigue 67and data handling 36definition 49feelings and emotion 85interests 40, 70, 85moods and attitudes 41needs of 38, 39, 40, 66, 70, 133

552

personal tastes 41and relevance 133size of 75stakeholders 66of this book 3

prerequisites 4–5time needed to view and understand 40understanding/knowledge 39–41, 70see also accessibility

axes 51line charts 305truncated 37, 304–305

bandings charts 218bar charts 31, 51, 161, 215Battling Infectious Diseases in the 20th Century: The Impact ofVaccines 273–274Beer Brands of SAB InBev 181Benjamin Walter 278, 279Berkowitz, Bonnie, Emily Chow and Todd Lindeman (WashingtonPost)Berliner Morgenpost xiii, 268Bertin, Jacques 212bias 36, 37Big Data 50black and white printing 282–283Bloch, Matthew, Lee Byron, Shan Carter and Amanda Cox (NewYork Times) x, 197Bloomberg Billionaires 155, 215, 250, 251Bloomsberg Visual Data vii, ix, x, 101, 166, 194, 203, 218, 251Bocoup and the World Economic Forum ix, 168Boice, Jay, Aaron Bycoffe, Andrei Scheinkman and Simon Jackmanx, 208Bolt, Usain 325Booker, Christopher 1Bostock, Mike and Jason Davies x, 182Bostock, Mike and Shan Carter and Kevin Quealy (New York Times)x, 198Bostock, Mike, Shan Carter (New York Times) xi, 233brushing 234, 235

553

Buckle, Catherine, and Graham van de Ruit xiv, 303Buckminster Fuller, Richard 43Bui, Quoctrung viii, 136bullet charts 218Bump, Phillip (Washington Post) ix, 180Burn-Murdoch, John 85Butterick, Matthew 257Buxton, Bill 68Buying Power: The Families Funding the 2016 Presidential Election84

Cairo, Alberto 29, 64Cameos, Jorge 29Carbon Map 207Carli, Luis vii, 79cartesian charts 304Casualties 279–280categorical attributes 154categorical charts 158Census Bump: Rank of the Most Populous Cities at Each Census,1790-1890 192central tendency 113challenges 10Chang, Kai ix, 185Chart Structures 304chart types 21, 42, 50–51, 126, 137, 157–160

acronym CHRTS 157, 158, 216, 220choosing 210–220, 243

accessibility 218–219angles of analysis 216assessing chart types 211data examination 215data exploration 215elegance 219–220perception 214purpose 211–212ranking of tasks 212, 213skills and tools 211tone 212trustworthiness 216–218

554

examplesarea cartogram 207area chart 195back-to-back bar chart 178bar chart 161box-and-whisker plot 171bubble chart 167bubble plot 184bump chart 192, 212chord diagram 189, 299choropleth map 201, 271clustered bar chart 162connected dot plot 164connected scatter plot 194connected timeline 198dashboards 159dendrogram 181Dorling cartogram 208dot map 205dot plot 161, 162, 163dual families 159flow map 206Gantt chart 199grid map 209heat map 186histogram 173horizon chart 196instance chart 200isarithmic map 202line chart 191, 305matrix chart 187node–link diagram 188parallel coordinates 185pictogram 165pie chart 37, 157, 158, 175, 303polar chart 169, 304prism map 204proportional shape chart 166proportional symbol map 203radar chart 168

555

Sankey diagram 190, 211, 299scatter plot 137, 183slope graph 193, 299stacked bar chart 177, 270stream graph 197sunburst chart 182treemaps 45, 88–89, 179, 303univariate scatter plot 172Venn diagram 180waffle chart 176word clouds 158, 174

storytelling 159–160text visualisation 159

Charting the Beatles: Song Structure 269Chen, Lina and Anita Rundles viii, 155Cheshire, James,Ed Manley, John Barratt and Oliver O’Brien xi, 238Chimero, Frank 43Christiansen, Jan 309circle size encoding 216–217City of Anarchy 294Ciuccarelli, Paolo 113Clark, Duncan and Robin Houston (Kiln) ix, 182, 207Cleveland, William and McGill, Robert 212, 214Clever, Thomas 44, 250clutter 260Coal, Gas, Nuclear, Hydro? How Your State Generates Power 193Coats, Emma 59Color of Debt 249, 250colour 77, 91, 263–291

accessibilitycolour associations 287–288colour incongruence 288consistency 286cultural sensitivities 288visual 286–287

black and white printing 282–283categorical colours 267–269, 268chart-type choice 285choices 33–34, 42, 137CIELAB 266

556

CMYK (Cyan, Magenta, Yellow and Black) 265, 282contrast 276–278, 280data encoding 285data examination 284diverging/converging scales 269, 270–272

rainbow scale 274–275editorial salience 276–278elegance

justification for colour use 289neutral colouring 288–289unity 288

format 282–283functional harmony 278–282

annotations 280–281composition 281–282interactivity 279multimedia 280–281

greyscale 280HSL (Hue, Saturation, Lightness) 265, 266, 267hue 213, 265ideas and inspiration 283–284illusions 286inappropriate usage 37interval and ratio 270–276lightness 266, 267meaningfulness 285nominal data 267–269ordinal data 269–270purpose map 283quantitative data 270–276RBG (Red, Blue, Green) colour model 264, 265, 266, 282saturation 153, 265, 266setting 283style guidelines of organisations 283and texture 269theory 264–267white background 283

colour blindness 286–287Colour-blind Friendly Alternatives to Red and Green 287columnar charts 304

557

communication 58, 149comparison of judging line size vs area size 213Comparison of Judging Related Items using Variation in Colour(hue) vs Variation in Shape 214completion 148complex subjects 39, 41complexity of data visualisation 2complicated subjects 39composition 138, 293–311

angles of analysis 303aspect ratios 306–307chart composition 295–301

acronym LATCH 298–299chart orientation 297–298chart scales 296–297chart size 296value sorting 298–301

chart type choice 303–304chart-scale optimisation 304–306data examination 302–303elegance

thoroughness 309unity 309

format 302mapping projections 307project composition 293–295

hierarchy of content 294hierarchy and size 294

unobtrusive design 307, 308–309comprehending 22, 23, 26–27, 74, 79conceiving ideas 144, 145consumption

definition 49frequency 71settings 72

context 36, 64–75audiences 70circumstances 68–69constraints and limitations 74consumption 71–72

558

curiosity 64–68format 73pressures 70–71purpose 74–75quantities/workload 72resources 73–74rules 71stakeholders 69

correlations 52Corum, Jonathan (New York Times) and Nicholas D. Pyenson xiii,281Countries with the Most Land Neighbours 83Cox, Amanda 83Crawford, Kate 36creation: definition 49Crime Rates by State 184critical friends 147critical thinking 7–8, 10Critics Scores for Major Movie Francises 172Crude Oil Prices 195Culp, S. xiv, 297curiosity 64–68, 131

anticipated intrigue 67audience intrigue 66personal intrigue 65potential intrigue 67–68and purpose 75stakeholder intrigue 66

Current Electricity Prices in Switzerland 271

Daily Indego Bike Share Station Usage 272Daily Mail 33–34data 97–129

absence 275acquisition 36, 106–110, 128

quantity needed 106–107resolution 107sources 50, 107–110

API (Application Programme Interface) 110foraging 108

559

issued by client 109pdf files 108raw (primary) data 108system report or export 109third-party services 109–110web 108–109

when available 110data literacy 97–98examination 110–117, 128

absence of data 115animation 241and choice of chart types 215completeness 115–116data operations 112identification of type 111influence of examination 116–117inspection and scanning 112for interactivity 241meaning of data 113–114quality 111size 111statistical methods 112–113underlying phenomena of data 114

exploratory data analysis (EDA) 121–128analyst instinct 124–125chart types 126choosing chart types 215datasets 127efficiency 125finding nothing 127interrogating the data 125knowns and unknowns 122–124, 123machine learning 127need 128reasoning 125research 126–127statistical methods 127

filtering 134fundamental role in visualisation 20and goals 121, 124

560

‘open data’ 108range 117raw (primary) 49–50, 99, 108representativeness 115samples 115source 322statistics 98, 105–106

in data examination 112–113inference techniques 105univariate and multivariate techniques 105

tabulated datasets 99cross-tabulated datasets 99, 100normalised datasets 99

transformation 118–121, 128, 141backups 118cleaning 118–119consolidation 121conversions 119–121

example 119quantitative 120textual data 119–120

creating 120–121junk 119

transparency in handling 36types 20, 100–105, 111

acronym TNOIR 100, 128discrete and continuous 104–105qualitative

nominal 101–102ordinal 102–103textual 100–101, 119–120

quantitativeinterval 103ratio 103–104

temporal 104data art 48data journalism 48data privacy 233data representation 19, 21, 151–221

chart types 157–160, 161–209

561

choosing chart types 210–220deception 36, 36–37, 216–218definition 19unfamiliar 40visual encoding 151–157, 152–156

data science 48data visualisation: definition 19–20

and data vis 47datapresentation

forms of deception 37datasets: definition 50Deal, Michael xiii, 269deception 36, 36–37, 216–218decisions/choices 1, 8, 10, 29

heuristic judgements 57significance 53and truth 32see also chart types; design

decoration 44–46deductive reasoning 125D’Efilippo, Valentina and James Ball xii, 254D’Efilippo, Valentina and Nicolas Pigelet viii, 146D’Efilippo, Valentina 149depth 41design 28

Dieter Ram’s general principles 37, 41, 42environmentally friendly 31guiding principles 29–30, 146–147innovative design 30‘invisibility’ of 244–245long lasting 31rules 71see also chart types

deuteranopia 286–287diagrams 51digital resources 9Dilnott, Andrew 32Dimensional Changes in Wood 79–80distortions created by 3D decoration 217–218Doping under the Microscope 306

562

duration of task 70dynamic of need 39

Ebb and Flow of Movies: Box Office Receipts 1986-2008 197ECB Bank Test Results 236Economic Research Service (USDA) xiv, 297editorial thinking 36, 131–142, 139, 140

angle/s 132–134, 136, 138–139, 142, 216annotation 137, 141colour 137, 139, 141and design choices 137–141, 140–141example: Fall and Rise of U.S. Inequality, in Two Graphs135–138, 136example: Why Peyton Manning’s Record Will Be Hard to Beat138–141, 139, 140focus 135, 137, 139, 142framing 134, 136–137, 139, 142influences 135–141interactivity 137, 140–141representation 137, 140

Election Dashboard 208Elections Performance Index 278elegance 30, 42–46, 147

in composition 309decoration 44–46definition 42–43eliminating the arbitrary 43–44in interactive design 244–245style 44–46thoroughness 44visual appeal 219–220

Elliot, Kennedy 35, 55Elliot, Kennedy, Ted Mellink and Richard Johnson (WashingtonPost) xi, 237emptiness/nothingness, representations of 275, 282enclosure charts 304encoded overlays 78, 218, 277environmentally friendly design 31ER Wait Watcher: Which Emergency Room Will You See the Fastest?299, 300

563

errors 58–59Excel 112, 239Executive Pay By Numbers 267exhibitory visualisation 25, 76, 77, 81–82, 259experimental 361explanatory visualisation 25, 76, 77–79, 241, 258exploratory visualisation 76, 77, 79–80, 241, 259expressiveness 210–211

facilitating understanding 21, 28, 38see also accessibility

Fairfled, Hannah 67Fairfield, Hannah and Graham Roberts (New York Times) xi, 219Fall and Rise of U.S. Inequality, in 2 Graphs 191Fast-food Purchasers Report More Demands on Their Time 297Fedewa, Peter A. vii, 35feedback 147–148Few, Stephen 29Fewer Women Run Big Companies Than Men Named John 276, 296figure-ground perception 34filtering 134financial restraints 70Financial Times 85Finviz 225, 226First Fatal Accident in Spain on a High-Speed Line 280fit 117Five Hundred and Twelve Paths to the White House 233FiveThirtyEight vii, 78, 298flat design 31Florida: Murders by Firearms 34–35, 114flow maps 160focus 135, 137, 139, 142fonts 256, 257Football Player Dashboard 277footnotes 251For These 55 Marijuana Companies, Every Day is 4/20 166form marks 153formats 41, 42, 51, 88

restrictions 69formulating briefs 36, 63–95

564

context 36, 64–75curiosity 64–68definitions 63, 64establishing vision 75–94

framing 134, 136–137, 139, 142, 225–226Fraser, Shelly-Ann 325Frequency of Words Used in Chapter 1 of This Book 174fun 84, 85functional restrictions 71functionality 43functions 51

Gagnon, Francis xiv, 297Gallup xiv, 306gateway layers 89Gemignani, Zach 39Gender Pay Gap US? 164, 251, 301Geography of a Recession 234, 275–276geometric calculations 216–217geometric zoom 226Glass Ceiling Persists 296, 297Glass, Ira 332Global Competitveness Report 2014-2015 168Global Flow of People 189Goddemeyer, Daniel, Moritz Stefaner, Dominikus Baur and LevManovich xiv, 299Goldsberry, Kirk 114Gore, Al 82grahics 51Grape Expectations 91Graphic Language: The Curse of the CEO 101graphs 51Gray, Marcia 98, 316greyscale 280Groeger, Lena 41, 148Groeger, Lena, Mike Tigas and Sisi Wei (ProPublica) xiv, 300Groskopf, Christopher, Alyson Hurt and Avie Schneider x, 193guiding principles

accessibility 37–42elegance 42–46

565

trustworthiness 30, 32–37Gun Deaths in Florida 34, 286Gun Deaths in Florida redesign 35

Harper, Bryce 231Hemingway, Ernest 160Here’s Exactly Where the Candidates’ Cash Came From 203heuristic techniques 57HEX codes 265hierarchical charts 158Highest Max Temperatures in Australia (1st to 14th January 2013)274historical context 8History Though the President’s Words 237, 240Hobbs, Amanda viii, 163Hobbs, Amanda 40, 58Holdouts Find Cheapest Super Bowl Tickets Late in the Game 194,252Holmes, Nigel 93, 285home owners

Falling Number of Young Homeowners 33–34Housing and Home Ownership in the UK 33

Horse in Motion 243How Americans Die 230, 237How the ‘Avengers’ Line-up Has Changed Over the Years 186, 200How Big Will the UK Population be in 25 Years’ Time? 234How the Insane Amount of Rain in Texas Could Turn Rhode IslandInto a Lake 156How Long Will We Live – And How Well? 183, 268How Nations Fare in PhDs by Sex 163, 268How Old Are You? 233How Well Do You Know Your Area? 232How Y’all, Youse and You Guys Talk 80, 202Hubley, Jill xiii, 284Hunt-Davis, Ben 30Hurt, Alyson 70, 125, 127

ideasconceiving 145keywords 91–92

566

limitations 93mental visualisation 90–91other people’s 94research and inspiration 93Sketch by Giorgia Lupi 92sketching ideas 92, 145sources of imagery 93

If Vienna Would be an Apartment 45Image From the Home Page of visualisingdata.com 156Images from Wikipedia Commons 308Impact of colour blindness 286inductive reasoning 125inference techniques 105info-posters 47Infographic History of the World 254infographics 46, 47–48information design 48information visualisation 47Ingold, David, Keith Collins and Jeff Green vii, 101Ingraham, Christopher (Washington Post) viii, 156innovative design 30Inside the Powerful Lobby Fighting for Your Right to Eat Pizza 220Interactive Fixture Molecules 187interactivity 21, 42, 51, 137, 223–246

advantages 223data adjustments

animating 228–229, 241, 242, 243contributing data 232–233framing 225–226, 242navigating 226–228sequencing 230–231

event, control and function 224influencing factors

angle 241chart-type choice 243data examination 241ease of usability 244feature creep 244format 239–240fun 245

567

purpose map 241setting 239skills and resources 238–239timescales 239trustworthiness 243visual accessibility 244

presentation adjustmentsannotating 235–236focusing 234–235orientating 236–238

usefulness 223, 244interests of audiences 40interpreting 22–23, 24–26, 74, 79, 126

factors 25and previous knowledge 25–26, 27

Iraq’s Bloody Toll 34, 35, 298

Jaws (film) 74Jenkins, Nicholas and Scott Murray xii, 249Jones, Ben x, 199Jordan, Chris 87

Kahneman, Daniel 90–91, 317Kane, Wayne 25, 26Kasich Could Be The GOP’s Moderate Backstop 298Katz, Josh (New York Times) vii, 80Keegan, Jon (Wall Street Journal) ix, 186Killing the Colorado: Explore the Robot River 238Kindred Britain 249Kirk, Andy 289, 301, 318–324Kirk, Andy and Andy Witherley xiv, 318–324Klein, Matthew C. and Bloomberg Visual Data xi, 230knowns and unknowns 122–124, 123Kosara, Robert 263

labels 47, 252–254axis labels 252, 253axis titles 252categorical labels 253–254value labels 253–254

568

Lambert Azimuthal Equal-area 308Lambrechts, Maarten (Mediafin) ix, 181launching 144, 148–149layouts 37, 71legends 51levels of data see data, typesLife Cycle of Ideas 152line charts 140, 157

aspect ratios 37, 306–307truncated axes 305

line marks 153linking data 234, 235Lionel Messi: Games and Goals for FC Barcelona 23, 24, 25, 27, 28,155, 156listening 58Literacy Proficiency 177London is Rubbish at Recycling and Many Boroughs are GettingWorse 209long lasting design 31Losing Ground 89, 90, 239Lunge Feeding 281Lupi, Giorgia viii, 46, 92, 295Lustgarten, Abrahm, Al Shaw, Jeff Larson, Amanda Zamora, LaurenKirchner and John Grimwade xii, 238

McCandles, David 29McCandles, David, Miriam Quick and Philippa Thomas xii, 251McCandles, David and Tom Evans xi, 233McKinlay, J.D. x, 213Mackinley, Jock 212McLean, Kate 113, 147managing progress 56maps/mapping 37, 51, 157, 216

projections 307, 308thematic 307zooms 226see also under chart types

markers overlays 219market influences 71Marshall, Bob, The Lens, Brian Jacobs and Al Shaw (ProPublica) vii,

569

89Martin, Andrew and Bloomberg Visual Data xi, 220Martino, Mauro, Clio Andris, David Lee, Marcus J. Hamilton,Christian E. Gunning and John Armistead Selde ix, 188meaning 74Meirelles, Isabel 38memorability 31Mercator projection 307, 308Messi, Lionel 23, 24, 25, 27, 28Mider, Zach, Christopher Cannon, and Adam Pearce (BloombergVisual Data) x, 203Minard, Charles Joseph 8Mizzou’s Racial Gap Is Typical On College Campuses 77–78, 253mock-ups see prototypesModel Projections of Maximum Air Temperatures Near the Oceanand Land Surface on the June Solstice in 2014 and 2099 231Mollweide projection 308Morton, Jill 278mouse/trackpad events 224Movies Starring Michael Caine 173multiple assets 88multivariate analysis 127Munsell, Albert 265Munzner, Tamara 29Murray, Scott 69, 330Muybridge, Eadweard xii, 243MyCuppa Mug 269

narrative visualisation 78National Public Radio (NPR) 135, 136Native and New Berliners – How the S-Bahn Ring Divides the City201needs of audiences 38, 39, 40, 66, 70, 133Nelson, John 114, 126network diagrams 159, 160NFL Players: Height and Weight Over Time 229Nightingale, Florence 8Nobel Laureates 234Nobels no Degrees 298non-linear logarithmic scales 302

570

note-taking 57–58Nutrient Contents 185NYC Street Trees by Species 284, 285NYPD, Council Spar Over More Officers 277

Obama, Barack 114Obama’s Health Law: Who Was Helped Most 271–272Obesity Around the World 226, 228objectives of this book 9–11objectivity 32OECD Better Life Index 89–90, 116, 117, 215, 233Olson, Randy xiii, 272On Broadway 299On NSA, 30 Percent Either Want No Limits on Surveillance or Say‘Shut It Down’ 178ONS Digital Content team vii, 33‘open data’ 108organisation and contents of this book 11–15Ortiz, Santiago 90outliers 51, 117

Peek, Katie 58, 258pen and paper 57Per Capita US Cheese Consumption 20perceiving 22, 23–24, 26, 74, 79Percentage Change in Price for Select Food Items, Since 1990 196perfection, pursuit of 53–54personal tastes 41pie charts 37, 157, 158, 175, 215plagiarism 93planning 56Playfair, William 8plots 51Plow 242point marks 153Political Polarization in the American Public 170Pong, Jane 73, 93Pong, Jane (South China Morning Post) xiv, 300Posavec, Stefanie 43, 93, 145, 278PowerPoint 239

571

pragmatic approach 7–8, 54precision 216prerequisites for data visualisation 4–5presentation of work in progress 70Presidential Gantt Chart 199pressures 70–71print 73production cycle 56, 144–149

conceiving ideas 144, 145launching 144, 148–149prototypes 144, 146refining and completion 144, 148testing 144, 146–148wireframing and storyboarding 144, 145–146

project: definition 49Proportion of Total Browser Usage for Internet Explorer andChrome 176ProPublica 239, 240prototypes 144, 146Psychotherapy in the Arctic 289, 301publicising 149purpose 74–75

and curiosity 75purpose map 76–90

experience 76–82in practice 86–90tone 82–86, 87, 88, 116, 212

Pursuit of Faster 318–326

Qiu, Yue and Michael Grabell (ProPublica) xi, 235quantitative attributes 153–154quantity 153

Racial Dot Map 205, 227radial charts 304Rain Patterns 300Rams, Dieter 29, 33, 37, 41, 42range charts 170Rapp, Bill 72Razor Sales Move Online, Away From Gillette 220

572

reading guides 251–252reasoning 125Record-high 60 Percent of Americans Support Same-sex Marriage306redundant encoding 301Rees, Kim 112, 113reference lines 219refining 148Reichenstein, Oliver 44, 55relational attributes 154relational charts 158Relative Value of the Daily Performance of Stocks Across the 500Index 179relevance of content 41research and inspiration 93, 126–127resources 73–74responsive design 239Rise of Partisanship and Super-cooperators in the US House ofRepresentatives 188Rosling, Hans 82Roston, Eric and Blacki Migliozzi (Bloomberg Visual Data) x, 218Rumsfeld, Donald 122

Saint-Exupéry, Antoine de 148sampling 115Sander, Nikola, Guy J. Abel and Ramon Baur ix, 189Satanarayan, Arvind 210scales 51scales of measurement see data, typesScarr, S., C. Chan, and F. Foo (Reuters Graphics) viii, 91Scarr, Simon vii, 34 134, 284Schneiderman, Ben 88Scientific American 85scientific visualisation 48scope of data visualisation 3–4Seeing Data Visualisation project 39, 315series of values 50settings 42, 72Shankley, Bill 280Shaw, Al, Annie Waldman, and Paul Kiel (ProPublica) xii, 249

573

shelf life 149Silva, Rodrigo, Antonio Alonso, Mariano Zafra, Yolanda Clementeand Thomas Ondarra (El Pais) xiii, 280Simmon, Robert xiii, 270simplicity 39, 40simplification 39, 41size 153, 304–305sketching 92, 145, 295skeuomorphism 31skills 73Slobin, Sarah vii, 20, 30, 43, 132small multiples technique 296Smith, Alan 109, 332Snow, John 8Social Progress Imperative 228sources of data 322spark bars 161spatial charts 158, 304speaking 58Spielberg, Steven 74Spotlight on Profitability 81–82, 117stakeholders 1, 107, 147

as audience 66Stalemate 297State of the Polar Bear 275statistics 98, 105–106

in data examination 112–113and data exploration 127inference techniques 105multivariate analysis 127univariate and multivariate techniques 105

Stefaner, Moritz 117, 132Stefaner, Moritz, Dominikus Baurs and Raureif GambH vii, 89Stevens, Joshua xi, 231Stevens, Stanley 100storyboarding 144, 145–146, 146, 295storytelling 1, 51, 78, 159–160style 44–46style guidelines 71subject neutrality of data visualisation 4

574

subject-matter appeal 39subject-matter knowledge 25–26, 27, 39subjectivity 32, 36Summary of Eligible Voters in the UK General Election 2015 175support for the work 149Swing of Beauty 231Sydney Olympics 2000 30System 1 and System 2 thinking 90–91, 317Szücs, Krisztina vii, 81

tables 20–21tabulation 50teamwork 330–331technical skills needed 5–6technological tools 6, 73temporal charts 158Ten Actors with the Most Oscar Nominations but No Wins (2015)161, 162testing 144, 146–148

‘Texas Department of Criminal Justice’ Website 86–87meaning and completeness of data 114, 115

text visualisation 159theoretical context 7This Chart Shows How Much More Ivy League Grads Make ThanYou 171Thorp, Jer 115three D data representation 37, 217time slider control 137time to consume a visualisation 40timescales 56, 70tone 76, 259Total Sightings of Winglets and Spungles 26, 27tower graphics 47Tracing the History of N.C.A.A. Conferences 198treemaps 45, 88–89, 179, 215Tribou, Alex and Adam Pearce (Bloomberg Visual Data) ix, 166Tribou, Alex, David Ingold and Jeremy Diamond (Bloomberg VisualData) xii, 251Trillions of Trees 204Tröger, Julius, André Pätzold, David Wendler and Moritz Klack x,

575

201trustworthiness 30, 32–37, 98, 113

annotation 259and colour 285–286in composition 304–307and deception 216–218, 217of interactive designs 243key matters 35–37testing 147

truth 32, 33, 131Tufte, Edward 29, 85, 244–245Tukey, John 124Tulp, Jan Willem x, 204Tulp, Jan Willem 73, 121Twitter NYC: A Multilingual Social City 238, 255typefaces 256–257

UK Election Results by Political Party, 2010 vs 2015 190UK Office for National Statistics 33, 34Ulmanu, Monica, Laura Noonan and Vincent Flasseur (ReutersGraphics) xi, 236UN Global Pulse Survey 270understanding 19, 21

and complexity 42facilitating 21, 28, 38and needs of audiences 38of new symbols/technology 316–317process 74and purpose 74stages of 22–28

unfamiliar representation 40univariate and multivariate techniques 105updating 149US Gun Deaths 225, 255US Presidents by Ethnicity (1789 to 2015) 114user guides 249

Vallandingham, Jim x, 192value sorting

LATCH 298–301

576

location 298–299alphabetical 299, 300time-based 300categorical 300–301hierarchical 301

values 111, 112–113frequency counts 112frequency distribution 112range 117spread/dispersion 113

Van de Ruit, Graham: Millions, Billions, Trillions 302–303variables 50, 111, 112–113Veltman, Noah xi, 229Viégas, Fernanda and Martin Wattenberg xiii, 289viewers

definition 49effort needed by 317learning new symbols/technology 316–317number of 75see also audiences

vision: definition 76visual analytics 48visual appeal 219–220visual encoding 151–157

attributes 152, 153–154form 156, 157marks 21, 151–152, 153, 155, 156see also chart types

visual mood 88visualisation

experience 76–82exhibitory 76, 77, 81–82, 88explanatory 76, 77–79, 88exploratory 76, 77, 79–80, 88narrative 78

harnessing ideas 90–92tone 82–86, 87–88

feeling tone 83–86reading tone 82–83

visualisation literacy 315–334

577

first thoughtsinitial scan 319relevance 319setting 319visual appeal 319

key featuresdata 322format 320interactive functions 322–323introductory text 321–322shape and size 320source 321studying the project 320subject matter 320

taskscomprehending 326interpreting 325observations 325perceiving 324–325

understanding symbols 315–316viewer–visualiser harmony 326

appreciation of context 326false consciousness 326–327

curiosities 327overview 326

visualiserscapabilities 327–329

communicator 328data analyst 328designer 329journalist 328–329project manager 327scientist 328technologist 329

definition 49developing capabilities 329–330

evaluatingoutcome 331performance 331–332

teamwork 330–331

578

Vitruvius Pollio, Marcus 46voiceover 255–256Voronoi treemap 158

Washington Post 231waterfall charts 158Wayne Kane: Games and Points for Toronto Rangers 25–26Wealth Inequality in America 78, 256web design 239Weber, Matthew (Reuters Graphics) xi, 234Weldon Cooper Center for Public Service, Rector and Visitors of theUniversity of Virginia (Dustin A. Cable, creator) x, 205What Good Marathons and Bad Investments Have in Common 124What’s Really Warming the World? 218Where You Can Both Smoke Weed and Get a Same-sex Marriage 180Which Fossil Fuel Companies are Most Responsible for ClimateChange? 182White, Alex 282Who Wins the Stanley Cup Playoff Beards 165Why Is Her Paycheck Smaller 219Wind Map 289Winkel-Tripel projection 308Wireframe Sketch 295wireframing 144, 145–146, 294–295Wolfers, Justin viii, 124Wooton, Sir Henry 46workflow

processadaptability 55experimentation 54–55four stages 54importance of process 53–54mindset activities: thinking, doing, making 56ongoing tasks 55, 88pragmatism 54purpose 54

process in practiceattention to detail 58–59communication 58heuristics 57

579

honesty with yourself 59making it work 59management 56note-taking 57–58pen and paper 57reflective learning 59research 58thinking 56–57

workload 72Worst Board Games Ever Invented 303Wurman, Richard Saul 298

Yau, Nathan ix, 184Years Since First Movie table 152YouTube user ‘Politizane’ vii, 78

zero quantity 305zooms 226

580

Table of Contents

Half Title 2Publisher Note 3Title Page 4Copyright Page 5Contents 7Illustration List 8Acknowledgements 18About the Author 19Introduction 21Part A Foundations 411 Defining Data Visualisation 422 Visualisation Workflow 94Part B The Hidden Thinking 1053 Formulating Your Brief 1064 Working With Data 1535 Establishing Your Editorial Thinking 202Part C Developing Your Design Solution 2196 Data Representation 2277 Interactivity 3868 Annotation 4209 Colour 44410 Composition 488Part D Developing Your Capabilities 51911 Visualisation Literacy 520References 546Index 551

581

  • Half Title
  • Publisher Note
  • Title Page
  • Copyright Page
  • Contents
  • Illustration List
  • Acknowledgements
  • About the Author
  • Introduction
  • Part A Foundations
  • 1 Defining Data Visualisation
  • 2 Visualisation Workflow
  • Part B The Hidden Thinking
  • 3 Formulating Your Brief
  • 4 Working With Data
  • 5 Establishing Your Editorial Thinking
  • Part C Developing Your Design Solution
  • 6 Data Representation
  • 7 Interactivity
  • 8 Annotation
  • 9 Colour
  • 10 Composition
  • Part D Developing Your Capabilities
  • 11 Visualisation Literacy
  • References
  • Index
Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.
Open chat
1
Hello. Can we help you?