Category Archives: Infographics

Information Visualization Reading List

Below is the most recent iteration of the reading list for my spring course in Information Visualization at Columbia Journalism School. Because I teach in a journalism school but all of my courses involve programming, I start every semester with a discussion about the role and culture of programming in general, facilitated by the first few pages of Douglas Rushkoff’s book “Program or be Programmed.” Unfortunately, this brief but powerful preface is not part of the digital edition, but I recommend locating a print copy even just for the sake of these pages.

Links provided are to the New York Public Library (which now has purchase options!) where possible; otherwise to the publisher.


Program or Be Programmed: Ten Commands for a Digital Age by Douglas Rushkoff. Soft Skull Press, 2011.
Preface


Thinking With Type by Ellen Lupton. Princeton Architectural Press, 2010.
Text: Linearity, Birth of the User


Information Visualization: Perception for Design by Colin Ware. Morgan Kaufman, 2004.
Chapter 1: Foundation for a Science of Information Visualization



“Visualization” by Tamara Munzer, in Fundamentals of Computer Graphics, 3rd. ed. A K Peters/CRC Press, 2009.
Chapter 27: Visualization


Visual Explanations by Edward Tufte. Graphics Press, 1997.
Chapter 2: Visual and Statistical Thinking: Displays of Evidence for Making Decisions


Now You See It: Simple Visualization Techniques for Quantitative Analysis by Stephen Few. Analytics Press, 2009.
Chapter 5: Analytical Techniques and Practices


The Wall Street Journal Guide to Information Graphics by Dona Wong. W. W. Norton & Company, 2013.
Chapter 2: Chart Smart, Chapter 3: Ready Reference, Chapter 4: Tricky Situations


Interview with Amanda Cox. Substratum, Issue 6.


The Functional Art: An introduction to information graphics and visualization by Alberto Cairo. New Riders, 2012.
Profile 3: Steve Duenes, Profile 4: Hannah Fairfield, Profile 5: Jan Schwochow


The Design of Everyday Things by Donald Norman. Basic Books, 2002.
Chapter 6: The Design Challenge, Chapter 7: User-Centered Design


Now You See It: Simple Visualization Techniques for Quantitative Analysis by Stephen Few. Analytics Press, 2009.
Chapter 4: Analytical Interaction and Navigation


Thinking With Type Ellen Lupton. Princeton Architectural Press, 2010.
Letter: Anatomy, Size, Scale, Type Classification, Type Families, Superfamilies; Text: Tracking, Line Spacing, Alignment, Hierarchy; Grid: Grid as Program, Grid as Table, Multicolumn Grid, Modular Grid



Content-Out Layout by Nathan Ford. A List Apart, 2014.


“Five Common but Questionable Principles of Multimedia Learning” by Richard E. Clark & David F. Feldon, in The Cambridge Handbook of Multimedia Learning, 1st Ed. Cambridge University Press, 2005


“The Split-Attention Principle in Multimedia Learning” by Paul Ayres & John Sweller, in The Cambridge Handbook of Multimedia Learning, 1st Ed. Cambridge University Press, 2005


“The Worked-Out Examples Principle” by Alexander Renkl, in The Cambridge Handbook of Multimedia Learning, 1st Ed. Cambridge University Press, 2005

Exploring America: Visualizing Data with Google Fusion Tables

Interactive maps and charts can be a great way to add interest and visual appeal to a primarily textual work of journalism, and in a newsroom there are many times when it’s only after the article has been largely written that we think to create them. But visualizations can also serve as essential reporting tools in and of themselves, allowing journalists to see patterns in otherwise impenetrable data sets – patterns that can provide essential leads to finding interesting stories in the first place.

This habit of pattern-finding and data analysis is already well known in the area of journalism usually known as “computer-assisted reporting” (CAR), and where it exists is usually the purview of perhaps one or two specialist reporters. Typically, the tools associated with CAR have been both reasonably expensive and time-consuming to learn: database technologies like Microsoft Access; mapping technologies like ArcGIS. Few individuals and not all organizations could realistically afford the software, training and personnel required to do this kind of work.

At the infamously (and in this case, happily) exponential pace of technological evolution, however, there are now free, user-friendly tools whose power and versatility is rapidly surpassing the more “traditional” tools of CAR. A prime example of this Google Fusion Tables, which combines the essential functions of a database (large-scale data storage, powerful sorting and filtering, the ability to merge tables) with the robustness of Google’s mapping resources (and their many charting tools).

A quick walk-through of how to use the the merging and mapping functions is provided through the Fusion Tables Help, of which the below is an edited and annotated version. To follow along, you’ll need to be logged in to a Gmail account.

To start, click on each of the links below (these should come up in new browser tabs):

110th US Congressional District Outlines

2008 1-year American Community Survey (ACS) Data

You’ll notice that these pages basically looks like giant spreadsheets, and this is essentially what they are. Looking at the column headers, take a moment to notice that the format of the Outlines table’s “id” column and the ACS table’s “Two-Digit District” column is very similar.

Looking a little further, you’ll see that in the upper-right there is a count of the number of rows in the table (e.g. “1100 of 436″ for the ACS data); clicking on the “Next>>” link will show you the next 100 rows.

Just below the “Next>>” link, in the gray title bar, you’ll see a square button with a small triangle in it – this is the subtle clue you’re given that there are more columns of data than what you currently see. If you begin clicking this in the ACS data set, you’ll quickly discover the many, many columns that this table contains.

Browsing that many columns of data is tedious, though, and impossible to analyze. So let’s get it mapped instead so we can see really see what’s here.

In the Outlines table, click the “Merge” link in the blue bar. In the box on the left, you see radio buttons for each of the columns in that table, with “id” selected. Above the empty box on the right are the instructions: “Merge with” followed by an input box and a GET button. Ignoring the dropdown, paste the url of the ACS table (http://www.google.com/fusiontables/DataSource?dsrcid=237928) in the input box and click “GET”.

 

 

The right-hand box now contains radio buttons of every column in the ACS table – but what it wants you to do is tell it how to match up the information in the two tables. Remember How “id” and “Two-Digit District” looked pretty similar? Make sure those are the radio buttons selected and then type a name for your about-to-be-merged table into the input box labeled “Save as a new table named”. Finally, click the “Merge Tables” button.

 

 

 

Very often a red box with the warning “Could not merge tables” will appear for a moment. Ignore this and wait for the page to finish loading.

You’ll now be looking at a new Fusion Table with the name you entered earlier. The columns from the first table have a white background; those from the merged or “joined” table are pale yellow. Because we connect outline information with data, we can now see that data visualized by selecting “Visualize >> Map” in the blue bar. After several seconds, you’ll see a bright red Google Map of the U.S., with gray outlines marking the 110th Congressional District boundaries.

 

Click somewhere on the map. After a moment, a balloon pops up with a readout of the first 10 columns of table data for that area. That’s a little useful, but if we were interested in data only about one district, it would have been just as easy to read from the table. Instead, click the “Configure styles” link above the map.

In the “Configure map styles” popup, click on “Fill color” under the “Polygons” header at left. To the right, click “Buckets” and then select the “Divide into 2 buckets” radio button. First, click on the “2” dropdown and change your number of buckets to “4”. Open the “Column” dropdown below this and you can quickly scroll through all of the data columns available in ACS table. To start exploring, select one and click the “Save” button at the bottom. After a moment, a yellow “Map style saved” label will appear above the map, and it will be recolored according to your selection. To see how other columns of data map, simply click “Configure styles” again, select a different column of data from the dropdown, and click “Save”.

A few notes:

  • Keep in mind that the default “bucket” ranges (0-25, 25-50, 50-75 and 75-100) may not be ideal for the particular data column you’ve chosen. You may need to adjust these values in order for the map to be meaningful (or even show any color variation at all).
  • Also note that the default “bucket” colors should really be shades of a single color, rather than 4 distinct colors. Any time you are mapping the intensity of a single value, it should be indicated by intensity of a single color, not multiple colors.
  • Why doesn’t your chosen data point show up in the little balloon? You need to adjust its contents by clicking the “Configure info window” link above the map. There you can select the with check boxes the exact column information you want to appear. Select a few on the left, click “Save” and then click on the map again.

Having played around with the data for a little while, you’ve found some interesting data points. I’m always interested in rent burden and housing affordability, so I chose the very last column of data “Percent of Renter-occupied Units Spending 30 Percent or More of Household Income on Rent and Utilities”. After using the table view and (by clicking on the column header) sorting the data and find its minimum and maximum values, I adjusted my ranges to be 0-30, 30-45, 45-60, and 60-100, colored in shades of red. The result is a map that shows a few interesting things – at first glance, we note that Nebraska’s 3rd Congressional district is the only one in the country where more than 70% of people live in affordable housing, and two of the most rent-burdened districts are California’s 45th and Florida’s 25th – not in districts in New York, San Francisco, Los Angeles or other notoriously expensive cities. What’s going on here? This data alone won’t tell us, but it has given us a lead towards what might be an interesting story.

Once you’ve done the rest of your research and discovered some of the “why” behind your visualization, you’ll want to make sure your readers have your data at their disposal. To add the map to your page/site you’ll need to do 2 things:

1. Share it. In the upper-right corner of the map or table view, you’ll see a “Share” button that brings up a popup. In the bottom half of this window, three radio buttons list the “Visibility options”. To add your map to a webpage, you’ll need to make it at least “Unlisted”, if not “Public”.

 

 

 

 

 

 

2. Embed it. Click “Get embeddable link” above the map and a small scrolling window is revealed above the map with code that you can paste into an HTML. To it into WordPress, as above, you’ll need an iframe plugin like Easy iFrame Loader installed. Using the revealed code, follow the directions to add the map to your post or page.

If you want only to email your link to a few people (and not have others be able to view it) you can leave the “Visibility options” on “Private” and share it either through the “Share” popup, or else email them the link made visible when you click the “Get KML network link”.

So that’s a first round on using Google Fusion Tables to generate explore and share data sets through interactive mapping. There are many other features available here though, so there will undoubtedly be more Google Fusion Tables fun to come.

Fractured Feverlines

NYT's Budget Projection Graphic

Their page link suggests that the term for the New York Times’  budget projection graphic at left is a “porcupine” – though it’s not clear whether this could apply to visualizations other than feverlines.

Either way, however, this frizzy format is inspired, providing an insightful balance between mapping the actual data of annual budgets while retaining the impression of where analysts thought the trends were going in any given year.

There are certainly other ways that this data could have been presented; it could have been rendered as a bar chart with the range of projections for that particular year overlaid, for example. But the visualization as it stands elegantly solves several problems that the latter approach cannot: it retains the dimension of what year a projection was made as well as the year it was made for –  without relying on labels, rollovers, or – worse – some kind of maxed-out color code.

Because the saturation change and departure point clearly distinguish not only actual values from projections, but also maintain the continuity of the year in which those projections were made, the viewer can truly inspect the data and assess the projection trends over time. At this point of analysis the use of feverlines does us another favor, because their angles are informational and can be meaningfully compared.

This graphic’s innovative form is a credit to its author, Amanda Cox, and has deservedly garnered her the attention of even the inimitable Edward Tufte, who highlights another of her pieces on his own website, here.

Inaccurate Inmates

Good Magazine inmate abuse graphic Good Magazine publishes a wide range of infographics produced by different folks (I think they get submissions and/or hire freelancers), many of which are lovely.

Some of them, however, are problematic, such as the example at left (click on the photo to view full size).

Though the subject matter is interesting, there is a fundamental scaling problem: each figure represents an arbitrary total percentage, making any visual comparison of the color segments inherently misleading.

This makes the graphic essentially inaccurate, despite the percentage labels – visual elements especially must be just as rigorously treated as “hard” numbers, because they can often register more powerfully (further explanation of this TK). Note that this graphic also lacks any kind of color key – another big nono when using color to ostensibly represent numerical values.