Thursday 1 October 2015

Are map legends too lazy?

A somewhat click-baity blog title, but I wanted to crowdsource some knowledge from proper carto/viz people, so if you have any insights on what I write, please feel free to get in touch via twitter or e-mail. No doubt what I write about below already has a name but I don't know what that is and I haven't seen this functionality in proprietary or open source GIS. By asking 'are map legends too lazy', what I really mean is are GIS-made choropleth map legends doing enough for us in their current form - and is there an opportunity for us to add some new functionality which enhances the communicative power of the humble choropleth legend? An example... look at the map below, which I created in QGIS. It's a map of a new deprivation* dataset for England, focused on the local authority of Birmingham.

Deprivation choropleth, with legend and inset map

This dataset is typically understood and discussed in terms of deciles, hence the classification used above. The dataset goes from decile 1 (most deprived) to decile 10 (least deprived) - within the context of England as a whole. Cities like Birmingham tend to have a higher proportion of their small areas in the most deprived decile, and in map form this results in lots of red and not much blue, as you can see above. If you wanted to find out how many areas were in decile 1 (most deprived) you would know that it was 'a lot' but because the inner-urban areas tend to be smaller in size (relative to the blue ones), making an accurate assessment visually is quite difficult. In fact, owing to the different sizes of the spatial units, you could quite easily take the wrong message away from a choropleth like this.

My solution? Make the legend do more work. Make it tell us not just what the colours represent but also what proportion of areas are in each category by scaling the colour patches relative to the proportion of areas in each choropleth class - in the form of a bar chart - what I call a 'bargend' (jump in at this point if you already have a name for this). You could, without much effort, add in a table or a separate chart, but I want the legend to actually be the bar chart. In part, I was inspired to attempt this in QGIS because of Andy Tice's prototype scatterplot layout and his comment that he'd like to get it working in the QGIS Atlas tool. Here are some results, followed by further thoughts.

This time, I've added in a 'bargend'

A closer look at the bargend for Birmingham

When I do a visual comparison of the Birmingham map, I'm surprised that the least deprived (i.e. richest) areas only account for 1.7% of the total, because I'm drawn to the blue of the choropleth. This could be solved though a cartogram approach, but I wanted to preserve geographical accuracy here. I'm not surprised that almost 40% of areas are in the poorest decile - that's what I'd expect from what I know about deprivation in English inner-cities. Let's look at another example below.

The London Borough of Tower Hamlets

This time I've shown one of the poorest parts of London - Tower Hamlets. An interesting aside here is the emergence of one area in decile 9 (i.e. richer area) compared to the pattern from 2010. This is almost certainly linked to gentrification and displacement rather than individuals becoming 'less deprived'. I find the extra information provided in the bargend very useful analytically/cognitively compared to the simple legend we would normally use.

Now let's look at a few more...

Liverpool contains relatively few 'non-deprived' areas

Like Liverpool, Manchester has many poor areas

Middlesbrough has the highest % in the most deprived decile

One of the benefits of this approach, in my view, is when you compare different places - you can click on an image above and then go forward and backward to make comparisons. The added value of the bargend approach means that you have precise details of the proportion of areas in each decile and you can make more meaningful comparisons. You could just do this with a table or chart and dispense with the map altogether, but then you'd lose the very important ability to identify where precisely individual areas are and where spatial concentrations of deprivation (and affluence) exist. Talking of affluence, it's only fair that I show you some maps of places that are at the opposite end of the scale. Two prime examples...

A beautiful part of the world, but very blue

Hart, you almost broke my chart (highest % in decile 10)

I'll wrap up with a few points.

1. I'd love it if someone could find a way to add in this functionality natively in QGIS. I had to do a bit of thinking and tinkering to automate this in the Atlas tool, but I now have it working well and everything dynamically updates and re-positions itself once you set it up.

2. I wouldn't always want to use a bargend, but I think it's something that adds value without taking up much more space (if any) in map layouts.

3. I'm trying to think of any drawbacks of this approach, but I can't. I'm happy for others to chip in with ideas on this.

4. I think 'bargend' is a terrible word. Please tell me it already has a nice sounding name, or invent one for me. [update: in my rush to coin a phrase, and because I was mapping deciles as categories - as in a bar chart - I was thinking about bar charts rather than histograms. This is really a histogram but it uses named categories (deciles) which in theory could be re-ordered and the chart would still make sense, so perhaps the bargend retains qualities of both and, anyway, a histogram still uses bars]

5. Are map legends too lazy? Not really, but they can sometimes work harder.

Andrew Wheeler very kindly got in touch to share a few relevant papers on the subject. The Kumar paper is very close to what I propose (though he does the chart for the entire dataset rather than a subset) and he calls it a 'Frequency Histogram Legend' - more accurate perhaps, but less catchy. The Dykes et al. paper is very interesting and I like the treemap approach.

Hannes (@cartocalypse) also got in touch to say he likes the idea and he's suggested 'legumns', which is also useful (but more difficult to pronounce!).

I'll add more on the topic if people respond.

* Just in case the use of this word sounds odd to you, we use the word 'deprivation' in the British context in studies of urban poverty/disadvantage but it's not exactly the same thing. I've written about this in previous academic papers but to all intents and purposes more deprived means 'poorer' and less deprived means 'richer'. In the maps above, you could say red: poor and blue: rich and you wouldn't be wrong (ecological fallacies notwithstanding).