Sunday 26 October 2014

How we read maps and dataviz - new research needed?

There's a fairly long academic tradition of looking at how humans interact with maps but, in my view, there is a need to revisit such research in relation to the new wave of digital mapping and dataviz currently available online. Some of it is fantastic and some less so, but this isn't about being critical of the bad stuff. Instead, I'm hoping others will share what they've been doing or what they've seen (via @undertheraedar) to try to understand the effect of new dataviz/mapping on how we perceive/read maps - and what impact this might have on cognition/understanding of underlying issues. 

Early last year I had some discussions about this with a very helpful colleague in psychology at Sheffield - Megan Freeth - and I gave her one of my blog images to test with her eye tracking technology. The results are shown below, in sequence (click to enlarge). I've also put them together in a slide show if you want to download them all at once.

The original 3D image

Scan path from first 10 seconds of map viewing

Scan path for one minute of map viewing

Heat map showing areas focused on most

'Region of interest' analysis

I'm aware that I am probably just not up to date with the kind of research being done in this area but before going further I should say that I am aware of people across the world who have done work in these fields - e.g. Alan M. MacEachren and others at the GeoVISTA Center at Penn State and this study from Brodersen et al at Risø National Laboratory in Denmark - but I'm not aware of what's been done in the last 4 or 5 years in particular to help us understand the effects of new approaches to mapping and visualisation on cognition and perception.

Are we understanding more because of the new wave of mapping and dataviz? Are we understanding less? Are we just enjoying how things look and being wowed by the technology more than we are critically engaging with the underlying content? Has the method become the message?

I'm as guilty as anyone of posting maps and images on twitter and this blog without necessarily thinking too much, though my aim is always to inform and engage - but as the protagonist in David Lodge's Changing Places says, "Every decoding is another encoding" and my visual 'decodings' of spatial data will always be 'encoded' by the viewer in ways I might not have expected - or even want. It's always interesting to see how people interpret things and whether this aligns with what we'd hoped. This perception issue might also come up tomorrow when one of my maps appears in the new HS2 report in the UK - we'll see.

Anyway, thoughts and insights welcome via @undertheraedar.

Thursday 16 October 2014

The Urban Fabric of English Cities

[now updated, thanks to @udlondon - scroll to bottom of page]
Inspired by some mapping in the US by Seth Kadish, the availability of new GIS open data, and the fact that I love looking at patterns of urban form, structure and density, I have created a comparative graphic showing the building footprints of nine English cities, with London at the centre (just because it's biggest). I have done this in a very simple way, with all cities mapped at a scale of 1:125,000 in the full size versions (which are massive), plus one small scale bar and a little explanatory text. Here's what it looks like:

The urban fabric of English cities (black/red, medium res)

This graphic does a good job - in my view - of demonstrating the compactness or otherwise of the cities in question. It also illustrates how tightly-bounded some places are and how under-bounded others are. For example, Liverpool is very dense and compact in contrast to Leeds but this really is a boundary effect because the size of the local authorities differs so much. The urban area of 'Liverpool' extends far beyond the boundaries of the local authority area, which is what I show above. I wanted to compare the local authority areas rather than the wider city-region because I wanted to highlight this boundedness issue and compare like with like in terms of formal administrative areas. London is obviously a bit different so I've shown the 33 constituent parts of Greater London.

Take a closer look at the graphic by clicking on the two larger images below - one in white and one in black. They are both just a bit bigger than A0 paper size in their full size versions in the zipped folder below so if you want to take a really close look, download them. I've also uploaded smaller-sized versions in the same folder. I deliberately didn't include more information on the graphic itself, but at the bottom of the post you'll see the population of each city in 2011 (which relates to the individual city images), plus its urban area and metropolitan area population. The population of Greater London in 2011 was 8.2 million (compared to 4.4 million for the other cities shown). The cities I selected are the English members of the Core Cities group, which now also includes Glasgow and Cardiff.

Click here for a full screen white version

Click here for a full screen black version

Download a zipped folder with black and white versions in different sizes.

Update: the @udlondon people got in touch via twitter to show their attempt at fitting the core cities inside the London boundary - as below - so this inspired me to try the same with the original data. The first image below is the original @udlondon artwork and the next one is my attempt using GIS. Finally, as a reminder that nothing is ever really new, I have added a similar map which we found as part of the JR James urban image archive which we launched last year. This version has 13 different cities.

A manual approach to GIS!

My attempt at the same thing, using QGIS - full size

Some of the boundaries were a bit different in those days

Urban area
Metropolitan area

Totals: the population of the 8 city local authority areas is 4.4 million, for their urban areas it is 9.8 million and for their metropolitan areas it is 16.5 million. I may compare metropolitan areas next time, but mapping this is a little more time consuming.

Saturday 11 October 2014

Flow mapping with QGIS

[Now updated with sample data file - see Step 1.]
I've written quite a bit about flow mapping with GIS in the past, including on this blog, and in a couple of academic papers. Previously, I'd used ArcView 3.2, ArcGIS 9 or 10 and MapInfo. MapInfo in particular has been my 'go to' GIS for mapping large flow matrices, thanks to a very short line of MapBasic code explained to me by Ed Ferrari. Others, such as James Cheshire, have used R to great effect, but this post is instead about flow mapping with QGIS, which I am extremely impressed with for its flow map capabilities. I've posted many of my QGIS flow maps on my twitter but in this post I want to explain a little bit about the method so others can experiment with their own data. Here's an example of a flow map created in QGIS - though in this case it's not a very satisfying result because of population distribution, county shape and so on*.

US county to county commuting

So, to the method. If you want to create these kinds of maps in QGIS, it's mostly about data preparation. I should also add that I currently use version QGIS 2.4 but I believe the method is the same in any version. Here's the ingredients you need.

1. A file with some kind of flow data, such as commuting, migration, flight paths, trade flows or similar. There should be columns with an origin x coordinate, origin y coordinate, destination x coordinate, destination y coordinate, some other number (such as total commuters) and any other attributes your dataset has (such as area codes and names). Here's an example csv file of global airline flows, if you want to experiment - it's the one from the screenshots below. I put it together using data from OpenFlights - by combining the airports.dat and routes.dat files. 

2. Once you have a file with the above ingredients, you then need to create a new column which has the word 'LINESTRING' in it, followed by a space, an open bracket, then the origin coordinates separated by a space, followed by a comma and a space, then the destination coordinates separated by a space and then a close bracket - as you can see below. You don't actually need to call the column 'Geom' as I have below, but when you import the file into QGIS it will ask you which column is the 'geom' one. You can create the new column in Excel by using the 'concatenate' function. If you're not familiar with it, there are loads of explainers online.

This bit probably takes the most time

3. Once you have your data in this format, you need to save it as a CSV so it's ready to import into QGIS. From within QGIS, you simply click on the 'Add Delimited Text Layer' button (the one that looks like a comma) and then make sure your settings look like the example below.

Make sure you click the right import button
Import CSV dialogue in QGIS - should be on WKT

4. Once you've done this, you simply click OK and wait a few seconds for QGIS to ask which CRS (coordinate reference system) you want to use. Select your preferred option here and then wait a few more seconds and QGIS will display the results of the import. You can then right click on the new layer and Save it as a shapefile, or your other preferred format. In the screenshot example above, the file with c60,000 airline flows took only about 10 seconds to appear on my fairly average PC running 64 bit Windows 7. I also tried it with 2.4 million lines and it only took about a minute. If you try this in ArcGIS - in my experience - it normally doesn't work with that many flows but MapInfo will handle it okay, but take longer. However, QGIS will render it more nicely as it handles transparency in a more sophisticated way and with hundreds of thousands of flows you usually have to set the layer transparency to 90% or higher.

The results, once you've done a bit of symbolisation and layer ordering, will look like some of the examples below.

Rail flows

All commuter flows

Bus flows - no labels, obviously

* I'm still trying to make sense of the US county to county flow map. The spatial structure of the counties and the distribution of the population make it more difficult to filter, so the above example is just a very rough (and not very satisfying) example.

Addendum: since a few people have asked, I've done a new post on how to make the lines appear to glow