Saturday 11 October 2014

Flow mapping with QGIS

[Now updated with sample data file - see Step 1.]
I've written quite a bit about flow mapping with GIS in the past, including on this blog, and in a couple of academic papers. Previously, I'd used ArcView 3.2, ArcGIS 9 or 10 and MapInfo. MapInfo in particular has been my 'go to' GIS for mapping large flow matrices, thanks to a very short line of MapBasic code explained to me by Ed Ferrari. Others, such as James Cheshire, have used R to great effect, but this post is instead about flow mapping with QGIS, which I am extremely impressed with for its flow map capabilities. I've posted many of my QGIS flow maps on my twitter but in this post I want to explain a little bit about the method so others can experiment with their own data. Here's an example of a flow map created in QGIS - though in this case it's not a very satisfying result because of population distribution, county shape and so on*.

US county to county commuting

So, to the method. If you want to create these kinds of maps in QGIS, it's mostly about data preparation. I should also add that I currently use version QGIS 2.4 but I believe the method is the same in any version. Here's the ingredients you need.

1. A file with some kind of flow data, such as commuting, migration, flight paths, trade flows or similar. There should be columns with an origin x coordinate, origin y coordinate, destination x coordinate, destination y coordinate, some other number (such as total commuters) and any other attributes your dataset has (such as area codes and names). Here's an example csv file of global airline flows, if you want to experiment - it's the one from the screenshots below. I put it together using data from OpenFlights - by combining the airports.dat and routes.dat files. 

2. Once you have a file with the above ingredients, you then need to create a new column which has the word 'LINESTRING' in it, followed by a space, an open bracket, then the origin coordinates separated by a space, followed by a comma and a space, then the destination coordinates separated by a space and then a close bracket - as you can see below. You don't actually need to call the column 'Geom' as I have below, but when you import the file into QGIS it will ask you which column is the 'geom' one. You can create the new column in Excel by using the 'concatenate' function. If you're not familiar with it, there are loads of explainers online.

This bit probably takes the most time

3. Once you have your data in this format, you need to save it as a CSV so it's ready to import into QGIS. From within QGIS, you simply click on the 'Add Delimited Text Layer' button (the one that looks like a comma) and then make sure your settings look like the example below.

Make sure you click the right import button
Import CSV dialogue in QGIS - should be on WKT

4. Once you've done this, you simply click OK and wait a few seconds for QGIS to ask which CRS (coordinate reference system) you want to use. Select your preferred option here and then wait a few more seconds and QGIS will display the results of the import. You can then right click on the new layer and Save it as a shapefile, or your other preferred format. In the screenshot example above, the file with c60,000 airline flows took only about 10 seconds to appear on my fairly average PC running 64 bit Windows 7. I also tried it with 2.4 million lines and it only took about a minute. If you try this in ArcGIS - in my experience - it normally doesn't work with that many flows but MapInfo will handle it okay, but take longer. However, QGIS will render it more nicely as it handles transparency in a more sophisticated way and with hundreds of thousands of flows you usually have to set the layer transparency to 90% or higher.

The results, once you've done a bit of symbolisation and layer ordering, will look like some of the examples below.

Rail flows


All commuter flows


Bus flows - no labels, obviously

* I'm still trying to make sense of the US county to county flow map. The spatial structure of the counties and the distribution of the population make it more difficult to filter, so the above example is just a very rough (and not very satisfying) example.


Addendum: since a few people have asked, I've done a new post on how to make the lines appear to glow