Tuesday 1 December 2015

Blog retrospective

This blog is now officially archived because I'm moving on to other things (including a new blog, which I launched at the end of 2015). However, I thought it would be good to do one final post here to wrap things up - and to encourage anyone thinking about blogging to give it a go. I've not exactly set the world alight with the stuff on here, but quite a few people have found it useful and it's led to some very interesting work. Someone once even said they were a 'big fan'. Anyway, back in 2007 when I was a Research Associate at the University of Manchester, my colleague and car-sharing partner Alex Hardman encouraged me to start blogging. So I did, and here I am eight years later. It wouldn't be undertheraedar without a map, so here's one showing page views by country (click all images to enlarge).

The United States was going to overtake the UK, so I had to stop.

The blog has led to collaboration with people like Simon Rogers at Google - previously with The Guardian and Twitter, and led to lots of media contacts, news stories and idea-sharing with other academics. It's also enabled me to get some of the findings from my academic papers out to a wider audience - which is partly why I began writing it in the first place. But which posts have been the most popular? Well, that's an interesting story and here's a little chart showing the top ten - including a major outlier at number one.

Note to self - write stuff about London on new blog

Twitter obviously helped a lot. Alex Hardman also encouraged me to get on Twitter early, so I signed up almost straight away but just couldn't get it so abandoned it and came back to it relatively late, in July 2011. You can kind of see how my blog traffic developed since then in this line graph, which peaked at over 13,000 page views in October 2015.

Peaks and troughs, but reasonable level of growth over time

Where did my traffic come from? A typically eclectic range of sources but a lot from Google. You can see this in the Referring URLs and Referring Sites charts below. 

Am I big in Wisconsin?

What about the tech battles? The battle of the browsers? You can see from the data below that Chrome wins the browser war, Windows wins the operating system war and that 596 page views came from Unix. 

William Playfair invented the pie chart in 1801

Total page views all time? 403,984 (as of 2 Sept 2016), but of course there are all sorts of views on the accuracy of blogger stats vs Google Analytics, which normally come out with lower figures (often around half). But even if it was just 300,000 bots viewing my pages, I know for sure some actual humans looked at my posts because they told me, so that makes it worthwhile. 

In a world where an academic paper with 50 citations is quite a big deal, getting a few hundred thousand page views is a nice way to make you feel like you can reach a wider audience. Some people may not like some of the stuff I posted and some of it is - looking back - slightly embarrassing but then that's all part of the learning experience. 

If you've followed this site, or just looked once, thanks for taking an interest. I'll be back with new stuff - and more maps and stats - in the new year (or possibly before). If you're thinking about blogging, give it a bash. It can be hard to keep publishing content but I've found it really worthwhile.

Alasdair Rae
1 December 2015
(updated 2 September 2016)

Monday 9 November 2015

Premier league poverty, 2015

Over the past decade I've spent a lot of time looking at patterns of deprivation across the UK. One thing I've often noticed is the way football grounds regularly appear in the very poorest neighbourhoods. I've blogged on this topic a few times in the past, most notably in 2012 when I looked at the location of English Premier League grounds in relation to the deprivation level of their areas. I also noticed this in relation to the Scottish Index of Multiple Deprivation when looking at the East End of Glasgow in 2009. Given the history of football, its industrial working class origins, the development of British cities, and land values (to name just a few factors), none of this should be a surprise. But since a new English deprivation dataset was released in September, I wanted to revisit the topic and make a few maps, just to see if anything has changed. That's what I've done here - one map showing the location of each Premier League ground and the deprivation level of the area it sits in - and its wider neighbourhood. Further explanation follows below.

Note the blue area to the north east - now Highbury Square

Most areas in the neighbourhood are in the 20% most deprived

Bournemouth is the one big exception - it's at the opposite end of the scale

Stamford Bridge is in a much more mixed area than most

Selhurst Park sits right beside some more deprived areas

Goodison and Anfield look very similar  - only about half a mile apart

The wider area of Leicester's ground is more mixed

This is quite typical of much of north Liverpool

Manchester City play in the most deprived area of any top flight team

Manchester United are situated in a more mixed land use area

Newcastle United's pitch is split between areas - I've based this on majority area

Norwich City's ground is also in a more mixed neighbourhood

St Mary's is situated in one of the city's most deprived zones

Stoke also play in quite a deprived area - though there's more variation nearby

Like many newer stadiums, this ground is in a slightly more mixed area

The Welsh deprivation dataset is used here - but similar story to be told

Post-riots, much has been said of regeneration in this area of London

Watford play in a much less deprived locality

West Brom's ground is in the most deprived decile of England

Another ground split between areas - but still more deprived than not

What does all this tell us?
The most obvious thing to emerge from this simple mapping exercise is that more than half of all Premier League grounds are located in areas among the 20% most deprived in the country, but a good few are not. Two in particular - Bournemouth and Watford - are in much less deprived areas. Nonetheless, if you scroll through the maps quickly, the main colour you'll see is red (for the 20% most deprived). When I see stories in the news about the ability of sport to tackle deprivation, I'm generally all for it, but then sometimes I make a mental comparison between the wage bills of some teams and the neighbourhoods they're located in and I think we've barely scratched the surface of what's possible when we talk of the potential for elite sport to help transform poorer areas. Post-Olympics, this has kind of been forgotten. Having said this, it is good to see that the Premier League and FA's Football Foundation provides money for grass roots development in the most deprived areas as defined by this very same dataset. 

What does it not tell us?
Quite a lot, and I wouldn't want anyone to think that I've done this to pick on any one team. I'm just curious about the relationship between these football grounds and underlying patterns of deprivation because when I look at the data as I map it, I often notice the stadia. It doesn't tell us anything about cause and effect, whether teams are trying to do anything to boost the fortunes of their local areas or what the areas themselves are like to live in. If you want to know more about the underlying data, read this briefing from the Government. Does having a Premier League football team in your area make you poor? Of course not. 

Some of the grounds look the wrong shape - why?
I used building footprint data from the Ordnance Survey in the maps above and the shapes of the grounds are as they were in the original dataset - with the exception of Vicarage Road, which for some reason wasn't enclosed on one side so I made my own version. I've just added a little glow around each ground to make it stand out and then added in the footprints of all other buildings in the wider neighbourhood to help people identify nearby features and roads.

What about when a ground is split between areas?
I could have taken the average deprivation rank here and used that figure but instead I chose to use the deprivation rank of the area that the majority of the playing surface was located in. This was only really an issue for Arsenal, Newcastle and West Ham - and only really notable in Newcastle. 

Explain that 'deprivation percentile' thing again please
In England, there are 32,844 areas known as Lower Super Output Areas. These LSOAs are small areas which the government use to report all kinds of statistics, including Census data. When they publish their Indices of Deprivation, they give each one of the 32,844 areas a rank, from 1 (most deprived in England) to 32,844 (least deprived in England). Therefore, it's a relative measure that allows us to compare one area with another, all across the country. The data are often split into five or ten chunks (quintiles or deciles) for reporting purposes but here I've decided to use 'percentiles' as it's more precise. If an area is in percentile 5, it's among the 5% most deprived in England, and so on. If it's in percentile 95 (like Bournemouth's ground) then we can say it's not very deprived at all and actually highly likely to be a very affluent area. In the case of Swansea City, I've used Welsh deprivation data from 2014. This classifies places in almost exactly the same way, although there are 1,909 areas in Wales rather than 32,844. These areas have an average population of around 1,600.

Isn't this all just pointless area classification?
You might think so, but the Government use these Indices to make all sorts of important decisions, in healthcare and education for example. If you're in an area classified as being among the 20% most deprived, for example, you might find that you're eligible for some kind of funding - there are loads of examples of uses, with sport being one of many. You can find quite a few other examples in section 1.4 (p. 8) of this report. We must also remember that not all people living in areas classified as 'deprived' or 'not deprived' match that description - this dataset classifies areas not people.

When are you going to expand this to include my team?
I'm not planning to, but I'm sure it would be even more interesting than the Premier League.

On all the maps, north is up so I couldn't help notice that Manchester United seem to be the only team playing on an east to west pitch. I'm guessing most grounds don't do this so that they can avoid the setting sun problem - and in fact Old Trafford cricket ground rotated their pitch 90 degrees to avoid this problem in 2010. Shades of blue - representing the 40% least deprived areas - appear on only 7 of the maps, and only two grounds are in such areas. Red (20% most deprived areas) appear on 19 maps - only Watford is the exception. The maps for Everton, Man City, Tottenham and West Brom are entirely red - which indicates that these grounds and surrounding areas (a few hundred metres in each direction) are within wider areas classified as the most deprived in England. The very most deprived areas to appear on any of the maps are ranked 24 (beside Goodison) and 29 (beside Anfield). 

Which team do you support?
ICTFC, of course. But not very enthusiastically. 

Wednesday 28 October 2015

Mapping property conditions in Detroit

I've written here a couple of times before about Detroit, in relation to my family history and my interest from a data/urban point of view. Last year I did a short piece on the amazing Motor City Mapping project, a comprehensive effort to digitize the city's property information and provide clear data which can help the city move forward. In the first phase of the project 150 Detroiters surveyed the entire city (about 380,000 land parcels) and this information was used in the Blight Elimination Task Force's report. You can read more about all this here. This post is about mapping the results of their work and property conditions in particular. Properties are categorised as being in 'good', 'fair', or 'poor' condition, or else demolition is suggested. I mapped this for all 54 of the city's neighbourhoods, as in the image below.

One of the 54 maps I created for the City of Detroit

The 'neighbourhoods' I used are the 54 'Master Plan Neighborhoods' provided by Data Driven Detroit, and available here as open data. For each area I've shown the proportion of properties in each category, in addition to the total number of structures for each area. The image above shows the Mt. Olivet neighbourhood to the north of the city, whereas the map below shows Chadsey, to the south of the city. One of the points I wanted to make here is that although Detroit has its problems - well documented - it's not necessarily the burning relic that some portray it to be. Anyone looking for an insight into all of this would do well to listen to fifth generation Detroiter George Galster's Driving Detroit lecture (which he also gave here in Sheffield last year).

Detroit is mostly 'good' - see, it's not so bad!

The other reason that I'm returning to the topic is because I teach using this fantastic dataset. It's fascinating in itself but it's also a great example of how cities can create, and then use, open data to help turn things around - or at least begin to. Half the battle is knowing where to start and the Motor City Mapping project provided a solid base for this.

My family history has always been tied to Detroit and although I've never lived there every time I've visited I liked what I saw and the people were really friendly. I'm probably biased but aren't we all? Last winter I found a few pieces of Detroit history in the family archives, which I'll share here as they are a little slice of the city that no longer exists. First, one of my grandparents' wedding photos, from late 1929, then a description of the event from a Scottish paper - followed by a photo of the Danish Brotherhood Hall mentioned in the piece. 

1920s style!

Yes, they had Comic Sans in 1929...

The 'Danish Brotherhood Hall'

Note in the image above of the Danish Brotherhood Hall a Danish flag has actually been painted over and if you look at it on Google Street View you can see the 'DB' inscription on the building's upper centre section. Detroit Urbex published a really fascinating history of this building, which is well worth a read. Anyway, it's interesting for me that the story of Detroit, as it were, is also wrapped up a little bit in my family history and can be seen in the images associated with my grandparents' wedding.

Back to the maps - I've shared them all via Google Drive so that anyone can access them and use then if they so wish. 

Property condition maps for all 54 neighborhoods

The only other thing to say is that if you download the original dataset you'll see that each land parcel also has a URL with a photo of it - quite an achievement when you realise there are nearly 380,000 of them. 

Sunday 18 October 2015

Glowing lines in QGIS

In one of my previous QGIS posts, on flow mapping, I outlined a method for mapping origin-destination data related to movements, rendered as a collection of straight lines from point a to b. One thing I didn't do in that post was explain how you get the 'glow' effect to make the lines appear brighter at higher densities (example below).

A little glowing flow map example from my US commuting map

Since a few people have asked about it, I thought I'd share it - and thanks to Nyall Dawson and all the other QGIS developers for making this possible. If I begin with a commuting flow dataset I made for England and Wales and just add it to QGIS, here's what I get (click on the individual images to see them full size):

We can see the country outline, that's about it

Next, let's try reducing the default line width from 0.26 to 0.1 and see what happens...

This is a bit clearer, but still not very useful.

We could darken the background (via Project > Project Properties > General) to make the lines stand out more...

This is getting a bit better now, but still not great

Okay, let's now change the colour and introduce some feature transparency and see how this looks:

Definitely an improvement, but not great

Note how this was done, if you don't already know:

So far, so good. But what about the glow effects? That's where feature blending mode comes in - as you can see below:

With a line width of 0.1, transparency of 90% (because I have a couple of million lines here) and a Feature blending mode set to 'Addition' here's what I get:

You may need a different transparency % in your data

What on earth do all the different blending modes do? There's 'Screen', 'Multiply', 'Dodge' and many more but it's not immediately obvious so here's a little summary from the QGIS 2.8 documentation pages on the subject:

To see the different impact each feature blending mode has, it's best to try them - for example, if you want a less 'glowy' version of the previous example above, you could used 'Dodge', as shown below:

Similar to the previous one, but this is 'Dodge'

Of course, you could also decide that you want the lines to be different colours and symbolise them differently based on their length. With this, you take a different approach and it would look something like the image below, where I've used reds:

No feature blending here, just layer symbology and ordering

To achieve the above, you'd have to have a line length field (but that's easy in QGIS) and then color different lengths slightly differently and then use layer ordering. This too requires a good bit of experimenting to get right (and the ones shown here are far from perfect examples) but here's an example from the layer properties dialogue:

Note: click 'Advanced' to see symbol levels

The only other thing to mention is that when you zoom in you'll see things differently and perhaps need to change the symbology to suit the zoom level. You can see this for the example below where I've zoomed in to London and changed the transparency down to 70%:

Now we can begin to make more sense of the flows

If you want to know how to create the flow lines in the first place, check out my previous post on the subject, where I also provide a sample dataset to work with. Once you've got things looking as you want them, you can then add labels and all sorts of other things to make your map more informative. Note that I used QGIS 2.10 here but this should work from QGIS 2.2 and above.

Thursday 1 October 2015

Are map legends too lazy?

A somewhat click-baity blog title, but I wanted to crowdsource some knowledge from proper carto/viz people, so if you have any insights on what I write, please feel free to get in touch via twitter or e-mail. No doubt what I write about below already has a name but I don't know what that is and I haven't seen this functionality in proprietary or open source GIS. By asking 'are map legends too lazy', what I really mean is are GIS-made choropleth map legends doing enough for us in their current form - and is there an opportunity for us to add some new functionality which enhances the communicative power of the humble choropleth legend? An example... look at the map below, which I created in QGIS. It's a map of a new deprivation* dataset for England, focused on the local authority of Birmingham.

Deprivation choropleth, with legend and inset map

This dataset is typically understood and discussed in terms of deciles, hence the classification used above. The dataset goes from decile 1 (most deprived) to decile 10 (least deprived) - within the context of England as a whole. Cities like Birmingham tend to have a higher proportion of their small areas in the most deprived decile, and in map form this results in lots of red and not much blue, as you can see above. If you wanted to find out how many areas were in decile 1 (most deprived) you would know that it was 'a lot' but because the inner-urban areas tend to be smaller in size (relative to the blue ones), making an accurate assessment visually is quite difficult. In fact, owing to the different sizes of the spatial units, you could quite easily take the wrong message away from a choropleth like this.

My solution? Make the legend do more work. Make it tell us not just what the colours represent but also what proportion of areas are in each category by scaling the colour patches relative to the proportion of areas in each choropleth class - in the form of a bar chart - what I call a 'bargend' (jump in at this point if you already have a name for this). You could, without much effort, add in a table or a separate chart, but I want the legend to actually be the bar chart. In part, I was inspired to attempt this in QGIS because of Andy Tice's prototype scatterplot layout and his comment that he'd like to get it working in the QGIS Atlas tool. Here are some results, followed by further thoughts.

This time, I've added in a 'bargend'

A closer look at the bargend for Birmingham

When I do a visual comparison of the Birmingham map, I'm surprised that the least deprived (i.e. richest) areas only account for 1.7% of the total, because I'm drawn to the blue of the choropleth. This could be solved though a cartogram approach, but I wanted to preserve geographical accuracy here. I'm not surprised that almost 40% of areas are in the poorest decile - that's what I'd expect from what I know about deprivation in English inner-cities. Let's look at another example below.

The London Borough of Tower Hamlets

This time I've shown one of the poorest parts of London - Tower Hamlets. An interesting aside here is the emergence of one area in decile 9 (i.e. richer area) compared to the pattern from 2010. This is almost certainly linked to gentrification and displacement rather than individuals becoming 'less deprived'. I find the extra information provided in the bargend very useful analytically/cognitively compared to the simple legend we would normally use.

Now let's look at a few more...

Liverpool contains relatively few 'non-deprived' areas

Like Liverpool, Manchester has many poor areas

Middlesbrough has the highest % in the most deprived decile

One of the benefits of this approach, in my view, is when you compare different places - you can click on an image above and then go forward and backward to make comparisons. The added value of the bargend approach means that you have precise details of the proportion of areas in each decile and you can make more meaningful comparisons. You could just do this with a table or chart and dispense with the map altogether, but then you'd lose the very important ability to identify where precisely individual areas are and where spatial concentrations of deprivation (and affluence) exist. Talking of affluence, it's only fair that I show you some maps of places that are at the opposite end of the scale. Two prime examples...

A beautiful part of the world, but very blue

Hart, you almost broke my chart (highest % in decile 10)

I'll wrap up with a few points.

1. I'd love it if someone could find a way to add in this functionality natively in QGIS. I had to do a bit of thinking and tinkering to automate this in the Atlas tool, but I now have it working well and everything dynamically updates and re-positions itself once you set it up.

2. I wouldn't always want to use a bargend, but I think it's something that adds value without taking up much more space (if any) in map layouts.

3. I'm trying to think of any drawbacks of this approach, but I can't. I'm happy for others to chip in with ideas on this.

4. I think 'bargend' is a terrible word. Please tell me it already has a nice sounding name, or invent one for me. [update: in my rush to coin a phrase, and because I was mapping deciles as categories - as in a bar chart - I was thinking about bar charts rather than histograms. This is really a histogram but it uses named categories (deciles) which in theory could be re-ordered and the chart would still make sense, so perhaps the bargend retains qualities of both and, anyway, a histogram still uses bars]

5. Are map legends too lazy? Not really, but they can sometimes work harder.

Andrew Wheeler very kindly got in touch to share a few relevant papers on the subject. The Kumar paper is very close to what I propose (though he does the chart for the entire dataset rather than a subset) and he calls it a 'Frequency Histogram Legend' - more accurate perhaps, but less catchy. The Dykes et al. paper is very interesting and I like the treemap approach.

Hannes (@cartocalypse) also got in touch to say he likes the idea and he's suggested 'legumns', which is also useful (but more difficult to pronounce!).

I'll add more on the topic if people respond.

* Just in case the use of this word sounds odd to you, we use the word 'deprivation' in the British context in studies of urban poverty/disadvantage but it's not exactly the same thing. I've written about this in previous academic papers but to all intents and purposes more deprived means 'poorer' and less deprived means 'richer'. In the maps above, you could say red: poor and blue: rich and you wouldn't be wrong (ecological fallacies notwithstanding).