Friday, 28 August 2015

Mapping the American Commute

One of my summer projects this year has been attempting to map the American commute, following earlier work on a similar subject. Put simply, I've attempted to put together a map which shows commuting connections between locations in the contiguous United States, using the most fine-grained data I could find. Some of the results of this went into a recent piece in WIRED, and also CityMetric, and the larger piece of work it's based on is part of on-going research into the best ways of mapping commuting flows. The main images are below, followed by some more technical information. For now, all you need to know is that these images show commuting connections of 100 miles or less between Census tracts in the lower 48 states. You'll have to forgive me if your city isn't labelled! 

Higher resolution image available here

And now some zoomed in versions...

Zoom in of the west coast

Texas, and beyond!

Interesting patterns of connectivity in the Midwest

Look closely for some interesting inter-connections

The famous BosWash megalopolis

But this just shows where people live, doesn't it? Yes it does. But it also shows how the places where people live connect with other places from a functional economic point of view, at a fairly fine-grained level. It offers a slightly different view than just looking at the urban fabric alone which, I might add, is interesting in itself. Mapping flows like this is not exactly new, as this paper from Arthur Robinson (1955) on Henry Drury Harness (1837) demonstrates. Nonetheless, I haven't seen anyone map travel to work at this resolution for the United States, so I thought I'd have a go myself. 

If you spend some time looking at the big version of the map you can begin to see how places connect and where there are obvious disconnections, even between places that are not that far apart. One thing that you can pick up from the complete dataset (but not this batch of maps) is the growth of mega-commuting, as explained by Melanie Rapino and Alison Fields of the United States Census Bureau. 

Background information: the data I used is the most recent tract-to-tract journey to work dataset from the American Community Survey. This dataset covers journeys to work between the c74,000 census tracts in the United States and the complete dataset has around 4million interactions. I mapped this in QGIS, using methods I've described previously on this blog. The tricky bits were dealing with the messy FIPS codes, dealing with the size of the dataset, and trying to decide what to label. There is quite a bit of error in the dataset (as acknowledged by the ACS people) and each individual flow line has a margin of error value associated with it, from which I also calculated the coefficient of variation. This is explained in a more detailed working paper, which I expect to publish in the coming months.

Tuesday, 4 August 2015

"The Regional World", version 2

I recently came back to CartoDB to do a bit of experimenting for some GIS work I'm doing this autumn, so I decided to revisit a topic I looked at before: sub-national regions of the world. In a previous version I posted via Twitter I took sub-national boundaries of the world and put together an interactive map (in about 15 minutes, so it wasn't very good). I've now produced a better one. It's not perfect but I have managed to add in an equal area projection version and other simple features - such as scale-dependent labelling and line styling.

The Regional World - version 2

According to Wikipedia, the largest sub-national divisions in the world are the Sakha Republic (Yakutiya) in Russia, Western Australia, and Krasnoyarsk Krai, also in Russia. The first two are more than ten times the size of the UK (which is 244,000 sq km) and number three almost is. If you click on the link above to go to the map then you'll see that you can also click on the equal area version. I did this because web maps often default to the Mercator projection, which causes massive distortion towards the poles and leads people into thinking Greenland is bigger than Africa, which of course it isn't.

The Regional World - equal area projection

The equal area projection does of course mean that areas towards the poles are extremely distorted, but that's part of the deal with some map projections. I've taken the administrative boundaries at face value, but of course they may not be 100% accurate, as the authors of the data acknowledge:

"This is the toughest dataset to keep current. Unlike the United States, other countries constantly rearrange their admin-1 units, slicing and combining them on a regular basis."

Read more about the data

You'll notice that I have put links to a small number of countries on the main map. I chose these because I find them interesting, that's all. This was part experiment with CartoDB and a little SQL (projection) and CSS (scale-dependent styling), part GIS project, part teaching material, partly driven by my interest in regions more generally, and part pre-holiday wind-down. In relation to the latter, just for fun, I have hidden two little artefacts in the main map that only appear when you zoom to a certain level at two places on earth. 

Can you find them? 

Answers via Twitter or e-mail...

Monday, 20 July 2015

Urban footprints: some building outline data sources

This is an informational post about where to find building outline data, which I've used a lot in previous GIS projects. It might also be of interest to architects, engineers and anyone interested in urban studies and planning more generally. I like using this kind of data to explore cities as it gives us a good idea of the layout of the urban fabric, as in the example below (New Orleans). The links mainly refer to data from the US, Canada and Great Britain but other parts of the world are covered to various extents by OpenStreetMap.

New Orleans

Let's start big, with OSM... Steve Bernard has produced an excellent video which explains how you can get OpenStreetMap data directly into QGIS very simply - he uses Madrid in the example. The accuracy and coverage varies a great deal across the world, so you need to bear this in mind when downloading and using it - but on the whole it is a fantastic resource. The example below shows Mogadishu, where the coverage is incomplete for buildings but pretty good for the road network. 

© OpenStreetMap contributors

Another useful OSM-related resource with decent global city coverage is CAD Mapper, where you can download areas up to 1km square for free. However, I'm focusing on open data today so will not go into detail on this. The best OSM download source is I think GEOFABRIK (German for 'geo factory), a German GIS consultancy who extract and process OSM data and then make it available for free online. It's really nicely structured and easy to find what you're looking for. Here's the download page for New Zealand, for example - followed by the contents, where you can see the building data on top of a current OSM base map. At time of writing, the zipped shp folder for the whole of New Zealand was 146MB.

The New Zealand GEOFABRIK download page (20 July 2015)

Auckland, NZ - very good building coverage here

The OSM sources are great, since the licence is very generous and you can use the data for just about anything, so long as it's properly cited. However, many towns, cities and counties across the world also provide building footprint or outline data (the terminology varies from place to place) so I've put together a list below of ones I know about. Some of them (e.g. Detroit, NYC) cover land parcels or tax lots so are slightly different but in the main it's just building outlines. I've included visuals for some of the datasets, so you can get an idea of what they look like.

New York City - from the BYTES of the BIG APPLE website you can download the MapPLUTO dataset, for all 5 Boroughs in New York City. Tax lot level rather than building outlines, but it's an extremely rich dataset with loads of useful land use planning variables in it, including 'year built' and number of floors. A little sample of the data are shown below, for the area around Central Park.

A little sample of the data (using Qgis2threejs)

Chicago - the building footprints layer is avaiable in two versions online, one of which says it is deprecated but I've heard from the Chicago GIS team that this isn't the case. It's just that due to limited staff the dataset is only edited when necessary. Also contains a 'year built' and height variable.

San Francisco - another really good city buildings dataset, from SF OpenData. Also lots of useful variables in this dataset, including height. I really like this one.

Dallas - you'll probably get a disclaimer box in a pop-up when you go to download this. I've linked to the general GIS page and the file you want is called Structures (Building Footprints) in the Planimetric Data section - it's about 81MB to download and the unzipped file is well over 100MB.

Atlanta - again, I've linked to the GIS page, this time from the City of Atlanta and you need to download the 'Impervious Buildings' layer. If you're looking to map the sprawl of Atlanta, this won't work as it covers the City area only. Still, a very useful dataset.

Denver - excellent open data from Denver. This dataset covers all permanent structures and buildings for a 152 square mile area of the City and County of Denver. Available in a number of different formats.

Seattle - this dataset was created in 2009 by Pictometry International Corp but is now in the public domain. It is available via the City of Seattle's data website.

Los Angeles - this is a fantastic dataset for the County (not just the City) of Los Angeles, which is the most populous county in the United States (just over 10 million). Made available via the LA County GIS Data Portal. It is a little hefty (581MB) so be careful! In the example below I show all the buildings in LA County but the City of Los Angeles in dark shading, just to emphasise its crazy shape.

Boston - this was created in 2012 and is available via the City of Boston. Contains a number of different fields, including base elevation of the structures, the elevation of the highest point above sea level and fields on building type.

Detroit - like New York, not strictly a buildings outline file but instead a property lot level dataset. Very impressive dataset produced by Data Driven Detroit's Motor City Mapping project. I've used this data a lot in talks and teaching as it's a really good example of its type.

Now some links to further datasets which I know of but haven't used that much...

Washington DC - link is to the download page, but direct link to zip is here (559MB unzipped)

Baltimore - the top link on this page

Philadelphia - via OpenDataPhilly

Massachusetts - buildings for a wide range of towns and cities in the state

Boulder - this is from Boulder County, Colorado. Available in a number of different file formats.

Bloomington, Indiana - one of many smaller cities with excellent geodata

New Orleans - an excellent dataset, not just because of the unusual shape of the city!

Toronto - don't be confused by the '3D Massing' terminology here. Scroll down to the 'Data download' section

Vancouver - doesn't cover the whole city and they were digitised in 1999 but still a useful dataset.

Waterloo - this is from the Region of Waterloo and was up to date as of January 2014.

Hobart, Tasmania - an nice example of building data from Hobart in Australia. Contains a 'year constructed' variable.

Wellington, NZ - can't overlook New Zealand! I think you need to register to download this but it's Creative Commons 3 so still open. 

The list wouldn't be complete without mentioning OS OpenData for Great Britain, provided by Ordnance Survey. A new dataset with detailed buildings became available in March (the OS Open Map - Local) dataset. The building data is a very small part of this collection but one I find very interesting. I've patched together a few cities here to get the ball rolling but you can download your own. There's also a 'tile finder' to help you identify which OS tile you need to cover your area of interest. 

This could save you some time 

I think this just about covers it. Get in touch if you have any other great data sources for building outlines.

Sunday, 12 July 2015

Mapping the Polycentric Metropolis: journeys to work in the Bay Area

I’ve recently been writing and thinking about polycentric urban regions, partly because I’m interested in how places connect (or not) for one of my research projects, and partly because I’ve been experimenting with ways to map the connections between places in polycentric urban regions. There was quite a lot of the latter in Peter Hall and Kathy Pain’s ‘The Polycentric Metropolis’ from 2006 but given that the technology has moved on a little since then I thought I’d explore the topic in more detail. Mind you, I’ve also been looking back on Volumes 1 to 3 of the Chicago Area Transportation Study of 1959 as a reminder that technology hasn’t moved on as much as we think – their ‘Cartographatron’ was capable of mapping over 10 million commuting flows even then (though it was the size of a small house and required a team of technicians to operate it – see bottom of post for a photo).

Are you part of the big blue blob?

Anyway, to the point… What’s the best way of mapping polycentricity in an urban region? For this, I decided to look at the San Francisco Bay Area since it has been the subject of a few studies by one of my favourite scholars, Prof Robert Cervero of UC Berkeley. Also, a paper by Melanie Rapino and Alison Fields of the US Census Bureau identified the Bay Area as the region with the highest percentage of ‘mega commuting’ in the United States (traveling 90 or more minutes and 50 or more miles to work). Therefore, I decided to look at commuting flows between census tracts in the 9 counties of the Bay Area, from Sonoma County in the north to Santa Clara County in the south. I’ve used a cut-off of 30 miles here instead of the more generous 50 mile cut-off used by Rapino and Fields. I also mapped the whole of the United States in this way, but that’s for another day.

The series of maps below illustrate both patterns of commuting in the Bay Area and the different approaches I’ve taken in an attempt to capture the essence of polycentrism in the area. I don’t attempt to capture the misery of some of these commutes, since for that I’d need a different kind of technology. But, I do think the animations in particular capture the polycentric nature of commuter flows. If you’re represented by one of the dots in the images below, thanks a lot for taking part!

Let’s start with a simple representation of commutes of over 30 miles from San Francisco County (which is coterminous with the City of San Francisco). The animated gif is shown below and you can click the links to view the sharper video file (mp4) in your browser (so long as you're on a modern browser). The most noticeable thing here is the big blue blob© making its way down from San Francisco to Palo Alto, Mountain View and Cupertino in Santa Clara County. In total, the blue dots represent just over 15,000 commuters going to 803 different destination census tracts. I’m going to take a wild guess and suggest that some of these commutes are by people who work at Stanford, Google and Apple. But it probably also includes people working at NASA Ames Research Center, Santa Clara University and locations in San Jose. 

View video file in browser - or click image to enlarge gif

These patterns aren’t particularly surprising, since there has been a lot of press coverage about San Francisco’s bus wars and commutes of this kind. However, there is a fairly significant dispersal of San Francisco commuters north and east, even if the numbers don’t match those of the big blue blob. By the way, from San Francisco it's about 33 miles to Palo Alto, 39 miles to Mountain View, 42 to Cupertino and 48 to San Jose. 

The first example above doesn’t reveal anything like the whole story, though. There are actually quite a lot of commuters who travel in the opposite direction from Santa Clara County to San Francisco but more widely the commuting patterns in the Bay Area – a metro area of around 7.5 million people – resembles a nexus of mega-commuting. This is what I’ve attempted to show below, for all tract-to-tract connections of 10 people or more, and no distance cut-off. The point is not to attempt to display all individual lines, though you can see some. I’m attempting to convey the general nature of connectivity (with the lines) and the intensity of commuting in some areas (the orange and yellow glowing areas). Even when you look at tract-to-tract connections of 50 or more, the nexus looks similar.

Click image to view larger version

Stronger connections - click image to view larger version

If we zoom in on a particular location, using a kind of ‘spider diagram’ of commuting interactions, we can see the relationships between one commuter destination and its range of origins. In the example below I’ve taken the census tract where the Googleplex is located and looked at all Bay Area Commutes which terminate there, regardless of distance. In the language of the seminal Chicago Area Transportation Study I mentioned above, these are ‘desire lines’ since this represents ‘the shortest line between origin and destination, and expresses the way a person would like to go, if such a way were available’ (CATS, 1959, p. 39) instead of, for example, sitting in traffic on US Route 101 for 90 minutes. According to the data, this example includes just over 23,000 commuters from 585 different locations across the Bay Area. I've also done an animated line version and a point version, just for comparison.

Commuting connections for the Googleplex census tract

Animated spider diagram of flows to Mountain View

Just some Googlers going to work (probably) mp4

Looking further afield now, to different parts of the Bay Area, I also produced animated dot maps of commutes of 30 miles or more for the other three most populous counties – Alameda, Contra Costa and Santa Clara. I think these examples do a good job of demonstrating the polycentric nature of commuting in this area since the points disperse far and wide to multiple centres. Note that I decided to make the dots return to their point of origin – after a slight delay – in order to highlight the fact that commuting is a two way process. The Alameda County animation represents over 12,000 commuters, going to 751 destinations, Contra Costa 25,000 and 1,351, and Santa Clara nearly 28,000 commuters and 1,561 destinations. The totals for within the Bay Area are about 3.3 million and 110,000 origin-destination links.

Alameda County commutes of 30+ miles mp4

Contra Costa County commutes of 30+ miles mp4

Santa Clara County commutes of 30+ miles mp4

Finally, I’ve attempted something which is a bit much for one map, but here it is anyway; an animated dot map of all tract-to-tract flows of 30 or more miles in the Bay Area, with dots coloured by the county of origin. Although this gets pretty crazy half way through I think the mixing of the colours does actually tell its own story of polycentric urbanism. For this final animation I’ve added a little audio into the video file as well, just for fun.

A still from the final animation - view here

What am I trying to convey with the final animation? Like I said, it's too much for a single map animation but it's kind of a metaphor for the messy chaos of Bay Area commuting (yes, let's go with that). You can make more sense of it if you watch it over a few times and use the controls to pause it. It starts well and ends well, but the bits in the middle are pretty ugly - just like the Bay Area commute, like I said.

My attempts to understand the functional nature of polycentric urbanism continue, and I attempt to borrow from pioneers like Waldo Tobler and the authors of the Chicago Area Transportation Study. This is just a little map-based experimentation in an attempt to bring the polycentric metropolis to life, for a region plagued by gruesome commutes. It’s little wonder, therefore, that a recent poll suggested Bay Area commuters were in favour of improving public transit. If you're interested in understanding more about the Bay Area's housing and transit problems, I suggest watching this Google Talk from Egon Terplan (54:44).

Notes: the data I used for this are the 2006-2010 5-year ACS tract-to-tract commuting file, published in 2013. Patterns may have changed a little since then, but I suspect they are very similar today, possibly with more congestion. There are severe data warnings associated with individual tract-to-tract flows from the ACS data but at the aggregate level they provide a good overview of local connectivity. I used QGIS to map the flows. I actually mapped the entire United States this way, but that’s going into an academic journal (I hope). I used Michael Minn’s MMQGIS extension in QGIS to produce the animation frames and then I patched them together in GIMP (gifs) and Camtasia (for the mp4s), with IrfanView doing a little bit as well (batch renaming for reversing file order). Not quite a 100% open source workflow but that’s because I just had Camtasia handy. The images are low res and only really good for screen. If you’re looking for higher resolution images, get in touch. It was Ebru Sener who gave me the idea to make the dots go back to their original location. I think this makes more sense for commuting data.

The Cartographatron: Information and images on the 'Cartographatron' used in the Chicago Area Transportation Study (1959) are shown below.

From p.39 of CATS, 1959, Vol 1

From p.98 of CATS, 1959, Vol 1

Monday, 22 June 2015

Where is all London's new housing?

Some of my recent work on housing markets, mortgage lending and housing search has led me to consider the question of where, exactly, London's new housing is located. On a recent visit to King's Cross I was amazed by the sheer scale of development, particularly all the new flats. Because I've been working with the data for another project - and recently re-examined it for a project proposal which explicitly didn't focus on London - I thought it would be interesting to see whether my perception of the flats boom is based in reality. Of course it is! 

The maps below are based on all new build homes sold in London from 1 June 2010 to the end of April 2015 (the most recent data). During this time, according to HM Land Registry 'price paid' data, there were 42,938 transactions on 42,813 properties. This indicates that quite a few properties are not being picked up in this dataset - e.g. compare it to the completions data from the London Datastore. Nevertheless, the patterns and distribution of property types is revealing. 88.8% of transactions were for flats, 7.1% for terraced houses, 2.4% for semi-detached properties and 1.7% for detached houses.

All property types


Terraced houses

Semi-detached houses

Detached houses

Clearly, the mix of new housing - and its relative low volume - is something that many people have commented on before, but I've not seen many people look at the geographical distribution in this way. The important questions arising from these maps - as ever - is why are things the way they are? That's something the maps can't tell us but it does provide an interesting starting point for debate. The discrepancy between the Land Registry data and data on completions is also not surprising owing to the way new build housing is sometimes sold, but it would be interesting to explore this more in future. If you do happen to have a few million quid to spare, good luck finding a new detached London house to live in!

A note on the maps... I've geocoded the price paid data using Ordnance Survey's Code-Point Open dataset, which can match sales to postcode units, rather than street addresses. The transparent bubble map is of course far from perfect but I've used it here to convey the scale and location of new housing, rather than to offer a precise fix. So long as it gives the impression of there being a massive splodge of newbuild flats in Central London that'll do for now. I am aiming to highlight the general scale and geography of development as a fairly quick experiment to see what might be done with the data. No plans to make it interactive (update: the best laid plans of... see map point datadump below).

Update (1715, 22 June 2015): I fixed the glitches, which were caused by a rogue space here and there in the codepoint open file. Moral of the story? Build more houses (I think). Always a giveaway when there aren't many dots in Wandsworth. The numbers were correct all along though. Finally, I've added in some information from HM Land Registry on properties not included in the price paid data.

Data excluded from Price Paid dataset - link

Friday, 19 June 2015

Creating an English green belt atlas

UPDATE: I've fixed the glitches in version 1 and compiled a spreadsheet with the data. See new download at the bottom of the post.

I've blogged before about green belt, and also written about the underlying data in the press. Now that the data are open, I've finally got round to finishing a little project I meant to complete ages ago. I was prompted to do this during a recent visit to my department by Prof Bob Barr, a legend in the data and GIS worlds. Bob said it would be good to know what percentage of the land area in each local authority in England was covered by green belt. I agree, so here are the results of my analysis (using 2014 green belt data) from Version 1 of my English green belt 'atlas' (actually lots of individual images to keep the file size down). Here's a snapshot of one of the maps...

Green belt land in Cheshire East

And another, this time from Birmingham. You can see that I've dimmed the background so that you can get a sense of other green belt land in the areas I've mapped.

Birmingham green belt land

Finally, a few more from around the country...

There are some glitches in the data but my initial overview suggests the numbers are pretty accurate (see exceptions below). I hope that people might find these maps useful. If you want to use any of them, be my guest.

Download all the files here (154MB): Green Belt Atlas 2014 (version 2) (186 individual map files, plus spreadsheet)

Download just the spreadsheet: percent green belt figures for each of the 186 local authorities:

Contents of the spreadsheet (download above)

Warnings: A couple of issues with version 1... 1. The West Lancashire greenbelt area extends into the sea on the green belt shapefile available from DCLG, so the figures here are incorrect (working on a fix). 2. The figure for Ashfield is clearly wrong - not sure why, so I will fix that too. 3. Some areas have extremely low values and may not actually be in the green belt - it may instead be down to the accuracy of underlying data. 4. Mole Valley currently missing, am looking into why. UPDATE: I looked again at the original Green Belt shapefile from DCLG and found that Mole Valley had the same code as Ashfield, so I fixed that and there's now a map for Mole Valley. New Forest was also assigned two different codes, so I've fixed that too. Also, in the percent figure, I've exluded the part of the West Lancashire green belt that is not on land, so this gives an accurate figure now. You can see from the image below that part of the green belt goes into the Ribble Estuary.

Technical stuff: I did this in QGIS 2.8 (open source GIS software) using the Atlas tool and a very heavy laptop, plus a bit of trickery I picked up here and there. I blogged about this before, with a little tutorial. Perhaps I should actually be using the term 'green belts', as Richard Blyth pointed out, but forgive me for this.