Monday, 20 July 2015

Urban footprints: some building outline data sources

This is an informational post about where to find building outline data, which I've used a lot in previous GIS projects. It might also be of interest to architects, engineers and anyone interested in urban studies and planning more generally. I like using this kind of data to explore cities as it gives us a good idea of the layout of the urban fabric, as in the example below (New Orleans). The links mainly refer to data from the US, Canada and Great Britain but other parts of the world are covered to various extents by OpenStreetMap.

New Orleans


Let's start big, with OSM... Steve Bernard has produced an excellent video which explains how you can get OpenStreetMap data directly into QGIS very simply - he uses Madrid in the example. The accuracy and coverage varies a great deal across the world, so you need to bear this in mind when downloading and using it - but on the whole it is a fantastic resource. The example below shows Mogadishu, where the coverage is incomplete for buildings but pretty good for the road network. 

© OpenStreetMap contributors

Another useful OSM-related resource with decent global city coverage is CAD Mapper, where you can download areas up to 1km square for free. However, I'm focusing on open data today so will not go into detail on this. The best OSM download source is I think GEOFABRIK (German for 'geo factory), a German GIS consultancy who extract and process OSM data and then make it available for free online. It's really nicely structured and easy to find what you're looking for. Here's the download page for New Zealand, for example - followed by the contents, where you can see the building data on top of a current OSM base map. At time of writing, the zipped shp folder for the whole of New Zealand was 146MB.

The New Zealand GEOFABRIK download page (20 July 2015)


Auckland, NZ - very good building coverage here


The OSM sources are great, since the licence is very generous and you can use the data for just about anything, so long as it's properly cited. However, many towns, cities and counties across the world also provide building footprint or outline data (the terminology varies from place to place) so I've put together a list below of ones I know about. Some of them (e.g. Detroit, NYC) cover land parcels or tax lots so are slightly different but in the main it's just building outlines. I've included visuals for some of the datasets, so you can get an idea of what they look like.


New York City - from the BYTES of the BIG APPLE website you can download the MapPLUTO dataset, for all 5 Boroughs in New York City. Tax lot level rather than building outlines, but it's an extremely rich dataset with loads of useful land use planning variables in it, including 'year built' and number of floors. A little sample of the data are shown below, for the area around Central Park.

A little sample of the data (using Qgis2threejs)

Chicago - the building footprints layer is avaiable in two versions online, one of which says it is deprecated but I've heard from the Chicago GIS team that this isn't the case. It's just that due to limited staff the dataset is only edited when necessary. Also contains a 'year built' and height variable.



San Francisco - another really good city buildings dataset, from SF OpenData. Also lots of useful variables in this dataset, including height. I really like this one.



Dallas - you'll probably get a disclaimer box in a pop-up when you go to download this. I've linked to the general GIS page and the file you want is called Structures (Building Footprints) in the Planimetric Data section - it's about 81MB to download and the unzipped file is well over 100MB.


Atlanta - again, I've linked to the GIS page, this time from the City of Atlanta and you need to download the 'Impervious Buildings' layer. If you're looking to map the sprawl of Atlanta, this won't work as it covers the City area only. Still, a very useful dataset.


Denver - excellent open data from Denver. This dataset covers all permanent structures and buildings for a 152 square mile area of the City and County of Denver. Available in a number of different formats.


Seattle - this dataset was created in 2009 by Pictometry International Corp but is now in the public domain. It is available via the City of Seattle's data website.


Los Angeles - this is a fantastic dataset for the County (not just the City) of Los Angeles, which is the most populous county in the United States (just over 10 million). Made available via the LA County GIS Data Portal. It is a little hefty (581MB) so be careful! In the example below I show all the buildings in LA County but the City of Los Angeles in dark shading, just to emphasise its crazy shape.


Boston - this was created in 2012 and is available via the City of Boston. Contains a number of different fields, including base elevation of the structures, the elevation of the highest point above sea level and fields on building type.



Detroit - like New York, not strictly a buildings outline file but instead a property lot level dataset. Very impressive dataset produced by Data Driven Detroit's Motor City Mapping project. I've used this data a lot in talks and teaching as it's a really good example of its type.



Now some links to further datasets which I know of but haven't used that much...

Washington DC - link is to the download page, but direct link to zip is here (559MB unzipped)



Baltimore - the top link on this page

Philadelphia - via OpenDataPhilly

Massachusetts - buildings for a wide range of towns and cities in the state

Boulder - this is from Boulder County, Colorado. Available in a number of different file formats.

Bloomington, Indiana - one of many smaller cities with excellent geodata

New Orleans - an excellent dataset, not just because of the unusual shape of the city!



Toronto - don't be confused by the '3D Massing' terminology here. Scroll down to the 'Data download' section

Vancouver - doesn't cover the whole city and they were digitised in 1999 but still a useful dataset.

Waterloo - this is from the Region of Waterloo and was up to date as of January 2014.

Hobart, Tasmania - an nice example of building data from Hobart in Australia. Contains a 'year constructed' variable.



Wellington, NZ - can't overlook New Zealand! I think you need to register to download this but it's Creative Commons 3 so still open. 


The list wouldn't be complete without mentioning OS OpenData for Great Britain, provided by Ordnance Survey. A new dataset with detailed buildings became available in March (the OS Open Map - Local) dataset. The building data is a very small part of this collection but one I find very interesting. I've patched together a few cities here to get the ball rolling but you can download your own. There's also a 'tile finder' to help you identify which OS tile you need to cover your area of interest. 


This could save you some time 


I think this just about covers it. Get in touch if you have any other great data sources for building outlines.


Sunday, 12 July 2015

Mapping the Polycentric Metropolis: journeys to work in the Bay Area

I’ve recently been writing and thinking about polycentric urban regions, partly because I’m interested in how places connect (or not) for one of my research projects, and partly because I’ve been experimenting with ways to map the connections between places in polycentric urban regions. There was quite a lot of the latter in Peter Hall and Kathy Pain’s ‘The Polycentric Metropolis’ from 2006 but given that the technology has moved on a little since then I thought I’d explore the topic in more detail. Mind you, I’ve also been looking back on Volumes 1 to 3 of the Chicago Area Transportation Study of 1959 as a reminder that technology hasn’t moved on as much as we think – their ‘Cartographatron’ was capable of mapping over 10 million commuting flows even then (though it was the size of a small house and required a team of technicians to operate it – see bottom of post for a photo).

Are you part of the big blue blob?

Anyway, to the point… What’s the best way of mapping polycentricity in an urban region? For this, I decided to look at the San Francisco Bay Area since it has been the subject of a few studies by one of my favourite scholars, Prof Robert Cervero of UC Berkeley. Also, a paper by Melanie Rapino and Alison Fields of the US Census Bureau identified the Bay Area as the region with the highest percentage of ‘mega commuting’ in the United States (traveling 90 or more minutes and 50 or more miles to work). Therefore, I decided to look at commuting flows between census tracts in the 9 counties of the Bay Area, from Sonoma County in the north to Santa Clara County in the south. I’ve used a cut-off of 30 miles here instead of the more generous 50 mile cut-off used by Rapino and Fields. I also mapped the whole of the United States in this way, but that’s for another day.

The series of maps below illustrate both patterns of commuting in the Bay Area and the different approaches I’ve taken in an attempt to capture the essence of polycentrism in the area. I don’t attempt to capture the misery of some of these commutes, since for that I’d need a different kind of technology. But, I do think the animations in particular capture the polycentric nature of commuter flows. If you’re represented by one of the dots in the images below, thanks a lot for taking part!

Let’s start with a simple representation of commutes of over 30 miles from San Francisco County (which is coterminous with the City of San Francisco). The animated gif is shown below and you can click the links to view the sharper video file (mp4) in your browser (so long as you're on a modern browser). The most noticeable thing here is the big blue blob© making its way down from San Francisco to Palo Alto, Mountain View and Cupertino in Santa Clara County. In total, the blue dots represent just over 15,000 commuters going to 803 different destination census tracts. I’m going to take a wild guess and suggest that some of these commutes are by people who work at Stanford, Google and Apple. But it probably also includes people working at NASA Ames Research Center, Santa Clara University and locations in San Jose. 

View video file in browser - or click image to enlarge gif


These patterns aren’t particularly surprising, since there has been a lot of press coverage about San Francisco’s bus wars and commutes of this kind. However, there is a fairly significant dispersal of San Francisco commuters north and east, even if the numbers don’t match those of the big blue blob. By the way, from San Francisco it's about 33 miles to Palo Alto, 39 miles to Mountain View, 42 to Cupertino and 48 to San Jose. 

The first example above doesn’t reveal anything like the whole story, though. There are actually quite a lot of commuters who travel in the opposite direction from Santa Clara County to San Francisco but more widely the commuting patterns in the Bay Area – a metro area of around 7.5 million people – resembles a nexus of mega-commuting. This is what I’ve attempted to show below, for all tract-to-tract connections of 10 people or more, and no distance cut-off. The point is not to attempt to display all individual lines, though you can see some. I’m attempting to convey the general nature of connectivity (with the lines) and the intensity of commuting in some areas (the orange and yellow glowing areas). Even when you look at tract-to-tract connections of 50 or more, the nexus looks similar.

Click image to view larger version

Stronger connections - click image to view larger version


If we zoom in on a particular location, using a kind of ‘spider diagram’ of commuting interactions, we can see the relationships between one commuter destination and its range of origins. In the example below I’ve taken the census tract where the Googleplex is located and looked at all Bay Area Commutes which terminate there, regardless of distance. In the language of the seminal Chicago Area Transportation Study I mentioned above, these are ‘desire lines’ since this represents ‘the shortest line between origin and destination, and expresses the way a person would like to go, if such a way were available’ (CATS, 1959, p. 39) instead of, for example, sitting in traffic on US Route 101 for 90 minutes. According to the data, this example includes just over 23,000 commuters from 585 different locations across the Bay Area. I've also done an animated line version and a point version, just for comparison.

Commuting connections for the Googleplex census tract

Animated spider diagram of flows to Mountain View

Just some Googlers going to work (probably) mp4


Looking further afield now, to different parts of the Bay Area, I also produced animated dot maps of commutes of 30 miles or more for the other three most populous counties – Alameda, Contra Costa and Santa Clara. I think these examples do a good job of demonstrating the polycentric nature of commuting in this area since the points disperse far and wide to multiple centres. Note that I decided to make the dots return to their point of origin – after a slight delay – in order to highlight the fact that commuting is a two way process. The Alameda County animation represents over 12,000 commuters, going to 751 destinations, Contra Costa 25,000 and 1,351, and Santa Clara nearly 28,000 commuters and 1,561 destinations. The totals for within the Bay Area are about 3.3 million and 110,000 origin-destination links.

Alameda County commutes of 30+ miles mp4


Contra Costa County commutes of 30+ miles mp4


Santa Clara County commutes of 30+ miles mp4


Finally, I’ve attempted something which is a bit much for one map, but here it is anyway; an animated dot map of all tract-to-tract flows of 30 or more miles in the Bay Area, with dots coloured by the county of origin. Although this gets pretty crazy half way through I think the mixing of the colours does actually tell its own story of polycentric urbanism. For this final animation I’ve added a little audio into the video file as well, just for fun.

A still from the final animation - view here

What am I trying to convey with the final animation? Like I said, it's too much for a single map animation but it's kind of a metaphor for the messy chaos of Bay Area commuting (yes, let's go with that). You can make more sense of it if you watch it over a few times and use the controls to pause it. It starts well and ends well, but the bits in the middle are pretty ugly - just like the Bay Area commute, like I said.

My attempts to understand the functional nature of polycentric urbanism continue, and I attempt to borrow from pioneers like Waldo Tobler and the authors of the Chicago Area Transportation Study. This is just a little map-based experimentation in an attempt to bring the polycentric metropolis to life, for a region plagued by gruesome commutes. It’s little wonder, therefore, that a recent poll suggested Bay Area commuters were in favour of improving public transit. If you're interested in understanding more about the Bay Area's housing and transit problems, I suggest watching this Google Talk from Egon Terplan (54:44).


Notes: the data I used for this are the 2006-2010 5-year ACS tract-to-tract commuting file, published in 2013. Patterns may have changed a little since then, but I suspect they are very similar today, possibly with more congestion. There are severe data warnings associated with individual tract-to-tract flows from the ACS data but at the aggregate level they provide a good overview of local connectivity. I used QGIS to map the flows. I actually mapped the entire United States this way, but that’s going into an academic journal (I hope). I used Michael Minn’s MMQGIS extension in QGIS to produce the animation frames and then I patched them together in GIMP (gifs) and Camtasia (for the mp4s), with IrfanView doing a little bit as well (batch renaming for reversing file order). Not quite a 100% open source workflow but that’s because I just had Camtasia handy. The images are low res and only really good for screen. If you’re looking for higher resolution images, get in touch. It was Ebru Sener who gave me the idea to make the dots go back to their original location. I think this makes more sense for commuting data.

The Cartographatron: Information and images on the 'Cartographatron' used in the Chicago Area Transportation Study (1959) are shown below.


From p.39 of CATS, 1959, Vol 1


From p.98 of CATS, 1959, Vol 1



Monday, 22 June 2015

Where is all London's new housing?

Some of my recent work on housing markets, mortgage lending and housing search has led me to consider the question of where, exactly, London's new housing is located. On a recent visit to King's Cross I was amazed by the sheer scale of development, particularly all the new flats. Because I've been working with the data for another project - and recently re-examined it for a project proposal which explicitly didn't focus on London - I thought it would be interesting to see whether my perception of the flats boom is based in reality. Of course it is! 

The maps below are based on all new build homes sold in London from 1 June 2010 to the end of April 2015 (the most recent data). During this time, according to HM Land Registry 'price paid' data, there were 42,938 transactions on 42,813 properties. This indicates that quite a few properties are not being picked up in this dataset - e.g. compare it to the completions data from the London Datastore. Nevertheless, the patterns and distribution of property types is revealing. 88.8% of transactions were for flats, 7.1% for terraced houses, 2.4% for semi-detached properties and 1.7% for detached houses.

All property types

Flats

Terraced houses

Semi-detached houses

Detached houses


Clearly, the mix of new housing - and its relative low volume - is something that many people have commented on before, but I've not seen many people look at the geographical distribution in this way. The important questions arising from these maps - as ever - is why are things the way they are? That's something the maps can't tell us but it does provide an interesting starting point for debate. The discrepancy between the Land Registry data and data on completions is also not surprising owing to the way new build housing is sometimes sold, but it would be interesting to explore this more in future. If you do happen to have a few million quid to spare, good luck finding a new detached London house to live in!

A note on the maps... I've geocoded the price paid data using Ordnance Survey's Code-Point Open dataset, which can match sales to postcode units, rather than street addresses. The transparent bubble map is of course far from perfect but I've used it here to convey the scale and location of new housing, rather than to offer a precise fix. So long as it gives the impression of there being a massive splodge of newbuild flats in Central London that'll do for now. I am aiming to highlight the general scale and geography of development as a fairly quick experiment to see what might be done with the data. No plans to make it interactive (update: the best laid plans of... see map point datadump below).



Update (1715, 22 June 2015): I fixed the glitches, which were caused by a rogue space here and there in the codepoint open file. Moral of the story? Build more houses (I think). Always a giveaway when there aren't many dots in Wandsworth. The numbers were correct all along though. Finally, I've added in some information from HM Land Registry on properties not included in the price paid data.


Data excluded from Price Paid dataset - link




Friday, 19 June 2015

Creating an English green belt atlas

UPDATE: I've fixed the glitches in version 1 and compiled a spreadsheet with the data. See new download at the bottom of the post.

I've blogged before about green belt, and also written about the underlying data in the press. Now that the data are open, I've finally got round to finishing a little project I meant to complete ages ago. I was prompted to do this during a recent visit to my department by Prof Bob Barr, a legend in the data and GIS worlds. Bob said it would be good to know what percentage of the land area in each local authority in England was covered by green belt. I agree, so here are the results of my analysis (using 2014 green belt data) from Version 1 of my English green belt 'atlas' (actually lots of individual images to keep the file size down). Here's a snapshot of one of the maps...

Green belt land in Cheshire East

And another, this time from Birmingham. You can see that I've dimmed the background so that you can get a sense of other green belt land in the areas I've mapped.

Birmingham green belt land

Finally, a few more from around the country...















There are some glitches in the data but my initial overview suggests the numbers are pretty accurate (see exceptions below). I hope that people might find these maps useful. If you want to use any of them, be my guest.

Download all the files here (154MB): Green Belt Atlas 2014 (version 2) (186 individual map files, plus spreadsheet)

Download just the spreadsheet: percent green belt figures for each of the 186 local authorities:

Contents of the spreadsheet (download above)


Warnings: A couple of issues with version 1... 1. The West Lancashire greenbelt area extends into the sea on the green belt shapefile available from DCLG, so the figures here are incorrect (working on a fix). 2. The figure for Ashfield is clearly wrong - not sure why, so I will fix that too. 3. Some areas have extremely low values and may not actually be in the green belt - it may instead be down to the accuracy of underlying data. 4. Mole Valley currently missing, am looking into why. UPDATE: I looked again at the original Green Belt shapefile from DCLG and found that Mole Valley had the same code as Ashfield, so I fixed that and there's now a map for Mole Valley. New Forest was also assigned two different codes, so I've fixed that too. Also, in the percent figure, I've exluded the part of the West Lancashire green belt that is not on land, so this gives an accurate figure now. You can see from the image below that part of the green belt goes into the Ribble Estuary.



Technical stuff: I did this in QGIS 2.8 (open source GIS software) using the Atlas tool and a very heavy laptop, plus a bit of trickery I picked up here and there. I blogged about this before, with a little tutorial. Perhaps I should actually be using the term 'green belts', as Richard Blyth pointed out, but forgive me for this.

Wednesday, 3 June 2015

The beating heart of the City of London

I've had a rush of blood to the head so here I am with a second blog in two days. I'm getting some slides ready for tomorrow's Modelling World 2015 talk in London, which is all about visualising mobility (see below) so I wanted to add in a couple of new visuals on commuting in and out of London. Visualisation can often be just a lot of fancy graphics. This can be useful in itself for a number of reasons (e.g. capturing attention on an important issue, drawing attention to unusual patterns in a dataset) but since I've been working with commuting data in England and Wales I wanted to focus on flows into and out of the City of London. 



This interests me for a number of reasons, including i) commuting can play a significant role in wealth creation and it also needs to be understood in relation to how we measure GVA; ii) commuting is often very stressful and damaging to the individual - particularly long commutes - so I'm interested in the kinds of distances involved and this can be seen easily on a map; iii) commuting can often be environmentally damaging - though this isn't what I'm mapping here; iv) commuting in and around London is often about green belt hopping so I was curious to see how much commuting comes from beyond the metropolitan green belt; and v) commuting is a two-way process and affects places at both ends and in between due to travel. 

So, here's what I did. I took the MSOA-level commuting data for England and Wales (table WUEW01 here), used a bit of QGIS, extracted frames from QGIS using the MMQGIS plugin, then patched it all together in GIMP to create an animated gif. One for inflows, one for outflows and one for in and outflows (thanks to Ebru Sener for the idea). It might run a little slowly in the blog post in a browser but see below for the images. Just to clarify, I've only shown flows of 25 or more into the City of London. Those not familiar with the data should be aware the the 'City of London' refers to the small area in the centre of London and not the entirity of Greater London! An obvious point but one worth repeating in case anyone is confused. A Greater London map would have many more data points, covering most of England.

Commuting flows (>=25) into the City of London



Same as above, but going back the way


The 'pulse' of the City of London

You should be able to get a better view of the images by clicking on them individually and if you want them to work more quickly try saving them to your own machine.




Tuesday, 2 June 2015

The Polycentric South East

Tweets yesterday from Michael Edwards and Joseph Kilroy reminded me of a map I produced last year in which I showed commuting patterns in South East England, minus London. I did this to get a sense of the polycentric nature of travel to work in the South East as this has been a topic of many previous studies - including the famous Hall and Pain book - but none (to my knowledge) using the 2011 Census data I mapped. The other reason for me blogging about this today is that I'm speaking on a similar topic at Modelling World 2015 this Thursday in London. Enough words, time for some maps, which I've refreshed for this week.

The first map below shows all commuting in the South East of England in 2011, without place names. As you can see, I've removed London from the equation, both in relation to travel to work flows and from the underlying map canvas. This gives a slightly different perspective than the one we're used to. The second map is the same as the first but I've added the names of local authorities in order to help identify places. Click any of the images to enlarge.


Commuting in South East England, 2011


Same as above, but with labels

Now, here's what it looks like when you add London back in... Kind of brings to mind astronomical metaphors, as hinted at in a previous study by the RTPI. I should add that the definition of a supernova is 'a star that suddenly increases greatly in brightness because of a catastrophic explosion that ejects most of its mass' so this might be stretching things slightly... Then again, if what people are saying about the displacement of the poor from London this might actually be spot on.

The 'London Supernova'


Finally, I've produced a zoomed-in version closer to London where you can see some of the flows which go through/over the capital. I don't fancy that commute!