Tuesday 26 November 2013

A World of Open Access

I've recently been appointed as one of the Editors-in-Chief (with Alex Singleton) of a new Taylor and Francis/Regional Studies Association open access journal. For the two years prior to the launch of Regional Studies, Regional Science I was also heavily involved in the research and development of the journal. As we're about to start publishing papers, I thought I'd blog on the topic of open access more generally and include some interesting data from the Directory of Open Access Journals, the most authoritative source of open access information on the web. Those with an interest in open access will of course know all about DOAJ and the fact that there are now nearly 10,000 open access journals in the world. 9,990 to be exact (as of 26 November 2013).

However, few people have looked closely at the data on open access; probably because most people are still in debate about the merits and pitfalls of open access itself. The simple fact is that open access publishing is having a major impact on academia and the biggest journal in the world (by volume of papers) is now PLoS ONE, an open access title, as documented by several commentators including Heather Morrison at the University of Ottawa. Now for some data and charts...

The United States has over 1,200 open access journals and seven countries account for more than 50% of all open access journals. Journals come from a total of 124 nations (click charts to enlarge):

by country - full size chart


Open access publishing is currently in a major phase of expansion, but it's not new - for example, 22 new titles were launched in 1985. The peak year to date has been 2011 with 1,099 titles being launched:

by year started - full size chart


The vast majority of open access titles do not charge a publication fee (66%) but a substantial minority do (26%), including many of the best known titles:

by publication fee - full size chart

Your knowledge of, and exposure to, open access will vary greatly by discipline. There are over 500 open access titles in Medicine and more than 160 in Political Science but only 3 in Geology (according to DOAJ):

by discipline - full size chart

The majority of open access titles publish only in English (5,538 or 55%), with the next closest language of publication being Spanish (621 or 6%):

by language - full size chart

The data these charts are based on can be downloaded directly in CSV format from the DOAJ FAQ page. Just scroll down and look for the section entitled "How can I get journal metadata from DOAJ?".

There's a lot of activity in the field of open access but it is highly unequal in terms of its geographic, disciplinary and linguistic distribution. In the subject fields of Regional Studies and Regional Science (the subject areas for our new title) the open access landscape is considerably less crowded - particularly in relation to titles supported by major international learned societies and multinational publishing houses. Given this situation, we expect that Regional Studies, Regional Science (already known more commonly as RSRS) will play an important role in helping improve access to knowledge in regional research across a wide range of disciplines, with a focus on geography, planning and economics.

Look out for our first articles in December, by Andrew Beer (Adelaide, Australia) and Terry Clower (North Texas, United States), Sarah Ayres (Bristol, UK), John Gibney (Birmingham, UK) and Markku Sotarauta (Tampere, Finland).


Tuesday 10 September 2013

The Age of Buildings in the City of Chicago

Following my last post, on the geography of New York City, I've been exploring other building-level datasets to see what they can offer us in relation to telling us more about the fabric of the cities we live in. This time, I've focused on Chicago's 'Building Footprints' dataset. It's not nearly as detailed as New York's PLUTO data but it does include variables on (e.g.) number of floors and year built. As with the NYC data, it is not perfect but we can still make good use of it to understand the development of the city and its structure. I've mapped the city using number of floors as a proxy for height and shaded it by building age to produce the following overview (blues = older buildings, reds = newer).


Besides looking relatively interesting, the above graphic also reveals something about the phased construction of the City of Chicago and - possibly - something more about the data itself. As with the New York City PLUTO data, I produced a chart of the 'year built' column just to give me some idea of its distribution. It looks better than the New York City chart but I'm still not convinced it is 100% accurate (the year built data run from 1852 to 2010). 


Were there really nearly 15,000 buildings constructed in 2006 and only 78 in 2000? Possibly, but it would be good to know more about the accuracy of the data. In total there are 820,154 building footprints in the dataset and there are a range of different columns - which you can read more about in the metadata file. Once again, it's pretty cumbersome to work with in a normal desktop GIS setting but my machine can just about handle it. 


Thursday 5 September 2013

The Geography of New York City

In June 2013, the city of New York released as open data one of the most detailed, fascinating and user-friendly datasets ever. The Property Land Use Tax lot Output (PLUTO) dataset is essentially a record of every parcel of land in the city, what is on it and who owns it - but this is only part of it. See the full PLUTO data dictionary for more on this. Wired said the mapping elite were 'drooling' over it and there have been a few impressive visualisations already but I was keen to look at the data in more detail and then map land use patterns and get to grips with the dataset more generally. So, as an initial experiment, I mapped all 11 land use categories for the whole city in 3D (PLUTO has a field for number of floors so the maps below are extruded on this basis). Click on an image to enlarge and then flick through the images to compare land uses.












I've also put these images in a PowerPoint file in case anyone finds it useful... These visualisations in many ways tell us what many New Yorkers already know but the PLUTO data (n.b. I've used the ready-made MapPLUTO shapefile) offers everyone for the first time the opportunity to explore this open data and examine the geography of New York City as a whole in much more detail. 

Some further information about the dataset. There are 857,879 rows in the complete dataset and the MapPLUTO version has 85 fields so if you want to work with it then you better have a good computer. When you go to the download page you'll notice that the PLUTO dataset is available as one csv file while the MapPLUTO data is split into the five boroughs of New York City. 

This is an amazing resource but it is not perfect - as the Department of City Planning recognise when they say 'PLUTO is being provided ... for informational purposes only'. The data are only as good as the sources, and sometimes when you look closely things seem a little strange. For example, here's what you get when you chart the YearBuilt column for all buildings constructed since 1800 (click to enlarge). It's hard to tell but I reckon that from about 1980 onwards the YearBuilt column is pretty accurate but before that is is something of a best estimate - though I'd be happy to be proven wrong on this!


I'll probably come back and explore this again soon but that's all for now...


Footnote: 0.4% of tax lots and 1.0% of land remains unclassified. I produced the 3D maps in ArcScene and then annotated them in GIMP. I've just done these to explore at a basic level the characteristics of the dataset and the geography of land use in New York City.

Wednesday 28 August 2013

Natural Earth for GIS data

People often ask me where they can get GIS data to use for projects, analysis and general mapping. In the UK we now have OS OpenData, which is very nice and very detailed. There's also a new GIS portal from the Office for National Statistics - built using the Geoportal Server. Other GIS datasets are available, from organisations like Natural England, but in this post I thought I'd highlight the excellent - and totally free - Natural Earth site which is very well known in the geodata community but perhaps not more widely. It really is absolutely fantastic.


A little bit of technical information....

  • Natural Earth Vector comes in ESRI shapefile format, the de facto standard for vector geodata. Character encoding is Windows-1252.
  • Natural Earth Raster comes in TIFF format with a TFW world file.
  • All Natural Earth data use the Geographic coordinate system (projection), WGS84 datum +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
Whether you're looking for data to make a general world map or a more detailed local area map you will find what you are looking for here. It's not as detailed as OS data for Great Britain but then many GIS users don't need that level of accuracy. Another screenshot below showing the various download options...


Finally, taking my inspiration from the worst website in the world, and Ken Field's blog - I've made an absolutely awful map using some of this data*. Can anyone do a worse one?


*I actually did this as part of my work in developing some new GIS modules. I tried to pack in as much bad practice as possible just as an extreme example of what not to do.





Friday 16 August 2013

Mapping flows in ArcGIS

This short post is about the process of flow mapping in ArcGIS and not really about the end results - though the maps are quite interesting. I've done quite a bit of flow mapping in the past and am now getting ready to work on the next set of Census flow data in the UK (which should be out in November) so I've been experimenting with some tools. I've written about this in the past in papers in Computers, Environment and Urban Systems and also Environment and Planning B but those papers are a bit long-winded! Other people have produced beautiful flight path maps so I thought I'd experiment with the same data using the relatively new ArcGIS XY to Line tool in version 10.0 (it can be found in ArcToolbox - Data Management Tools - Features - XY to Line at the bottom of the list of tools). For more on other methods and previous iterations of this kind of thing take a look at the work of Nathan Yau, Michael Markieta or James Cheshire.


For anyone wanting to map flows in ArcGIS, Michael Markieta's tutorial is probably a good place to start but be prepared for things to go awry in ArcGIS... When I mapped the 59,000 or so flight paths in the map above using XY to Line and one single dbf file (or csv, etc. - it makes no difference) the resulting shapefile only contained 16,066 rows. This happened every time I tried it and a couple of times my shapefile had only 73 rows. Another time it had ~14,000. That's why Markieta recommends splitting the file up - although I just cut it up into chunks of 16,000. Interestingly, I ran into exactly the same problem with my CEUS paper a few years back using Glennon's flow data model tool - though the limit was about 32,000 before it cut off. 

Another very annoying feature of XY to Line (for me at least) is that when you choose the Great Circle option under 'Line Type' it takes much longer to compute and the resultant shapefile is enormous. The shapefile for flight paths in the above map is about 10MB whereas the great circle version was over 450MB for one 16,000 chunk alone. Not sure if anyone else has run into this but it doesn't seem like a very efficient way of doing things! [Edit - as @baeing has reminded me, it's because shapefiles don't support curves - though geodatabases do.]

Once I had my complete shapefile I moved to QGIS, added in a world layer from Natural Earth and then experimented a little with symbology. I also experimented with different styles and projections to produce some of the maps below. That's all for now - I just hope ESRI are able to improve upon the current version of XY to Line because when it does work it is really fast (on my machine at least) and straightforward.

Very similar to above, minus text

Short haul, different projection

Short haul, different projection, borders

Slightly different symbology

And, yes, I know that flights from Australia or New Zealand typically go over the Pacific rather than the long way round! I'm just showing the XY to Line outputs as they are in this post.

Thursday 8 August 2013

Employee Growth in London, 2001 to 2012

The Office for National Statistics has released a new dataset on the number of employees across London's 983 MSOAs. The data are sourced from the Inter-Departmental Business Register (IDBR) and they reveal some interesting trends. Naturally, I had to do a 3D map of this, so take a look at the image below for the obvious growth points...

Massive absolute growth in the City of London and Canary Wharf - and some other central MSOAs in Camden, Southwark and Westminster, but also massive growth in employment in Uxbridge.

Click on the image to enlarge

The number of employees in the City of London increased by 36%, compared to 267% in Canary Wharf and 170% in an Uxbridge MSOA. By contrast, one part of Islington had 78,600 employees in 2001 but only 57,000 in 2012 - a drop of 27%. This area of Islington is immediately north of the City of London and includes Clerkenwell and Finsbury.

If you're interested in this kind of thing it's definitely worth looking at the original dataset.


Thursday 25 July 2013

Stratospheric London

I've recently been looking at the Land Registry's price paid data for England and Wales. This is now open data and the Land Registry recently published all data going back to 2009 and by November 2013 they have promised to release all price paid data going back to 1995. I've been looking at the period from January to May this year to see - at a very basic level - what is happening in terms of the total volume and value of sales across England and Wales. As expected, this simply tells us that the London property market is 'stratospheric', as the Evening Standard recently reported. A few summary stats and charts before I go on holiday...

  • Total value of sales in England and Wales between January and May 2013: £51 billion.
  • Total value of sales in Greater London: £14 billion (28% of the total)
  • Total value of sales in Westminster: £1.6 billion.







Explore the data and charts for yourself in the spreadsheet I put together here. The original data comes from the Land Registry's price paid data. I've used the original source files to produce some summary stats by Local Authority, City/Town and County. If you want to know more about the data, make sure you read the Land Registry's FAQs

Data produced by Land Registry © Crown copyright 2013.

This data covers the transactions received at Land Registry in the period 1 January 2013 to 31 May 2013. © Crown copyright 2013.

Monday 10 June 2013

The London Problem Revisited

I've been thinking about the matter of London and the kind of 'two-speed Britain' stories which have been in the press recently. The idea is that London's continued growth and prosperity - in contrast to the rest of the UK - is a problem. The BBC covered this via Stephanie Flanders in March and the Guardian did a piece on it in May this year. The Economist even did a special supplement on London last year, entitled 'On a High' in which they covered a range of issues, not just economic growth - e.g. a map and some of my work on deprivation on p. 7 of the pdf. These kinds of stories always generate a lot of debate (see the comments on the Guardian piece for an example) of the 'London vs. the rest' variety but I suppose the thing that sticks in my mind is that none of this is really new. The precise nature of London's recent resurgence is perhaps unique but it's hardly a new problem. I was reminded of this last week at the Centre for Urban Policy Studies 30th Anniversary conference in Manchester when Sir Brian Briscoe quoted from the Barlow Report of 1940 (photo below - yes, I actually have a real copy to hand!). 


This in turn reminded me of the 1985 edition of the Planner magazine that I've had on my desk for some time. Why do I have this? Because it's a special issue to commemorate the life of former Chief Planner and University of Sheffield Professor JR James, who featured on the front cover of the April 1985 edition, five years after his untimely death in September 1980. In this edition, leading planners of the time gave a series of lectures on seminal reports - Peter Hall on The Barlow Report, Gerald Wibberley on The Scott Report and HR Parker on The Uthwatt Report.





Peter Hall's reading  of Barlow is fascinating for many reasons but I found it interesting that he chose to pick out the exact same quote that Sir Brian Briscoe did last week: "the continued drift of population to London and the Home Counties constitutes a social, economic and strategical problem which demands immediate attention" (p. 202 - see below for copy of the original, in paragraph 5).


These words were written in very different times (e.g. it was published shortly after the outbreak of WW2 and in fact some of the content had to be changed to reflect this) but the over-arching message of a dominant London and a lagging rest of Britain resonates today. The policy solutions proposed in 1940, however, are rather different to those suggested today. For example, there was a strong emphasis on regionalism in the Barlow Report - unlike today. Further, there was a real concern about what could be done to address this nationally problematic uneven development (well, at least in theory) - e.g. Part IV from p. 185 is entitled 'Remedies ... And to Report what Remedial Measures, if any, should be taken in the National Interest'. So, this is not just an historical curiosity. On p. 197 we have more on 'methods of decentralisation or dispersal', which many today would like to see.

As for the man himself, Sir Anderson Montague-Barlow (image at bottom of page), I'm not sure what he'd make of the current rhetoric of a 'two-speed' UK but I'm sure he'd recognise that it's nothing new. The question that remains in my mind is whether this London boom will continue indefinitely and what the consequences of that might be. Some final thoughts for now...

1. Lost in much of this two-speed UK talk is the fact that inequality and poverty in London are quite extreme. That's what I was getting at in my very small contribution to the Economist supplement last year and in another Guardian piece. This is something that Ben Hennig has been looking at in more detail in recent times - see here for more.

2. Much less has been said about why this London/rest issue might be a problem and - more importantly - what might be done about it. Certainly, I don't think we'll see a Barlow-style Royal Commission about this any time soon.

3. What about the contribution of the rest of the UK to London's growth - e.g. in sending graduates, commuters, funding for infrastructure, and public transport - to name just four factors. The question here might be to what extent is London's growth down to the fact that it has a Barlow-esque gravity? Easy to answer but harder to quantify.

4. Are we witnessing a 'London supernova'? Will this bright shining star gradually fade? Fancy rhetoric, but worth thinking about, given the cyclical nature of boom and bust. See p. 52-54 of this RTPI report for more.

5. And, finally, we ought to look at old books and magazines more! 


Friday 3 May 2013

HS2 geodata - for download

Yesterday I wrote a short post on the Guardian's Datablog about my difficulties getting hold of the route data for the proposed routes for the new high speed rail lines in England. Coincidentally (or maybe not) HS2 responded to my request at almost exactly the same time as the piece appeared online. Anyway, sometimes it does take time for public bodies to respond to requests so my real question was why the shapefiles were not available for download, given that they are available under the Open Government Licence. I have a few ideas about why this must be but it would be good to have some information from this on HS2, though maybe they're too busy with other things! Clarity on this issue might, however, reduce the likelihood of data conspiracy theories and enhance transparency.


Anyway, enough about that. So that other people don't have the same wait as me to get hold of the GIS data I've made them available here via the link below. A few important points to bear in mind...

1. The Phase 1 (London to West Midlands) route is the 'post-consultation' route from January 2012.
2. The Phase 2 (Leeds and Manchester) routes are the 'initial preferred routes' from January 2013.
3. There is an interactive map of the Phase 1 route on the HS2 web pages, which is quite useful.
4. Users of the data need to remember to acknowledge the source.
5. It's not my data - I'm just making it available.
6. You can also get these files from Barry Cornelius, but not - as yet - from data.gov.uk
7. The route data available for download here doesn't necessarily reflect the precise location of where the train lines will be built - particularly for Phase 2. 


HS2 shapefiles, as of 2 May 2013


Wednesday 1 May 2013

Population 'explosion' in English city centres

A paper I wrote about English urban policy and the 'return' to the city is now out in Cities, so I thought I'd blog about it. It's just something I wrote after the first release of small area data from the 2011 Census and the results are not entirely surprising to those with a knowledge of these things but the scale of change over the decade from 2001 to 2011 was pretty big, particularly in the case of Manchester, which I've written about previously. I also reflect upon wider international issues associated with 'reurbanization' (with a z because it's a US journal!) so I think that although the focus is on England it should have wider resonance.


I discuss this growth in the context of New Labour's urban policies, and particularly those which emerged from the Urban White Paper of 2000. Although some of the aims were achieved (e.g. making city centres look nicer, with better design), I conclude that the changes have been mostly superficial and that perpetually high levels of inner city deprivation in cities which were the main foci for urban policy during this period does not represent a very positive legacy, despite the 'success' of getting people to move back to the city. The fact that this picked up immediately by @urbandata suggests that these themes are also relevant in the United States, and beyond. It's a short paper but I hope to follow up on it at some point* with more detailed data on tenure, etc. to dig a little deeper.


*This is normally code for 'it will never happen'!

Monday 8 April 2013

Deprivation in your area - search and zoom

After making a new website when the Scottish Index of Multiple Deprivation 2012 was released, I've been looking at some of my old English IMD web pages and tools and so have decided - in anticipation of the next update to the IMD - to add a 'search and zoom' tool so that anyone who is interested can find out what deprivation is like in their area - or any other area of England you are interested in. Like my previous versions of this kind of thing it's based on Google Maps but in the version seen in the screenshot below you can simply enter a place, neighbourhood, postcode or address (or even an organisation, like Centre for Cities) into the search box and it will go straight there after clicking 'search'. Click the image or the link below to use the tool.


As usual with these kinds of maps, if you click on an area the pop-up will give you more information. The next update of the IMD was to have been in 2013 but I'm not sure if that is still going to happen. However, I've updated this tool just in case so that I can add in the next update whenever it becomes available. The Scottish version of this tool is on this page, but I have not yet done the same for Wales or Northern Ireland.

Thursday 4 April 2013

Child poverty in the United Kingdom - for small areas

As I mentioned in my last post, I'm currently doing some work on child poverty and social mobility. I did this for English parliamentary constituency areas in 2011 but the problem I had then was that I didn't have a dataset that covered the entire UK at the small area level. Since then, however, I've been looking more into the HMRC's child poverty statistics and their revised local child poverty measure (linked to the Child Poverty Act 2010). To cut a long story short... I've mapped child poverty at the small area level for the whole UK - for those areas where child poverty is at 33.3% or higher (the UK average is 20.6%). The map is zoomed to London but you can zoom around and use the full screen version. Click on an area to find out more.




The highest value in the UK is in part of Springburn in Glasgow - with a value of 83.3% of children in poverty according to the HMRC definition. The highest value for anywhere outside Scotland is 75.0% in part of east central Manchester. 

Some technical information... The small areas used are Super Output Areas in England, Wales and Northern Ireland. In Scotland, Data Zones are used. These have around half the population of Super Output Areas (around 800, compared to 1600). Using these smaller areas means that Scottish areas dominate the list of the highest child poverty neighbourhoods in the UK. In the HMRC definition, 'children' are all dependent children aged 0 to 19. They also provide data on child poverty rate for children under 16 and this figure is usually about a half to one percentage point higher (e.g. 21.1% for under 16 compared to 20.6% for the whole UK). The data are from the most recent release - 2010.

Thursday 28 March 2013

Toddlers, teens and yoofs - where are they?

I've been doing some work on child poverty and social mobility recently, and part of this has involved looking at simple stats like the percentage of people in each area who are under 18. This has given me a better insight into which areas might be most affected by children's issues and policies. I've been looking across Great Britain but focusing on England and Wales and so I thought I'd just post a couple of maps showing the % of children across London Boroughs and the core cities - see below.


The results are quite striking - and particularly interesting I think are the differences between and within the core cities. What is also very interesting is the spatial divides in terms of which areas have a high concentration of children and which don't. These data overlap with many other datasets - which is partly what I'm looking at in my research right now - but this post is simply about the two maps.




One academic-related thought on this, though. I've been looking at the literature on neighbourhood effects over the past few years and following with interest the debate on where people 'choose' to live. This got me thinking about who might not get to choose, and children is one obvious group. As you can see from the maps there are large areas where 25% or more of the residents are children, and some areas where it is nearly 50%. Also interesting for me is the extent to which city centres in (e.g.) Liverpool, Birmingham, Manchester and Leeds are largely child-free - possibly linked to the concentration of students and new city centre living. Anyway, that's enough for now.

Technical note: this is done at the lower super output area level and for the local authority boundaries of the English core cities, plus all of Greater London. The data are from 2011 Census table KS102EW.

Friday 22 March 2013

The Highlands and London: lots of space vs. lots of people

I've been doing some work recently looking at local authority data from across the UK. One thing that always fascinates me about this is the differences between areas at a very basic level, such as the different sizes and populations of local authority and other sub-national areas. The two examples which always stand out in my mind are the Highland council area (because that's where I'm from) and Greater London (because it's so big and dynamic). The other reason the Highland council area stands out is because it is by far the biggest local authority in the UK. It's bigger than both Wales and Northern Ireland, yet it only has 232,000 people. London, on the other hand, would fit into the Highland council area about 20 times over yet it has 8.2 million people - as you can see from the map (click to enlarge - and see full screen here).


The map above shows Greater London and the Highland Council area mapped to the same scale, with London moved to the middle of the Highlands (not expecting this to become official Scottish Government policy any time soon, but you never know). Greater London is about 40 miles across but it is totally dwarfed by the Highlands. What is the relevance of all this? Well, it is interesting to think about this in the context of local government and the challenges they currently face.

Within Greater London you have 32 Boroughs plus the City of London all performing a variety of functions and in the whole of Scotland there are 32 Council Areas (which esteemed blogger Peter Matthews can list alphabetically off the top of his head). In the context of the ongoing fiscal crisis, there has been some talk of merging all local authorities in Scotland. I doubt it is going to happen but you never know. A Scotland on Sunday article on it from late 2012 seems to suggest there is some truth to the idea.


Thursday 31 January 2013

Which world map projection is correct?

A recent Google map puzzle got me thinking once again about map projections and the ways in which they can be used for web maps. Google maps uses a variation on the Mercator projection which, as we all know, dates from 1569. Most people who are into maps know that this particular projection makes areas close to the poles look bigger than they actually are. This distortion is an unavoidable problem when trying to project the surface of a big round object like the earth on to a flat surface. So, which world map project is correct? The answer is of course that none are 'correct' or possibly that there can be no 'correct'. Every projection compromises something. To demonstrate this I've mapped the world nine different times in the image below to demonstrate the impact of using different projections.


I've pasted the individual images below and you should be able to flick through them for comparison by clicking on one and then clicking to see each successive image. Two that I think are particularly effective are the Robinson and the Winkel Tripel. These have both been used by National Geographic. The former was their default projection until 1998 when they switched to Winkel Tripel, which was originally developed by Oscar Winkel in 1921. Equal area projections are of course used widely (e.g. Gall-Peters) but some suffer from extreme flattening at the poles, like the Behrmann one below. No matter what, none are really 'correct'.










Now I should get back to marking reports!