Train times v. house prices: the commuter belt, on a graph

Posted by on Oct 13, 2011 in life-hacking, transport, visualisation | 11 Comments

We’re house-hunting. And for me, like most coders, house-hunting involves lots and lots and lots of screen-scraping.

As well as crawling Rightmove listings, I’ve been looking at transport and house-price data. Specifically, I’ve scraped travel times to London by train versus house prices, to examine the theory that houses get much cheaper once you escape the commuter belt.

To test this, I gathered mean journey times to London from Traintimes for every railway station in the UK, and mean asking prices for 3-bed houses near each station from Nestoria. Here’s the graph of all stations, with a moving-average line added:

Waiting for graph to load…

Mouse over the graph to see data for individual stations. Or type a station name to highlight it on the graph:  

Thoughts on the graph

  • The sharp initial drop, up to about 30 minutes, must show just how much extra you pay to live in zone 2 rather than zone 6 of London itself. Yikes.
  • Prices do start dropping more steeply about 70 minutes from London, which probably marks the edge of the commuter belt.
  • Once you get to about 150 minutes, prices flatten. Except…
  • …There’s a distinct “Edinburgh bump” at about 270 minutes from London, which I wasn’t expecting at all.
  • There are a few high outliers, presumably where a mansion has skewed the average price. (It’s difficult to tell from the Nestoria data.)
  • But there’s a striking baseline below which house prices near a station never fall. Actually, pretty much the closest thing to an outlier on the downside is poor old Corby.

About the data

For clarity, the graph excludes London stations, and the long tail of stations that are 400-900 mins from the capital, mostly in the Scottish Highlands.

This is roughly what I did:

  • Find and geocode the 2500+ stations in England, Scotland and Wales, from this Guardian version of Office of Rail Regulation station usage data.
  • For each station, find the mean travel time for the first 5 journeys to London after 8am on a weekday, scraped from TrainTimes, Matthew Somerville’s accessible version of National Rail Enquiries.
  • For each station, find the mean asking price for a 3-bed house within 2km in the past 6 months, from the Nestoria API. (Nestoria shows listing prices, rather than transaction prices like Zoopla, so it may contain duplicates and is probably less accurate – but Zoopla isn’t granular enough to search just for 3-bed houses.)
  • Plot the moving average price, with a frame of 100 datapoints.

This is the code I used (on Github), and the resulting raw data (in Fusion Tables). The next logical step would be to plot distances against house prices, I guess. If I’ve missed anything, let me know.

And with that, back to the screen-scrapers, the mortgage brokers and – God help us – the estate agents.





11 Comments

  1. Duncan Stott
    October 14, 2011

    Very interesting graph.

    Are you able to calculate the Spearman rank correlation coefficient on this data?

    Reply
  2. Tim M
    October 14, 2011

    Nice move, you could also scrape London Underground info using the Travel Planner on the TfL gov website – so for example you have Harrow and Chorleywood, but the Met Line covers the same area servicing many more towns (Met Line also publishes a timetable but other tube lines don’t do so). And you’d get bus journeys too !

    Reply
  3. Adam Trickett
    October 14, 2011

    First I’d like to say that this is very cool.

    Second it’s interesting what it shows, three stops on the same line, Hook, Basingstoke and Overton are quite interesting. Of the three I think locals would say that Overton is the more desirable and it has fast trains to London but it’s the furthest from London. Hook is the closest to London and has some nice outlying houses but isn’t as desirable as Overton and it’s on the slower stopping service to London. Basingstoke is the town in the middle, it has more houses, more trains and lots of fast trains, BUT it’s Basingstoke – locally known as Boringstoke or Basingrad and your data does show that house prices are noticeably down compared with the two villages either side…

    Reply
  4. Nick Barnes
    October 14, 2011

    See also: Mapumental. http://demo.mapumental.com/

    Reply
  5. Clare
    October 16, 2011

    That is very useful.

    Stoke-on-Trent comes out quite well down there with Wales and the suburbs of Birmingham, but since you can get 3 bedrooms from 35,000 and Virgin trains are faster at 93 minutes than the slower but very cheap London Midland, I hope we’ll see you in a lovely spacious terrace in bohemian Burslem or pretty Penkhull soon :)

    Reply
  6. Jk
    October 16, 2011

    Nice but I don’t think People In edinburgh commute to london, It Had Utas Own economy

    Reply
  7. Paul Bradshaw
    January 7, 2012

    I particularly liked the 88k house at Duddeston, for a 109-minute journey to the capital. Would love to see the marketing blurb to frustrated metropoles…

    Reply
  8. Rachel Pearce
    January 28, 2012

    Love it! Very interesting. A further modification (not sure how…) might reflect fares. e.g. I live between Chesterfield and Matlock. Trains from Matlock are much slower, and house prices a bit higher, but train fares are also cheaper from there, and parking at the station is also cheaper in Matlock.

    Reply
  9. Richard Wallis
    February 3, 2012

    A very impressive demonstration of the value gained from mixing several raw data sources to deliver a valuable end result, and also a demonstration of how house hunting is morphing in to a science ;-)

    I wonder if the Edinburgh bump is an indication that perhaps the distance calculation should be from the nearest large conerbation with a major rail terminal – or will that just give you a London bump at the other end of the graph?

    Reply
  10. House-hunting with data « kasabi
    February 7, 2012

    [...] for a while now (mainly because of her work on the Open Domesday Project), but I somehow missed her fantastic life-hack linking house prices with distance from London. On one hand, it shows what coders do to solve [...]

    Reply
  11. Gareth
    February 22, 2012

    This is very cool. My love of stats knows no bounds. Are these graphs are great

    But I didn’t learn much from this. I have issues with your main idea – London isn’t the center. Only a tiny number of people commute from Scotland to London often. And the housing regulations are different there too – a house often goes for much more than the asking price, unlike here in England.

    I suspect the standard deviation here is big and shows only a fleeting likeness of distance to house price.

    Gareth

    Reply

Leave a Reply

*