Tuesday, January 24, 2012

Geocode your data using, R, JSON and Google Maps' Geocoding API

First and foremost, I absolutely love the topic of Location Analytics (Geo-Spatial Analysis) and see tremendous business potential in not so distant future.  I would go out on a limb to predict that the Location Analytics will soon go viral in the enterprise space because it has the capability to WOW us. Look no further than your iPhone or an Android phone and count how many location aware apps you have. We all have at lease one app - Google Maps.  Mobile is one of the strongest catalyst for enterprise adoption of Location aware apps. All right, enough of business talk, let's get dirty with the code.


Over the last year and half, I have faced numerous challenges with geocoding the data that I have used to showcase my passion for location analytics.  In 2012, I decided to take thing in my control and turned to R.  Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.


To geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.

Here is function that can be called repeatedly by other functions:

getGeoCode <- function(gcStr)
{
  library("RJSONIO") #Load Library
  gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters
 #Open Connection
 connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="") 
  con <- url(connectStr)
  data.json <- fromJSON(paste(readLines(con), collapse=""))
  close(con)
#Flatten the received JSON
  data.json <- unlist(data.json)
  lat <- data.json["results.geometry.location.lat"]
  lng <- data.json["results.geometry.location.lng"]
  gcodes <- c(lat, lng)
  names(gcodes) <- c("Lat", "Lng")
  return (gcodes)
}

Let's put this function to test:
geoCodes <- getGeoCode("Palo Alto,California")

> geoCodes
           Lat            Lng 
  "37.4418834" "-122.1430195" 


You can run this on the entire column of a data frame or a data table:

Here  is my sample data frame with three columns - Opposition, Ground.Country and Toss. Two of the columns, you guessed it right, need geocoding.

> head(shortDS,10)
     Opposition              Ground.Country Toss
1      Pakistan            Karachi,Pakistan  won
2      Pakistan         Faisalabad,Pakistan lost
3      Pakistan             Lahore,Pakistan  won
4      Pakistan            Sialkot,Pakistan lost
5   New Zealand    Christchurch,New Zealand lost
6   New Zealand          Napier,New Zealand  won
7   New Zealand        Auckland,New Zealand  won
8       England              Lord's,England  won
9       England          Manchester,England lost
10      England            The Oval,England  won

To geo code this, here is a simple one liner I execute:

shortDS <- with(shortDS, data.frame(Opposition, Ground.Country, Toss,
                  laply(Ground.Country, function(val){getGeoCode(val)})))



> head(shortDS, 10)
    Opposition           Ground.Country Toss  Ground.Lat  Ground.Lng
1     Pakistan         Karachi,Pakistan  won   24.893379   67.028061
2     Pakistan      Faisalabad,Pakistan lost   31.408951   73.083458
3     Pakistan          Lahore,Pakistan  won    31.54505   74.340683
4     Pakistan         Sialkot,Pakistan lost  32.4972222  74.5361111
5  New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254
6  New Zealand       Napier,New Zealand  won -39.4928444 176.9120178
7  New Zealand     Auckland,New Zealand  won -36.8484597 174.7633315
8      England           Lord's,England  won     51.5294     -0.1727
9      England       Manchester,England lost   53.479251   -2.247926
10     England         The Oval,England  won   51.369037   -2.378269



Happy Demoing and Coding!

No comments:

Post a Comment