Over the last year and half, I have faced numerous challenges with geocoding the data that I have used to showcase my passion for location analytics. In 2012, I decided to take thing in my control and turned to R. Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.
To geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.
Here is function that can be called repeatedly by other functions:
getGeoCode <- function(gcStr)
{
library("RJSONIO") #Load Library
gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters
#Open Connection
connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="")
con <- url(connectStr)
data.json <- fromJSON(paste(readLines(con), collapse=""))
close(con)
#Flatten the received JSON
data.json <- unlist(data.json)
lat <- data.json["results.geometry.location.lat"]
lng <- data.json["results.geometry.location.lng"]
gcodes <- c(lat, lng)
names(gcodes) <- c("Lat", "Lng")
return (gcodes)
}
Let's put this function to test:
geoCodes <- getGeoCode("Palo Alto,California")
Lat Lng
"37.4418834" "-122.1430195"
Here is my sample data frame with three columns - Opposition, Ground.Country and Toss. Two of the columns, you guessed it right, need geocoding.
> head(shortDS,10)
Opposition Ground.Country Toss
1 Pakistan Karachi,Pakistan won
2 Pakistan Faisalabad,Pakistan lost
3 Pakistan Lahore,Pakistan won
4 Pakistan Sialkot,Pakistan lost
5 New Zealand Christchurch,New Zealand lost
6 New Zealand Napier,New Zealand won
7 New Zealand Auckland,New Zealand won
8 England Lord's,England won
9 England Manchester,England lost
10 England The Oval,England won
To geo code this, here is a simple one liner I execute:
> head(shortDS, 10)
Opposition Ground.Country Toss Ground.Lat Ground.Lng
1 Pakistan Karachi,Pakistan won 24.893379 67.028061
2 Pakistan Faisalabad,Pakistan lost 31.408951 73.083458
3 Pakistan Lahore,Pakistan won 31.54505 74.340683
4 Pakistan Sialkot,Pakistan lost 32.4972222 74.5361111
5 New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254
6 New Zealand Napier,New Zealand won -39.4928444 176.9120178
7 New Zealand Auckland,New Zealand won -36.8484597 174.7633315
8 England Lord's,England won 51.5294 -0.1727
9 England Manchester,England lost 53.479251 -2.247926
10 England The Oval,England won 51.369037 -2.378269
Happy Demoing and Coding!
shortDS <- with(shortDS, data.frame(Opposition, Ground.Country, Toss,
laply(Ground.Country, function(val){getGeoCode(val)} )))
> head(shortDS, 10)
Opposition Ground.Country Toss Ground.Lat Ground.Lng
1 Pakistan Karachi,Pakistan won 24.893379 67.028061
2 Pakistan Faisalabad,Pakistan lost 31.408951 73.083458
3 Pakistan Lahore,Pakistan won 31.54505 74.340683
4 Pakistan Sialkot,Pakistan lost 32.4972222 74.5361111
5 New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254
6 New Zealand Napier,New Zealand won -39.4928444 176.9120178
7 New Zealand Auckland,New Zealand won -36.8484597 174.7633315
8 England Lord's,England won 51.5294 -0.1727
9 England Manchester,England lost 53.479251 -2.247926
10 England The Oval,England won 51.369037 -2.378269
No comments:
Post a Comment