Thursday, March 22, 2012

Tracking SFO Airport's Performance Using R, HANA and D3

Visualize Big Data Using R, HANA, D3, JSON and HTML5/JavaScript

This is my first introduction to D3 and I am simply blown away.  Mike Bostock (@mbostock), you are genius and thanks for creating D3!  With HANA, R, D3, HTML5 and iPad, and you got yourself a KILLER combo!

I have been burning my midnight oil on piecing together my big data story using HANA, R, JSON and HTML5.  If you recall, I did a technical session on R and SAP HANA at DKOM, SAP's Development Kickoff Event last week where I showcased the supreme powers of R and HANA when analyzing 124 million records in real time.  R and SAP HANA: A Highly Potent Combo for Real Time Analytics on Big Data

Since last week, I have been looking for other creative ways to analyze and then visualize this airlines data. I am very fortunate to come across D3.  After spending couple of hours with D3, I decided to build the calendar view for the airlines data I have.  The calendar view is the first example Mike shows on his D3 page. Amazingly awesome!

I created this calendar view capturing the percent of delayed flight from SFO airports that departed daily between 2005-2008.  For this analysis, I used HANA to get the data out for SFO (out of 250 plus airports) over this 4 years period in seconds and then did all the aggregation in R including creating a JSON and .CSV file in seconds again.  Later, I moved to HTML5 and D3 to generate this beautiful calendar view showing SFO's performance.  Graphics is presented below:


As expected, December and January are two notorious months for flights delay.  Have fun with the live example hosted in the Amazon cloud..

Once again, my R code is very simple:


## Depature Delay for SF Airport
ba.hp.sfo <- ba.hp[Origin=="SFO",]


ba.hp.sfo.daily.flights <- ba.hp.sfo[,list(DailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]
ba.hp.sfo.daily.flights.delayed <- ba.hp.sfo[DepDelay>15,list(DelayedDailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]
setkey(ba.hp.sfo.daily.flights.delayed, Year, Month, DayofMonth)
response <- ba.hp.sfo.daily.flights.delayed[ba.hp.sfo.daily.flights]
response <- response[,list(Date=as.Date(paste(Year, Month, DayofMonth, sep="-"),"%Y-%m-%d"), 
                           #DailyFlights,DelayedDailyFlights,
                           PercentDelayedFlights=round((DelayedDailyFlights/DailyFlights), digits=2))]
objs <- apply(response, 1, toJSON)
res <- paste('{"dailyFlightStats": [', paste(objs, collapse=', '), ']}')
writeLines(res, "dailyFlightStatsForSFO.json")                 
write.csv(response, "dailyFlightStatsForSFO.csv", row.names=FALSE)


For D3 and HTML code, please take a look at this example from D3 website. 

Happy Analyzing and Keep That Mid Night Oil Burning!



3 comments:

  1. This example was updated with a better tooltip to enable viewing from iPad.
    http://ata.blog.s3-website-us-east-1.amazonaws.com/d3SFO.html

    ReplyDelete
  2. This post has a lot of importance to the people…I hope you can continue to inspire and post more of this…Thanks

    ReplyDelete
  3. Interesting post! Makes me think of things people can do in case there is a flight delay! On a related note read a blog recently on how to enhance passenger experience by making airports more engaging and entertaining! http://www.graymatter.co.in/BlogPost/boost-non-aero-revenues-through-smart-advertising

    ReplyDelete