Visualize Big Data Using R, HANA, D3, JSON and HTML5/JavaScript
This is my first introduction to D3 and I am simply blown away. Mike Bostock (@mbostock), you are genius and thanks for creating D3! With HANA, R, D3, HTML5 and iPad, and you got yourself a KILLER combo!
I have been burning my midnight oil on piecing together my big data story using HANA, R, JSON and HTML5. If you recall, I did a technical session on R and SAP HANA at DKOM, SAP's Development Kickoff Event last week where I showcased the supreme powers of R and HANA when analyzing 124 million records in real time. R and SAP HANA: A Highly Potent Combo for Real Time Analytics on Big Data
Since last week, I have been looking for other creative ways to analyze and then visualize this airlines data. I am very fortunate to come across D3. After spending couple of hours with D3, I decided to build the calendar view for the airlines data I have. The calendar view is the first example Mike shows on his D3 page. Amazingly awesome!
I created this calendar view capturing the percent of delayed flight from SFO airports that departed daily between 2005-2008. For this analysis, I used HANA to get the data out for SFO (out of 250 plus airports) over this 4 years period in seconds and then did all the aggregation in R including creating a JSON and .CSV file in seconds again. Later, I moved to HTML5 and D3 to generate this beautiful calendar view showing SFO's performance. Graphics is presented below:
As expected, December and January are two notorious months for flights delay. Have fun with the live example hosted in the Amazon cloud..
Once again, my R code is very simple:
## Depature Delay for SF Airport
ba.hp.sfo <- ba.hp[Origin=="SFO",]
ba.hp.sfo.daily.flights <- ba.hp.sfo[,list(DailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]
ba.hp.sfo.daily.flights.delayed <- ba.hp.sfo[DepDelay>15,list(DelayedDailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]
setkey(ba.hp.sfo.daily.flights.delayed, Year, Month, DayofMonth)
response <- ba.hp.sfo.daily.flights.delayed[ba.hp.sfo.daily.flights]
response <- response[,list(Date=as.Date(paste(Year, Month, DayofMonth, sep="-"),"%Y-%m-%d"),
#DailyFlights,DelayedDailyFlights,
PercentDelayedFlights=round((DelayedDailyFlights/DailyFlights), digits=2))]
objs <- apply(response, 1, toJSON)
res <- paste('{"dailyFlightStats": [', paste(objs, collapse=', '), ']}')
writeLines(res, "dailyFlightStatsForSFO.json")
write.csv(response, "dailyFlightStatsForSFO.csv", row.names=FALSE)
For D3 and HTML code, please take a look at this example from D3 website.
Happy Analyzing and Keep That Mid Night Oil Burning!
This example was updated with a better tooltip to enable viewing from iPad.
ReplyDeletehttp://ata.blog.s3-website-us-east-1.amazonaws.com/d3SFO.html
Interesting post! Makes me think of things people can do in case there is a flight delay! On a related note read a blog recently on how to enhance passenger experience by making airports more engaging and entertaining! http://www.graymatter.co.in/BlogPost/boost-non-aero-revenues-through-smart-advertising
ReplyDelete