Advanced Analytics with R and SAP HANA
R has become the open source language of choice for
statistical data analysis / data mining, advanced algorithms, credit risk
scoring and for other forms of predictive analytics. R is already an
in-memory based scripting language and is capable of handling big data, tens of
gigabytes and hundreds of millions of rows. And when combined with SAP's
in-memory platform technology called HANA, R offers the potential to take the
in-memory analytics to a whole new level. Imagine performing advanced
statistical analysis such as decision tree, game-theory, linear and multiple
regressions and much more inside SAP HANA on millions of rows and turning
around with critical business insights at the speed of thought.
This is
possible now with R and HANA. This combination has the potential to
completely revolutionize and advance the game of analytics in your
enterprise. This is not it yet. Imagine taking the output from R
and using the Advanced Visualization techniques available in Business
Intelligence 4.0 suite based on HTML5 to create stunning visualization for
today’s business users.
Just to tease you, here is a one-liner in R that processed 120 million records and brought back aggregated data under 20 seconds:
averageDelay <- dt[,list(AvgArrDelay=round(mean(ArrDelay, na.rm=TRUE), digits=2),
AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),
DistanceTravelled=sum(Distance, na.rm=TRUE),
FlightCount=length(Month)),
by=list(UniqueCarrier, Year)][order(Year, -AvgArrDelay)][AvgArrDelay > 10 | AvgDepDelay > 10]
The machine I used for this analysis had 24 cores and 96GB of memory! More to follow over next few days.
Join me and my fellow colleagues for this session at DKOM 2012 San Jose (March 14th at 11 AM at San Jose Convention Center)
Great article ! many useful tips…. thanks for posting!
ReplyDeleteThanks
statistical analysis of survey data