Friday, March 2, 2012

Advanced Analytics with R and HANA at DKOM 2012 San Jose


Advanced Analytics with R and SAP HANA

R has become the open source language of choice for statistical data analysis / data mining, advanced algorithms, credit risk scoring and for other forms of predictive analytics.  R is already an in-memory based scripting language and is capable of handling big data, tens of gigabytes and hundreds of millions of rows.  And when combined with SAP's in-memory platform technology called HANA, R offers the potential to take the in-memory analytics to a whole new level.  Imagine performing advanced statistical analysis such as decision tree, game-theory, linear and multiple regressions and much more inside SAP HANA on millions of rows and turning around with critical business insights at the speed of thought. 

This is possible now with R and HANA.  This combination has the potential to completely revolutionize and advance the game of analytics in your enterprise.  This is not it yet.  Imagine taking the output from R and using the Advanced Visualization techniques available in Business Intelligence 4.0 suite based on HTML5 to create stunning visualization for today’s business users.

Just to tease you, here is a one-liner in R that processed 120 million records and brought back aggregated data under 20 seconds:

averageDelay <- dt[,list(AvgArrDelay=round(mean(ArrDelay, na.rm=TRUE), digits=2),
                  AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),
                  DistanceTravelled=sum(Distance, na.rm=TRUE),
                  FlightCount=length(Month)), 
                  by=list(UniqueCarrier, Year)][order(Year, -AvgArrDelay)][AvgArrDelay > 10 | AvgDepDelay > 10]

The machine I used for this analysis had 24 cores and 96GB of memory! More to follow over next few days.

Join me and my fellow colleagues for this session at DKOM 2012 San Jose (March 14th at 11 AM at San Jose Convention Center)

1 comment: