Showing posts with label Predictive. Show all posts
Showing posts with label Predictive. Show all posts

Friday, April 19, 2013

Democratization of Business Analytics Dashboards

I am super impressed with the following visual dashboard from IPL T20 tournament - IPL 2013 in Numbers.  For those of you not so familiar with cricket or IPL, IPL is the biggest, the most extravagant and the most lucrative cricket tournament in the world.  I like the way IPL is bringing sports analytics to the common masses.


What is impressive is that each metric (runs, wickets, or tweets) is live so these numbers get updated automatically, pretty cool for IPL and cricket fans.  Also, each metric is clickable so one can drill down to his or her heart's content.  This is a common roll-up analysis but the visualization and the real time updates make this dashboard pretty appealing.  IPL team, thanks for not putting any dials on this dashboard (LOL).

I have been influencing and now building analytics products that power these sports and various other dashboards/reports for many years.  The most fascinating thing is that these dashboards (or lets call it analytics in general) are reaching the masses like never before.  Everyone has heard of terms like democratization of data and humanization of analytics.  This is it!  The data revolution is underway.  

Now, there are many new frontiers to go after and the existing ones need to be reinvented.  Yes, the analytics market is ready for massive disruption.  This is what keeps me excited about Business Analytics space.

Happy Analyzing and Happy Friday!

Wednesday, May 2, 2012

Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part III

Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related Delays

For this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA.  Historical airlines and weather data were used for the underlying analysis. The aggregated output of this analysis was outputted in JSON which was visualized in HTML5, D3 and Google Maps.  The previous blogs on this series are:
  1. Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part II
  2. Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps
  3. Getting Historical Weather Data in R and SAP HANA 
  4. Tracking SFO Airport's Performance Using R, HANA and D3
In this blog, I wanted to mash-up disparate data sources in R and HANA by combining airlines data with weather data to understand the reasons behind the airport/airlines delay.  Why weather - because weather is one of the commonly cited reasons in the airlines industry for flight delays.  Fortunately, the airlines data breaks up the delay by weather, security, late aircraft etc., so weather related delays can be isolated and then the actual weather data can be mashed-up to validate the airlines' claims.  However, I will not be doing this here, I will just be displaying the mashed-up data.

I have intentionally focused on the three bay-area airports and have used last 4 years of historical data to visualize the airport's performance using a HTML5 calendar built from scratch using D3.js.  One can use all 20 years of data and for all the airports to extend this example.  I had downloaded historical weather data for the same 2005-2008 period for SFO and SJC airports as shown in my previous blog (For some strange reasons, there is no weather data for OAK, huh?).  Here is how the final result will look like in HTML5:



Click here to interact with the live example.  Hover over any cell in the live example and a tool tip with comprehensive analytics will show the break down of the performance delay for the selected cell including weather data and correct icons* - result of a mash-up.  Choose a different airport from the drop-down to change the performance calendar. 
* Weather icons are properties of Weather Underground.

As anticipated, SFO airport had more red on the calendar than SJC and OAK.  SJC definitely is the best performing airport in the bay-area.  Contrary to my expectation, weather didn't cause as much havoc on SFO as one would expect, strange?

Creating a mash-up in R for these two data-sets was super easy and a CSV output was produced to work with HTML5/D3.  Here is the R code and if it not clear from all my previous blogs: I just love data.table package.


###########################################################################################  

# Percent delayed flights from three bay area airports, a break up of the flights delay by various reasons, mash-up with weather data

###########################################################################################  

baa.hp.daily.flights <- baa.hp[,list( TotalFlights=length(DepDelay), CancelledFlights=sum(Cancelled, na.rm=TRUE)), 

                             by=list(Year, Month, DayofMonth, Origin)]
setkey(baa.hp.daily.flights,Year, Month, DayofMonth, Origin)

baa.hp.daily.flights.delayed <- baa.hp[DepDelay>15,
                                     list(DelayedFlights=length(DepDelay), 
                                      WeatherDelayed=length(WeatherDelay[WeatherDelay>0]),
                                      AvgDelayMins=round(sum(DepDelay, na.rm=TRUE)/length(DepDelay), digits=2),
                                      CarrierCaused=round(sum(CarrierDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      WeatherCaused=round(sum(WeatherDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      NASCaused=round(sum(NASDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      SecurityCaused=round(sum(SecurityDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      LateAircraftCaused=round(sum(LateAircraftDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2)), by=list(Year, Month, DayofMonth, Origin)]
setkey(baa.hp.daily.flights.delayed, Year, Month, DayofMonth, Origin)

# Merge two data-tables
baa.hp.daily.flights.summary <- baa.hp.daily.flights.delayed[baa.hp.daily.flights,list(Airport=Origin,
                           TotalFlights, CancelledFlights, DelayedFlights, WeatherDelayed, 
                           PercentDelayedFlights=round(DelayedFlights/(TotalFlights-CancelledFlights), digits=2),
                           AvgDelayMins, CarrierCaused, WeatherCaused, NASCaused, SecurityCaused, LateAircraftCaused)]
setkey(baa.hp.daily.flights.summary, Year, Month, DayofMonth, Airport)

# Merge with weather data
baa.hp.daily.flights.summary.weather <-baa.weather[baa.hp.daily.flights.summary]
baa.hp.daily.flights.summary.weather$Date <- as.Date(paste(baa.hp.daily.flights.summary.weather$Year, 
                                                           baa.hp.daily.flights.summary.weather$Month, 
                                                           baa.hp.daily.flights.summary.weather$DayofMonth, 
                                                           sep="-"),"%Y-%m-%d")
# remove few columns
baa.hp.daily.flights.summary.weather <- baa.hp.daily.flights.summary.weather[, 
            which(!(colnames(baa.hp.daily.flights.summary.weather) %in% c("Year", "Month", "DayofMonth", "Origin"))), with=FALSE]

#Write the output in both JSON and CSV file formats
objs <- baa.hp.daily.flights.summary.weather[, getRowWiseJson(.SD), by=list(Airport)]
# You have now (Airportcode, JSONString), Once again, you need to attach them together.
row.json <- apply(objs, 1, function(x) paste('{\"AirportCode\":"', x[1], '","Data\":', x[2], '}', sep=""))
json.st <- paste('[', paste(row.json, collapse=', '), ']')
writeLines(json.st, "baa-2005-2008.summary.json")                 
write.csv(baa.hp.daily.flights.summary.weather, "baa-2005-2008.summary.csv", row.names=FALSE)


Happy Coding!

Thursday, July 28, 2011

R and Future of Predictive!

I am super excited about R. I have been writing quantitative scripts (QuantMod) using R (MatLab prior to that) since 2008 and I am discovering new possibilities everyday. This is my night job!

So once R is packaged behind RESTful APIs, it is available to the masses. That is my goal. I know that RevolutionAnalytics guys have a lead (or at least they say that they have lead), but I have a different approach - take R to the masses, not just to the Enterprise but everyone. I am not far from my goal.

Thank you RApache!

Thursday, May 26, 2011

Predictive Analytics to the Rescue and Beyond! Predictive Analytics to go pervasive!

This segment of the analytics will explode soon. I have never made a prediction but if I were to make one - this will be it. Predictive Analytics have been the thing of super smart, masters of finance or PHD in statistics and/or mathematicians in an organization, not generally IT. This will soon change just like everything else has changed around us with consumerization of IT. Companies like SAS, SPSS (IBM) and bunch of other smaller niche companies offer predictive solutions to companies to make future decisions by analyzing the patterns in the data (all of the statistics: mean, variance, confidence-intervals, distribution, monte-carlo, seasonality, decision trees etc.)

Here are some anecdotal use-cases from the real would on how predictive is helping companies become smarter and more profitable:

 "Some of the most famous examples of analytics in action come from the world of professional sports, where 'quants' increasingly make the decisions about what players are really worth. Consider these examples from the business world:


--Best Buy was able to determine through analysis of member data that 7% of its customers were responsible for 43% of its sales. The company then segmented its customers into several archetypes and redesigned stores and the in-store experience to reflect the buying habits of particular customer groups.


--Olive Garden uses data to forecast staffing needs and food preparation requirements down to individual menu items and ingredients. The restaurant chain has been able to manage its staff much more efficiently and has cut food waste significantly.


--The U.K.'s Royal Shakespeare Co. used analytics to look at its audience members' names, addresses, performances attended and prices paid for tickets over a period of seven years. The theater company then developed a marketing program that increased regular attendees by more than 70% and its membership by 40%."


Source : Forbes: Why Predictive Analytics Is A Game-Changer?

As always, more on this later.. My approach will be to introduce each topic in the Analytics and then go deeper as the opportunity arise. I want to bring more use-cases in future blogs...

Wednesday, May 25, 2011

Does your company have a BI implementation plan? Consider this statistics:

  • According to market research firm IDC, annual data generation will reach 35 zettabytes or about 35 million petabytes by 2020. 
  • That is enough data to fill a stack of DVDs reaching halfway to Mars, or 17.5 million times the entire collections of all the academic libraries in the United States. 
  • As a result, business intelligence has become an eight billion dollar industry and continues to increase each year. Global Industry Analysts released a report in July projecting the BI software market will reach $12.4 billion by 2015.
  •  In May, Forrester came out with a report on the state of the BI industry, finding 49 percent of companies are planning a BI project in 2010 or soon after.
I think that the BI market is expected to be bigger than what IDC is projecting based on my own experience, analysis, research trends and continuous work with BI think-tanks. There are lot of other trends like location analytics and predictive which will make BI even more pervasive in next 2-3 years. 

Now, this ties pretty neatly into my earlier blogs on Big Data and huge amount of innovation happening in this space. Keep blogging... (See my blog entry from February 2010)