Wednesday, May 23, 2012

If You are a R Developer, Then You Must Try SAP HANA for Free.


This is a guest blog from Alvaro Tejada Galindo, my colleague and fellow R and SAP HANA enthusiast.  I am thankful to Alvaro for coming and posting on "AllThingsBusinessAnalytics".

Are you an R developers? Have ever heard of SAP HANA? Would you like to test SAP HANA for free?

SAP HANA is an In-Memory Database Technology allowing developers to analyze big data in real-time.

Processes that took hours now take seconds due to SAP HANA's power to keep everything on RAM memory.

As announced in SAP Sapphire Now event in Orlando, Florida, SAP HANA is free for developers. You just need to download and install both the SAP HANA Client and the SAP HANA Studio, and create an SAP HANA Server on the Amazon Web Services as described in the following document:
Get your own SAP HANA DB server on Amazon Web Services - http://scn.sap.com/docs/DOC-28294

Why should this interest you? Easy...SAP HANA is an agent of change bringing speed to its limits and it can also be integrated with R as described in the following blog:

Want to know more about SAP HANA? Read everything you need here: http://developers.sap.com

You're convinced but don't want to pay for the Amazon Web Services? No problem. Just leave a comment including your name, company and email. We will reach you and send you an Amazon Gift Card so you can get started. Of course, your feedback would be greatly appreciated. Of course, we only a limited set of gift cards, so be quick or be out.

Author Alvaro Tejada Galindo, mostly known as "Blag" is a Development Expert working for the Technology Innovation and Developer Experience team in SAP Labs.  He can be contacted at a.tejada.galindo@sap.com.

Alvaro's background in his own words: I used to be an ABAP Consultant for 11 years. I worked in implementations on Peru and Canada. I’m also a die hard developer using R, Python, Ruby, PHP, Flex and many more languages. Now, I work for SAP Labs and my main roles are evangelize SAP technologies by writing blogs, articles, helping people on the forums, attending SAP events, besides many other “Developer engagement” activities.
I maintain a blog called “Blag’s bag of rants” at blagrants.blogspot.com

Wednesday, May 2, 2012

Why Delta's Foray into the Crude Refining Business is a BAD Move?

When my mentor/guide and company president Sanjay Poonen threw this open challenge on Twitter:

For all u MBAs, what do u think of Delta buying an oil refinery for $150M (formerly $1B) for top-grade jet fuel. Would Michael Porter frown?

how could I have passed on this challenge PLUS I have been lately descending deep into the technology roots (most of my blogs are technical with lots of code snippets for all intentional purposes - AllThingsR.)  So I decided to spend some time sleuthing and analyzing hard facts before replying to @spoonen, and (may be) counter @gkm1 (George Mathew's) arguments.  This way I get back into analyzing business topics for some time.  (After all, A in MBA stands for analysis right? Masters in Business Analytics?)

The original news on WSJ covering Delta decision to buy a refinery from ConocoPhillips is here.

I spent quite sometime researching so I can educate myself on this deal.  I started with a prior belief that this is a BAD deal.  After all, the crude refining business is a boom and bust business, has razor thin margins and is notoriously competitive.  Here is a quote from Bloomberg supporting my argument: "Refiners in the northeastern U.S. are struggling to turn a profit because of the narrow margin between the cost of imported crude and fuel prices." (Source: Bloomberg)

Moreover, not a single new refinery has sprung up in the US for at least 35 years (Source) because no one wants to invest in this business.  In addition, ConocoPhillips, had idled this refinery for few months now and Sunoco, another refiner in that area, is in the process of shutting down two more refineries in that region. (Source: Bloomberg)  "Sunoco...said its refining businesses has been losing $1 million dollars a day for three years running." (Source)

So why is Delta buying this refinery? Vertical integration, fuel hedging, cost-savings, political, EPS improvements etc?  Actually all of the above.

Delta's planes burned 3.9B gallons of jet fuel last year.  At an avg. 2011 price of $2.86 per gallon, Delta spent $11.8B, which is 40% of its operating expenses. (Source: NYTimes)  If the cost of jet fuel was 40% of your company's operating expense, you will also be thinking about taking such dramatic decisions but may not execute on it if it is outside your realm, but Delta did.

Delta will pay $150M in cash (it has $3B in cash on its balance sheet, so there is no liquidity issue) and will invest another $100M in retooling this refinery.  Also you should note that PA government is chipping in with additional $30M (Thank you tax payers!).  Retooling is required for reason self-evident in this table (Mainly to crank-up the jet-fuel production):


Now, looking at this table why would anyone believe that Delta can earn $300M every year from this. Also remember, Delta is not bringing down its fuel cost from ~$12B by a whole lot, it is merely trying to save few cents on the dollar. A little bit of shift in the numbers above and Delta will be in red trouble.

Delta said that this is a good deal for the investors, really?  Valero had margins of less than 3% in its last quarter and it is a pure play refinery company. Can Delta beat Valero on margins?  I have my serious doubts.  This could be a gain but just for the Delta's management as it attempts to boost EPS in the near-term.

Also, can you believe that Delta can really retool the refinery and produce more jet-fuel than it is possible? The bio-chemistry doesn't support it.  From one barrel of crude, only 19.5 gallons of gasoline and 4.1 gallons of  jet-fuel can be produced.  How is Delta going to produce more jet-fuel per barrel of crude?

Also, FYI, this refinery can only process light sweet crude (with low sulfur) not that heavy Saudi oil that has high sulfur and is gaining more prominence due to global oil issues. (Source: ConocoPhillips)

Net net, this is a bad move, Delta will burn itself and get out in a year or two.  And when they sell, it will be a fire, sale since many other refineries in that area are already struggling to make a profit as I mentioned above. Delta's thinking that future of refineries is brighter is quite puzzling for me.

Happy Analyzing!

Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part III

Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related Delays

For this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA.  Historical airlines and weather data were used for the underlying analysis. The aggregated output of this analysis was outputted in JSON which was visualized in HTML5, D3 and Google Maps.  The previous blogs on this series are:
  1. Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part II
  2. Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps
  3. Getting Historical Weather Data in R and SAP HANA 
  4. Tracking SFO Airport's Performance Using R, HANA and D3
In this blog, I wanted to mash-up disparate data sources in R and HANA by combining airlines data with weather data to understand the reasons behind the airport/airlines delay.  Why weather - because weather is one of the commonly cited reasons in the airlines industry for flight delays.  Fortunately, the airlines data breaks up the delay by weather, security, late aircraft etc., so weather related delays can be isolated and then the actual weather data can be mashed-up to validate the airlines' claims.  However, I will not be doing this here, I will just be displaying the mashed-up data.

I have intentionally focused on the three bay-area airports and have used last 4 years of historical data to visualize the airport's performance using a HTML5 calendar built from scratch using D3.js.  One can use all 20 years of data and for all the airports to extend this example.  I had downloaded historical weather data for the same 2005-2008 period for SFO and SJC airports as shown in my previous blog (For some strange reasons, there is no weather data for OAK, huh?).  Here is how the final result will look like in HTML5:



Click here to interact with the live example.  Hover over any cell in the live example and a tool tip with comprehensive analytics will show the break down of the performance delay for the selected cell including weather data and correct icons* - result of a mash-up.  Choose a different airport from the drop-down to change the performance calendar. 
* Weather icons are properties of Weather Underground.

As anticipated, SFO airport had more red on the calendar than SJC and OAK.  SJC definitely is the best performing airport in the bay-area.  Contrary to my expectation, weather didn't cause as much havoc on SFO as one would expect, strange?

Creating a mash-up in R for these two data-sets was super easy and a CSV output was produced to work with HTML5/D3.  Here is the R code and if it not clear from all my previous blogs: I just love data.table package.


###########################################################################################  

# Percent delayed flights from three bay area airports, a break up of the flights delay by various reasons, mash-up with weather data

###########################################################################################  

baa.hp.daily.flights <- baa.hp[,list( TotalFlights=length(DepDelay), CancelledFlights=sum(Cancelled, na.rm=TRUE)), 

                             by=list(Year, Month, DayofMonth, Origin)]
setkey(baa.hp.daily.flights,Year, Month, DayofMonth, Origin)

baa.hp.daily.flights.delayed <- baa.hp[DepDelay>15,
                                     list(DelayedFlights=length(DepDelay), 
                                      WeatherDelayed=length(WeatherDelay[WeatherDelay>0]),
                                      AvgDelayMins=round(sum(DepDelay, na.rm=TRUE)/length(DepDelay), digits=2),
                                      CarrierCaused=round(sum(CarrierDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      WeatherCaused=round(sum(WeatherDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      NASCaused=round(sum(NASDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      SecurityCaused=round(sum(SecurityDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      LateAircraftCaused=round(sum(LateAircraftDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2)), by=list(Year, Month, DayofMonth, Origin)]
setkey(baa.hp.daily.flights.delayed, Year, Month, DayofMonth, Origin)

# Merge two data-tables
baa.hp.daily.flights.summary <- baa.hp.daily.flights.delayed[baa.hp.daily.flights,list(Airport=Origin,
                           TotalFlights, CancelledFlights, DelayedFlights, WeatherDelayed, 
                           PercentDelayedFlights=round(DelayedFlights/(TotalFlights-CancelledFlights), digits=2),
                           AvgDelayMins, CarrierCaused, WeatherCaused, NASCaused, SecurityCaused, LateAircraftCaused)]
setkey(baa.hp.daily.flights.summary, Year, Month, DayofMonth, Airport)

# Merge with weather data
baa.hp.daily.flights.summary.weather <-baa.weather[baa.hp.daily.flights.summary]
baa.hp.daily.flights.summary.weather$Date <- as.Date(paste(baa.hp.daily.flights.summary.weather$Year, 
                                                           baa.hp.daily.flights.summary.weather$Month, 
                                                           baa.hp.daily.flights.summary.weather$DayofMonth, 
                                                           sep="-"),"%Y-%m-%d")
# remove few columns
baa.hp.daily.flights.summary.weather <- baa.hp.daily.flights.summary.weather[, 
            which(!(colnames(baa.hp.daily.flights.summary.weather) %in% c("Year", "Month", "DayofMonth", "Origin"))), with=FALSE]

#Write the output in both JSON and CSV file formats
objs <- baa.hp.daily.flights.summary.weather[, getRowWiseJson(.SD), by=list(Airport)]
# You have now (Airportcode, JSONString), Once again, you need to attach them together.
row.json <- apply(objs, 1, function(x) paste('{\"AirportCode\":"', x[1], '","Data\":', x[2], '}', sep=""))
json.st <- paste('[', paste(row.json, collapse=', '), ']')
writeLines(json.st, "baa-2005-2008.summary.json")                 
write.csv(baa.hp.daily.flights.summary.weather, "baa-2005-2008.summary.csv", row.names=FALSE)


Happy Coding!