All Things Analytics: Cloud

Showing posts with label Cloud. Show all posts

Wednesday, September 4, 2013

The Future of Big Data is Cognitive Big Data Apps

Volume, Velocity, Variety and Veracity of your data, the 4V challenge, has become untamable. Wait, yet another big data blog? No, not really. In this blog, I would like to propose a cognitive app approach that can transform your big-data problems into big opportunities at a fraction of the cost.

Everyone is talking about big data problems but not many are helping us in understanding big data opportunities. Let's define a big data opportunity in the context of customers because growing customer base, customer satisfaction and customer loyalty is everyone’s business:

you have a large, diverse and growing customer base

your customers are more mobile and social than ever before

you have engaged with your customers where ever they are: web, mobile, social, local

you believe that "more data beats better algorithms" and that big data is all data

you wish to collect all data - call center records, web logs, social media, customer transactions and more so that

you can understand your customers better and how they speak of and rank you in their social networks

you can group (segment) your customers to understand their likes and dislikes

you can offer (recommend) them the right products at the right time and at the right price

you can preempt customer backlash and prevent them for leaving (churn) to competitors and taking their social network with them (negative network effects)

all this effort will allow you to forecast sales accurately, run targeted marketing campaigns and cut cost to improve revenues and profitability

you wish to do all of this without hiring an army of data analysts, consultants and data scientists

and without buying half-dozen or more tools, getting access to several public / social data sets and integrating it all in your architecture

and above all, you wish to do it fast and drive changes in real time

And most importantly, you wish to rinse and repeat this approach for the foreseeable future

There are hardly any enterprise solutions in the market that can address the challenges listed above. You have no other choice but to build a custom solution by hiring several consultants and striking separate licenses agreements with public and social data vendors to get a combined lens on public and private data. This approach will be cost prohibitive for most enterprise customers and as "90% of the IT projects go" will be mired with delays, cost overruns and truck load of heartache.

The advances in technologies like in-memory databases and graph structures as well as democratization of data science concepts can help in addressing the challenges listed above in a meaningful and cost-effective way. Intelligent big data apps are the need of the hour. These apps need to be designed and built from scratch keeping the challenges and technologies such as cognitive computing[1] in mind. These apps will leave the technology paradigms of 1990s like "data needs to be gathered and modeled (caged) before an app is built" in the dumpster and will achieve the flexibility required from all modern apps to adapt as the underlying data structures and data sources change. These apps can be deployed right off the shelf with minimum customization and consulting because the app logic will not be anchored to the underlying data-schema and will evolve with changing data and behavior.

The enterprise customers will soon be asking for a suite of such cognitive big data apps for all domain functions so that they can put the big data opportunities to work to run their businesses better than their competitors. Without dynamic cognitive approach in apps, addressing the 4V challenge will be a nightmare and big data will fail to deliver its promise.

Stay tuned for future blogs on this topic including discussions on a pioneering technology approach.

[1] Cognitive computing is the ability to analyze oceans of data in context with related information and expertise. Cognitive systems learn from how they’re used and adjust their rules and results dynamically. Google search engine and knowledge graph technology is predicated upon this approach.

This blog has benefited from the infinite wisdom and hard work of my former colleagues Ryan Leask and Harish Butani and that of my current colleagues Sethu M., Jens Doerpmund and Vijay Vijayasankar.

Image courtesy of MemeGenerator

Sunday, August 25, 2013

Data Science: Definition and Opportunities

Image courtesy of BBC

My thoughts on what data science is, what skills data scientists have, what are the current issues in the Business Intelligence pipeline, how can machine learning automate a part of the BI chain, why and how data science should be democratized and made available to every one including decision makers (business users), how business analyst should build complex data models and how data scientists should be freed up from the mundane tasks of rinse and repeat ETL before building models that provide input for decision making, how companies can build a business practice around data science.

Key Premise: big data is all data and the big data apps offer the ability to combine all data (public + private) and expand the horizon to discover more meaningful insights.

Data Science is:

An art of mining large quantities of data
An art of combining disparate data sources and blending public data with corporate data
Forming hypothesis to solve hard problems
Building models to solve current problems and provide forecast
Anticipate future events (based on historical data) and provide correcting actions (finance, banking, travel, operational runtime)
Automating the processes to reduce time to solve future problems

A Data Scientists has following minimum set of core skills:

Problem-Solver
Creative and can form an hypothesis
Is able to program with large quantities of data
Can think of bringing data from appropriate data source and can bring and blend data
Stats/math/analytics background to build models and write algorithms
Can quickly develop domain knowledge to understand key factors which influence the performance of a busies problem

Roles Data Scientists play:

Problem description
Hypothesis formation
Data assembly, ETL and data integration role
Model development (pattern recognition or any other model to provide answers) and training
Data visualization
AB Testing
Propose solutions and/or new business idea

The balance between human vs. machines:

Current: humans play a significant role in the process – ETL, joins, models, visualization, machine-learning and repeating and recycling this process as the problem changes
Tomorrow: A big portion of the food-chain can be automated via machine learning so machines can take over and scientists can free up to build more algorithms/models
The process can be automated so repeating/recycling can be cheaper and less time consuming

The Data Science pipeline currently look like:

From Data to Insights – this entire process requires mundane skills (IT), specialized skills (data-scientist) and elements of human psychology to present the right information at right time
The data needs to be discovered, assembled, semantically enriched and anchored to a business logic – this task can be be automated through machine learning (a set of harmonized tools with AI) to free up scarce resources
Specialized skills today get addressed by open source technologies such as R and expensive solutions like Matlab and SPSS.
Very few software solution carefully introduce human interface to make their application consumable without requiring customer training

This pipeline needs complete rethinking:

Automate mundane tasks that IT gets tagged with
Discover data automatically
Detach business logic from data models
Make blending public data with corporate data a second nature
Free up scientists so that they can build analytics micro-apps for a domain or a sub-domain
Data Science need not be a niche (specialized category), it should appeal to the masses (democratization of data and brining insights to everyone without needing specialized skills)

Opportunities in Data Science:

Understand the value chain (IT + Business Analyst + Data Scientists + Business Users)
Provide something for everyone - a single integrated platform (ETL + Data Integration + Predictive modeling + in-memory computing + storage) for data-scientist so that they can build standard analytical apps and move away from proprietary models and standardize (helps IT)
Analytical apps on this platform (think of them as Rapid Deployment Solutions) for business users
Help business analysts write basic models (churn, segmentation, correlation etc.) without needing advanced skills
Work with consulting companies so that they can consult and build apps for companies that do not have data scientist on their pay-roll (Mu-Sigma and Opera Solutions)
Partner with public data provider (to help clients), consulting companies (Rapid Solutions solution), R/Python/ML communities (mind-share and thought-leadership),
Donate your predictive models to open-source communities

Thursday, October 11, 2012

Preparing for Workday (WDAY)’s IPO: Betting on the Future

My esteemed colleague Ryan Leask (LinkedIn Profile) and I have co-authored this three-part blog offering our insights on Workday's IPO.

This is part 3 in our blog on Workday’s IPO. Part 1 looked at Workday vs. Salesforce. Part 2 looked at Workday vs. SFSF, TLEO, NOW and CRM

To try and summarize all of the analysis from the first and second blog posts:

Workday’s revenue numbers and growth are fantastic.
But their costs are extremely high. While we understand the focus on “growth now, profits later”, the costs are still pretty extreme.
In comparison to a few other companies, we did find that SuccessFactors operations had a somewhat similar cost structure to Workday, so they aren’t alone in their high costs.
In general, Workday’s costs and profit margins are heading in the right direction as they grow, so if we give them the benefit of the doubt that they continue in this direction in the future, they should be able to get more in line with other SaaS companies.
We would not expect profitability in the next several years (Workday states as such in the S-1)

So the purpose of this blog is not to reiterate Workday’s numbers, but is instead to offer our own conclusions from staring at this data for a while and working in this industry.

Prior to going into the S-1 details, we thought Workday would be a slam dunk, and our only concern was the overall macro environment they are IPO’ing into. However, looking at their surprisingly high costs, we think it is going to take quite some time before Workday becomes profitable, and we think these costs indicate Workday might be betting the house on moving itself outside of the HCM domain. Workday did mention this as a risk in their S-1, in that they don’t have proven success outside of the HCM domain yet, and only 10% of their customers (i.e. around 30 customers) have adopted their financial module. Looking at these numbers, we think this point may have been understated in the S-1.

In this sense, Workday’s IPO feels a lot more like a late-stage VC round than an IPO to us. It seems they are almost looking for money to try and find product-market fit for their new Finance product line. If they had chosen to sit back and ride their HCM business harder instead of investing into the finance area, the figures we’re seeing would be a lot more attractive.

But, it looks to us like they are really betting big to make longer term investments (which we are a big fan of). A prime example of this is their dual-class structure of its common share. We love this tactical move which allows Workday to build a great company and not get dragged into a bitter take-over fight (cough, Oracle-PeopleSoft, cough).

Going into its IPO, Salesforce had multiples of 11.6 and 6.3 on TTM and FTM revenues respectively. It went on to return more than 900% over the next 8 years! Workday has TTM and FTM multiples of 31 and 14 going into IPO, so it is hard to believe that it will yield a return similar to Salesforce, and we’re anticipating returns more like those SuccessFactors and Taleo produced (at least over the next several years). In hindsight, Salesforce’s IPO was a steal.

What this IPO comes down to for us, is that you have to decide for yourself whether Workday is going to nail the Finance market like they did HCM… or not. If yes, it’s a great buy. If not, they are going to continue plowing through the cash the HCM business generates for a lot longer. It certainly feels like Workday is carrying more risk than we would have first thought, but it is a great company with great historical growth and even better prospects for the future growth. Perhaps the timing of this IPO is more a reflection of the uncertainty at the macro level, as this might possibly be their last chance for a while if the fiscal cliff kicks in?

Disclaimer: All numbers are approximate. We are not offering any investment advice and all the analysis we have performed to support our blogs is preliminary.

Preparing for Workday (WDAY)’s IPO: Workday vs. SuccessFactors, Taleo, ServiceNow and Salesforce.com

My esteemed colleague Ryan Leask (LinkedIn Profile) and I have co-authored this three-part blog offering our insights on Workday's IPO.

This is part 2 in our blog on Workday’s IPO. Part 1 looked at Workday vs. Salesforce. And the third and final part provides an overall summary.

To quickly recap the first blog in our three-part series, we discovered WDAY’s cost structure was significantly higher than we anticipated when we looked at their S-1. In a quest to understand this, we looked at Workday (Ticker: WDAY) against SalesForce.com (Ticker: CRM), and discovered some major differences in their business models. As a result, in this blog, we wanted to take some other sample companies to compare to WDAY, in the hopes of finding a company that might show some more similarities.

As a point of note, we will continue to leave the CRM figures in the information presented here for comparison purposes. Our sample of new companies are SuccessFactors (Ticker: SFSF) and Taleo (Ticker: TLEO), both HCM SaaS companies (perhaps our best candidates for comparison), as well as ServiceNow (Ticker: NOW) (although they are in a different space, they are an cloud based enterprise software company which also IPO’d in 2012, so we thought it could just be an interesting comparison point). And as a disclaimer, note that these comparisons are not precise. For example, WDAY is going to IPO about 9 months after the last full year of data is available, so their IPO price as an example, may be more based on this year’s results rather than last year's. With that said, we are just looking for generalities and trends, so an imprecise comparison is still ok. So let’s jump right in.

We assembled the following table by pulling the data from each company’s S-1 to provide a perspective on WDAY’s valuation:

WDAY is attempting to raise more capital than anyone else did, and they are asking a 31x multiple on TTM revenues. This is by far the highest in this set, but when compared on FTM, it’s 14x multiple is a bit closer to what we would expect to see (albeit still rather high). So it does not appear WDAY is a bargain buy like CRM was (at 6x FTM multiple).

Here also is the same figure from the first blog, extended for our new comparison companies, showing some key metrics of the last fiscal year of information before the IPO:

Some quick eyeballing of the numbers tells us that WDAY:

has lot more employees than any other company before going IPO;
does way more consulting services business than the comparisons;
has a lot less customers than the comparisons, except for Taleo who very similar numbers;
generates MUCH more revenue per customer than their peer group. Even if we exclude the services revenue from WDAY, they are around $272k per customer, which is remarkably higher than any of the comparison group; and
spends massively more on R&D cost (as a percent of its revenue) than the others.

So let’s look at a few of these metrics in more detail (note: “Year 3” in this charts represents the last full year of earnings before their IPO, “Year 2” represents the year before that, etc).

First up is revenue growth. After all, investors love growth companies and the “multiples” game hinges on future growth. The charts below show growth rates of our companies, with revenues in $m on the primary Y-axis, and YoY growth rates on the secondary Y-axis.

Nothing significant jumps out here, as all the companies in our universe had strong growth before going IPO except TLEO (who also had the second lowest revenue of the group too, so this is not an issue with growth rates on large numbers). WDAY enjoys some of the strongest growth rates, which is all the more impressive given they also have the largest revenue numbers of the comparison group too. Net net, WDAY is looking very strong in terms of revenue growth.

Second up is the cost of revenues (the cost to earn a dollar of revenue): Investors over the years have come to accept that the cloud business is a different beast where it takes years to become profitable, but it’s still important to keep the cost of sales in-check.

Most companies generally show signs of getting economies of scale as the company grows. The notable points in this chart are how much more efficient CRM is compared to the rest, as well the fact that WDAY has yet to reach an efficient model. So while there is some issues with how high WDAY’s cost of revenues are, we can try and give them the benefit of the doubt that this will come down over the next few years as it is at least trending in the right direction.

Next up is the operating expense and margins: Investors would like to see a stable cost-structure expanding in sync with growth in revenues. Anything out of whack will raise concerns.

As expected, the SaaS companies here, and perhaps more generally any startup focusing on growth, have operating expenses greater than revenues. Interestingly, both SFSF and WDAY seem to have extremely high cost structures. We were glad to find some company for WDAY on this, as we were really beginning to wonder where these guys are spending so much money. In fact, at least WDAY has consistently been getting the ratio headed in the right direction, unlike SFSF whose Year 3 figures actually started increasing again relative to Year 2. Again, WDAY is not at the point of having reached economies of scale, so we have to give them the benefit of the doubt that they will get there as things are heading in the right direction.

All right, last up are net profit margins: We are expecting to see losses from startups in their growth phase as they put every dollar earned back into the company, focusing on building a great company for a long haul. But we want to look for the size of the losses and overall directionality too.

Another very similar pattern to operating expenses. WDAY is suffering the heaviest losses of the group, but they are shrinking relative to the size of revenues (but increasing in absolute terms). We would have liked to have seen losses also shrinking in absolute terms too though. WDAY should eventually have profits heading in the right direction once their recurring subscription revenues are a little larger, along with the economies of scale benefits as they start getting more customers. Again, WDAY has found a friend in SFSF, showing that the scale of their losses is not unprecedented.

These comparisons don’t really paint the best picture for WDAY. Not only are they asking the highest multiples off of revenue, but their cost structure is one of the highest of the comparison companies, and a very large chunk of their revenue is coming from services not license revenue (which has much lower margins). However, in terms of directionality, everything does look promising for WDAY in the future. We will try and summarize our overall conclusions in our third and final blog post of the series.

Disclaimer: All numbers are approximate. We are not offering any investment advice and all the analysis we have performed to support our blogs is preliminary.

Preparing for Workday (WDAY)’s IPO: Workday vs. Salesforce.com

My esteemed colleague Ryan Leask (LinkedIn Profile) and I have co-authored this three-part blog offering our insights on Workday's IPO.

This is part 1 in our blog on Workday’s IPO. Part 2 looks at how Workday compares to SFSF, TLEO, NOW and CRM. And the third and final part provides an overall summary.

Workday (Ticker: WDAY), a cloud based provider of HCM and other enterprise software, is going to IPO tomorrow. As in typical Silicon Valley fashion, not that many people are discussing it because it’s not a consumer software company. But for us in the enterprise software world, this is absolutely one to watch!

We’ve known Workday has been on a tear for a while, so as we looked through their S-1, their growth didn’t come as a big surprise to us. That’s not to belittle their accomplishments. It was an amazing feat by all accounts, and they achieved it all right through the heart of the Great Recession. Spectacular performance! However, the thing that caught us a little off-guard was their expenses. We wanted to take a deeper look at their numbers, and compare it to other cloud enterprise companies to see how their figures stacked up.

Of course, our analysis began with comparing Workday to Salesforce.com (Ticker: CRM). If you invested in CRM on opening day and held it all the way till date, you would be sitting pretty on a 900% ROI over ~9 years. Not too shabby. So how does Workday compare?

The figure below highlights a few key metrics. The WDAY figures are for their year ending Jan 31, 2012 from their S-1. The CRM column represents the data in Salesforce.com’s S-1 document, however, since CRM was only ~5 years old when it IPO’d in 2004, and WDAY is already 7 years old, we added an extra set of figures for CRM at their 7 year mark too (CRM@7), and use this as the comparison point for this blog.

You can see by all accounts, WDAY is significantly trailing CRM@7 years. WDAY’s revenues are 43% of CRM@7’s revenues (134m vs. 310m), yet Workday’s costs are 73% of CRM@7’s (213m vs. 290m). That’s a big discrepancy. Where are these costs coming from?

Well, Workday had 1096 employees to CRM@7’s 1304 (i.e. Workday had 84% of CRM@7’s number of employees to produce 43% of their revenue, yet still incur 73% of their costs). That means WDAY saw $122k rev per employee vs. CRM@7’s $238k rev per employee, so nearly a 2x favor to CRM@7.

So it’s clear, WDAY is operating with a different model to CRM. This led us to take two follow-up steps:

Compare WDAY to some other companies, to see if we could find any other similarities. This will be the second part of our blog.
Analyze “why” WDAY’s figures are so different to CRM’s. Yes, there is the HCM vs. CRM difference, but prior to going through the S-1, we would not have expected to see big differences between the companies.

The rest of this blog post will address our theory on the second question of “why” the two company’s figures are so different. So here we offer our some of our thoughts on this:

WDAY is Selling to Large Enterprises

Workday has only 326 customers after 7 years. CRM@7 by contrast had over 20,000 customers around the same time. So yes CRM@7 had 2.3x WDAY’s revenue, but they also had 63x the customers.
WDAY’s Rev/Customers amounted to $412k. CRM@7’s Rev/Customers was $15k. Clearly, WDAY is selling much more to larger companies than CRM did.
WDAY does a lot more services business as well, but even if you exclude it (34% of rev), it would still give you a figure of $272k/customer… so way higher license rev per customer than CRM.
WDAY over the years had made news of big account wins (Flextronics & Chiquita come to mind), so we knew they were successful in LE’s. However, we assumed they were also getting a lot more traction in the SME space too, which appears not to be the case.
As per WDAY’s S-1, the figure of 326 customers does exclude SME’s which were bought in from a reseller. But given we didn’t see any explanation of the figures in any more detail, we would assume that the number of SME’s & the revenue they bring in is not material.
Selling basically exclusively to large companies also explains why WDAY’s services figures are so high, at 34% of revs. This is higher than we would have expected/liked to have seen from a SaaS company.
WDAY mentions customization as a risk: Workday’s customers often want customization (but they don’t support adding custom fields or functions), and big companies always want customization (in our experience). However, one point that doesn’t add up about this: what are all the services for if Workday doesn’t allow customization? It would be very interesting to know what the average implementation project time is for Workday customers – we’re guessing it might be a lot higher than other SaaS products.
Another consequence of selling to the big guys is that you will definitely end up with longer sales cycles. Yes, Sales & Marketing costs are still 52% of Rev’s, but this is in line with other SaaS companies. Given that they kept this in-line despite the longer sales cycles, this makes the S&M figure seem more impressive.

Investing For the Future

Workday did state in their S-1 that they are trying to expand out from HCM now into Finance. This is definitely going to require a serious commitment in R&D. Clearly its early stages for them, with only 10% of their customer base (roughly 30 customers) having adopted their finance component so far.
The R&D costs for WDAY were $62m vs. $23m for CRM at their 7 year mark. That means CRM produced 2.3x WDAY’s Rev, while spending only about 0.37x of the R&D cost.
However, we aren’t convinced that just one module (Finance) would be sufficient to account for this R&D. Our best guess is that there is something else in the works too, and Workday is trying to get to a full ERM/ERP suite sooner rather than later. We could be wrong of course, and maybe it’s the extra effort of trying to support analytics, mobility, etc that CRM didn’t have to deal with when it was seven years old… but still, R&D is an extremely high number. We are going to anticipate a positive surprise in the near future because of the higher R&D expenses.
A secondary aspect that we suspect might account for the extra costs is Workday’s focus on international expansion. Both HCM & Finance are going to require a lot more regional changes than say the CRM (i.e. different country laws, etc) module. Workday already supports 21 languages vs. we counted that the CRM only supports 16 languages today, so they are clearly taking international markets seriously.

We won’t draw any more conclusions in this blog. Instead we will put WDAY against other similar SaaS companies, and then summarize our overall perspective in the third and final blog post.

Disclaimer: All numbers are approximate. We are not offering any investment advice and all the analysis we have performed to support our blogs is preliminary.

Wednesday, October 10, 2012

Besides Facebook's Botched IPO, IPO Market Returns 20% in 2012

Facebook (Ticker: FB) is down ~47% since its IPO in May. Now, it is not the most botched IPO ever unfortunately as the infamous record belongs to BATS Exchange (Ticker: BATS) which operates an alternate stock exchange to NYSE and NASDAQ. (Read the Business Insider story here: 8 Unforgettable IPO Disasters)

Also, FB is not the worst performing IPO either. Groupon (Ticker: GRPN) and Zynga (Ticker: ZNGA, proudly led by Mark Pincus), are down 77% and 74% respectively since their IPO. In comparison, FB has done ok, it could be worst but a rapid strategy shift by FB including the emphasis on mobile and a decision to allow e-commerce transactions (Facebook Gifts) on Facebook have provided some kind of a floor under its stock. Here is a chart comparing the three (not-so) darlings of the Web 2.0.

Anyhow, below is a table of the best IPOs for this year. Guidewire (Ticker: GWRE) and Demandware (Ticker: DWRE) are the two cloud technology companies in the list that have done very well returning 137% and 108% till date.

IPO Top Performers (YTD)

Company	Offer Date	Under	Industry	Deal Size (mm)	Offer Price	First Day Close	Closing Price	First Day Return	Total Return
Supernus Pharmac	4/30/12	Citi	Health Care	$50	$5.00	$5.37	$12.77	7.4 %	155.4 %
Nationstar Mortg	3/7/12	Merrill	Financial	$233	$14.00	$14.20	$33.29	1.4 %	137.8 %
Guidewire Softwa	1/24/12	JPM	Technology	$115	$13.00	$17.12	$30.84	31.7 %	137.2 %
Annies	3/27/12	CS	Consumer	$95	$19.00	$35.92	$44.87	89.1 %	136.2 %
Demandware	3/14/12	GS	Technology	$88	$16.00	$23.59	$33.31	47.4 %	108.2 %

Palo Alto Network (Ticker: PANW) is up 16% since IPO with returns of 48% over its IPO price of $42. Splunk (Ticker: SPLK) is down about 10% since IPO but still giving returns of 90% over its IPO price of $17. Both these companies didn't make the cut in the table above.

Here is a list of the worst performing IPOs till date. If one were to change the time period from YTD to 12-months, Zynga shows up in the list, no surprise there. Social gaming is a fast changing environment and ZNGA faces crisis in confidence with so many departures.

IPO Worst Performers (YTD)

Company	Offer Date	Under	Industry	Deal Size (mm)	Offer Price	First Day Close	Closing Price	First Day Return	Total Return
Envivio	4/24/12	GS	Technology	$70	$9.00	$8.49	$2.15	-5.7 %	-76.1 %
Audience	5/9/12	JPM	Technology	$90	$17.00	$19.10	$5.65	12.4 %	-66.8 %
CafePress	3/28/12	JPM	Technology	$86	$19.00	$19.03	$8.07	0.2 %	-57.5 %
Ceres	2/21/12	GS	Materials	$65	$13.00	$14.80	$5.77	13.8 %	-55.6 %
Renewable	1/18/12	UBS	Energy	$72	$10.00	$10.10	$5.16	1.0 %	-48.4 %

Take a closer look, FB is barely staying away from this infamous list. On a similar note, LinkedIn (Ticker: LNKD) is up approximately 80% till date. What a contrasting tale of the two social network companies!

So far in 2012, IPOs have resulted in 20% returns which is better than the -11% returns IPO market yielded in 2011. Since there are about 2.5 months more to go before the curtains drop on 2012, the 2012 IPO return might beat the 25% returns the year 2010 produced.

One very encouraging signs for the IPO investors this year has been the 13% average first day pop in IPOs that is line with what IPO market observed before the great recession (~13%). And to all the naysayers out there who claim that tech-stocks are in a bubble, take a look at the average opening day pop in 1999 (72%) and 2000 (56%) and compare it to 2012, you will hold your peace for few more years at least!

Workday (Ticker: WDAY) is on the deck for this week. Do you due-diligence before investing.

Happy IPO Investing!
Jitender

Source: Renaissance Capital, Greenwich, CT (www.renaissancecapital.com).

Wednesday, May 2, 2012

Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part III

Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related Delays

For this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA. Historical airlines and weather data were used for the underlying analysis. The aggregated output of this analysis was outputted in JSON which was visualized in HTML5, D3 and Google Maps. The previous blogs on this series are:

In this blog, I wanted to mash-up disparate data sources in R and HANA by combining airlines data with weather data to understand the reasons behind the airport/airlines delay. Why weather - because weather is one of the commonly cited reasons in the airlines industry for flight delays. Fortunately, the airlines data breaks up the delay by weather, security, late aircraft etc., so weather related delays can be isolated and then the actual weather data can be mashed-up to validate the airlines' claims. However, I will not be doing this here, I will just be displaying the mashed-up data.

I have intentionally focused on the three bay-area airports and have used last 4 years of historical data to visualize the airport's performance using a HTML5 calendar built from scratch using D3.js. One can use all 20 years of data and for all the airports to extend this example. I had downloaded historical weather data for the same 2005-2008 period for SFO and SJC airports as shown in my previous blog (For some strange reasons, there is no weather data for OAK, huh?). Here is how the final result will look like in HTML5:

Click here to interact with the live example. Hover over any cell in the live example and a tool tip with comprehensive analytics will show the break down of the performance delay for the selected cell including weather data and correct icons* - result of a mash-up. Choose a different airport from the drop-down to change the performance calendar.

* Weather icons are properties of Weather Underground.

As anticipated, SFO airport had more red on the calendar than SJC and OAK. SJC definitely is the best performing airport in the bay-area. Contrary to my expectation, weather didn't cause as much havoc on SFO as one would expect, strange?

Creating a mash-up in R for these two data-sets was super easy and a CSV output was produced to work with HTML5/D3. Here is the R code and if it not clear from all my previous blogs: I just love data.table package.

###########################################################################################

# Percent delayed flights from three bay area airports, a break up of the flights delay by various reasons, mash-up with weather data

###########################################################################################

baa.hp.daily.flights <- baa.hp[,list( TotalFlights=length(DepDelay), CancelledFlights=sum(Cancelled, na.rm=TRUE)),

by=list(Year, Month, DayofMonth, Origin)]

setkey(baa.hp.daily.flights,Year, Month, DayofMonth, Origin)

baa.hp.daily.flights.delayed <- baa.hp[DepDelay>15,

list(DelayedFlights=length(DepDelay),

WeatherDelayed=length(WeatherDelay[WeatherDelay>0]),

AvgDelayMins=round(sum(DepDelay, na.rm=TRUE)/length(DepDelay), digits=2),

CarrierCaused=round(sum(CarrierDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),

WeatherCaused=round(sum(WeatherDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),

NASCaused=round(sum(NASDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),

SecurityCaused=round(sum(SecurityDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),

LateAircraftCaused=round(sum(LateAircraftDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2)), by=list(Year, Month, DayofMonth, Origin)]

setkey(baa.hp.daily.flights.delayed, Year, Month, DayofMonth, Origin)

# Merge two data-tables

baa.hp.daily.flights.summary <- baa.hp.daily.flights.delayed[baa.hp.daily.flights,list(Airport=Origin,

TotalFlights, CancelledFlights, DelayedFlights, WeatherDelayed,

PercentDelayedFlights=round(DelayedFlights/(TotalFlights-CancelledFlights), digits=2),

AvgDelayMins, CarrierCaused, WeatherCaused, NASCaused, SecurityCaused, LateAircraftCaused)]

setkey(baa.hp.daily.flights.summary, Year, Month, DayofMonth, Airport)

# Merge with weather data

baa.hp.daily.flights.summary.weather <-baa.weather[baa.hp.daily.flights.summary]

baa.hp.daily.flights.summary.weather$Date <- as.Date(paste(baa.hp.daily.flights.summary.weather$Year,

baa.hp.daily.flights.summary.weather$Month,

baa.hp.daily.flights.summary.weather$DayofMonth,

sep="-"),"%Y-%m-%d")

# remove few columns

baa.hp.daily.flights.summary.weather <- baa.hp.daily.flights.summary.weather[,

which(!(colnames(baa.hp.daily.flights.summary.weather) %in% c("Year", "Month", "DayofMonth", "Origin"))), with=FALSE]

#Write the output in both JSON and CSV file formats

objs <- baa.hp.daily.flights.summary.weather[, getRowWiseJson(.SD), by=list(Airport)]

# You have now (Airportcode, JSONString), Once again, you need to attach them together.

row.json <- apply(objs, 1, function(x) paste('{\"AirportCode\":"', x[1], '","Data\":', x[2], '}', sep=""))

json.st <- paste('[', paste(row.json, collapse=', '), ']')

writeLines(json.st, "baa-2005-2008.summary.json")

write.csv(baa.hp.daily.flights.summary.weather, "baa-2005-2008.summary.csv", row.names=FALSE)

Happy Coding!