All Things Analytics: Google

Showing posts with label Google. Show all posts

Wednesday, September 4, 2013

The Future of Big Data is Cognitive Big Data Apps

Volume, Velocity, Variety and Veracity of your data, the 4V challenge, has become untamable. Wait, yet another big data blog? No, not really. In this blog, I would like to propose a cognitive app approach that can transform your big-data problems into big opportunities at a fraction of the cost.

Everyone is talking about big data problems but not many are helping us in understanding big data opportunities. Let's define a big data opportunity in the context of customers because growing customer base, customer satisfaction and customer loyalty is everyone’s business:

you have a large, diverse and growing customer base

your customers are more mobile and social than ever before

you have engaged with your customers where ever they are: web, mobile, social, local

you believe that "more data beats better algorithms" and that big data is all data

you wish to collect all data - call center records, web logs, social media, customer transactions and more so that

you can understand your customers better and how they speak of and rank you in their social networks

you can group (segment) your customers to understand their likes and dislikes

you can offer (recommend) them the right products at the right time and at the right price

you can preempt customer backlash and prevent them for leaving (churn) to competitors and taking their social network with them (negative network effects)

all this effort will allow you to forecast sales accurately, run targeted marketing campaigns and cut cost to improve revenues and profitability

you wish to do all of this without hiring an army of data analysts, consultants and data scientists

and without buying half-dozen or more tools, getting access to several public / social data sets and integrating it all in your architecture

and above all, you wish to do it fast and drive changes in real time

And most importantly, you wish to rinse and repeat this approach for the foreseeable future

There are hardly any enterprise solutions in the market that can address the challenges listed above. You have no other choice but to build a custom solution by hiring several consultants and striking separate licenses agreements with public and social data vendors to get a combined lens on public and private data. This approach will be cost prohibitive for most enterprise customers and as "90% of the IT projects go" will be mired with delays, cost overruns and truck load of heartache.

The advances in technologies like in-memory databases and graph structures as well as democratization of data science concepts can help in addressing the challenges listed above in a meaningful and cost-effective way. Intelligent big data apps are the need of the hour. These apps need to be designed and built from scratch keeping the challenges and technologies such as cognitive computing[1] in mind. These apps will leave the technology paradigms of 1990s like "data needs to be gathered and modeled (caged) before an app is built" in the dumpster and will achieve the flexibility required from all modern apps to adapt as the underlying data structures and data sources change. These apps can be deployed right off the shelf with minimum customization and consulting because the app logic will not be anchored to the underlying data-schema and will evolve with changing data and behavior.

The enterprise customers will soon be asking for a suite of such cognitive big data apps for all domain functions so that they can put the big data opportunities to work to run their businesses better than their competitors. Without dynamic cognitive approach in apps, addressing the 4V challenge will be a nightmare and big data will fail to deliver its promise.

Stay tuned for future blogs on this topic including discussions on a pioneering technology approach.

[1] Cognitive computing is the ability to analyze oceans of data in context with related information and expertise. Cognitive systems learn from how they’re used and adjust their rules and results dynamically. Google search engine and knowledge graph technology is predicated upon this approach.

This blog has benefited from the infinite wisdom and hard work of my former colleagues Ryan Leask and Harish Butani and that of my current colleagues Sethu M., Jens Doerpmund and Vijay Vijayasankar.

Image courtesy of MemeGenerator

Sunday, October 28, 2012

Apple, SAP & Hewlett-Packard: Not Just Numbers, Company's Vision, Strategy and Goals Also Matter For Investors (Part II)

Part I of this two-part blog offered empirical evidence suggesting that few consistently outperforming technology companies (such as AAPL and GOOG) get valuation treatments that defy conventional wisdom. Part I ended by introducing an investment approach that was based on three simple rules and suggested that management's effectiveness in articulating its corporate vision and goals and its trustworthiness also plays a critical role in winning investors' sentiment. Let's put each company through this test in part II of this blog and discuss the outcome.

First up, AAPL: For AAPL, I can safely conclude that the first two rules are securely in the bag. Investors understand the company and its hugely popular products. It has successfully sailed with the wind for the past decade and I might even argue that it brought fresh wind in the sails of tablets and smart phone segments. But when it comes to applying the third and final rule, i.e. investing in AAPL for the mid-to-long term, and thus paying a reasonable multiple, investors are certainly hesitating to act. From investors’ point of view, the investment decision boils down to following two points:

AAPL gets fresh lease of life every year when it upgrades its iLine (iPads, iPhones, iPods and Macs) of products;
But other than this routine, AAPL management is highly secretive about its vision for the future of AAPL.

For AAPL investors, it is challenging to see beyond a one year horizon. The investors are asking larger questions to AAPL including: a) what does AAPL want to be in 5 years and where will it be? b) Will AAPL dominate some market segments as it does today, if so, what is that longer-term strategy? Until, AAPL addresses these questions and clearly articulates its strategy and the goals tied to its strategy, it will be hard to see why investors would apply SP500 like or higher multiples on AAPL.

Next up, ORCL: For ORCL, I would start by arguing that ORCL is suffering from a credibility problem with the investors. Investors get ORCL’s enterprise software and hardware business which is attractive and growing at a secular rate. ORCL has accepted that inorganic growth model (via acquisition) is the way to move its business forward and stay current on the technology innovation front. Besides all this, its earlier position on cloud technologies (calling it a fad) and then turning into a true cloud believer (with the acquisition of RightNow, Taleo and its own investments) has sent mixed messages to investors.

ORCL has done a poor job of laying out its long-term vision for investors and investors are unhappy because they are unable to see a clear path forward. Does ORCL want to be like its big brethren IBM and package hardware, software, services and cloud infrastructure together for its customers? What does ORCL want to be? What are some of its growth plays? ORCL has attempted to articulate its vision to investors and analysts but the reduced trustworthiness and the past delays in strategic investments have kept investors skeptical at best. ORCL needs to win the credibility back from its investors and shy away from sending mixed messages to investors and its own customers.

Next up, HPQ: I don’t know where to begin with this company. Let’s start at the very top, the board. HPQ’s board has had major credibility issues for many years now because of the scandals and terrible decisions that have resulted in billions of dollars of losses for investors. The epic stumbles such as the launch of Palm based tablets/smart phones (and then the immediate pull out), public display of flip-flopping decisions on spin-off for Personal Computer unit and then having three CEOs at the helm of HPQ in less than three years has not pleased investors.

In addition, HPQ’s core businesses continue to suffer resulting in heavy losses because it has been slow to respond to the shift in technology spending to cloud and mobile technologies. HPQ’s market cap has dropped by more than 80% since peaking at approximately $120B in 2010. Investors have little to no confidence in HPQ and are pricing in rapid erosion of its customer base and sales (which is reflected in a low price/sales ratio of 0.23.)

Next up, GOOG: GOOG has wide range of interests resulting in a large array of investments including the investment in driver-less cars. Not all the projects GOOG has undertaken in recent years have been positive NPV projects and as a result GOOG’s stock has same P/E multiples as that of SP500. GOOG wants to be a technology company and all the investments GOOG makes have this common origin. This is a fact but why investors are not comfortable with it? Is GOOG not effective at convincing investors that this approach is right and will bear fruits?

Is GOOG going to be a media company, or a mobile company, or a hardware company, or an Internet bandwidth company, or a search company or an enterprise software company or all of the above (i.e. a tech conglomerate)? Apparently, GOOG’s vision and roadmap are not very clear to investors which is why GOOG had lackluster performance for the first six months of 2012 prior to Q2’s earnings announcement. I believe that investors have adopted a wait and watch approach on GOOG which is a mature company now but surrounds itself with a high number of uncertainties.

Next up, IBM: IBM is securely in the bag using the rules I laid out in Part-I. By 2015, IBM will generate $20 in non-GAAP EPS - this is IBM’s corporate goal for 2015. I believe that investors should love the simplicity of IBM’s singular goal. IBM has done a nicejob in articulating its corporate goal for 2015 including the key growth plays that will drive IBM forward to its goal. The key growth plays from IBM are emerging markets, Analytics, Cloud and Smart Planet initiatives. IBM has also articulated that it will pursue higher-margin opportunities (i.e. software) and use share repurchase programs to boost EPS. IBM’s EPS in 2011 was $13.4 which would have to rise by 50% in 4 years if IBM were to accomplish its goal of producing $20 EPS by 2015.

Both AAPL and IBM are iconic and trusted brands. IBM has provided a clear vision and a path forward but AAPL has not, therefore I am not surprised to see that both IBM and AAPL received similar Trailing and Forward P/E multiples despite the fact that AAPL’s earnings growth is nothing less than spectacular.

This brings me to the last company I will discuss here, SAP: Just like IBM, SAP is also securely in the bag. SAP is a global brand and plans to reach 1 billion people in an attempt to become a household name. I have found SAP to be a goals driven company and it is taking all the necessary steps (both organic and inorganic growth opportunities) to track towards these goals. This is similar to IBM’s approach but more clearly spelled out. Here are the goals that SAP has laid out for 2015 on its corporate website:

Source: SAP' Corporate Website

Additionally, just like IBM, SAP has also clearly articulated its growth strategy and the five market categories it plans to expand into. These categories are: applications, analytics, mobile, database & technology, and the cloud. SAP’s management has not sent mixed messages to the market (unlike ORCL) since sharing its vision and goals for the future and is gearing up to ride both the mobile (with Sybase and Syclo acquisitions) and the cloud trends (with SuccessFactor and Ariba acquisitions).

SAP is not the only company growing its revenues at a double digit rate for more than 10 quarters, but it is logging that performance on a consistent basis and tracking towards its 2015 corporate goals. Investors are cheering this steady performance and have bid up the stock by more than 35% YTD in 2012, higher than every other stock in the group except AAPL (see the graphics below):

Source: Google Finance

Majority of the public companies, if not all, develop a vision, lay out a clear strategy and announce goals to realize that vision. But some do a better job than others in articulating and sharing this on a regular basis with their investors. Companies that clearly articulate their vision and strategy to all their constituents including customers, employees, partners and investors earn respect almost instantaneously. And when these companies publicly track progress against their vision, they benefit tremendously by winning the trust and credibility from each and every constituent (including investors) allowing them to attract top talent, new customers, new partners and new markets to help them grow their business.

This blog has benefited from the discussions with my friends and colleagues Jens Doerpmund, Ryan Leask and Rajani Aswani on this topic.

Disclaimer: All numbers are approximate and the underlying analysis is preliminary. This blog is not intended for offering any investment advice. SAP is my employer but all the views and opinions expressed here are solely mine.

Apple, SAP & Hewlett-Packard: Not Just Numbers, Company's Vision, Strategy and Goals Also Matter For Investors (Part I)

In this two-part blog, I will share my view points on why investors price certain stocks at higher or lower multiples against market defying conventional wisdom which suggests that higher (lower) growth stocks should fetch higher (lower) multiples than that of the market.

In part I of this two-part blog, I offer empirical evidence suggesting that few consistently outperforming technology companies get valuation treatments that defy conventional wisdom. Faster growing companies get lower multiples while slow and steadily growing companies get higher multiples.

In part II of this blog, I will conclude by suggesting that a clearly articulated long term strategy along with measurable corporate goals play an equally important role together with the company’s financial track record and market beating performance in winning investors' heart (and getting higher multiples.)

To help me illustrate my view points, I assembled a small group of traditional tech companies including Apple (Ticker: AAPL), Oracle (Ticker: ORCL), Hewlett-Packard (Ticker: HPQ), Microsoft (Ticker: MSFT), IBM (Ticker: IBM), SAP (Ticker: SAP), and Google (Ticker: GOOG). I selected S&P 500 (SP500) as the market.

As of Oct 19, SP500’s Trailing P/E and estimates for Forward P/E were 17 and 13.8 respectively (see the side table.) AAPL’s 52-weeks return of 50.5% has markedly outpaced the same period return of 9.95% for SP500. In addition, AAPL’s earnings growth has substantively outpaced that of SP 500 for five straight years (see the side chart). So I started to wonder why AAPL’s Trailing and Forward PE multiples of 14.3 and 11.4 trail that of S&P 500 (see the table below). Is there a crisis looming for AAPL that could be bigger in magnitude and impact than those faced by the financial markets including the never-ending debt crisis in Europe, a worsening slowdown in China and an already unraveling fiscal cliff in the US. So, why are investors not pricing AAPL using the multiples of SP500 at the minimum?

Consider SAP: In a peer group comprising of four enterprise software tech companies - SAP, IBM, ORCL and MSFT, SAP has the highest Trailing P/E and the highest Price/Sales ratio (see the table below). SAP’s Forward P/E of 19.7 is 9 points higher than that of ORCL, 11 points higher than that of MSFT and 8 points higher than that of IBM. In addition, SAP’s Forward and Trailing P/E multiples are also higher than that of SP500! So, why investors are willing to price SAP stock at higher multiples than the others in its peer group including the SP500. Interestingly, SAP’s multiples are also higher than that of AAPL.

Finally, let’s drop HPQ into the mix: HPQ’s TTM revenue was $61.9 per share (see the table above). Its stock is trading at a meager P/S multiple of .2x and has a Forward P/E multiple of just 4. This is not hard to explain as there is no love left between HPQ and its investors who have suffered heavy losses in HPQ which has dropped almost 50% just this year alone. In addition, at such lower multiples, investors are definitely pricing in a catastrophic scenario.

Source: Morning Star and Yahoo Finance

For all these companies, I assembled last 5 years income statements and I reviewed their revenue growth rates (see the side table). Nothing jumped out that could have suggested why AAPL should have lower multiples than the SP500, GOOG or SAP. Clearly, there is something else at play which traditional valuation approach is not explaining.

Investors’ actions in HPQ, AAPL, SAP and its peers can be justified by applying following three simple rules of investment:

invest in companies you know, understand and believe;
invest in companies which are going to persist, pursue positive NPV projects and successfully sail with the wind (market trends); &
invest in companies for mid-long term.

Investors closely scrutinize, more than one would desire, the company management’s effectiveness in articulating its future and corporate goals and the trustworthiness. This is where the “believe” part in the first rule comes in. In part II of this blog, I will apply this set of principles to few companies in this group and discuss the outcome.

Wednesday, May 2, 2012

Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part III

Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related Delays

For this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA. Historical airlines and weather data were used for the underlying analysis. The aggregated output of this analysis was outputted in JSON which was visualized in HTML5, D3 and Google Maps. The previous blogs on this series are:

In this blog, I wanted to mash-up disparate data sources in R and HANA by combining airlines data with weather data to understand the reasons behind the airport/airlines delay. Why weather - because weather is one of the commonly cited reasons in the airlines industry for flight delays. Fortunately, the airlines data breaks up the delay by weather, security, late aircraft etc., so weather related delays can be isolated and then the actual weather data can be mashed-up to validate the airlines' claims. However, I will not be doing this here, I will just be displaying the mashed-up data.

I have intentionally focused on the three bay-area airports and have used last 4 years of historical data to visualize the airport's performance using a HTML5 calendar built from scratch using D3.js. One can use all 20 years of data and for all the airports to extend this example. I had downloaded historical weather data for the same 2005-2008 period for SFO and SJC airports as shown in my previous blog (For some strange reasons, there is no weather data for OAK, huh?). Here is how the final result will look like in HTML5:

Click here to interact with the live example. Hover over any cell in the live example and a tool tip with comprehensive analytics will show the break down of the performance delay for the selected cell including weather data and correct icons* - result of a mash-up. Choose a different airport from the drop-down to change the performance calendar.

* Weather icons are properties of Weather Underground.

As anticipated, SFO airport had more red on the calendar than SJC and OAK. SJC definitely is the best performing airport in the bay-area. Contrary to my expectation, weather didn't cause as much havoc on SFO as one would expect, strange?

Creating a mash-up in R for these two data-sets was super easy and a CSV output was produced to work with HTML5/D3. Here is the R code and if it not clear from all my previous blogs: I just love data.table package.

###########################################################################################

# Percent delayed flights from three bay area airports, a break up of the flights delay by various reasons, mash-up with weather data

###########################################################################################

baa.hp.daily.flights <- baa.hp[,list( TotalFlights=length(DepDelay), CancelledFlights=sum(Cancelled, na.rm=TRUE)),

by=list(Year, Month, DayofMonth, Origin)]

setkey(baa.hp.daily.flights,Year, Month, DayofMonth, Origin)

baa.hp.daily.flights.delayed <- baa.hp[DepDelay>15,

list(DelayedFlights=length(DepDelay),

WeatherDelayed=length(WeatherDelay[WeatherDelay>0]),

AvgDelayMins=round(sum(DepDelay, na.rm=TRUE)/length(DepDelay), digits=2),

CarrierCaused=round(sum(CarrierDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),

WeatherCaused=round(sum(WeatherDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),

NASCaused=round(sum(NASDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),

SecurityCaused=round(sum(SecurityDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),

LateAircraftCaused=round(sum(LateAircraftDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2)), by=list(Year, Month, DayofMonth, Origin)]

setkey(baa.hp.daily.flights.delayed, Year, Month, DayofMonth, Origin)

# Merge two data-tables

baa.hp.daily.flights.summary <- baa.hp.daily.flights.delayed[baa.hp.daily.flights,list(Airport=Origin,

TotalFlights, CancelledFlights, DelayedFlights, WeatherDelayed,

PercentDelayedFlights=round(DelayedFlights/(TotalFlights-CancelledFlights), digits=2),

AvgDelayMins, CarrierCaused, WeatherCaused, NASCaused, SecurityCaused, LateAircraftCaused)]

setkey(baa.hp.daily.flights.summary, Year, Month, DayofMonth, Airport)

# Merge with weather data

baa.hp.daily.flights.summary.weather <-baa.weather[baa.hp.daily.flights.summary]

baa.hp.daily.flights.summary.weather$Date <- as.Date(paste(baa.hp.daily.flights.summary.weather$Year,

baa.hp.daily.flights.summary.weather$Month,

baa.hp.daily.flights.summary.weather$DayofMonth,

sep="-"),"%Y-%m-%d")

# remove few columns

baa.hp.daily.flights.summary.weather <- baa.hp.daily.flights.summary.weather[,

which(!(colnames(baa.hp.daily.flights.summary.weather) %in% c("Year", "Month", "DayofMonth", "Origin"))), with=FALSE]

#Write the output in both JSON and CSV file formats

objs <- baa.hp.daily.flights.summary.weather[, getRowWiseJson(.SD), by=list(Airport)]

# You have now (Airportcode, JSONString), Once again, you need to attach them together.

row.json <- apply(objs, 1, function(x) paste('{\"AirportCode\":"', x[1], '","Data\":', x[2], '}', sep=""))

json.st <- paste('[', paste(row.json, collapse=', '), ']')

writeLines(json.st, "baa-2005-2008.summary.json")

write.csv(baa.hp.daily.flights.summary.weather, "baa-2005-2008.summary.csv", row.names=FALSE)

Happy Coding!

Wednesday, March 28, 2012

Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps

Technologies: SAP HANA, R, HTML5, D3, Google Maps, JQuery and JSON

For this fun exercise, I analyzed more than 200 million data points using SAP HANA and R and then brought in the aggregated results in HTML5 using D3, JSON and Google Maps APIs. The 2008 airlines data is from the data expo and I have been using this entire data set (123 million rows and 29 columns) for quite sometime. See my other blogs

The results look beautiful:

Each airport icon is clickable and when clicked displays an info-window describing the key stats for the selected airport:

I then used D3 to display the aggregated result set in the modal window (light box):

Unfortunately, I can't provide the live example due to the restrictions put in by Google Maps APIs and I am approaching my free API limits.

Fun fact: The Atlanta airport was the largest airport in 2008 on many dimensions: Total Flights Departed, Total Miles Flew, Total Destinations. It also experienced lower average departure delay in 2008 than Chicago O'Hare. I always thought Chicago O'Hare is the largest US airport.

As always, I just needed 6 lines of R code including two lines of code to write data in JSON and CSV files:

################################################################################
airports.2008.hp.summary <- airports.2008.hp[major.airports,
list(AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),
TotalMiles=prettyNum(sum(Distance, na.rm=TRUE), big.mark=","),
TotalFlights=length(Month),
TotalDestinations=length(unique(Dest)),
URL=paste("http://www.fly", Origin, ".com",sep="")),
by=list(Origin)][order(-TotalFlights)]
setkey(airports.2008.hp.summary, Origin)
#merge the two data tables
airports.2008.hp.summary <- major.airports[airports.2008.hp.summary,
list(Airport=airport,
AvgDepDelay, TotalMiles, TotalFlights, TotalDestinations,
Address=paste(airport, city, state, sep=", "),
Lat=lat, Lng=long, URL)][order(-TotalFlights)]

airports.2008.hp.summary.json <- getRowWiseJson(airports.2008.hp.summary)
writeLines(airports.2008.hp.summary.json, "airports.2008.hp.summary.json")
write.csv(airports.2008.hp.summary, "airports.2008.hp.summary.csv", row.names=FALSE)
##############################################################################

Happy Coding and remember the possibilities are endless!

Saturday, March 17, 2012

Geocode and reverse geocode your data using, R, JSON and Google Maps' Geocoding API

(Reposting the previous blog with additional module on reverse geocoding added here.)

First and foremost, I absolutely love the topic of Location Analytics (Geo-Spatial Analysis) and see tremendous business potential in not so distant future. I would go out on a limb to predict that the Location Analytics will soon go viral in the enterprise space because it has the capability to WOW us. Look no further than your iPhone or an Android phone and count how many location aware apps you have. We all have at lease one app - Google Maps. Mobile is one of the strongest catalyst for enterprise adoption of Location aware apps. All right, enough of business talk, let's get dirty with the code.

Over the last year and half, I have faced numerous challenges with geocoding and reverse geocoding the data that I have used to showcase my passion for location analytics. In 2012, I decided to take thing in my control and turned to R. Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.

To geocode and reverse geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.

Geocode:

getGeoCode <- function(gcStr) {

library("RJSONIO") #Load Library

gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters

#Open Connection

connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="")

con <- url(connectStr)

data.json <- fromJSON(paste(readLines(con), collapse=""))

close(con)

#Flatten the received JSON

data.json <- unlist(data.json)

if(data.json["status"]=="OK") {

lat <- data.json["results.geometry.location.lat"]

lng <- data.json["results.geometry.location.lng"]

gcodes <- c(lat, lng)

names(gcodes) <- c("Lat", "Lng")

return (gcodes)

}

geoCodes <- getGeoCode("Palo Alto,California")

> geoCodes
           Lat            Lng 
  "37.4418834" "-122.1430195"

Reverse Geocode:

reverseGeoCode <- function(latlng) {

latlngStr <- gsub(' ','%20', paste(latlng, collapse=","))#Collapse and Encode URL Parameters

library("RJSONIO") #Load Library

#Open Connection

connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&latlng=',latlngStr, sep="")

con <- url(connectStr)

data.json <- fromJSON(paste(readLines(con), collapse=""))

close(con)

#Flatten the received JSON

data.json <- unlist(data.json)

if(data.json["status"]=="OK")

address <- data.json["results.formatted_address"]

return (address)

}

address <- reverseGeoCode(c(37.4418834, -122.1430195))

> address
                    results.formatted_address 
"668 Coleridge Ave, Palo Alto, CA 94301, USA"

Happy Coding!

Wednesday, March 14, 2012

R and SAP HANA: A Highly Potent Combo for Real Time Analytics on Big Data

Lets Talk Code

SAP DKOM 2012 kicks off in San Jose today and I can’t be more excited than this. For the past three months Jens Doerpmund, Chief Development Architect of Analytics at SAP and I have been working on this topic of R and SAP HANA and all our hard work (upwards of 400 hours) is about to pay off (fingers crossed).

It has been a stunning journey and an incredible learning experience. Both R and HANA are fascinating technologies and bringing them together is analogous to bringing Google and Apple together. We are gearing up for our session and in the true spirit of DKOM, we will be only talking code, yes code and lots of it. We just wrapped up our slides with lots of code snippets to share with fellow DKOMers. Here is a quick sneak preview of what we are going to cover today:

Big Data Analytics (Really Big)

Airlines sector in the travel industry
22 years (1987-2008) of airlines on time performance data on US airlines
123 million records
Extract Transform Load – ETL work to combine this data with data on airports, data on carriers with this data to setup for Big Data analysis in R and HANA
D20 with 96GB of RAM and 24 Cores
Massive amount of data crunching using R and HANA

We will be covering lots and lot of topics, here is a short list:

Sentiment Analysis on #DKOM and a WordCloud
Cluster Analysis using K-Mean
Geo Code Your Data – Google Maps API
SP100 - XML Parsing and Historical Stock Data
R and HANA integration
Moving big-data from one HANA to another HANA (Replication)
Server side Java Scripting
and an HTML5 App built with R, HANA and Server Side Java Script

Here is a wordcloud straight from R on #DKOM. There will be lot more to discuss today. Looking forward to meeting you all DKOMers.

Lets Talk Code Everyone and Happy Coding!

Jitender Aswani

Jens Doerpmund

Learn more on this session topic in my previous blog: Advance Analytics with R and HANA at DKOM 2012 San Jose

Wednesday, February 1, 2012

Big Four and the Battle of Sentiments - Oracle, IBM, Microsoft and SAP

In this battle of sentiments or opinions for the four software giants - Oracle, IBM, Microsoft and SAP, SAP is generating a lot of positive buzz with its message of "innovation without disruption" and leading the pack with a 95% sentiment score.

Tag	TweetsFetched	+ve Tweets	-ve Tweets	Avg.Score	Tweets	Sentiment
@IBM	198	49	45	0.081	94	52%
@Microsoft	893	307	78	0.484	385	80%
@Oracle	297	90	17	0.313	107	84%
@SAP	98	55	3	0.673	58	95%

Few days ago, I published this blog "Updated Sentiment Analysis and a Word Cloud for Netflix" and the underlying R code. I used the same R program to compare the sentiments for the four software giants. Now, technically speaking, IBM and Oracle are not pure software companies anymore since they both package hardware (server and storage hardware) along with the software but the rivalry between these four companies persuaded me to put a comparative analysis here. I originally included HP in this analysis but then dropped it as I didn't consider HP in the same league as these fours in the software category.

What surprised me the most was the lowest score IBM received, lower than Oracle! What went wrong here? I am also surprised to see Oracle occupying the second spot with 84% sentiment score. So besides all the negative publicity Oracle attracts, the sentiment is overwhelmingly positive.

The one improvement I would like to make to this analysis is to get more tweets. Twitter API restricts the number of tweets that one can fetch and doesn't allow you to fetch older tweets. I would love to run this analysis over a year worth of tweets and also show a time series of sentiment score. That will be fantastic!

Here are the four histograms, one each for four candidates, showing the distribution of opinion scores:

SAP

IBM

Microsoft

Oracle

Happy Analyzing!

The underlying data can be downloaded here.