Showing posts with label BI. Show all posts
Showing posts with label BI. Show all posts

Wednesday, September 4, 2013

The Future of Big Data is Cognitive Big Data Apps



Volume, Velocity, Variety and Veracity of your data, the 4V challenge, has become untamable.  Wait, yet another big data blog?  No, not really.  In this blog, I would like to propose a cognitive app approach that can transform your big-data problems into big opportunities at a fraction of the cost.

Everyone is talking about big data problems but not many are helping us in understanding big data opportunities.  Let's define a big data opportunity in the context of customers because growing customer base, customer satisfaction and customer loyalty is everyone’s business:

  • you have a large, diverse and growing customer base
  • your customers are more mobile and social than ever before
  • you have engaged with your customers where ever they are: web, mobile, social, local
  • you believe that "more data beats better algorithms" and that big data is all data
  • you wish to collect all data - call center records, web logs, social media, customer transactions and more so that
  • you can understand your customers better and how they speak of and rank you in their social networks
  • you can group (segment) your customers to understand their likes and dislikes
  • you can offer (recommend) them the right products at the right time and at the right price
  • you can preempt customer backlash and prevent them for leaving (churn) to competitors and taking their social network with them (negative network effects)
  • all this effort will allow you to forecast sales accurately, run targeted marketing campaigns and cut cost to improve revenues and profitability
  • you wish to do all of this without hiring an army of data analysts, consultants and data scientists
  • and without buying half-dozen or more tools, getting access to several public / social data sets and integrating it all in your architecture
  • and above all, you wish to do it fast and drive changes in real time
  • And most importantly, you wish to rinse and repeat this approach for the foreseeable future
There are hardly any enterprise solutions in the market that can address the challenges listed above.  You have no other choice but to build a custom solution by hiring several consultants and striking separate licenses agreements with public and social data vendors to get a combined lens on public and private data.  This approach will be cost prohibitive for most enterprise customers and as "90% of the IT projects go" will be mired with delays, cost overruns and truck load of heartache. 

The advances in technologies like in-memory databases and graph structures as well as democratization of data science concepts can help in addressing the challenges listed above in a meaningful and cost-effective way.  Intelligent big data apps are the need of the hour.  These apps need to be designed and built from scratch keeping the challenges and technologies such as cognitive computing[1] in mind.  These apps will leave the technology paradigms of 1990s like "data needs to be gathered and modeled (caged) before an app is built" in the dumpster and will achieve the flexibility required from all modern apps to adapt as the underlying data structures and data sources change.  These apps can be deployed right off the shelf with minimum customization and consulting because the app logic will not be anchored to the underlying data-schema and will evolve with changing data and behavior.

The enterprise customers will soon be asking for a suite of such cognitive big data apps for all domain functions so that they can put the big data opportunities to work to run their businesses better than their competitors.  Without dynamic cognitive approach in apps, addressing the 4V challenge will be a nightmare and big data will fail to deliver its promise.

Stay tuned for future blogs on this topic including discussions on a pioneering technology approach.

[1] Cognitive computing is the ability to analyze oceans of data in context with related information and expertise.  Cognitive systems learn from how they’re used and adjust their rules and results dynamically.  Google search engine and knowledge graph technology is predicated upon this approach.  

 This blog has benefited from the infinite wisdom and hard work of my former colleagues Ryan Leask and Harish Butani and that of my current colleagues Sethu M., Jens Doerpmund and Vijay Vijayasankar.

Image courtesy of  MemeGenerator

Saturday, March 17, 2012

Geocode and reverse geocode your data using, R, JSON and Google Maps' Geocoding API


(Reposting the previous blog with additional module on reverse geocoding added here.)

First and foremost, I absolutely love the topic of Location Analytics (Geo-Spatial Analysis) and see tremendous business potential in not so distant future.  I would go out on a limb to predict that the Location Analytics will soon go viral in the enterprise space because it has the capability to WOW us. Look no further than your iPhone or an Android phone and count how many location aware apps you have. We all have at lease one app - Google Maps.  Mobile is one of the strongest catalyst for enterprise adoption of Location aware apps. All right, enough of business talk, let's get dirty with the code.

Over the last year and half, I have faced numerous challenges with geocoding and reverse geocoding the data that I have used to showcase my passion for location analytics.  In 2012, I decided to take thing in my control and turned to R.  Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.

To geocode and reverse geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.

Geocode:

getGeoCode <- function(gcStr)  {
  library("RJSONIO") #Load Library
  gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters
 #Open Connection
 connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="") 
  con <- url(connectStr)
  data.json <- fromJSON(paste(readLines(con), collapse=""))
  close(con)
  #Flatten the received JSON
  data.json <- unlist(data.json)
  if(data.json["status"]=="OK")   {
    lat <- data.json["results.geometry.location.lat"]
    lng <- data.json["results.geometry.location.lng"]
    gcodes <- c(lat, lng)
    names(gcodes) <- c("Lat", "Lng")
    return (gcodes)
  }
}
geoCodes <- getGeoCode("Palo Alto,California")


> geoCodes
           Lat            Lng 
  "37.4418834" "-122.1430195" 

Reverse Geocode:
reverseGeoCode <- function(latlng) {
latlngStr <-  gsub(' ','%20', paste(latlng, collapse=","))#Collapse and Encode URL Parameters
  library("RJSONIO") #Load Library
  #Open Connection
  connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&latlng=',latlngStrsep="")
  con <- url(connectStr)
  data.json <- fromJSON(paste(readLines(con), collapse=""))
  close(con)
  #Flatten the received JSON
  data.json <- unlist(data.json)
  if(data.json["status"]=="OK")
    address <- data.json["results.formatted_address"]
  return (address)
}
address <- reverseGeoCode(c(37.4418834, -122.1430195))

> address
                    results.formatted_address 
"668 Coleridge Ave, Palo Alto, CA 94301, USA" 

Happy Coding!

Wednesday, March 14, 2012

R and SAP HANA: A Highly Potent Combo for Real Time Analytics on Big Data


Lets Talk Code

SAP DKOM 2012 kicks off in San Jose today and I can’t be more excited than this.  For the past three months Jens Doerpmund, Chief Development Architect of Analytics at SAP and I have been working on this topic of R and SAP HANA and all our hard work (upwards of 400 hours) is about to pay off (fingers crossed).

It has been a stunning journey and an incredible learning experience. Both R and HANA are fascinating technologies and bringing them together is analogous to bringing Google and Apple together. We are gearing up for our session and in the true spirit of DKOM, we will be only talking code, yes code and lots of it.  We just wrapped up our slides with lots of code snippets to share with fellow DKOMers.  Here is a quick sneak preview of what we are going to cover today:

Big Data Analytics (Really Big)
  • Airlines sector in the travel industry
  • 22 years (1987-2008) of airlines on time performance data on US airlines
  • 123 million records
  • Extract Transform Load – ETL work to combine this data with data on airports, data on carriers with this data to setup for Big Data analysis in R and HANA
  • D20 with 96GB of RAM and 24 Cores
  • Massive amount of data crunching using R and HANA

 We will be covering lots and lot of topics, here is a short list:
  • Sentiment Analysis on #DKOM and a WordCloud
  • Cluster Analysis using K-Mean
  • Geo Code Your Data – Google Maps API
  • SP100 - XML Parsing and Historical Stock Data
  • R and HANA integration
  • Moving big-data from one HANA to another HANA (Replication)
  • Server side Java Scripting
  • and an HTML5 App built with R, HANA and Server Side Java Script



Here is a wordcloud straight from R on #DKOM. There will be lot more to discuss today. Looking forward to meeting you all DKOMers.


Lets Talk Code Everyone and Happy Coding!

 Jitender Aswani
 Jens Doerpmund

Learn more on this session topic in my previous blog:  Advance Analytics with R and HANA at DKOM 2012 San Jose

Monday, January 30, 2012

Updated Sentiment Analysis and a Word Cloud for Netflix - The R Way!

The Netflix investors must be happy and cheerful as the stock is up more than 78% since the beginning of the year (YES, 78%, Source: Yahoo Finance!).  I am not going to talk about what turned the stock around after a much talked/hyped about Netflix debacle of the late 2011 that earned Reed Hastings quite a few UNWANTED title and every one demanded his resignation from the top post.  Not so fast, Mr. Bear!  Reed Hastings must be smiling!  After a stellar performance this year including carefully released stats on viewership, streaming hours as well as a solid Q4'11 earnings, Netflix is back and most importantly viewers are back!

Well, is is not coincidental that the sentiment for Netflix is also improving, 68% of the tweets now have positive sentiment.  See the table below:


Total  Positive Negative Average Total Sentiment
Tweets
 Fetched
Tweets Tweets Score Tweets
499 171 80 0.281 251 68%



*Make sure you understand and interpret this analysis correctly. This analysis is not based on NLP. 

I updated the sentiment analysis that I did last year, http://goo.gl/fkfPy ,  (I was then just beginning to play with Twitter and Text Mining packages in R) and used advanced packages like "TM" and  "WordCloud".  The new analysis is based on more than 6,800 words which are most commonly prescribed in various sentiment analysis blogs/books. (Check out Hu and Liu http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)

I came across this excellent blog by Jeffrey Bean, @JeffreyBean, (http://goo.gl/RPkFX) and his tutorial. Thank you Mr. Bean!  Please follow the instructions from Bean's slides and the R code listed there as well as the R code here:

Here is the updated R code snippets -
#Populate the list of sentiment words from Hu and Liu (http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)

huliu.pwords <- scan('opinion-lexicon/positive-words.txt', what='character', comment.char=';')
huliu.nwords <- scan('opinion-lexicon/negative-words.txt', what='character', comment.char=';')

# Add some words
huliu.nwords <- c(huliu.nwords,'wtf','wait','waiting','epicfail', 'crash', 'bug', 'bugy', 'bugs', 'slow', 'lie')
#Remove some words
huliu.nwords <- huliu.nwords[!huliu.nwords=='sap']
huliu.nwords <- huliu.nwords[!huliu.nwords=='cloud']
#which('sap' %in% huliu.nwords)

twitterTag <- "@Netflix"
# Get 1500 tweets - an individual is only allowed to get 1500 tweets
 tweets <- searchTwitter(tag, n=1500)
  tweets.text <- laply(tweets,function(t)t$getText())
  sentimentScoreDF <- getSentimentScore(tweets.text)
  sentimentScoreDF$TwitterTag <- twitterTag




# Get rid of tweets that have zero score and seperate +ve from -ve tweets
sentimentScoreDF$posTweets <- as.numeric(sentimentScoreDF$SentimentScore >=1)
sentimentScoreDF$negTweets <- as.numeric(sentimentScoreDF$SentimentScore <=-1)

#Summarize finidings
summaryDF <- ddply(sentimentScoreDF,"TwitterTag", summarise, 
                 TotalTweetsFetched=length(SentimentScore),
                 PositiveTweets=sum(posTweets), NegativeTweets=sum(negTweets), 
                 AverageScore=round(mean(SentimentScore),3))

summaryDF$TotalTweets <- summaryDF$PositiveTweets + summaryDF$NegativeTweets

#Get Sentiment Score
summaryDF$Sentiment  <- round(summaryDF$PositiveTweets/summaryDF$TotalTweets, 2)




Saving the best for the last, here is a word cloud (also called tag cloud) for Netflix built in R-

I will be putting the R code up here for building a word cloud after scrubbing it.

Happy Analyzing!

Wednesday, December 14, 2011

Closing the loop on Pervasive Location Analytics - an enlightening personal journey for sure!

When I started working on Google Maps deal at SAP in February of this year, I had no clue where it will end and what is next once the deal is done. I fell in love with this Location Analytics/Geo Data Visualization topic, and turned it into an opportunity to discuss this topic and also generate excitement in various different camps along the way.
Five sessions spread across three continents, 200+ attendees,1000 views and numerous downloads later, this topic became more than just a personal interest. I met great people along the way and worked with very smart and driven people to co-present from the likes of Ryan from Centigon Solutions, Nimish from FreshDirect and Brendan from ThinkSmart Technologies. (See links to slides and session evaluation below) 
A proud moment arrived this morning when an alert from SlideShare popped up indicating that this topic is hot on Facebook and as a result this topic is being put on SlideShare home page. Wow!


Pervasive Location Analytics: The Next Frontier to Fall in The Enterprise Software?

Session Evaluations Results

Thank you - my next two blogs will be presenting my thoughts on Moblie Analytics and Agile BI - two topics I have spent significant amount of time from strategy, market, customer, competition and product point of view.

Thursday, December 8, 2011

SuccessFactors - An amazing tech story through its financials and a solid grab by SAP!

I will take a slight detour from Analytics and talk about SAP's acquisition of  #1 cloud company SuccessFactors (SFSF). Announcement

The combination of SAP & SFSF will produce a cloud powerhouse in the cloud segment of the enterprise software market  and that is just starting to take off…

Strong business rationale:
·         Gartner - HCM to be a $10B by 2015, Talent Management alone will be a $4.5B with 75% of it coming from cloud based apps
·          SFSF is:
o    #1 HCM solution in the cloud
o   has 15m users from company of all sizes (CRM has only 3m users) in diverse 60 industries from across the globe (Example: Siemens has 450K seats)
o    60% recurring revenues from existing customers
o    90% of the growth is organic as oppose to Salesforce
o   Has just 14% overlap with SAP customers – a tremendous upside for both companies (with total addressable market of 500m employees of all SAP customers)

·         For SAP, SFSF will be a top-line acquisition with less emphasis on cost-synergies…
·         Deal will be slightly dilutive on EPS in 2012 but will be accretive in 2013 with significant upside to our revenues in 2013

Financials:
  • SAP paid $3.4 B to acquire SFSF which is not profitable yet.
  • SAP is paying ~10x for 2011 revenues, a multiple HP paid for Autonomy
  • For SFSF, street expects $332M in 2011 revenues; SFSF had $230M YTD revenues for the first nine months with $91M coming in Q3’11
  •   As of Sep, 2011, SAP had $5.2B in cash. The SFSF deal is all cash with $2B coming off SAP's own war chest and ~$1.4B of debt. 
 Taleo with 2011 expected revenues of $324M is barely profitable. Workday is on track to $320 million in billings in 2011, and is nearing profitability. Workday is preparing for an IPO.


Now let us talk about SFSF’s amazing growth over the past 9 years:

SFSF – a company which delivered a PERFECT hockey stick growth since 2002:



A revenue growth story that is enviable:

Operating structure has shown substantive improvement over the past 5 years:


Net net for SAP, a solid acquisition and timing couldn’t have been right. The ride has just begun…

Source: Company Financials and Analyst Calls

Thursday, July 28, 2011

SAP and Google Maps - Putting "Where" in the "What-When-Where" equation!

Organizations are looking for that x-factor to get competitive advantage. Could geo/location-enabled solutions offer the promise to deliver that x-factor?  I think so.  

Lets use couple of examples to understand this deal - where should Chipotle open its next franchise or where should BP drill its next well...  Geo-enabled solutions could help answer those questions. Yes, this is already happening and some of these companies have very sophisticated software to do this. But, these software solutions should be available for the masses - 

Here is my business explanation for this deal- 

  •         A large part of the world’s enterprise data resides in the SAP systems;
  •         And according to some estimates more than 80% of that enterprise data has a space dimension to it; 
  •     Increasing amount of organizations demand geo-spatial lenses to engage with the space dimension of their data; 
  •     Hence, this collaboration between the #1 enterprise software company and the #1 consumer Internet company to bring location-aware solutions to the market.


Go SAP-Google Maps!

Thursday, June 23, 2011

Eight Big Data, BI, Cloud Related Trends from Accenture 2011 Technology Vision

  • Data takes its rightful place as a platform.
  • Analytics is driving a discontinuous evolution from business intelligence. (This is most controversial...)
  • Cloud computing will create more value higher up the stack.
  • Architecture will shift from server-centric to service-centric.
  • IT security will respond rapidly, progressively—and in proportion.
  • Data privacy will adopt a risk-based approach.
  • Social platforms will emerge as a new source of business intelligence.
  • User experience is what matters.

Wednesday, May 25, 2011

Does your company have a BI implementation plan? Consider this statistics:

  • According to market research firm IDC, annual data generation will reach 35 zettabytes or about 35 million petabytes by 2020. 
  • That is enough data to fill a stack of DVDs reaching halfway to Mars, or 17.5 million times the entire collections of all the academic libraries in the United States. 
  • As a result, business intelligence has become an eight billion dollar industry and continues to increase each year. Global Industry Analysts released a report in July projecting the BI software market will reach $12.4 billion by 2015.
  •  In May, Forrester came out with a report on the state of the BI industry, finding 49 percent of companies are planning a BI project in 2010 or soon after.
I think that the BI market is expected to be bigger than what IDC is projecting based on my own experience, analysis, research trends and continuous work with BI think-tanks. There are lot of other trends like location analytics and predictive which will make BI even more pervasive in next 2-3 years. 

Now, this ties pretty neatly into my earlier blogs on Big Data and huge amount of innovation happening in this space. Keep blogging... (See my blog entry from February 2010)

A crowded Mobile Analytics (Mobile BI) Competitive Landscape - Is the opportunity really that big?

Quite a few challengers in the market. The following list is a just a first stab at the number of companies looking to capture a piece of the action. The Mobile Analytics market is going to be a big opportunity which also nicely ties into the Big Data story and the Enterprise Mobility trend. More on the size of the opportunity and mobile analytics trends later. For now enjoy this graphics I built using the Dresner study -


Also see the magic quadrant from Gartner on BI. Some overlap between the companies on the two graphics indicating that there are new challengers on the market like LogiXML and Bitam. See my earlier blog on HTML 5 on LogiXML - 


Monday, May 16, 2011

Business Analytics @ SAP Delivers a Chain of Innovations in Last Six Months!


The 10.0 release of EPM complements these tools by helping organizations ensure that central corporate strategy and an understanding of risk guide all decisions and actions. This enables companies to achieve goals with greater speed and fewer resources.