Sunday, June 30, 2013

Big Data Quote of the day: 06/30/2013

A cautionary note on Big Data in Big Data News: Mad Men Taking Over.


"Today’s meme is big data and we are sold an image of omnipotent data and algorithms, where numbers are more important than meaning, a mix of truth and fiction conjured to advance the story-tellers’ interests. The mix is so powerful, the data so intoxicating, that the data sellers can’t see the implications of their bragging and the potential consequences to society."






Thursday, June 27, 2013

Big Data Quote of the day: 06/27/2013

Krebbers (VP of architecture at Shell) said the best benefits of big data were realised by using a mixture of technologies (quoted in Data quality more important than fixating over big data)

"A mixture of more expensive in-memory, for speed; SQL, for a more economical option; and Hadoop, as a means of storing data you don’t know what to do with"

                                                   **************************

Even large, well-oiled organizations such as Shell are in a data-grab mode in terms of storing "all" data in Hadoop, without a well-thought through, business-driven data-capture and decisioning approach.

Wednesday, June 26, 2013

Big Data Quote of the Day: 06/26/2013

A quote from Splunk, which just launched Hunk - a new tool in the Hadoop ecosystem.  This quote captures the essence of the "Big Divide" in Big Data:


Hadoop is becoming the OS for where you store data,” Clint Sharp, Splunk senior product manager for big data. “I talk to a lot of customers who are not struggling to put data into Hadoop, but they are struggling to get value out of it.

                                                 *********************

Also, Hunk claims a voodoo feature: There is no requirement to “understand” data upfront, simply point Hunk at the Hadoop cluster and start exploring data immediately. 

If some of the so-called innovators in Big Data highlight such features, no wonder numerous Hadoop deployments fail, and Big Data gets a bad wrap. It is critical to remember that Big data are just data, despite all the hype.  Data are critically valuable to business decision-making, but understanding what it is, where it comes from, its inherent uncertainty and biases, how it could influence and impact decisions, and how it could augment our brain's ability shouldn't be ignored.

Tuesday, June 25, 2013

Big Data Quote of the day: 06/25/2013

A nice metaphor for Big Data in an InformationWeek article NoSQL Vs. Hadoop: Big Data Spotlight At E2

"In contrast to NoSQL, Hadoop seems to be getting all the credit it deserves and then some. By many accounts, it's the be-all and end-all of big data, despite the fact that the lion's share of deployments today are little more than digital landfills."

                                             *************************

It is high time we generate some value from digital landfills, starting with landfill gas.

Monday, June 24, 2013

Big Data Quote of the day: 06/24/2013



When the information is not really used, we call it dark data because it’s not being used for predictive analysis or making business decisions,” Krishnan said.

********************

Why do we need to augment data-related vocabulary, when there is so much confusion even what big data really means and can do?

Saturday, June 22, 2013

Big Data Quote of the day: 06/22/2013

A simple practical quote on generating value from data in Big data brings new vigor to health research


"Having somebody who knows how to do data and run a query is great, but until it gets to the person who understands what's going on and what to do with that information, it doesn't really matter," Collins said.


                                                     *******************

Realizing business value by taking actions from insights trump our ability to generate the insights in the first place.


Comical aspect of the quote:  What does "do" data really mean?

Friday, June 21, 2013

Big Data Quote of the day: 06/21/2013

Machines/sensors and Big Data

Quote in Big Data Will Drive the Industrial Internet

"A six hours flight from New York to LA on a twin-engine aircraft produces a 240-terabyte mountain of data," writes Paul Mathai, applied research lead, Manufacturing & Hi-Tech, at Wipro. "The data can be analyzed to reveal every aspect of the engine's performance and health."


                                        ***********************

These data add up very quickly if you think about all the flights.   Are all these mountains of data valuable?

Thursday, June 20, 2013

Big Data Quote of the day: 06/20/2013

SpaceCurve CEO Slitz has an interesting perspective on public cloud:

SpaceCurve’s platform is designed to run on computer clusters with at least 1,000 nodes, and not in a public cloud because it takes too long to move datasets this large. “It’s like draining Lake Union into the Pacific with a garden hose,” Slitz says.

Comically simple bandwidth bottlenecks continue to be a hassle in big data analytics. 

Tuesday, June 18, 2013

Big Data Quote of the day: 06/18/2013

Just read a nice quote in a press release from Beyond Analysis, a consulting firm:

“Analytical insights should be presented in a simple and striking fashion that will be meaningful to people across the organisation, enabling the kind of informed strategic decisions that will help retailers to thrive. To harness its true power, Big Data needs to be delivered in small bites and simple steps in order to produce truly powerful results.”


                             **********************************
Definitely concur that key business decision-makers, aka humans, can consume, digest, and internalize insights in tiny portions to help the development and/or refinement of business strategies, while machines consume big data and can support real-time granular decision-making in operations.

  

Monday, June 17, 2013

Big Data Quote of the day: 06/17/2013

Read in today's Boston Globe article Big-data crunching hits the fast lane in Holyoke a hilarious quote:

"The fastest way to get a large puddle of data from New York to LA is called the sneakernet,” Brown said. “You get a graduate student, you buy him a bus ticket, and you send it that way


All this talk of changing the world with big data and analytics and we transfer 1TB data (forget petabytes and zettabytes of big data) using stone-age methods. 

Next Waves of Big Data: Version 2.0 and Beyond



What we have seen thus far in the world of big data is version 1.0.   The key components are:

  • Distributed data storage and computational ecosystem:  We have the infrastructure and tools for processing  almost infinite data such as Hadoop/MapReduce, and variations thereof such as Dryad, NoSQL databases (HBase, Cassandra, AsterixDB, MongoDB, CouchBase to name a few -  do we really need so many open source NoSQL variants?) 
  • Integration of structured, semi-structured and unstructured data:   We see the beginning of investments in integrating “internal” data such as operations and supply chain, finance, sales, marketing, and customer service across the organization.
  • Cloud computing and easier deployment/monitoring of platforms:  Cloud-based X-as-a-service where X includes platform, infrastructure, database, and analytics (still nascent). Reducing IT costs is a powerful driver for firms, but the value from cost reductions is short-lived. 
  • Offline/batch processing:  Although there are few real-time applications such as RTBs in advertising, fraud detection, internet security, personalization and recommendation systems, big data and analytics are primarily linked to trend analyses and “slow” decision-making (daily, weekly, monthly, quarterly) environments.  
  • Data Scientists and Engineers:  The users of the frameworks are primarily tech-savvy software engineers, computer scientists, mathematicians, and statisticians.

In version 2.0, over the next 1-2 years, we will see many companies in the ecosystem help businesses explore big data (primarily analyzed via offline) and start the broader use of real-time analytics and optimization.

  • Data visualization and search tools for data discovery:  Companies such as Panopticon, Datameer, KISSmetrics, Domo, Cloudera and many others provide simple visualization and data discovery tools to ask questions through SQL-like queries and search paradigms.
  • Data integration facilitators:   Startups such as DataTamer, Data Gravity, Cloudant, and many others will ease the pain of continuously integrating data from disparate sources - both internal and external. 
  • Development of a commercial ecosystem around real-time analytics platforms:  New real-time computational frameworks such as Spark, Storm, Graphlab, and others form the next wave of real-time, distributed, in-memory analytics in multicore and multi-GPU clusters. 
  • Data-driven business analyst:   Business analysts start using data visualization and search tools to develop hypotheses of the business and markets and ask pointed questions at the data.

In the next 3 to 5 years, we expect version 3.0.  Key changes:

  • Commoditization of the platform and the cloud providers:  Consolidation of many players in the Hadoop-like ecosystem into few “end-to-end” platform providers:  Hadoop and another framework for batch analytics, and couple of players for real-time analytics.    
  • Shift in value to big data applications:  Platform market becomes a commodity and the value shifts to innovative business applications such as drug discovery, operational efficiency improvements, marketing and sales automation, etc,.   This shift in value is similar to the substantial shift in value from wireless operators (with substantial capex to build out the network and improve reliability) to smartphone makers and application developers in the mobile sector.  Real winners in big data will come from creative application-focused firms where businesses realize measurable benefits from big data investments - leveraged through identification of patterns and strategic anticipation, optimization of business operations, predictive analytics, and what-if simulations.
Where do you think the big data ecosystem is heading?

Friday, June 14, 2013

Buzz of Social Media but Value from Email Marketing


Despite all the buzz of social media and big data analytics, and social network analyses tools, a study sponsored by Lyris (a digital marketing solutions provider) and conducted by Economist Intelligence Unit identifies the following top three critical skills needed for marketer’s success:
  1. Ability to use data analysis to extract predictive findings from ‘Big Data’
  2. Understanding of best practices of email delivery
  3. Ability to generate insights about drivers of consumer behavior from multiple data sources

It is surprising that email marketing – a tool in the marketer’s toolkit for 15-20 years – is in the top 3 critical skill gaps.  Why?

  • Email marketing works and small % improvement have an substantial impact.  Revenue per thousand (RPM) for email marketing is 10X to 1000X the RPM for digital ads.  Small % improvement to email performance beats order of magnitude improvements in performance of digital ads.   Recent investments in start-ups such as MovableInk (supporting “live” or real-time personalized emails) substantiate its effectiveness and incremental opportunity hidden in improving email marketing. 
  • Prior relationship matters:  Having a conversation with consumers who have expressed interest in what you have to offer or bought from you in the past is more valuable and effective than looking for a prospect.  Even Facebook with the custom audience ad product permits marketers to leverage house email lists.  You upload your house file of email addresses and Facebook "matches" the email addresses to their internal list of email addresses, and voila you have a custom audience segment on Facebook. 
  • Email marketing is cheap to execute:  Compared to retargeting ads, email marketing is order of magnitude cheaper.  Why look for a consumer with some prior relationship with you somewhere around the web when you can contact them directly via an email?





Sunday, June 9, 2013

Getting Value from Big Data


In the last decade, new concepts such as big data, expertise of so called data scientists, and distributed computational frameworks such as Hadoop/MapReduce have received a lot of share of mind. Further, substantial investments have been made in leveraging large quantities of data with the hope (and prayer) of improving short-term and long-term business performance. But the returns on such investments are either slow to realize or non-existent. Why? Here are some plausible reasons and suggestions:

  1. There is no substitute to creativity and intuition: The hypothesis that collecting and analyzing large amounts of data moving at warp speed uncovers new insights is hyped. Patterns will emerge from such analyses, but the patterns need to be valuable to the business and timely actions need to be taken to change the status quo for realizing value. Human intuition, creativity, and interpretation are critical to convert big data analyses into recognizable patterns and data intelligence.

  2. Executing is more important then just knowing: Identification of a pattern such as positive customer experiences (and quick resolution when there is a negative experience) improve loyalty of your most valuable customers is one thing. But how does the company change its core customer service processes, incentive structures, and leverage real-time data to manage and influence customer dialogues is another matter altogether. It requires ad-hoc and real-time decisioning tools and intelligence on top of big data and changes in human behavior to follow-through on the identified pattern to realize value. It is no different from the limited value realized if insights you have garnered from small data (market research, customer satisfaction surveys and the like) in yesteryears gather dust on the shelves. 

  3. Don't ignore the art of story-telling: Ultimately, success of a business depends on sound strategies, anticipating the future, and impeccable execution. Decision-makers, managers and employees need simple and easily understandable stories emanating from big data and answer 
    “what-ifs” to change behavior and prioritize future investments and actions. 

  4. Small data plus big data: Big data and patterns thereof need to be synthesized with small data for making strategic business decisions. Big data with supporting offline and real-time decision engines have natural strengths in tactical, granular, and operational aspects of running a business, such as recommendations, real-time bids for advertising, etc. But they need to be synthesized into coarser-grained patterns to support business strategy refinements and identification of new opportunities where small data typically have excelled.

  5. Focus on data that (potentially) matter to your business: You don't need to gather and store every bit for eternity and then expect patterns to emerge for business decisions. One needs to prioritize types of bits based on potential value, invest technology and analytical resources in concert with the potential value and business strategy. For example, for managing customers in CRM systems, we don't need a 360-degree view of a customer but just the right view to help make smarter customer-level investment decisions.