Monday, June 17, 2013

Next Waves of Big Data: Version 2.0 and Beyond



What we have seen thus far in the world of big data is version 1.0.   The key components are:

  • Distributed data storage and computational ecosystem:  We have the infrastructure and tools for processing  almost infinite data such as Hadoop/MapReduce, and variations thereof such as Dryad, NoSQL databases (HBase, Cassandra, AsterixDB, MongoDB, CouchBase to name a few -  do we really need so many open source NoSQL variants?) 
  • Integration of structured, semi-structured and unstructured data:   We see the beginning of investments in integrating “internal” data such as operations and supply chain, finance, sales, marketing, and customer service across the organization.
  • Cloud computing and easier deployment/monitoring of platforms:  Cloud-based X-as-a-service where X includes platform, infrastructure, database, and analytics (still nascent). Reducing IT costs is a powerful driver for firms, but the value from cost reductions is short-lived. 
  • Offline/batch processing:  Although there are few real-time applications such as RTBs in advertising, fraud detection, internet security, personalization and recommendation systems, big data and analytics are primarily linked to trend analyses and “slow” decision-making (daily, weekly, monthly, quarterly) environments.  
  • Data Scientists and Engineers:  The users of the frameworks are primarily tech-savvy software engineers, computer scientists, mathematicians, and statisticians.

In version 2.0, over the next 1-2 years, we will see many companies in the ecosystem help businesses explore big data (primarily analyzed via offline) and start the broader use of real-time analytics and optimization.

  • Data visualization and search tools for data discovery:  Companies such as Panopticon, Datameer, KISSmetrics, Domo, Cloudera and many others provide simple visualization and data discovery tools to ask questions through SQL-like queries and search paradigms.
  • Data integration facilitators:   Startups such as DataTamer, Data Gravity, Cloudant, and many others will ease the pain of continuously integrating data from disparate sources - both internal and external. 
  • Development of a commercial ecosystem around real-time analytics platforms:  New real-time computational frameworks such as Spark, Storm, Graphlab, and others form the next wave of real-time, distributed, in-memory analytics in multicore and multi-GPU clusters. 
  • Data-driven business analyst:   Business analysts start using data visualization and search tools to develop hypotheses of the business and markets and ask pointed questions at the data.

In the next 3 to 5 years, we expect version 3.0.  Key changes:

  • Commoditization of the platform and the cloud providers:  Consolidation of many players in the Hadoop-like ecosystem into few “end-to-end” platform providers:  Hadoop and another framework for batch analytics, and couple of players for real-time analytics.    
  • Shift in value to big data applications:  Platform market becomes a commodity and the value shifts to innovative business applications such as drug discovery, operational efficiency improvements, marketing and sales automation, etc,.   This shift in value is similar to the substantial shift in value from wireless operators (with substantial capex to build out the network and improve reliability) to smartphone makers and application developers in the mobile sector.  Real winners in big data will come from creative application-focused firms where businesses realize measurable benefits from big data investments - leveraged through identification of patterns and strategic anticipation, optimization of business operations, predictive analytics, and what-if simulations.
Where do you think the big data ecosystem is heading?

3 comments:

  1. check out infochimps, CEO jim kaskade is blazing some 2.0 trails

    ReplyDelete
  2. check out
    http://www.iknowtion.com/blog/Where_Marketing_Meets_Big_Data

    ReplyDelete