a‎ > ‎


  • Extracting bias at TechCrunch [Internal]

    Extracting bias at TechCrunch [Internal]

  • Facebook seriously needs its own Bitmoji

    Facebook seriously needs its own Bitmoji

  • Dear Silicon Valley: America’s fallen out of love with you

    Dear Silicon Valley: America’s fallen out of love with you

  • Browse more...

big data
  • Investors place $25M on AtScale to get the big picture of big data

    Investors place $25M on AtScale to get the big picture of big data

  • Splunk expands machine learning capabilities across platform

    Splunk expands machine learning capabilities across platform

  • Alteryx Promote puts data science to work across the company

    Alteryx Promote puts data science to work across the company

  • Browse more...

The Big Data Bottleneck In The Consumer Web

Posted Nov 21, 2011 by Semil Shah (@semil), Columnist
  • 0

Next Story

A Personal Appeal TO Wikipedia Founder Jimmy Wales

Editor’s note: TechCrunch contributor Semil Shah is an entrepreneur interested in digital media, consumer Internet, and social networks. Shah is based in Palo Alto and you can follow him on twitter @semil

Earlier in the year, I wrote an opinion column on TechCrunch that big data “needs to think bigger.” At the time, I kept hearing the term “big data” over and over, and wondered how much of the emerging insights and techniques would be applied toward the Internet versus the larger problems society faces, such as detecting fraud in financial markets, finding new deposits of natural resources, or helping discover the next big pharma drug.

Yet in some of my experiences monitoring the space since then, I’ve come to conclusion for now that my March 2011 column meant well, but that reality is much further behind than we’d like to think. One would assume, for instance, that big drug companies would be aggressive adopting new, external, cutting-edge techniques to analyze their own data for new insights, especially with a dangerous patent cliff looming in 2012. Turns out, oftentimes drug companies aren’t always willing to share data with third parties, which is often necessary to take advantage of big data infrastructure. While I believe that eventually the best data science will emerge to help these industries grow in new ways, for now at least, the best opportunities lie in the one area I wanted to gloss over last time: the consumer and mobile web.

Investors see the wave coming. Over the past few months, the top-tier funds have begun to make their moves. Benchmark Capital brought in Craig Weissman from Salesforce as an EIR and invested in Josh James’ new company, Domo; Accel Partners recently announced the creation of a “Big Data Fund” by reallocating monies from existing funds, which will improve data dealflow; and of course, there’s Greylock Partners, which was one of the earliest investors in this space through numerous companies and, most recently, by recruiting DJ Patil to be their “Data Scientist in Residence.”

Since March, I’ve continued to hear the term “big data” uttered by so many, yet so few seemed to grasp what it means for us and the web (yours truly, included). We all know that the major social networks (like Facebook), broadcast engines (like Twitter), self-expression tools (like Tumblr and Pinterest), and services (like Dropbox) generate ridiculous amounts of data. Add to this the growing Quantified Self movement, where connected devices from companies like Fitbit, Runkeeper, and Jawbone let us track our offline movements and analyze them online.

What happens, then, when the companies holding these big buckets of data go to cash them in?

In the earlier stages of consumer web companies, data can be used to create new products with the hopes of increasing engagement metrics. Then, as a company begins to mature, services can be built using the data that may ideally involve revenue. In these companies today, data-driven engagement products are oftentimes baked into the earliest versions of the products, such as recommendation engines for whom to follow, where to go, or what to watch.

We should not take data as a given, however. To start with, the FTC has been warning technology executives to collect data core to their business only. One might be shocked at just how many well-funded, recognizable startups haven’t been collecting good, structured data, and in some cases, they don’t collect any. For those that do get a handle on their data, they oftentimes do not possess the talent in-house to make sense of it because the skills required to do so are rare.

The consumer web companies that do interesting things with data are the ones you’d expect: Google, Facebook, Amazon, LinkedIn, and Zynga, among a small group of others. Most web startups don’t have access to the right mathematical and statistical backgrounds needed in order to extract value from the data. Some data scientists I’ve talked to will go so far as to say that consumer startups that start to grow fast need a data scientist as part of the core engineering team as soon as possible, because most engineers working in the consumer space don’t have the skills in statistics and/or machine learning required to make sense of the data. (A data scientist is someone sufficiently trained to ask the proper questions of the data in order to tease out insights that serve as the basis for building new products and that, in turn, generate income for the company).

And, herein lies the rub.

What I’m writing isn’t news. Everyone who watches the space knows it. The reality is that this talent is in short supply. To put it in terms we can understand, for every 100 great iPhone engineers, there may be one or two people who can, on their own, dig into consumer web data and discover and build new and engaging services from it.

It’s been my experience that the majority of those who do, in fact, posses these statistical, mathematical, and machine learning skills are currently busy, diligently applying their rare skills in other industries such as finance, life sciences, and the physical sciences. They oftentimes haven’t applied their techniques on data sets culled from the consumer web, nor are they interested in doing so. As a result, there are very, very, very few people like DJ Patil, Pete Skomoroch (of LinkedIn), or Jeff Hammerbacher (of Cloudera) who truly understand these techniques as they relate to the world wide web. Since we can’t clone them, the alternative has been to build data teams consisting of data specialists and pairing them with those that have extensive consumer web data experience.

So, the next time you hear someone talk about “big data” in the context of the consumer web, realize that, yes, valuable data, whether big or small, is being collected by every click we strike. The big companies with resources are keenly aware of the opportunity, but most web startups don’t have data scientists as part of their early teams, and even if they wanted to, those folks are hard to find. Therefore, it’s my opinion that “big data” is a term we’ll hear for a very long time to come. Data generated by the web will produce some of the largest data sets ever known, if they haven’t already, and somewhere within all those billions and billions of likes, retweets, upvotes, reblogs, and repins may reside truths that, yet again, change the way we live. But more data scientists will be needed to unlock them.

Photo Credit / Creative Commons by An&

  • 0

Advertisement Advertisement


  • Facebook

    • Founded 2004
    • Overview Facebook is an online social networking service that allows its users to connect with friends and family as well as make new connections. It provides its users with the ability to create a profile, update information, add images, send friend requests, and accept requests from other users. Its features include status update, photo tagging and sharing, and more. Facebook’s profile structure includes …
    • Location Menlo Park, CA
    • Categories Social Media, Social Network, Social
    • Website http://www.facebook.com
    • Full profile for Facebook
  • Amazon

    • Founded 1994
    • Overview Amazon is an e-commerce retailer formed originally to provide consumers with products in two segments. It offers users with merchandise and content purchased for resale from vendors and those offered by third-party sellers. Operating in North American and International markets, Amazon provides its services through websites such as amazon.com and amazon.ca. It also enables authors, musicians, filmmakers, …
    • Location Seattle, WA
    • Categories Crowdsourcing, E-Commerce, Internet, Delivery, Retail, Software
    • Founders Jeff Bezos
    • Website http://amazon.com
    • Full profile for Amazon
  • Tumblr Inc.

    • Founded 2007
    • Overview Tumblr is a microblogging platform and social networking website allowing users to post multimedia and other content to a short-form blog. Its users also have the ability to follow other users' blogs, as well as make their blogs private. Much of the Tumblr’s features are accessed from the dashboard interface, where the option to post content and posts of followed blogs appear. This media network …
    • Location New York, NY
    • Categories Blogging Platforms, Collaboration, Social Media, Social Network
    • Website http://tumblr.com
    • Full profile for Tumblr Inc.
  • Cloudera

    • Founded 2008
    • Overview Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world's largest and most popular web sites. Founded by leading experts on big data from Facebook, Google, Oracle and Yahoo, Cloudera's mission is to bring the power of Hadoop, MapReduce, and distributed storage to companies of all sizes in the enterprise, …
    • Location Palo Alto, CA
    • Categories Search Engine, Analytics, Big Data, Enterprise Software
    • Website http://www.cloudera.com
    • Full profile for Cloudera
  • LinkedIn

    • Founded 2003
    • Overview LinkedIn is a professional networking site that allows its members to create business connections, search for jobs, and find potential clients. The site also enables its users to build and engage with their professional networks; access shared knowledge and insights; and find business opportunities. It offers LinkedIn mobile applications across various platforms and languages such as iOS, Android, …
    • Location Mountain View, CA
    • Categories Social Media, Social Recruiting, Professional Networking, Professional Services
    • Website http://linkedin.com
    • Full profile for LinkedIn
  • Google

    • Founded 1998
    • Overview Google is a multinational corporation that is specialized in internet-related services and products. The company’s product portfolio includes Google Search, which provides users with access to information online; Knowledge Graph that allows users to search for things, people, or places as well as builds systems recognizing speech and understanding natural language; Google Now, which provides information …
    • Location Mountain View, CA
    • Categories Search Engine, Blogging Platforms, Ad Network, Collaboration, Email, Video Streaming, Enterprise Software, Information Technology
    • Website http://www.google.com/
    • Full profile for Google
  • Zynga

    • Founded 2007
    • Overview Zynga develops, markets, and operates social games as live services played on the Internet, social networking sites, and mobile platforms in the United States and internationally. It offers its online social games primarily under the Slots, Words With Friends, Zynga Poker, and FarmVille franchises. The company’s games are accessible on mobile platforms, Facebook, and other social networks, as well …
    • Location San Francisco, CA
    • Categories PC Games, Gaming, Mobile
    • Founders Justin Waldron, Mark Pincus
    • Website http://www.zynga.com
    • Full profile for Zynga
    • Full profile for

Newsletter Subscriptions

Latest Crunch Report

  • Steve Wozniak Launches Education Platform | Crunch Report

    Steve Wozniak Launches Education Platform | Crunch Report

Watch More Episodes
  • big data
  • Opinion
  • Popular Posts


    Subpages (5): 7 f g l t