3‎ > ‎

3

LinkedIn's 'answer' to big data problems: Pinot

Summary: Already being rolled out for internal product management teams, LinkedIn is also planning to eventually open source Pinot to the public.

Rachel King

By Rachel King for Between the Lines | September 29, 2014 -- 18:00 GMT (11:00 PDT)

Follow @rachelking SHARE: SUBSCRIBE TO: Big Data TOPICS: Big Data, Apps, Data Management, Social Enterprise, Web development 1

By Rachel King for Between the Lines | September 29, 2014 -- 18:00 GMT (11:00 PDT)

zdnet-linkedin-pinot

Big data presents a huge opportunity in the tech community and is routinely as touted as much. But not so many are keen to admit the pitfalls and problems in harnessing that power.

More about LinkedIn

  • LinkedIn launches new one-stop shop for SMBs
  • LinkedIn's latest digital publishing play taps into entire user base
  • LinkedIn launches standalone mobile recruiting app
  • LinkedIn touts more data potential for 'Who's Viewed Your Profile'
  • LinkedIn execs downplay Intro security controversy
  • LinkedIn's search updates bring it closer to Facebook, Google+
  • LinkedIn hunts for talent in Brazil
  • Facebook, LinkedIn upgrade mobile experiences for developers, end users
  • LinkedIn's latest revamp directed at employers via Recruiter tool
  • LinkedIn unveils Chinese site, to adhere to local laws

LinkedIn is opening up about its own big data challenges through the unveiling of its new analytics engine.

Dubbed Pinot, the web-scale real-time analytics engine was designed for monitoring, managing, and utilizing massive quantities of big data generated by multiple products across LinkedIn's budding empire of professional social and digital publishing products.

The roots for Pinot started to sprout roughly two years ago as LinkedIn found itself running up against of wall of data-driven roadblocks. Once work on Pinot got started, it took platform builders roughly eight months before it could actually be consumed for internal product use.

Before cultivating Pinot in-house, LinkedIn's engineering team said it was using a cocktail of different generic storage systems from the likes of Oracle and distributed key-value storage system Project Voldemort.

LinkedIn engineer Praveen Neppalli Naga explained in a blog post that these weren't meeting the rapidly growing flood of big data being produced by a social network of more than 300 million members worldwide and counting.

Naga declared, "Pinot was born as an answer to our problems."

LinkedIn data has a lot of depth and each dimension requires special treatment. We needed to build custom compression techniques to fit every dimension, in order to get optimal scan speed tradeoff vs. memory consumed. For example, each one of our members can have hundreds of skills and representing them per event is difficult. Similarly, groups that members belong to and companies they follow are some of the dimensions difficult to represent per event. We built Pinot with this difficult to index data in mind, but will save the details of the compression techniques for future posts.

Pinot now stands as the flagship data infrastructure for products such as "Who's Viewed Your Profile" and others that demand frequent and instant complex queries.

Currently available for internal product management teams for crunching analytics on ads reporting and paid premium products such as company profile follows, LinkedIn is also planning to eventually open source Pinot to the public.

Hints and notes of open source can already be found in Pinot. Naga highlighted Pinot supports the Hadoop pipeline for bootstrapping and reconciliation as well as real-time data indexing from Kafka and Hadoop.

Image via LinkedIn

Topics: Big Data, Apps, Data Management, Social Enterprise, Web development

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

#auto

Subpages (4): 4 c m u
Comments