2‎ > ‎i‎ > ‎

3

Developer
  • Virtru’s new API brings encryption tech built by ex-NSA engineer to third-party developers

    Virtru’s new API brings encryption tech built by ex-NSA engineer to third-party developers

  • Microsoft advances several of its hosted artificial intelligence algorithms

    Microsoft advances several of its hosted artificial intelligence algorithms

  • Bugcrowd bug bounty platform gets big boost with $26 million Series C investment

    Bugcrowd bug bounty platform gets big boost with $26 million Series C investment

  • Browse more...

LinkedIn
  • Facebook rolls out job posts to become the blue-collar LinkedIn

    Facebook rolls out job posts to become the blue-collar LinkedIn

  • LinkedIn rolls out its Career Advice mentoring program to US, UK and India

    LinkedIn rolls out its Career Advice mentoring program to US, UK and India

  • LinkedIn and Microsoft team up for a resume building assistant in Word

    LinkedIn and Microsoft team up for a resume building assistant in Word

  • Browse more...

Hadoop
  • Investors place $25M on AtScale to get the big picture of big data

    Investors place $25M on AtScale to get the big picture of big data

  • Databricks releases serverless platform for Apache Spark along with new library supporting deep learning

    Databricks releases serverless platform for Apache Spark along with new library supporting deep learning

  • Cloudera’s IPO will test unicorn valuations

    Cloudera’s IPO will test unicorn valuations

  • Browse more...

big data
  • Google Flights will now predict airline delays – before the airlines do

    Google Flights will now predict airline delays – before the airlines do

  • Unravel Data raises $15M Series B for its big data performance monitoring platform

    Unravel Data raises $15M Series B for its big data performance monitoring platform

  • Clairvoyant launches Kogni to help companies track their most sensitive data

    Clairvoyant launches Kogni to help companies track their most sensitive data

  • Browse more...

data management
  • AWS adds Global Tables feature to share data across multiple geographies

    AWS adds Global Tables feature to share data across multiple geographies

  • Dharma hopes to solve health data collection for NGOs of the world

    Dharma hopes to solve health data collection for NGOs of the world

  • Facebook to open source LogDevice for storing logs from distributed data centers

    Facebook to open source LogDevice for storing logs from distributed data centers

  • Browse more...

LinkedIn open-sources its WhereHows data discovery and lineage portal

Posted Mar 3, 2016 by Frederic Lardinois (@fredericl)
  • 0

    SHARES
Next Story

Evertoon wants you to build funny 3D cartoon videos on your phone

LinkedIn today open-sourced WhereHows, a meta data-centric tool the company has long used internally to make it easier for its employees to discover data the company generates and to track the lineage of its datasets as they move around its various internal tools and services.

Now that almost every modern business creates massive amounts of data, simply managing how all this information flows across an organization becomes virtually impossible. Sure, you can store it in a data warehouse, but at the end of the day, you end up with a large number of datasets that are very similar, or different versions of an original dataset, or information that has been transformed so it can be used by different tools. The exact same data also often ends up in multiple systems, just with different names or maybe version numbers. In the end, how do you know which dataset you should work with when you are building a new product (or maybe just an executive report)?

2016-03-03_0839

This, LinkedIn’s Shirshanka Das and Eric Sun told me, was the problem the company was facing. So the team developed WhereHows, which functions as a central repository and web-based portal for keeping track of what happens to data in a large company like LinkedIn, or even a smaller one that has to deal with lots of heterogeneous data. At LinkedIn, WhereHows currently stores data about the status of 50,000 datasets, 14,000 comments and 35 million job executions. The company says all of this data relates to information that covers about a 15 petabyte footprint.

LinkedIn is a big Hadoop user, but the tool can also track data from other systems (think Oracle databases, Informatica, etc.).

WhereHows gives developers access to both an API and a web interface that allows employees to visualize the lineage of a dataset, annotate it and more.

As Das and Sun noted, LinkedIn has a long history of open sourcing products that aren’t part of its core competency. The idea here is to encourage conversation; as the large big-data ecosystem adopts this and similar tools, the company eventually benefits from this, as well. Similar to a lot of other companies I talk to, LinkedIn also notes that open source helps it elevate its engineering brand, which in turn makes recruiting easier.

2016-03-03_0844

Featured Image: Nan Palmero/Flickr UNDER A CC BY 2.0 LICENSE
  • 0

    SHARES
Advertisement Advertisement

Crunchbase

  • LinkedIn

    • Founded 2003
    • Overview LinkedIn is a professional networking site that allows its members to create business connections, search for jobs, and find potential clients. The site also enables its users to build and engage with their professional networks; access shared knowledge and insights; and find business opportunities. It offers LinkedIn mobile applications across various platforms and languages such as iOS, Android, …
    • Location Mountain View, California
    • Categories Social Media, Social Recruiting, Professional Networking, Professional Services
    • Website http://linkedin.com
    • Full profile for LinkedIn

Newsletter Subscriptions

Latest Crunch Report

  • The Last Episode Of | Crunch Report

    The Last Episode Of | Crunch Report

Watch More Episodes
  • Hadoop
  • big data
  • LinkedIn
  • data management
  • Developer
  • Popular Posts

    #auto

    Subpages (1): 9
    Comments