b‎ > ‎n‎ > ‎

h

Diversity
  • Tyra Banks on startup investing and her new TV show

    Tyra Banks on startup investing and her new TV show

  • Edie Windsor coding scholarship selects 40 LGBTQ women to learn how to code

    Edie Windsor coding scholarship selects 40 LGBTQ women to learn how to code

  • YouTube puts $1M behind Creators for Change

    YouTube puts $1M behind Creators for Change

  • Browse more...

Yahoo
  • Senator calls for SEC investigation into Yahoo breach

    Senator calls for SEC investigation into Yahoo breach

  • Weekly Roundup: Apple’s auto rumors, GoPro’s new devices and CZI’s $3B pledge

    Weekly Roundup: Apple’s auto rumors, GoPro’s new devices and CZI’s $3B pledge

  • Yahoo confirms state-sponsored attacker stole personal data of “at least” 500 million users

    Yahoo confirms state-sponsored attacker stole personal data of “at least” 500 million users

  • Browse more...

Hadoop
  • IBM releases DataWorks to give enterprise data a home and a brain

    IBM releases DataWorks to give enterprise data a home and a brain

  • Latest Amazon Elastic MapReduce release supports 16 Hadoop projects

    Latest Amazon Elastic MapReduce release supports 16 Hadoop projects

  • Spark fragmentation undermines community

    Spark fragmentation undermines community

  • Browse more...

Yahoo’s Open Source Omid Project Brings Scalable Transaction Processing To HBase

Posted Oct 1, 2015 by Frederic Lardinois (@fredericl)
  • 0

    SHARES
Next Story

Tweetbot 4.0 Becomes The Best Alternative Twitter Client For iPhone And Now iPad

A while back, Yahoo quietly made the code to Omid, an open source transaction processing system for the Apache HBase Hadoop big data store, available on GitHub. This is the same software the company uses internally to help it power thousands of search transactions per second.

Until now, Yahoo remained rather subdued about this project, but with the latest update, launching today, it feels the service is now robust enough for wider deployment and has proven its ability to scale. It’s also 10 times faster than the first version the company released to the public.

Yahoo’s director of engineering Ralph Rabbat and senior director of product management Sumeet Singh told me earlier this week that the company hopes that other platforms in the Hadoop and HBase ecosystem will adopt Omid.

Indeed, Yahoo hopes Omid will follow a trajectory similar to Hadoop. Hadoop, after all, began at Yahoo, and the company is one of its largest users and has remained very active in the open source efforts around it. Rabbat and Singh hope that Omid will eventually become an official Apache project, just like HBase. In an effort to reach out to the open-source community, the company plans to publish a series of blog posts about deploying and using Omid over the next few weeks.

By default, HBase does not conform to the ACID (Atomicity, Consistency, Isolation, Durability) principles of database design. Omid aims to ensure that applications can perform read and write operations on HBase with ACID properties by extending the HBase key-value API with transaction semantics.

As Rabbat told me, the company looked at the gap between traditional relational databases (which don’t scale all that well) and NoSQL databases (which typically don’t have transaction support). What was missing for Yahoo were transactions that allowed for HBase to process small updates individually. Google solved this with Percolator, but that’s still a proprietary system. Omid then, is in a way an open-source implementation of Google Percolator.

Internally, Yahoo uses Omid on top of its Sieve content management system, which drives — among other things — its Search platform. That’s essentially a multi-petabyte HBase store that stores billions of documents. There, Omid helps power tens of thousands of transactions per second.

Rabbat and Singh believe Omid could be really useful in other applications, too, though. Apache Phoenix — which essentially implements SQL on top of HBase — could use it as its transaction management component, for example. Any HBase system that needs to support incremental real-time processing, though, could use Omid, too. As Singh also noted, those don’t have to be web-scale implementations, either. Omid can work just as well at a smaller scale.

For Yahoo, the main benefit of open sourcing a project like Omid is that many of the community’s improvements will directly help it improve its own service. That’s something that held true for Hadoop, and the company hopes to replicate this success with projects like Omid.

“The value we got out of Hadoop has only increased since we open sourced it,” Singh said. Rabbat added that open source is also an increasingly important recruiting tool for the company and that, as the company stepped up its acquisition efforts since Marissa Mayer took the reins of the company, integrating the technologies of other companies has often been relatively easy because those companies already used Hadoop, too.

The Omid code is now available on GitHub.

Featured Image: Scott Schiller/Flickr UNDER A CC BY 2.0 LICENSE
  • 0

    SHARES
Advertisement Advertisement

CrunchBase

  • Yahoo!

    • Founded 1994
    • Overview Yahoo is the world's largest start-up, which means that we move fast and always let our users lead the way. Founded in 1994 by two Stanford PhD candidates, we've grown into a company that helps you find what you're looking for on any Internet-connected device. Our employees are rewarded for curiosity and we celebrate explorers, relying on our passionate and transformative talent to do what's right …
    • Location Sunnyvale, CA
    • Categories Search Engine, Internet, SEO, Email, Advertising, Financial Services, Finance
    • Founders David Filo
    • Website http://www.yahoo.com
    • Full profile for Yahoo!
  • Hadoop

    • Founded 2009
    • Overview The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. …
    • Categories Software, Open Source
    • Full profile for Hadoop

Newsletter Subscriptions

Latest Crunch Report

  • Facebook's futuristic frozen facility | Crunch Report

    Facebook's futuristic frozen facility | Crunch Report

Watch More Episodes
  • Hadoop
  • Yahoo
  • Diversity
  • Popular Posts

    #auto

    Subpages (2): l p
    Comments