c‎ > ‎

m

Developer
  • Digital Ocean launches Spaces, its object storage service

    Digital Ocean launches Spaces, its object storage service

  • Pluralsight IQ allows engineers to compare skills and proficiencies

    Pluralsight IQ allows engineers to compare skills and proficiencies

  • Stack Overflow’s developer salary calculator tells you how much money you should be making

    Stack Overflow’s developer salary calculator tells you how much money you should be making

  • Browse more...

LinkedIn
  • LinkedIn raises its ad tech game, launches Audience Network across ‘tens of thousands’ of sites and apps

    LinkedIn raises its ad tech game, launches Audience Network across ‘tens of thousands’ of sites and apps

  • Tech industry and comp-sci majors are highest earners, says LinkedIn job survey

    Tech industry and comp-sci majors are highest earners, says LinkedIn job survey

  • Crunch Report | First Day YC S17 Wraps Up

    Crunch Report | First Day YC S17 Wraps Up

  • Browse more...

Load Balancing open source
  • Kubernetes gains momentum as big-name vendors flock to Cloud Native Computing Foundation

    Kubernetes gains momentum as big-name vendors flock to Cloud Native Computing Foundation

  • Minio scores $20 million Series A to build a neutral object storage layer

    Minio scores $20 million Series A to build a neutral object storage layer

  • Keybase launches fully encrypted Slack-like communications tool — and it’s free

    Keybase launches fully encrypted Slack-like communications tool — and it’s free

  • Browse more...

Apache Kafka
  • Confluent achieves Holy Grail of ‘exactly once’ delivery on Kafka messaging service

    Confluent achieves Holy Grail of ‘exactly once’ delivery on Kafka messaging service

  • Confluent raises $50M to continue growing commercial arm of Apache Kafka

    Confluent raises $50M to continue growing commercial arm of Apache Kafka

  • Spark fragmentation undermines community

    Spark fragmentation undermines community

  • Browse more...

LinkedIn announces open source tool to keep Kafka clusters running

Posted Aug 28, 2017 by Ron Miller (@ron_miller)
  • 0

    SHARES
Next Story

Geronimo imagines a better outlook for the inbox

Today at The Kafka Summit in San Francisco, LinkedIn announced a new load balancing tool called Cruise Control, which has been developed to help keep Kafka clusters up and running.

The company developed Kafka, an open source message streaming tool to help make it easier to move massive amounts of data around a network from application to application. It has become so essential today that LinkedIn has dedicated 1800 servers moving over 2 trillion transactions per day through Kafka, Jiangjie Qin, lead software engineer on the Cruise Control project told TechCrunch.

With that kind of volume, keeping the Kafka clusters running has become mission-critical, so earlier this year the team decided to create a tool that would recognize when a cluster was going to break. Then based on a set of predefined rules, it would auto configure the cluster to use the correct number of resources, fix itself and keep running. The tool became Cruise Control

Prior to creating Cruise Control, engineers would have to manually reconfigure a cluster each time one went down, and Qin says this was a tricky proposition because it could end up having a cascading impact across clusters if it was configured incorrectly. By putting the machine in charge of cluster management with some human oversight, it greatly simplified the process and allowed them to scale cluster repair to meet the needs of their growing network in a way that just wasn’t possible when the engineering team had to do all of the work manually.

At its core, Qin explained this was a load balancing problem. Did the cluster have the right number of resources to stay running without having a negative impact on other clusters in the network. He said this was a matter of identifying some common configurations and applying a set of goals to each one. The machine can very quickly assess the needs of the cluster, check it against the set of common configurations and a set of goals to choose the correct one.

To make sure, it’s on track, it’s possible to put a human check in the workflow where Cruise Control asks an engineer to review the optimization plan before continuing.

If this seems like a tool that would have been nice to have before this, Qin acknowledges that it is, but it took the scalability issues to drive the company to apply the engineering resources to find a solution to the problem.

It took about half a year of tinkering to find the right solution where the machine could process the changes more efficiently than humans could. The company plans to release the tool to the open source community with the goal of not only improving the way it keeps Kafka clusters in balance, but also applying the same load balancing principles to other distributed systems, which should come in handy for a number of use cases, Qin says.

Featured Image: Filograph/Getty Images
  • 0

    SHARES
Advertisement Advertisement

Crunchbase

  • LinkedIn

    • Founded 2003
    • Overview LinkedIn is a professional networking site that allows its members to create business connections, search for jobs, and find potential clients. The site also enables its users to build and engage with their professional networks; access shared knowledge and insights; and find business opportunities. It offers LinkedIn mobile applications across various platforms and languages such as iOS, Android, …
    • Location Mountain View, CA
    • Categories Social Media, Social Recruiting, Professional Networking, Professional Services
    • Website http://linkedin.com
    • Full profile for LinkedIn

Newsletter Subscriptions

Latest Crunch Report

  • Waymo Wants $2.6B for Trade Secret | Crunch Report

    Waymo Wants $2.6B for Trade Secret | Crunch Report

Watch More Episodes
  • Load Balancing
  • open source
  • Apache Kafka
  • LinkedIn
  • Developer
  • Popular Posts

    #auto

    Subpages (2): 9 y
    Comments