Introduction LinkedIn’s stack consists of thousands of different microservices and the associated complex dependencies among them. When a production outage...
Dec 22, 2021
Introduction LinkedIn’s stack consists of thousands of different microservices and the associated complex dependencies among them. When a production outage...
Dec 22, 2021
While site outages are inevitable, it’s our job to minimize both the duration of outages and the likelihood for an outage to occur. One of our preemptive m...
Mar 16, 2021
Co-authors: Akbar KM and Kalyanasundaram Somasundaram Site up and secure is a fundamental element of how we operate, and site...
Feb 3, 2021
As companies grow, adapt, morph, and mature, one item remains the same: the need for reinvention. Technical infrastructure is no...
Oct 16, 2020
Espresso is LinkedIn's defacto NoSQL database solution. It is an online, distributed, fault-tolerant database that powers most of LinkedIn’s applications i...
Jun 23, 2020
Co-authors: Viranch Mehta, Jon Sorenson, Samir Jafferali As LinkedIn has grown to more than 690 million members, we’ve expanded our edge platform to 19 Poi...
Jun 16, 2020
By Hank Jacobs, Senior Site Reliability Engineer on CORE
Editor's Note: LinkedIn Engineering is dedicated to solving complex problems at scale to create economic opportunity for every member...
Feb 7, 2020
As the year draws to a close, we’re taking a look back at ten of our most popular 2019 articles on the LinkedIn Engineering Blog....
Introduction In 2013, when LinkedIn moved to multiple data centers across the globe, we needed a way to redirect traffic from one data...
Dec 6, 2019
Co-authors: Todd Palino, Samir Jafferali, Kurt Andersen, and Carolyn Blood LinkedIn hosted its 4th annual SRE[in]con conference in late October that brough...
Nov 14, 2019
At LinkedIn, our on-call incidents are managed using Iris and Oncall, two tools that we released as open source to the community about two years ago. Oncal...
May 9, 2019
Learn more about the work that our dedicated reliability engineering team is doing behind the scenes to keep our systems up and running.
Editor’s Note: This article originally appeared as a guest post on VentureBeat titled “What I learned by bringing down LinkedIn.com.” Reprinted here in ful...
Nov 16, 2018
LinkedIn has made significant investments in resilience engineering over the past few years. As Site Reliability Engineers (SREs), we've consistently witne...
May 24, 2018
At LinkedIn, we ship hundreds of command-line utilities to every machine in our data centers and to all of our employees’ workstations...
May 10, 2018
Author's note: My colleague, Michael Kehoe, wrote a blog post on the Couchbase Ecosystem at LinkedIn. I encourage you to read it if you haven’t already! Th...
May 1, 2018
In part one of this series, we discussed some of the key principles to consider when developing geographically distributed (GD) SRE...