Co-authors: Steven Chuang, Qinyu Yue, Aravind Rao, and Srihari Duddukuru Introduction Having recently transitioned LinkedIn’s...
Blog posts about Hadoop
Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big data analytics and...
Sep 8, 2021
Co-authors: Keqiu Hu, Jonathan Hung, and Junfan Zhang Today, TonY is joining the LF AI & Data Foundation, an umbrella foundation of the Linux Foundation su...
Jul 15, 2021
Co-authors: Konstantin V. Shvachko, Chen Liang, and Simbarashe Dzinamarira LinkedIn runs its big data analytics on Hadoop. During the last five years, the ...
May 27, 2021
Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at...
Co-authors: Sandhya Ramu and Vasanth Rajamani For companies and organizations, failure tends to be far more illuminating than success and the lingering eff...
Aug 6, 2020
Co-authors: Cong Gu, Abin Shahab, Chen Qiang, and Keqiu Hu Editor's note: This blog has been updated. LinkedIn AI has been...
Jun 10, 2020
One of the most common ways to store results from a Spark job is by writing the results to a Hive table stored on HDFS. While in theory…
Mar 3, 2020
Co-authors: Christian Mathiesen and Jie Zhang Your LinkedIn profile is intended to be a representative picture of your professional...
Sep 17, 2019
On January 30, Hadoop developers gathered at LinkedIn’s offices in Mountain View to share their latest work, with presentations by...
Feb 21, 2019
Co-authors: Jonathan Hung, Keqiu Hu, and Anthony Hsu LinkedIn heavily relies on artificial intelligence to deliver content and create...
Sep 12, 2018
Co-authors: Erik Krogen and Min Shen In March 2015, LinkedIn’s Big Data Platform team experienced a crisis. As the team was preparing to head home for the ...
Feb 8, 2018
Co-authors: Carl Steinbach and Vasanth Rajamani Big challenges in the big data ecosystem At LinkedIn, we have a number of challenges managing data in our c...
Next Tuesday marks the start of the Spark Summit Conference in San Francisco. This year, LinkedIn engineers and data scientists are...
This post has been updated to note the release of Pepperdata's Application Profiler, a commercial project based on Dr. Elephant. Last April, we announced t...
Mar 6, 2017
About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced by people working on ...
We shared Gobblin with the open source community a year ago. Since then, we’ve seen increasing interest and adoption among engineers, researchers and analy...
Apr 13, 2016
We are proud to announce today that we are open sourcing Dr. Elephant, a powerful tool that helps users of Hadoop and Spark understand, analyze, and improv...
Apr 8, 2016