Discover Best Tech Engineering Blogs

Evolving LinkedIn’s analytics tech stack

Co-authors: Steven Chuang, Qinyu Yue, Aravind Rao, and Srihari Duddukuru Introduction Having recently transitioned Li...

analytics

hadoop

data

December 7, 2021

Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes

Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big da...

hadoop

data

open source

September 8, 2021

TonY joins LF AI & Data Foundation

Co-authors: Keqiu Hu, Jonathan Hung, and Junfan Zhang Today, TonY is joining the LF AI & Data Foundation, an umbrella...

ai/ml

hadoop

open source

July 15, 2021

The exabyte club: LinkedIn’s journey of scaling the Hadoop Distributed File System

Co-authors: Konstantin V. Shvachko, Chen Liang, and Simbarashe Dzinamarira LinkedIn runs its big data analytics on Ha...

May 27, 2021

FastIngest: Low-latency Gobblin with Apache Iceberg and...

Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-crit...

January 6, 2021

Theory vs. Practice: Learnings from a recent Hadoop incident

Co-authors: Sandhya Ramu and Vasanth Rajamani For companies and organizations, failure tends to be far more illuminat...

August 6, 2020

Open sourcing Kube2Hadoop: Secure access to HDFS from...

Co-authors: Cong Gu, Abin Shahab, Chen Qiang, and Keqiu Hu Editor's note: This blog has been updated. LinkedIn AI has...

June 10, 2020

On Spark, Hive, and Small Files: An In-Depth Look at Spark Partitioning Strategies

One of the most common ways to store results from a Spark job is by writing the results to a Hive table stored on HDF...

March 3, 2020

The building blocks of LinkedIn Skill Assessments

Co-authors: Christian Mathiesen and Jie Zhang Your LinkedIn profile is intended to be a representative picture of you...

September 17, 2019

Partly Cloudy: Architecture

Recently, Twitter Engineering embarked on an effort to migrate elements of the Twitter stack to the cloud. This is an...

infrastructure

hadoop

May 9, 2019

Twitter

Partly Cloudy: The start of a journey into the cloud

Recently, Twitter Engineering embarked on an effort to migrate elements of the Twitter stack to the cloud. This artic...

HN Discussion

infrastructure

hadoop

April 8, 2019

The Present and Future of Apache Hadoop: A Community Meetup...

On January 30, Hadoop developers gathered at LinkedIn’s offices in Mountain View to share their latest work, with pre...

February 21, 2019

Open Sourcing TonY: Native Support of TensorFlow on Hadoop

Co-authors: Jonathan Hung, Keqiu Hu, and Anthony Hsu LinkedIn heavily relies on artificial intelligence to deliver co...

September 12, 2018

Data Analysis with Spark

Apache’s lightning fast engine for data analysis and machine learning ...

March 1, 2018

Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity

Co-authors: Erik Krogen and Min Shen In March 2015, LinkedIn’s Big Data Platform team experienced a crisis. As the te...

February 8, 2018

Crushing AVRO Small Files with Spark

Solving the many small files problem for AVRO ...

February 6, 2018

Dali Views: Functions as a Service for Big Data

Co-authors: Carl Steinbach and Vasanth Rajamani Big challenges in the big data ecosystem At LinkedIn, we have a numbe...

hadoop

dali

November 9, 2017

Spark Summit 2017: Research, Open Source, and Community

Next Tuesday marks the start of the Spark Summit Conference in San Francisco. This year, LinkedIn engineers and data ...

June 2, 2017

Blog posts about .css-ir0lpz{color:transparent;background-clip:text;-webkit-background-clip:text;background-image:linear-gradient(90deg,rgb(97,94,255),rgb(255,106,77)),linear-gradient(90deg,#615eff,#ff6a4d);}Hadoop

Blog posts about Hadoop