Co-authors: Venkata Krishnan Sowrirajan and Min Shen We are excited to announce that push-based shuffle (codenamed Pr...
October 20, 2021
Co-authors: Venkata Krishnan Sowrirajan and Min Shen We are excited to announce that push-based shuffle (codenamed Pr...
October 20, 2021
Co-authors: Andy Li and Hongbin Wu Indexing plays the key role in modern search engines for fast and accurate informa...
September 27, 2021
Co-authors: Preetam Nandy, Yunsong Meng, Cyrus DiCiccio, Heloise Logan, Amir Sepehri, Divya Venugopalan, Kinjal Basu,...
February 8, 2021
Co-authors: Min Shen, Chandni Singh, Ye Zhou, and Sunitha Beeram At LinkedIn, we rely heavily on offline data analyti...
October 21, 2020
Co-authors: Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu At LinkedIn, our imperative is to create economic oppor...
Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for da...
May 4, 2020
Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Adwait Tumbde, Ratandeep Ratti Introduction Over the years, the popula...
March 25, 2020
One of the most common ways to store results from a Spark job is by writing the results to a Hive table stored on HDF...
March 3, 2020
Jeremy Smith, Jonathan Indig, Faisal Siddiqi
There is often a hidden performance cost tied to the complexity of data pipelines — Overhead. In this post we will ex...
September 24, 2019
Co-authors: Xuhong Zhang, Chenya Zhang, and Yiming Ma Today, we are announcing a new open source project called Avro2...
April 4, 2019
How we scaled Spark streaming with a novel balanced Kafka reader for ingesting massive amount of logging events from ...
November 20, 2018
Next Tuesday marks the start of the Spark Summit Conference in San Francisco. This year, LinkedIn engineers and data ...
This post has been updated to note the release of Pepperdata's Application Profiler, a commercial project based on Dr...
March 6, 2017
Machine learning is a key component of LinkedIn’s relevance-driven products. We use machine learning to train the ran...
We are proud to announce today that we are open sourcing Dr. Elephant, a powerful tool that helps users of Hadoop and...
April 8, 2016