Related reading People Search AI at LinkedIn

Lessons Learned from Building LinkedIn’s AI Data Platform

In a recent presentation, Felix GV from LinkedIn shared insights into the construction of LinkedIn’s AI data platform, Venice.

Table of Contents

  1. Introduction
  2. AI at LinkedIn
  3. AI Ecosystem
  4. Venice: LinkedIn’s AI Data Platform
  5. Data Infrastructure Components
  6. Conclusion
  7. Appendix: Tools and Frameworks

Introduction

Felix GV discussed the challenges and solutions in building LinkedIn’s AI infrastructure. He highlighted the complexity of integrating machine learning systems into real-world applications, emphasizing the importance of robust surrounding infrastructure.

author: Félix GV - principal staff engineer

AI at LinkedIn

LinkedIn uses AI for various applications, including People You May Know (PYMK) and the main feed. These applications involve massive data and require sophisticated recommendation systems to score and rank entities, ensuring users receive the most relevant content.

AI Ecosystem

Initially, LinkedIn’s AI tools were fragmented, causing inefficiencies. To address this, they developed an integrated AI platform catering to both AI researchers and engineers. The platform covers feature management, model creation, deployment, serving, and maintenance, providing a holistic approach to AI workflows.

Venice: LinkedIn’s AI Data Platform

Key Features

  • Frame: A virtual feature store abstracting over multiple storage types.
  • King Kong: Kubernetes-based deep learning training infrastructure.
  • FedEx: Feature productionization pipeline.
  • Model Cloud: Inference platform for serving models efficiently.

Venice’s Role

Venice is designed for derived data, supporting high-throughput ingestion from batch and streaming sources, and providing low-latency responses essential for AI applications. Venice’s self-service nature ensures ease of use, allowing AI engineers to focus on their business needs.

Venice Architecture

Data Infrastructure Components

LinkedIn’s data infrastructure includes various open-source tools:

Conclusion

LinkedIn’s AI data platform, Venice, exemplifies the importance of a well-integrated infrastructure in deploying large-scale AI applications. By leveraging open-source tools and developing specialized components, LinkedIn ensures efficient, scalable, and reliable AI operations.

Performance is the Best Feature!

For more detailed insights, you can watch the full presentation on InfoQ: Lessons Learned from Building LinkedIn’s AI Data Platform.

Appendix: Tools and Frameworks

Tools and Frameworks Mentioned