Unpacking How Ad Ranking Works at Pinterest

Aayush Mudgal, a senior machine learning engineer at Pinterest, provided an in-depth look at how Pinterest uses AI to rank ads effectively. This blog post will summarize the key points and strategies discussed in the talk, including key infrastructure and architecture diagrams.

Introduction
Scalability Challenges
Data Management
Deep Learning Models and User Experience
System Monitoring and Model Performance Improvements
Conclusion

Introduction

At Pinterest, the ad-ranking system is designed to maximize the value for users, advertisers, and the platform itself. By leveraging advanced machine learning models and big data, Pinterest ensures relevant and engaging ads are displayed to users, driving both user engagement and advertiser satisfaction.

Scalability Challenges

Pinterest operates in a dual marketplace where users engage with content, and advertisers pay to connect with these users. The challenge lies in balancing the needs of both parties while maintaining low latency and high relevance.

Infrastructure Overview

When a user interacts with Pinterest, their request passes through a load balancer to an app server and then to an ad server, which fetches relevant ads to display. This process involves:

Feature Retrieval: Fetching user features from a key-value store.
Candidate Retrieval: Selecting potential ads from billions of content items.
Ranking Service: Using heavy-weight models to rank ads based on multiple objectives like clicks, saves, and reposts.

Data Storage and Formats

Pinterest stores data in multiple formats to cater to different analytical needs:

Hive: For data warehousing and efficient querying.
Cassandra: A scalable NoSQL database for handling large volumes of data.
Fuzzy Search: Enhances search capabilities by allowing approximate matching.

Here is an example architecture diagram illustrating the key components of Pinterest’s ad-serving infrastructure:

Pinterest Ad-Serving Architecture

Deep Learning Models and User Experience

Embeddings

Pinterest uses embeddings to represent both users and advertisements in the same space. This allows for efficient matching of relevant ads to users by comparing the closeness of embeddings.

Training Workflow

The training process involves:

Joiner Workflow: Combining events and features to provide comprehensive feature statistics and validation.
Training Workflow: Using the combined data to train machine learning models.

Models and Predictions

Pinterest employs multitask deep models to make multiple predictions using a single model. This approach includes:

Attention Sequence: Considers the user’s viewing history to improve future interaction predictions.
Low Latency Performance: Ensures that the entire process from ad retrieval to display is optimized for minimal delay.

GBDT + Logistic Regression Ensemble Models

AutoML Architecture

Pinterest uses an AutoML architecture to automate the process of selecting, training, and deploying machine learning models. This helps in handling the scale and complexity of the ad-serving system.

Pinterest’s AutoML Architecture

System Monitoring and Model Performance Improvements

To maintain optimal performance, Pinterest integrates:

Testing and Validation: Including offline validation and model failure alerts.
GPU Serving and Quantization: For efficient handling of large models.
Incremental Training: Allows for daily model updates and deployments using MLFlow for version control and reproducibility.

Conclusion

By adopting these advanced strategies and leveraging powerful open-source tools like Apache Hadoop, Apache Spark, and Docker, Pinterest has built a robust and scalable ad-ranking infrastructure. These efforts ensure that users receive relevant ads while advertisers achieve their campaign goals effectively.

For more detailed insights, you can read the full article on InfoQ: How Pinterest Scaled up Its Ad-Serving Architecture.

Table of Contents