ML Ops Platform at Cloudflare

ML Ops Platform at Cloudflare: https://blog.cloudflare.com/mlops/

Interesting notes

  • their products
    • A constantly-evolving ML model drives the WAF attack score that helps protect our customers from malicious payloads.
    • Another evolving model powers our bot management product to catch and prevent bot attacks` on our customers.
  • To make notebooks scalable and open to collaboration, we deploy JupyterHub on Kubernetes
  • ML Ops
  • GitOps
  • Orchestration
    • Apache Airflow - The Standard DAG Composer, run any data or machine learning workflow
    • Argo Workflows - Kubernetes-native Brilliance, YAML-based workflow definition
    • Kubeflow Pipelines - A Platform for Workflows, tailored for orchestrating machine learning workflows
    • Temporal - The Stateful Workflow Enabler, ability to manage complex, stateful workflows, providing a durable and fault-tolerant orchestration solution
  • Adoption
    • issue: each of the teams started their own machine learning solutions separately
    • goal: help streamline and standardize the ML processes
    • what provided: Providing components for shared use such as notebooks, orchestration, data versioning (DVC), feature engineering (Feast), and model versioning (MLflow) allow for teams to collaborate directly
  • Looking forward