ML Ops Platform at Cloudflare
ML Ops Platform at Cloudflare: https://blog.cloudflare.com/mlops/
Interesting notes
- their products
- A constantly-evolving ML model drives the
WAF attack score
that helps protect our customers from malicious payloads. - Another evolving model powers our bot management product to catch and prevent bot attacks` on our customers.
- A constantly-evolving ML model drives the
- To make notebooks scalable and open to collaboration, we deploy JupyterHub on Kubernetes
- JupyterHub https://jupyter.org/hub
- deploys JupyterHub on Kubernetes using Docker https://z2jh.jupyter.org/en/stable/
- ML Ops
- nbdev - a Python package to improve the notebook experience https://nbdev.fast.ai/
- Kubeflow - the kubernetes native CNCF project for machine learning https://www.kubeflow.org/docs/components/notebooks/overview/
- deployKF - ML Platforms on any Kubernetes cluster, with centralized configs, in-place upgrades, and support for leading ML & Data tools like
Kubeflow
,Airflow
, andMLflow
https://www.deploykf.org/
- GitOps
- ArgoCD and model templates
- Training Template
- Batch Inference Template
- Stream Inference Template
- Each orchestration can be configured to use
Airflow
orArgo Workflows
- Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes https://argo-cd.readthedocs.io/en/stable/
- Helm charts and Kustomize are both supported
- ArgoCD intro video https://www.youtube.com/watch?t=1m4s&v=aWDIQMbp1cc&feature=youtu.be
- architecture https://argo-cd.readthedocs.io/en/stable/operator-manual/architecture/
- ArgoCD and model templates
- Orchestration
- Apache Airflow - The Standard DAG Composer, run any data or machine learning workflow
- Argo Workflows - Kubernetes-native Brilliance, YAML-based workflow definition
- Kubeflow Pipelines - A Platform for Workflows, tailored for orchestrating machine learning workflows
- Temporal - The Stateful Workflow Enabler, ability to manage complex, stateful workflows, providing a durable and fault-tolerant orchestration solution
- Adoption
- issue: each of the teams started their own machine learning solutions separately
- goal: help streamline and standardize the ML processes
- what provided: Providing components for shared use such as notebooks, orchestration, data versioning (DVC), feature engineering (Feast), and model versioning (MLflow) allow for teams to collaborate directly
- Looking forward
- vision is future collaborates with other companies like Meta, for example, making LLama2 globally available on its platform.
- PR release https://www.cloudflare.com/en-gb/press-releases/2023/cloudflare-and-meta-collaborate-to-make-llama-2-available-globally/