Private MLOps Platform

Цей контент ще не доступний вашою мовою.

Private MLOps Platform

Operating machine learning infrastructure on bare metal requires replacing managed cloud services with self-hosted, scalable equivalents. A production-grade MLOps platform on Kubernetes is not a single monolithic application; it is a loosely coupled ecosystem of stateful data stores, stateless API servers, and workflow orchestrators.

Why This Module Matters

In early 2024, a major retail analytics firm named Zephyr Analytics suffered a catastrophic financial loss exceeding $5.2 million over a single holiday weekend. Their cloud-managed ML pipelines, relying on a fully managed feature store, silently failed due to an unannounced and unpinned API deprecation pushed by their cloud vendor. The models began serving predictions using stale, fallback data, leading to massive mispricing of inventory across their entire European market. The incident was not caught by standard Application Performance Monitoring tools because the managed service endpoints returned HTTP 200 OK statuses, completely masking the underlying data drift and schema resolution failures.

Following this disaster, the engineering organization made a strategic pivot: they migrated their entire predictive infrastructure to a bare-metal Kubernetes v1.35 environment. By utilizing open-source tools like MLflow, Feast, and KServe, they regained absolute control over their data plane and model execution environments. This transition required significant upfront engineering to replace managed components, but it eliminated vendor lock-in, reduced their monthly compute bill by over sixty percent, and most importantly, provided deterministic control over every layer of their MLOps stack.

When you build a private MLOps platform, you are taking responsibility for the statefulness, the networking, and the lifecycle of the models. You are no longer renting an abstraction; you are operating the engine. This module equips you with the knowledge to architect, deploy, secure, and troubleshoot that engine from the ground up, ensuring you never fall victim to black-box vendor failures.

Learning Outcomes

Architect a modular MLOps stack on bare-metal Kubernetes utilizing self-hosted components.
Deploy and configure MLflow with a highly available PostgreSQL backend and MinIO artifact store.
Implement Feast as a bare-metal feature store utilizing Redis for online serving and PostgreSQL for offline batching.
Design model serving pipelines using KServe for A/B testing and canary rollouts.
Diagnose common stateful ML component failures, including artifact sync errors and inference memory exhaustion.
Evaluate orchestration tools like Kubeflow Pipelines and Argo Workflows for complex ML lifecycle management.

Architecting the Private MLOps Stack

A functional MLOps platform handles four distinct lifecycles: Data, Experimentation, Orchestration, and Serving. On bare metal, you must provision the underlying storage (block, file, and object) and routing infrastructure that managed services typically abstract away. Without a solid foundation, the higher-level machine learning tools will suffer from extreme latency and data corruption.

Cloud to Bare Metal Mapping

When moving away from managed environments, engineers must map cloud primitives to their open-source equivalents. Teams migrating from comprehensive platforms like AWS SageMaker or Google Cloud’s BigQuery must fundamentally rethink their data pipelines to utilize open-source bare-metal equivalents.

Managed Cloud Service (AWS/GCP)	Bare Metal Kubernetes Equivalent	Primary Storage Backend
S3 / GCS	MinIO / Ceph Object Gateway	Physical NVMe / HDD
SageMaker Experiments / Vertex ML Metadata	MLflow Tracking Server	PostgreSQL + MinIO
Vertex Feature Store / SageMaker Feature Store	Feast	Redis (Online) + Postgres (Offline)
SageMaker Pipelines / Vertex Pipelines	Kubeflow Pipelines (KFP) / Argo	MinIO (Artifacts) + MySQL
SageMaker Endpoints / Vertex Prediction	KServe / Seldon Core	Knative / Istio

To visualize this transition logically, consider the flow mapping from proprietary to open ecosystems:

flowchart LR
    subgraph Proprietary Cloud Services
        S3[S3 / GCS]
        Sagemaker[SageMaker Experiments]
        VertexFS[Vertex Feature Store]
        SagemakerPipelines[SageMaker Pipelines]
        SagemakerEndpoints[SageMaker Endpoints]
    end
    subgraph Bare Metal Equivalents
        MinIO[MinIO / Ceph]
        MLflow[MLflow Tracking Server]
        Feast[Feast]
        KFP[Kubeflow Pipelines / Argo]
        KServe[KServe / Seldon Core]
    end
    S3 -.->|Maps to| MinIO
    Sagemaker -.->|Maps to| MLflow
    VertexFS -.->|Maps to| Feast
    SagemakerPipelines -.->|Maps to| KFP
    SagemakerEndpoints -.->|Maps to| KServe

System Architecture

The following diagram illustrates how these open-source components interact within a Kubernetes cluster to form a cohesive, production-ready platform.

flowchart TD
    subgraph Data Layer
        DS[Object Storage / MinIO]
        DB[(PostgreSQL)]
        Cache[(Redis)]
    end

    subgraph Feature Engineering
        Feast[Feast Registry]
        Feast -->|Materialize| Cache
        Feast -->|Historical| DB
    end

    subgraph Experimentation
        MLflow[MLflow Tracking Server]
        MLflow -->|Metadata| DB
        MLflow -->|Models/Artifacts| DS
    end

    subgraph Orchestration
        Argo[Argo Workflows / KFP]
        Argo -->|Fetch Data| Feast
        Argo -->|Log Metrics| MLflow
    end

    subgraph Serving
        KServe[KServe Inference]
        Istio[Istio Ingress]
        Istio --> KServe
        KServe -->|Pull Model| DS
    end

Pause and predict: Looking at the architecture diagram, what happens to the Serving layer if the MLflow Tracking Server goes offline? Predict the impact on real-time inference. Answer: Nothing. The Serving layer (KServe) pulls models directly from Object Storage (MinIO). MLflow is used for experimentation and metadata tracking, not for serving live traffic. This decoupling is a critical design principle in robust MLOps platforms.

Data Versioning on Bare Metal

Machine learning models are functions of the code and the data they are trained on. Versioning data on bare metal requires an object storage backend and a tracking layer. For bare metal, MinIO is the standard choice. The MinIO server is licensed under GNU AGPLv3, while its commercial enterprise offering is called AIStor.

DVC (Data Version Control)

DVC operates directly on top of Git. It tracks large datasets by storing metadata pointers (.dvc files) in Git while pushing the actual payload to MinIO. The latest stable version of DVC is 3.67.1, released under the Apache 2.0 license.

Pros: Requires zero additional infrastructure beyond your Git server and an S3-compatible endpoint. It leverages the developer’s existing Git workflow seamlessly.
Cons: Client-side heavy. Engineers must configure their local environment with S3 credentials, which can become an operational burden as the team scales across many projects.

LakeFS

LakeFS provides Git-like operations (branch, commit, merge, revert) directly over object storage via an API proxy.

Pros: Server-side implementation. Zero-copy branching (branching a ten-terabyte dataset takes milliseconds and consumes no extra storage).
Cons: Requires running a dedicated LakeFS PostgreSQL database and API server on the cluster. Applications must use the LakeFS S3 gateway endpoint instead of the direct MinIO endpoint.

For massive relational data warehousing on-prem, teams often rely on Greenplum or distributed PostgreSQL extensions, treating them as equivalent to cloud-native BigQuery.

Feature Stores: Feast on Kubernetes

A feature store solves a fundamental problem: bridging the gap between historical data (used for batch training) and real-time data (used for millisecond-latency inference serving). It ensures that the data features used for training exactly match the features used for serving, preventing training-serving skew.

Feast is a prominent open-source choice. Currently at v0.62.0, Feast is licensed under Apache 2.0. Notably, Feast is not a CNCF project, operating independently as an open standard for feature engineering.

Feast relies on two storage tiers:

Offline Store: Used for batch training. On bare metal, this is typically Apache Parquet files stored in MinIO or tables in a centralized PostgreSQL instance. It stores the historical data required to generate training datasets.
Online Store: Used for low-latency inference lookups. On bare metal, this is exclusively Redis. It serves only the latest real-time data points needed by live models.

Feast Configuration (`feature_store.yaml`)

To run Feast on bare metal, you must abandon cloud-native integrations and point the registry and stores to internal cluster endpoints via your feature_store.yaml file.

project: on_prem_mlops
registry:
  registry_type: sql
  path: postgresql://feast-user:secret@feast-postgres.mlops.svc.cluster.local:5432/feast_registry
provider: local
offline_store:
  type: file
online_store:
  type: redis
  connection_string: feast-redis-master.mlops.svc.cluster.local:6379

When deploying Feast materialization jobs, use Kubernetes CronJobs that execute feast materialize rather than relying on external workflow orchestrators. This keeps the data movement logic tightly bound to the feature definitions.

Experiment Tracking: MLflow Architecture

MLflow is hosted under the LF AI & Data Foundation (it is not a CNCF project) and is Apache 2.0 licensed. The current stable release is in major version 3 (v3.11.1), moving well past the legacy v2 architecture. MLflow officially supports Kubernetes as a backend for running MLflow Projects, allowing it to build Docker images and submit Kubernetes Jobs seamlessly without external orchestrators.

MLflow requires a carefully architected deployment to prevent data loss and ensure high availability. By default, running mlflow server writes data to the local container filesystem, which is immediately lost upon pod termination.

A production MLflow deployment requires three distinct components:

Tracking Server: A stateless Python API server handling HTTP requests from clients.
Backend Store: An external SQL database storing parameters, metrics, and run metadata.
Artifact Store: An S3-compatible bucket storing massive serialized models and deep learning checkpoints.

Critical Environment Variables

When running MLflow on Kubernetes with MinIO, the tracking server and client pods must be configured to communicate with the S3 API directly. You must inject these variables securely.

env:
  - name: MLFLOW_S3_ENDPOINT_URL
    value: "http://minio.storage.svc.cluster.local:9000"
  - name: AWS_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: minio-credentials
        key: access-key
  - name: AWS_SECRET_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: minio-credentials
        key: secret-key

Stop and think: Why do we inject AWS credentials when communicating with a local MinIO instance? MinIO implements the AWS S3 API protocol exactly, so standard AWS SDKs (like boto3) require these standard environment variables to authenticate, even if the endpoint is a local Kubernetes service running down the hall.

Orchestration & Pipelines

Orchestrating machine learning workflows requires handling Directed Acyclic Graphs (DAGs) of tasks securely and efficiently. Data pipelines are complex and need dedicated orchestrators.

Kubeflow & KFP

Kubeflow is a CNCF Incubating project (accepted July 2023, not yet Graduated), with its latest stable release at v1.10.0. Kubeflow Pipelines (KFP) SDK v1 is frozen at v1.8.22; SDK v2 (v2.16.0) is the only actively developed version. Crucially, the KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML format, moving away from the Argo Workflow YAML dependency of v1.

For model training, Kubeflow Trainer v2.2 supports PyTorch, JAX, XGBoost, MPI, and Flux distributed training under a single unified TrainJob CRD. Hyperparameter optimization is handled by Katib (v0.19.0), which supports algorithms including grid search, random search, Bayesian optimization, Hyperband, TPE, multivariate-TPE, CMA-ES, Sobol, and Population Based Training (PBT).

Argo & Tekton

Argo Workflows maintains both a v4.x branch (v4.0.4) and a v3.x LTS branch (v3.7.13) simultaneously, and the Argo project as a whole is a CNCF Graduated project. Alternatively, Tekton Pipelines (latest v1.11.0) is a CNCF Incubating project as of March 2026, having moved from the Continuous Delivery Foundation.

KubeRay

For heavy distributed computing, KubeRay (v1.6.0) is utilized. KubeRay is not a CNCF project; it is maintained under the Ray project ecosystem originating at Anyscale.

Model Serving: KServe & Triton

KServe provides a Kubernetes Custom Resource Definition (CRD) for serving ML models. It handles autoscaling, networking, health checking, and server configuration across multiple frameworks. It is a CNCF Incubating project (accepted September 2025). The latest stable release is v0.17.0.

While historically widely referred to as KFServing in community lore, canonical documentation currently refers to it exclusively as KServe. The KServe InferenceService API version is serving.kserve.io/v1beta1; it has not yet graduated to a v1 stable release. You will often see modelserving/v1beta1 in legacy deployment descriptors.

Knative Serving (latest v1.21.2) is optional for KServe; it is only required for the serverless (scale-to-zero) deployment mode. Standard deployments can run without Knative if autoscaling to zero is not required or desired.

Supported Runtimes

KServe built-in runtimes include TensorFlow Serving, NVIDIA Triton, Hugging Face Server, LightGBM, XGBoost, SKLearn, MLflow, and OpenVINO Model Server. TorchServe is not a built-in runtime; PyTorch models are served via Triton. NVIDIA Triton Inference Server (v2.67.0, NGC container release 26.03) is particularly powerful, supporting TensorRT, PyTorch (TorchScript), TensorFlow, ONNX Runtime, OpenVINO, Python, RAPIDS FIL, and vLLM backends.

As an alternative to KServe, Seldon Core v2 uses the Business Source License (BSL), not Apache 2.0, which may impact your deployment compliance. BentoML (v1.4.38) is another alternative, commonly understood to be Apache 2.0 licensed, though you must always verify repository licenses in enterprise contexts.

A/B Testing and Canary Rollouts

KServe supports native traffic splitting using Knative’s routing capabilities. To route traffic securely from the edge, map your Istio VirtualService to the Knative local gateway, ensuring the Host header matches the KServe InferenceService URL.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: fraud-detection
  namespace: mlops
spec:
  predictor:
    canaryTrafficPercent: 20
    model:
      modelFormat:
        name: xgboost
      storageUri: s3://models/fraud-detection/v2
---
# The previous version remains defined or defaults to the rest of the traffic

Monitoring & Governance

Monitoring ML platforms requires robust policy engines. OPA (Open Policy Agent) is a CNCF Graduated project, and its OPA Gatekeeper (v3.22.0) is heavily used to enforce resource limits on ML workloads. Prometheus, another CNCF Graduated project, scrapes metrics from all components.

For specialized ML monitoring, tools like Evidently AI (v0.7.21, Apache 2.0) and ZenML (0.94.2, Apache 2.0) offer drift detection and pipeline management. If you prefer managed platforms, Weights & Biases (wandb) provides an MIT-licensed Python SDK, but the W&B platform itself is a commercial SaaS with no open-source self-hosted server edition.

Did You Know?

Argo Workflows graduated from the CNCF on December 6, 2022, cementing its status in cloud-native orchestration.
MLflow v3.11.1 represents a major milestone in experiment tracking, officially supporting Kubernetes as a native backend for MLflow Projects.
Kubeflow was accepted into the CNCF Incubator on July 25, 2023, transitioning away from standard Google governance.
NVIDIA Triton v2.67.0 (NGC container release 26.03) integrates directly with the vLLM backend, offering massive throughput improvements for LLM serving.

Practitioner Gotchas and Common Mistakes

Mistake	Why It Happens	How to Fix It
MinIO Signature Version V4 Mismatch	Older machine learning libraries or older versions of the `boto3` SDK default to S3 Signature Version 2, but MinIO requires Version 4.	Explicitly set `S3_SIGNATURE_VERSION=s3v4` in the container environment variables.
Feast Redis OOM Kills	Batch materialization from PostgreSQL pushes data faster than Redis can handle, causing memory exhaustion and kubelet termination.	Set a hard `memory.limit` in the Redis StatefulSet and configure `maxmemory-policy allkeys-lru`.
KServe Cold Start Latencies	Knative scales model pods to zero. Loading a multi-gigabyte neural network into GPU memory causes severe HTTP timeouts.	Add the annotation `serving.knative.dev/minScale: "1"` to the InferenceService.
MLflow DB Connection Exhaustion	Hundreds of concurrent tuning workers attempt to log metrics to PostgreSQL without a connection pooler.	Deploy PgBouncer in front of PostgreSQL and route the `backend-store-uri` through it.
Missing S3 Endpoints	Client pods assume public AWS because `MLFLOW_S3_ENDPOINT_URL` is completely absent from their environment definition.	Inject the MinIO endpoint URL into every training pod’s environment variables.
Mixing KFP SDKs	Engineers attempt to compile KFP v2 Python code directly into Argo Workflow YAML manifests.	Use the KFP v2 compiler to generate IR YAML, as Argo YAML generation is deprecated.
Boto3 Silent Hangs	Pods without correct MinIO routing attempt to reach public AWS and hang silently due to high default timeout limits.	Set `AWS_METADATA_SERVICE_TIMEOUT=1` to force early failures and surface the error.

Hands-on Exercise: Deploy MLflow locally

In this exercise, you will deploy a production-ready MLflow stack backed by PostgreSQL and MinIO, and log a test model. This simulates establishing the experimentation layer of a private platform.

Prerequisites

A running Kubernetes v1.35 cluster (kind or k3s).
kubectl and helm installed locally.
A default StorageClass configured.

Task 1: Deploy MinIO (Artifact Store)

Use Helm to deploy a single-node object store.

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

helm install minio bitnami/minio \
  --namespace mlops --create-namespace \
  --set auth.rootUser=admin \
  --set auth.rootPassword=supersecret \
  --set defaultBuckets=mlflow-artifacts

Wait for the MinIO pod to become ready:

kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=minio -n mlops --timeout=90s

Task 2: Deploy PostgreSQL (Backend Store)

Deploy the relational backend for tracking metrics.

helm install mlflow-db bitnami/postgresql \
  --namespace mlops \
  --set global.postgresql.auth.postgresPassword=postgres \
  --set global.postgresql.auth.database=mlflow

Wait for the PostgreSQL pod to become ready:

kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=postgresql -n mlops --timeout=90s

Task 3: Deploy the MLflow Tracking Server

Create a file named mlflow-deployment.yaml. This manifest builds the stateless tracking server, connects it to the DB, and configures the S3 endpoint.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-server
  namespace: mlops
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow
  template:
    metadata:
      labels:
        app: mlflow
    spec:
      containers:
      - name: mlflow
        image: bitnami/mlflow:3.11.1
        command:
        - sh
        - -c
        - |
          mlflow server \
            --host 0.0.0.0 \
            --port 5000 \
            --backend-store-uri postgresql://postgres:postgres@mlflow-db-postgresql.mlops.svc.cluster.local:5432/mlflow \
            --default-artifact-root s3://mlflow-artifacts/
        ports:
        - containerPort: 5000
        env:
        - name: MLFLOW_S3_ENDPOINT_URL
          value: "http://minio.mlops.svc.cluster.local:9000"
        - name: AWS_ACCESS_KEY_ID
          value: "admin"
        - name: AWS_SECRET_ACCESS_KEY
          value: "supersecret"
---
apiVersion: v1
kind: Service
metadata:
  name: mlflow-server
  namespace: mlops
spec:
  selector:
    app: mlflow
  ports:
  - port: 5000
    targetPort: 5000

Apply the deployment:

kubectl apply -f mlflow-deployment.yaml
kubectl wait --for=condition=ready pod -l app=mlflow -n mlops --timeout=90s

Task 4: Verify Logging

Launch a temporary Python pod to act as a training job.

kubectl run mlflow-test -n mlops -i --tty --image=python:3.10-slim --rm \
  --env="MLFLOW_TRACKING_URI=http://mlflow-server.mlops.svc.cluster.local:5000" \
  --env="MLFLOW_S3_ENDPOINT_URL=http://minio.mlops.svc.cluster.local:9000" \
  --env="AWS_ACCESS_KEY_ID=admin" \
  --env="AWS_SECRET_ACCESS_KEY=supersecret" \
  -- sh

Once inside the pod, install dependencies and run a test:

# Inside the pod
pip install mlflow boto3 psycopg2-binary

python -c "
import mlflow
import os

with mlflow.start_run():
    mlflow.log_param('learning_rate', 0.01)
    mlflow.log_metric('accuracy', 0.95)

    # Create a dummy artifact
    with open('model.txt', 'w') as f:
        f.write('dummy model weights')

    mlflow.log_artifact('model.txt')
    print('Run logged successfully!')
"

Solution & Expected Output

If your environment variables are correctly injected, the Python script will successfully authenticate against the local MinIO bucket and write the metric to PostgreSQL.

Expected Output:

Run logged successfully!

Success Checklist:

MinIO bucket is accessible locally.
PostgreSQL connection resolves without OperationalError.
Artifacts sync successfully without InvalidSignature errors.

Task 5: Verify Model Artifacts and Cleanup

Exit the Python shell to terminate the temporary client pod. Then, utilize a port-forward to verify the MLflow tracking server UI in your local browser, confirming the model artifact synced correctly to MinIO.

kubectl port-forward svc/mlflow-server -n mlops 5000:5000

Navigate to http://localhost:5000. You should see your recent run logged. Click into the run to verify that model.txt is visible in the Artifacts pane.

Solution & Expected Output

The MLflow UI should load correctly, demonstrating that the frontend API server successfully queries the PostgreSQL database for metadata and retrieves the artifact byte stream directly from MinIO.

Success Checklist:

MLflow UI is accessible via localhost.
Run parameter learning_rate displays 0.01.
The model.txt artifact is visible and downloadable.

Troubleshooting the Lab

Error: psycopg2.OperationalError: could not translate host name
- Cause: The backend-store-uri string in the MLflow deployment contains a typo or PostgreSQL has not finished initializing. Verify the Service name of your PostgreSQL deployment.
Error: botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL
- Cause: MLFLOW_S3_ENDPOINT_URL is missing or the MinIO service name is incorrect. Ensure the client pod (the mlflow-test pod) has the environment variable set explicitly; it does not inherit it from the server.

Quiz

1. Your data scientists report that their training jobs successfully log parameters (like learning rate) to the MLflow UI, but fail with a connection timeout when attempting to save the final model weights. What is the most likely architectural misconfiguration?

A) The MLflow tracking server does not have write access to the PostgreSQL database.
B) The training pod lacks the MLFLOW_S3_ENDPOINT_URL environment variable, defaulting to public AWS endpoints.
C) The MLflow tracking server pod lacks the AWS_ACCESS_KEY_ID environment variable.
D) The Kubernetes cluster default StorageClass is exhausted.

Answer

**Correct Answer: B.** The client pod making the API call requires the S3 endpoint URL to push the artifact directly to MinIO. Parameters are sent via HTTP to the tracking server, which succeeds, but the artifact upload fails because the pod defaults to public AWS endpoints when `MLFLOW_S3_ENDPOINT_URL` is omitted.

2. A data engineering team proposes using a single MinIO bucket to act as both the Feast offline store for batch training and the Feast online store for real-time model serving. Why will this architectural design fail in production?

A) MinIO does not support the Apache Parquet file format required by Feast.
B) Object storage like MinIO cannot provide the millisecond latency required for the online serving of real-time features.
C) Feast requires a relational database like PostgreSQL for its online store on bare metal.
D) MinIO requires AWS credentials, which cannot be securely injected into Feast materialization jobs.

Answer

**Correct Answer: B.** Feast relies on two distinct storage tiers. While MinIO (Object Storage) is excellent for the batch offline store, the online store requires exclusively Redis on bare metal to achieve the low-latency lookups needed by live models during inference.

3. You deploy a 4GB deep learning model via KServe. The service works during the day, but the first request sent at 3:00 AM times out with an HTTP 504 Gateway Timeout. Subsequent requests succeed. What is the standard practitioner fix for this?

A) Increase the Istio Gateway timeout duration to 15 minutes.
B) Set the annotation serving.knative.dev/minScale: "1" on the InferenceService to prevent scale-to-zero.
C) Switch the underlying object storage from MinIO to Ceph.
D) Add a Redis cache sidecar to the KServe pod to keep the model in memory.

Answer

**Correct Answer: B.** Knative scales to zero by default to save resources. Loading massive deep learning models from storage into GPU memory incurs massive cold-start penalties. You must set a minimum scale of 1 (`minScale: "1"`) to ensure instant responses.

4. Two data science teams are concurrently training models on the same 100TB image dataset stored in MinIO. Team A needs to aggressively filter out blurry images, while Team B needs the unfiltered original dataset. How does LakeFS solve this conflict without requiring an additional 100TB of physical storage?

A) By utilizing Git hooks to track .dvc metadata files on the client side.
B) By providing server-side, zero-copy branching via an API gateway.
C) By converting the dataset into Parquet files and querying it via Feast.
D) By utilizing Knative Serving to scale the dataset access dynamically.

Answer

**Correct Answer: B.** LakeFS provides server-side, zero-copy branching via an API gateway. This is highly efficient for massive datasets because branching takes milliseconds and requires no additional physical storage space to maintain multiple divergent states of the data concurrently.

5. You need to perform an A/B test routing exactly 10% of real-time inference traffic to a new model version (v2) while the rest goes to v1. How is this natively achieved in a KServe deployment?

A) Deploying two separate InferenceServices and using a custom NGINX ingress controller script to split traffic based on headers.
B) Using the canaryTrafficPercent: 10 spec in the InferenceService resource.
C) Configuring MLflow to intercept requests and round-robin the traffic.
D) Scaling the v2 deployment to 1 replica and the v1 deployment to 9 replicas manually.

Answer

**Correct Answer: B.** KServe integrates deeply with Knative's native routing capabilities. You achieve this by configuring the `canaryTrafficPercent: 10` field within the InferenceService specification, offloading the traffic split to the underlying ingress gateway automatically.

6. Your platform team is deploying Kubeflow Pipelines (KFP) v2, but the security team explicitly forbids installing Argo Workflows in the cluster. How does the KFP v2 architecture allow you to proceed?

A) KFP v2 bypasses orchestrators and submits raw Pods directly to the Kubernetes API.
B) KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML, allowing alternative orchestrators to execute them.
C) KFP v2 relies exclusively on Tekton Pipelines as its default execution engine.
D) You cannot proceed; KFP v2 still has a hard dependency on Argo Workflows.

Answer

**Correct Answer: B.** The KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML format, moving away from the strict Argo Workflow YAML dependency of v1. This abstraction allows execution engines other than Argo to run the pipeline.

7. A data science team complains that their Feast materialization jobs keep failing halfway through the process. Upon inspection, you notice the Kubernetes node hosting the Feast online store is experiencing heavy swapping and intermittent kubelet unresponsiveness. What architectural flaw is causing this?

A) The Feast offline store in PostgreSQL is rejecting concurrent reads due to missing indexes.
B) The Redis pod lacks memory limits, allowing Feast batch materialization to consume all node memory and trigger an Out Of Memory (OOM) kill.
C) Knative Serving is autoscaling the Feast pods to zero during materialization.
D) The MLflow Tracking Server is rejecting the logged materialization metrics.

Answer

**Correct Answer: B.** During batch materialization, data is moved from PostgreSQL to Redis. Redis can ingest data faster than it handles eviction if unbounded. Setting a hard `memory.limit` and configuring the `maxmemory-policy allkeys-lru` prevents node-level degradation.

Next Module

Ready to dive deeper into inference optimization? Proceed to Module 9.5: Advanced GPU Scheduling and Inference Optimization, where we explore NVIDIA Multi-Process Service (MPS), Multi-Instance GPU (MIG) hardware configurations, and TensorRT engine compilation for massive throughput gains.

Private MLOps Platform

Private MLOps Platform

Why This Module Matters

Learning Outcomes

Architecting the Private MLOps Stack

Cloud to Bare Metal Mapping

System Architecture

Data Versioning on Bare Metal

DVC (Data Version Control)

LakeFS

Feature Stores: Feast on Kubernetes

Feast Configuration (feature_store.yaml)

Experiment Tracking: MLflow Architecture

Critical Environment Variables

Orchestration & Pipelines

Kubeflow & KFP

Argo & Tekton

KubeRay

Model Serving: KServe & Triton

Supported Runtimes

A/B Testing and Canary Rollouts

Monitoring & Governance

Did You Know?

Practitioner Gotchas and Common Mistakes

Hands-on Exercise: Deploy MLflow locally

Prerequisites

Task 1: Deploy MinIO (Artifact Store)

Task 2: Deploy PostgreSQL (Backend Store)

Task 3: Deploy the MLflow Tracking Server

Task 4: Verify Logging

Task 5: Verify Model Artifacts and Cleanup

Troubleshooting the Lab

Quiz

Further Reading

Next Module

Feast Configuration (`feature_store.yaml`)