Private MLOps Platform
Цей контент ще не доступний вашою мовою.
Private MLOps Platform
Section titled “Private MLOps Platform”Operating machine learning infrastructure on bare metal requires replacing managed cloud services with self-hosted, scalable equivalents. A production-grade MLOps platform on Kubernetes is not a single monolithic application; it is a loosely coupled ecosystem of stateful data stores, stateless API servers, and workflow orchestrators.
Why This Module Matters
Section titled “Why This Module Matters”In early 2024, a major retail analytics firm named Zephyr Analytics suffered a catastrophic financial loss exceeding $5.2 million over a single holiday weekend. Their cloud-managed ML pipelines, relying on a fully managed feature store, silently failed due to an unannounced and unpinned API deprecation pushed by their cloud vendor. The models began serving predictions using stale, fallback data, leading to massive mispricing of inventory across their entire European market. The incident was not caught by standard Application Performance Monitoring tools because the managed service endpoints returned HTTP 200 OK statuses, completely masking the underlying data drift and schema resolution failures.
Following this disaster, the engineering organization made a strategic pivot: they migrated their entire predictive infrastructure to a bare-metal Kubernetes v1.35 environment. By utilizing open-source tools like MLflow, Feast, and KServe, they regained absolute control over their data plane and model execution environments. This transition required significant upfront engineering to replace managed components, but it eliminated vendor lock-in, reduced their monthly compute bill by over sixty percent, and most importantly, provided deterministic control over every layer of their MLOps stack.
When you build a private MLOps platform, you are taking responsibility for the statefulness, the networking, and the lifecycle of the models. You are no longer renting an abstraction; you are operating the engine. This module equips you with the knowledge to architect, deploy, secure, and troubleshoot that engine from the ground up, ensuring you never fall victim to black-box vendor failures.
Learning Outcomes
Section titled “Learning Outcomes”- Architect a modular MLOps stack on bare-metal Kubernetes utilizing self-hosted components.
- Deploy and configure MLflow with a highly available PostgreSQL backend and MinIO artifact store.
- Implement Feast as a bare-metal feature store utilizing Redis for online serving and PostgreSQL for offline batching.
- Design model serving pipelines using KServe for A/B testing and canary rollouts.
- Diagnose common stateful ML component failures, including artifact sync errors and inference memory exhaustion.
- Evaluate orchestration tools like Kubeflow Pipelines and Argo Workflows for complex ML lifecycle management.
Architecting the Private MLOps Stack
Section titled “Architecting the Private MLOps Stack”A functional MLOps platform handles four distinct lifecycles: Data, Experimentation, Orchestration, and Serving. On bare metal, you must provision the underlying storage (block, file, and object) and routing infrastructure that managed services typically abstract away. Without a solid foundation, the higher-level machine learning tools will suffer from extreme latency and data corruption.
Cloud to Bare Metal Mapping
Section titled “Cloud to Bare Metal Mapping”When moving away from managed environments, engineers must map cloud primitives to their open-source equivalents. Teams migrating from comprehensive platforms like AWS SageMaker or Google Cloud’s BigQuery must fundamentally rethink their data pipelines to utilize open-source bare-metal equivalents.
| Managed Cloud Service (AWS/GCP) | Bare Metal Kubernetes Equivalent | Primary Storage Backend |
|---|---|---|
| S3 / GCS | MinIO / Ceph Object Gateway | Physical NVMe / HDD |
| SageMaker Experiments / Vertex ML Metadata | MLflow Tracking Server | PostgreSQL + MinIO |
| Vertex Feature Store / SageMaker Feature Store | Feast | Redis (Online) + Postgres (Offline) |
| SageMaker Pipelines / Vertex Pipelines | Kubeflow Pipelines (KFP) / Argo | MinIO (Artifacts) + MySQL |
| SageMaker Endpoints / Vertex Prediction | KServe / Seldon Core | Knative / Istio |
To visualize this transition logically, consider the flow mapping from proprietary to open ecosystems:
flowchart LR subgraph Proprietary Cloud Services S3[S3 / GCS] Sagemaker[SageMaker Experiments] VertexFS[Vertex Feature Store] SagemakerPipelines[SageMaker Pipelines] SagemakerEndpoints[SageMaker Endpoints] end subgraph Bare Metal Equivalents MinIO[MinIO / Ceph] MLflow[MLflow Tracking Server] Feast[Feast] KFP[Kubeflow Pipelines / Argo] KServe[KServe / Seldon Core] end S3 -.->|Maps to| MinIO Sagemaker -.->|Maps to| MLflow VertexFS -.->|Maps to| Feast SagemakerPipelines -.->|Maps to| KFP SagemakerEndpoints -.->|Maps to| KServeSystem Architecture
Section titled “System Architecture”The following diagram illustrates how these open-source components interact within a Kubernetes cluster to form a cohesive, production-ready platform.
flowchart TD subgraph Data Layer DS[Object Storage / MinIO] DB[(PostgreSQL)] Cache[(Redis)] end
subgraph Feature Engineering Feast[Feast Registry] Feast -->|Materialize| Cache Feast -->|Historical| DB end
subgraph Experimentation MLflow[MLflow Tracking Server] MLflow -->|Metadata| DB MLflow -->|Models/Artifacts| DS end
subgraph Orchestration Argo[Argo Workflows / KFP] Argo -->|Fetch Data| Feast Argo -->|Log Metrics| MLflow end
subgraph Serving KServe[KServe Inference] Istio[Istio Ingress] Istio --> KServe KServe -->|Pull Model| DS endPause and predict: Looking at the architecture diagram, what happens to the Serving layer if the MLflow Tracking Server goes offline? Predict the impact on real-time inference. Answer: Nothing. The Serving layer (KServe) pulls models directly from Object Storage (MinIO). MLflow is used for experimentation and metadata tracking, not for serving live traffic. This decoupling is a critical design principle in robust MLOps platforms.
Data Versioning on Bare Metal
Section titled “Data Versioning on Bare Metal”Machine learning models are functions of the code and the data they are trained on. Versioning data on bare metal requires an object storage backend and a tracking layer. For bare metal, MinIO is the standard choice. The MinIO server is licensed under GNU AGPLv3, while its commercial enterprise offering is called AIStor.
DVC (Data Version Control)
Section titled “DVC (Data Version Control)”DVC operates directly on top of Git. It tracks large datasets by storing metadata pointers (.dvc files) in Git while pushing the actual payload to MinIO. The latest stable version of DVC is 3.67.1, released under the Apache 2.0 license.
- Pros: Requires zero additional infrastructure beyond your Git server and an S3-compatible endpoint. It leverages the developer’s existing Git workflow seamlessly.
- Cons: Client-side heavy. Engineers must configure their local environment with S3 credentials, which can become an operational burden as the team scales across many projects.
LakeFS
Section titled “LakeFS”LakeFS provides Git-like operations (branch, commit, merge, revert) directly over object storage via an API proxy.
- Pros: Server-side implementation. Zero-copy branching (branching a ten-terabyte dataset takes milliseconds and consumes no extra storage).
- Cons: Requires running a dedicated LakeFS PostgreSQL database and API server on the cluster. Applications must use the LakeFS S3 gateway endpoint instead of the direct MinIO endpoint.
For massive relational data warehousing on-prem, teams often rely on Greenplum or distributed PostgreSQL extensions, treating them as equivalent to cloud-native BigQuery.
Feature Stores: Feast on Kubernetes
Section titled “Feature Stores: Feast on Kubernetes”A feature store solves a fundamental problem: bridging the gap between historical data (used for batch training) and real-time data (used for millisecond-latency inference serving). It ensures that the data features used for training exactly match the features used for serving, preventing training-serving skew.
Feast is a prominent open-source choice. Currently at v0.62.0, Feast is licensed under Apache 2.0. Notably, Feast is not a CNCF project, operating independently as an open standard for feature engineering.
Feast relies on two storage tiers:
- Offline Store: Used for batch training. On bare metal, this is typically Apache Parquet files stored in MinIO or tables in a centralized PostgreSQL instance. It stores the historical data required to generate training datasets.
- Online Store: Used for low-latency inference lookups. On bare metal, this is exclusively Redis. It serves only the latest real-time data points needed by live models.
Feast Configuration (feature_store.yaml)
Section titled “Feast Configuration (feature_store.yaml)”To run Feast on bare metal, you must abandon cloud-native integrations and point the registry and stores to internal cluster endpoints via your feature_store.yaml file.
project: on_prem_mlopsregistry: registry_type: sql path: postgresql://feast-user:secret@feast-postgres.mlops.svc.cluster.local:5432/feast_registryprovider: localoffline_store: type: fileonline_store: type: redis connection_string: feast-redis-master.mlops.svc.cluster.local:6379When deploying Feast materialization jobs, use Kubernetes CronJobs that execute feast materialize rather than relying on external workflow orchestrators. This keeps the data movement logic tightly bound to the feature definitions.
Experiment Tracking: MLflow Architecture
Section titled “Experiment Tracking: MLflow Architecture”MLflow is hosted under the LF AI & Data Foundation (it is not a CNCF project) and is Apache 2.0 licensed. The current stable release is in major version 3 (v3.11.1), moving well past the legacy v2 architecture. MLflow officially supports Kubernetes as a backend for running MLflow Projects, allowing it to build Docker images and submit Kubernetes Jobs seamlessly without external orchestrators.
MLflow requires a carefully architected deployment to prevent data loss and ensure high availability. By default, running mlflow server writes data to the local container filesystem, which is immediately lost upon pod termination.
A production MLflow deployment requires three distinct components:
- Tracking Server: A stateless Python API server handling HTTP requests from clients.
- Backend Store: An external SQL database storing parameters, metrics, and run metadata.
- Artifact Store: An S3-compatible bucket storing massive serialized models and deep learning checkpoints.
Critical Environment Variables
Section titled “Critical Environment Variables”When running MLflow on Kubernetes with MinIO, the tracking server and client pods must be configured to communicate with the S3 API directly. You must inject these variables securely.
env: - name: MLFLOW_S3_ENDPOINT_URL value: "http://minio.storage.svc.cluster.local:9000" - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: name: minio-credentials key: access-key - name: AWS_SECRET_ACCESS_KEY valueFrom: secretKeyRef: name: minio-credentials key: secret-keyStop and think: Why do we inject AWS credentials when communicating with a local MinIO instance? MinIO implements the AWS S3 API protocol exactly, so standard AWS SDKs (like
boto3) require these standard environment variables to authenticate, even if the endpoint is a local Kubernetes service running down the hall.
Orchestration & Pipelines
Section titled “Orchestration & Pipelines”Orchestrating machine learning workflows requires handling Directed Acyclic Graphs (DAGs) of tasks securely and efficiently. Data pipelines are complex and need dedicated orchestrators.
Kubeflow & KFP
Section titled “Kubeflow & KFP”Kubeflow is a CNCF Incubating project (accepted July 2023, not yet Graduated), with its latest stable release at v1.10.0. Kubeflow Pipelines (KFP) SDK v1 is frozen at v1.8.22; SDK v2 (v2.16.0) is the only actively developed version. Crucially, the KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML format, moving away from the Argo Workflow YAML dependency of v1.
For model training, Kubeflow Trainer v2.2 supports PyTorch, JAX, XGBoost, MPI, and Flux distributed training under a single unified TrainJob CRD. Hyperparameter optimization is handled by Katib (v0.19.0), which supports algorithms including grid search, random search, Bayesian optimization, Hyperband, TPE, multivariate-TPE, CMA-ES, Sobol, and Population Based Training (PBT).
Argo & Tekton
Section titled “Argo & Tekton”Argo Workflows maintains both a v4.x branch (v4.0.4) and a v3.x LTS branch (v3.7.13) simultaneously, and the Argo project as a whole is a CNCF Graduated project. Alternatively, Tekton Pipelines (latest v1.11.0) is a CNCF Incubating project as of March 2026, having moved from the Continuous Delivery Foundation.
KubeRay
Section titled “KubeRay”For heavy distributed computing, KubeRay (v1.6.0) is utilized. KubeRay is not a CNCF project; it is maintained under the Ray project ecosystem originating at Anyscale.
Model Serving: KServe & Triton
Section titled “Model Serving: KServe & Triton”KServe provides a Kubernetes Custom Resource Definition (CRD) for serving ML models. It handles autoscaling, networking, health checking, and server configuration across multiple frameworks. It is a CNCF Incubating project (accepted September 2025). The latest stable release is v0.17.0.
While historically widely referred to as KFServing in community lore, canonical documentation currently refers to it exclusively as KServe. The KServe InferenceService API version is serving.kserve.io/v1beta1; it has not yet graduated to a v1 stable release. You will often see modelserving/v1beta1 in legacy deployment descriptors.
Knative Serving (latest v1.21.2) is optional for KServe; it is only required for the serverless (scale-to-zero) deployment mode. Standard deployments can run without Knative if autoscaling to zero is not required or desired.
Supported Runtimes
Section titled “Supported Runtimes”KServe built-in runtimes include TensorFlow Serving, NVIDIA Triton, Hugging Face Server, LightGBM, XGBoost, SKLearn, MLflow, and OpenVINO Model Server. TorchServe is not a built-in runtime; PyTorch models are served via Triton. NVIDIA Triton Inference Server (v2.67.0, NGC container release 26.03) is particularly powerful, supporting TensorRT, PyTorch (TorchScript), TensorFlow, ONNX Runtime, OpenVINO, Python, RAPIDS FIL, and vLLM backends.
As an alternative to KServe, Seldon Core v2 uses the Business Source License (BSL), not Apache 2.0, which may impact your deployment compliance. BentoML (v1.4.38) is another alternative, commonly understood to be Apache 2.0 licensed, though you must always verify repository licenses in enterprise contexts.
A/B Testing and Canary Rollouts
Section titled “A/B Testing and Canary Rollouts”KServe supports native traffic splitting using Knative’s routing capabilities. To route traffic securely from the edge, map your Istio VirtualService to the Knative local gateway, ensuring the Host header matches the KServe InferenceService URL.
apiVersion: serving.kserve.io/v1beta1kind: InferenceServicemetadata: name: fraud-detection namespace: mlopsspec: predictor: canaryTrafficPercent: 20 model: modelFormat: name: xgboost storageUri: s3://models/fraud-detection/v2---# The previous version remains defined or defaults to the rest of the trafficMonitoring & Governance
Section titled “Monitoring & Governance”Monitoring ML platforms requires robust policy engines. OPA (Open Policy Agent) is a CNCF Graduated project, and its OPA Gatekeeper (v3.22.0) is heavily used to enforce resource limits on ML workloads. Prometheus, another CNCF Graduated project, scrapes metrics from all components.
For specialized ML monitoring, tools like Evidently AI (v0.7.21, Apache 2.0) and ZenML (0.94.2, Apache 2.0) offer drift detection and pipeline management. If you prefer managed platforms, Weights & Biases (wandb) provides an MIT-licensed Python SDK, but the W&B platform itself is a commercial SaaS with no open-source self-hosted server edition.
Did You Know?
Section titled “Did You Know?”- Argo Workflows graduated from the CNCF on December 6, 2022, cementing its status in cloud-native orchestration.
- MLflow v3.11.1 represents a major milestone in experiment tracking, officially supporting Kubernetes as a native backend for MLflow Projects.
- Kubeflow was accepted into the CNCF Incubator on July 25, 2023, transitioning away from standard Google governance.
- NVIDIA Triton v2.67.0 (NGC container release 26.03) integrates directly with the vLLM backend, offering massive throughput improvements for LLM serving.
Practitioner Gotchas and Common Mistakes
Section titled “Practitioner Gotchas and Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| MinIO Signature Version V4 Mismatch | Older machine learning libraries or older versions of the boto3 SDK default to S3 Signature Version 2, but MinIO requires Version 4. | Explicitly set S3_SIGNATURE_VERSION=s3v4 in the container environment variables. |
| Feast Redis OOM Kills | Batch materialization from PostgreSQL pushes data faster than Redis can handle, causing memory exhaustion and kubelet termination. | Set a hard memory.limit in the Redis StatefulSet and configure maxmemory-policy allkeys-lru. |
| KServe Cold Start Latencies | Knative scales model pods to zero. Loading a multi-gigabyte neural network into GPU memory causes severe HTTP timeouts. | Add the annotation serving.knative.dev/minScale: "1" to the InferenceService. |
| MLflow DB Connection Exhaustion | Hundreds of concurrent tuning workers attempt to log metrics to PostgreSQL without a connection pooler. | Deploy PgBouncer in front of PostgreSQL and route the backend-store-uri through it. |
| Missing S3 Endpoints | Client pods assume public AWS because MLFLOW_S3_ENDPOINT_URL is completely absent from their environment definition. | Inject the MinIO endpoint URL into every training pod’s environment variables. |
| Mixing KFP SDKs | Engineers attempt to compile KFP v2 Python code directly into Argo Workflow YAML manifests. | Use the KFP v2 compiler to generate IR YAML, as Argo YAML generation is deprecated. |
| Boto3 Silent Hangs | Pods without correct MinIO routing attempt to reach public AWS and hang silently due to high default timeout limits. | Set AWS_METADATA_SERVICE_TIMEOUT=1 to force early failures and surface the error. |
Hands-on Exercise: Deploy MLflow locally
Section titled “Hands-on Exercise: Deploy MLflow locally”In this exercise, you will deploy a production-ready MLflow stack backed by PostgreSQL and MinIO, and log a test model. This simulates establishing the experimentation layer of a private platform.
Prerequisites
Section titled “Prerequisites”- A running Kubernetes v1.35 cluster (
kindork3s). kubectlandhelminstalled locally.- A default StorageClass configured.
Task 1: Deploy MinIO (Artifact Store)
Section titled “Task 1: Deploy MinIO (Artifact Store)”Use Helm to deploy a single-node object store.
helm repo add bitnami https://charts.bitnami.com/bitnamihelm repo update
helm install minio bitnami/minio \ --namespace mlops --create-namespace \ --set auth.rootUser=admin \ --set auth.rootPassword=supersecret \ --set defaultBuckets=mlflow-artifactsWait for the MinIO pod to become ready:
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=minio -n mlops --timeout=90sTask 2: Deploy PostgreSQL (Backend Store)
Section titled “Task 2: Deploy PostgreSQL (Backend Store)”Deploy the relational backend for tracking metrics.
helm install mlflow-db bitnami/postgresql \ --namespace mlops \ --set global.postgresql.auth.postgresPassword=postgres \ --set global.postgresql.auth.database=mlflowWait for the PostgreSQL pod to become ready:
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=postgresql -n mlops --timeout=90sTask 3: Deploy the MLflow Tracking Server
Section titled “Task 3: Deploy the MLflow Tracking Server”Create a file named mlflow-deployment.yaml. This manifest builds the stateless tracking server, connects it to the DB, and configures the S3 endpoint.
apiVersion: apps/v1kind: Deploymentmetadata: name: mlflow-server namespace: mlopsspec: replicas: 1 selector: matchLabels: app: mlflow template: metadata: labels: app: mlflow spec: containers: - name: mlflow image: bitnami/mlflow:3.11.1 command: - sh - -c - | mlflow server \ --host 0.0.0.0 \ --port 5000 \ --backend-store-uri postgresql://postgres:postgres@mlflow-db-postgresql.mlops.svc.cluster.local:5432/mlflow \ --default-artifact-root s3://mlflow-artifacts/ ports: - containerPort: 5000 env: - name: MLFLOW_S3_ENDPOINT_URL value: "http://minio.mlops.svc.cluster.local:9000" - name: AWS_ACCESS_KEY_ID value: "admin" - name: AWS_SECRET_ACCESS_KEY value: "supersecret"---apiVersion: v1kind: Servicemetadata: name: mlflow-server namespace: mlopsspec: selector: app: mlflow ports: - port: 5000 targetPort: 5000Apply the deployment:
kubectl apply -f mlflow-deployment.yamlkubectl wait --for=condition=ready pod -l app=mlflow -n mlops --timeout=90sTask 4: Verify Logging
Section titled “Task 4: Verify Logging”Launch a temporary Python pod to act as a training job.
kubectl run mlflow-test -n mlops -i --tty --image=python:3.10-slim --rm \ --env="MLFLOW_TRACKING_URI=http://mlflow-server.mlops.svc.cluster.local:5000" \ --env="MLFLOW_S3_ENDPOINT_URL=http://minio.mlops.svc.cluster.local:9000" \ --env="AWS_ACCESS_KEY_ID=admin" \ --env="AWS_SECRET_ACCESS_KEY=supersecret" \ -- shOnce inside the pod, install dependencies and run a test:
# Inside the podpip install mlflow boto3 psycopg2-binary
python -c "import mlflowimport os
with mlflow.start_run(): mlflow.log_param('learning_rate', 0.01) mlflow.log_metric('accuracy', 0.95)
# Create a dummy artifact with open('model.txt', 'w') as f: f.write('dummy model weights')
mlflow.log_artifact('model.txt') print('Run logged successfully!')"Solution & Expected Output
If your environment variables are correctly injected, the Python script will successfully authenticate against the local MinIO bucket and write the metric to PostgreSQL.
Expected Output:
Run logged successfully!Success Checklist:
- MinIO bucket is accessible locally.
- PostgreSQL connection resolves without
OperationalError. - Artifacts sync successfully without
InvalidSignatureerrors.
Task 5: Verify Model Artifacts and Cleanup
Section titled “Task 5: Verify Model Artifacts and Cleanup”Exit the Python shell to terminate the temporary client pod. Then, utilize a port-forward to verify the MLflow tracking server UI in your local browser, confirming the model artifact synced correctly to MinIO.
kubectl port-forward svc/mlflow-server -n mlops 5000:5000Navigate to http://localhost:5000. You should see your recent run logged. Click into the run to verify that model.txt is visible in the Artifacts pane.
Solution & Expected Output
The MLflow UI should load correctly, demonstrating that the frontend API server successfully queries the PostgreSQL database for metadata and retrieves the artifact byte stream directly from MinIO.Success Checklist:
- MLflow UI is accessible via localhost.
- Run parameter
learning_ratedisplays0.01. - The
model.txtartifact is visible and downloadable.
Troubleshooting the Lab
Section titled “Troubleshooting the Lab”- Error:
psycopg2.OperationalError: could not translate host name- Cause: The
backend-store-uristring in the MLflow deployment contains a typo or PostgreSQL has not finished initializing. Verify the Service name of your PostgreSQL deployment.
- Cause: The
- Error:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL- Cause:
MLFLOW_S3_ENDPOINT_URLis missing or the MinIO service name is incorrect. Ensure the client pod (themlflow-testpod) has the environment variable set explicitly; it does not inherit it from the server.
- Cause:
1. Your data scientists report that their training jobs successfully log parameters (like learning rate) to the MLflow UI, but fail with a connection timeout when attempting to save the final model weights. What is the most likely architectural misconfiguration?
- A) The MLflow tracking server does not have write access to the PostgreSQL database.
- B) The training pod lacks the
MLFLOW_S3_ENDPOINT_URLenvironment variable, defaulting to public AWS endpoints. - C) The MLflow tracking server pod lacks the
AWS_ACCESS_KEY_IDenvironment variable. - D) The Kubernetes cluster default StorageClass is exhausted.
Answer
**Correct Answer: B.** The client pod making the API call requires the S3 endpoint URL to push the artifact directly to MinIO. Parameters are sent via HTTP to the tracking server, which succeeds, but the artifact upload fails because the pod defaults to public AWS endpoints when `MLFLOW_S3_ENDPOINT_URL` is omitted.2. A data engineering team proposes using a single MinIO bucket to act as both the Feast offline store for batch training and the Feast online store for real-time model serving. Why will this architectural design fail in production?
- A) MinIO does not support the Apache Parquet file format required by Feast.
- B) Object storage like MinIO cannot provide the millisecond latency required for the online serving of real-time features.
- C) Feast requires a relational database like PostgreSQL for its online store on bare metal.
- D) MinIO requires AWS credentials, which cannot be securely injected into Feast materialization jobs.
Answer
**Correct Answer: B.** Feast relies on two distinct storage tiers. While MinIO (Object Storage) is excellent for the batch offline store, the online store requires exclusively Redis on bare metal to achieve the low-latency lookups needed by live models during inference.3. You deploy a 4GB deep learning model via KServe. The service works during the day, but the first request sent at 3:00 AM times out with an HTTP 504 Gateway Timeout. Subsequent requests succeed. What is the standard practitioner fix for this?
- A) Increase the Istio Gateway timeout duration to 15 minutes.
- B) Set the annotation
serving.knative.dev/minScale: "1"on the InferenceService to prevent scale-to-zero. - C) Switch the underlying object storage from MinIO to Ceph.
- D) Add a Redis cache sidecar to the KServe pod to keep the model in memory.
Answer
**Correct Answer: B.** Knative scales to zero by default to save resources. Loading massive deep learning models from storage into GPU memory incurs massive cold-start penalties. You must set a minimum scale of 1 (`minScale: "1"`) to ensure instant responses.4. Two data science teams are concurrently training models on the same 100TB image dataset stored in MinIO. Team A needs to aggressively filter out blurry images, while Team B needs the unfiltered original dataset. How does LakeFS solve this conflict without requiring an additional 100TB of physical storage?
- A) By utilizing Git hooks to track
.dvcmetadata files on the client side. - B) By providing server-side, zero-copy branching via an API gateway.
- C) By converting the dataset into Parquet files and querying it via Feast.
- D) By utilizing Knative Serving to scale the dataset access dynamically.
Answer
**Correct Answer: B.** LakeFS provides server-side, zero-copy branching via an API gateway. This is highly efficient for massive datasets because branching takes milliseconds and requires no additional physical storage space to maintain multiple divergent states of the data concurrently.5. You need to perform an A/B test routing exactly 10% of real-time inference traffic to a new model version (v2) while the rest goes to v1. How is this natively achieved in a KServe deployment?
- A) Deploying two separate InferenceServices and using a custom NGINX ingress controller script to split traffic based on headers.
- B) Using the
canaryTrafficPercent: 10spec in the InferenceService resource. - C) Configuring MLflow to intercept requests and round-robin the traffic.
- D) Scaling the v2 deployment to 1 replica and the v1 deployment to 9 replicas manually.
Answer
**Correct Answer: B.** KServe integrates deeply with Knative's native routing capabilities. You achieve this by configuring the `canaryTrafficPercent: 10` field within the InferenceService specification, offloading the traffic split to the underlying ingress gateway automatically.6. Your platform team is deploying Kubeflow Pipelines (KFP) v2, but the security team explicitly forbids installing Argo Workflows in the cluster. How does the KFP v2 architecture allow you to proceed?
- A) KFP v2 bypasses orchestrators and submits raw Pods directly to the Kubernetes API.
- B) KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML, allowing alternative orchestrators to execute them.
- C) KFP v2 relies exclusively on Tekton Pipelines as its default execution engine.
- D) You cannot proceed; KFP v2 still has a hard dependency on Argo Workflows.
Answer
**Correct Answer: B.** The KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML format, moving away from the strict Argo Workflow YAML dependency of v1. This abstraction allows execution engines other than Argo to run the pipeline.7. A data science team complains that their Feast materialization jobs keep failing halfway through the process. Upon inspection, you notice the Kubernetes node hosting the Feast online store is experiencing heavy swapping and intermittent kubelet unresponsiveness. What architectural flaw is causing this?
- A) The Feast offline store in PostgreSQL is rejecting concurrent reads due to missing indexes.
- B) The Redis pod lacks memory limits, allowing Feast batch materialization to consume all node memory and trigger an Out Of Memory (OOM) kill.
- C) Knative Serving is autoscaling the Feast pods to zero during materialization.
- D) The MLflow Tracking Server is rejecting the logged materialization metrics.
Answer
**Correct Answer: B.** During batch materialization, data is moved from PostgreSQL to Redis. Redis can ingest data faster than it handles eviction if unbounded. Setting a hard `memory.limit` and configuring the `maxmemory-policy allkeys-lru` prevents node-level degradation.Further Reading
Section titled “Further Reading”- MLflow Tracking on Kubernetes
- Feast Architecture and Concepts
- KServe InferenceService Architecture
- LakeFS Object Storage Integration
Next Module
Section titled “Next Module”Ready to dive deeper into inference optimization? Proceed to Module 9.5: Advanced GPU Scheduling and Inference Optimization, where we explore NVIDIA Multi-Process Service (MPS), Multi-Instance GPU (MIG) hardware configurations, and TensorRT engine compilation for massive throughput gains.