Перейти до вмісту

Private MLOps Platform

Цей контент ще не доступний вашою мовою.

Operating machine learning infrastructure on bare metal requires replacing managed cloud services with self-hosted, scalable equivalents. A production-grade MLOps platform on Kubernetes is not a single monolithic application; it is a loosely coupled ecosystem of stateful data stores, stateless API servers, and workflow orchestrators.

In early 2024, a major retail analytics firm named Zephyr Analytics suffered a catastrophic financial loss exceeding $5.2 million over a single holiday weekend. Their cloud-managed ML pipelines, relying on a fully managed feature store, silently failed due to an unannounced and unpinned API deprecation pushed by their cloud vendor. The models began serving predictions using stale, fallback data, leading to massive mispricing of inventory across their entire European market. The incident was not caught by standard Application Performance Monitoring tools because the managed service endpoints returned HTTP 200 OK statuses, completely masking the underlying data drift and schema resolution failures.

Following this disaster, the engineering organization made a strategic pivot: they migrated their entire predictive infrastructure to a bare-metal Kubernetes v1.35 environment. By utilizing open-source tools like MLflow, Feast, and KServe, they regained absolute control over their data plane and model execution environments. This transition required significant upfront engineering to replace managed components, but it eliminated vendor lock-in, reduced their monthly compute bill by over sixty percent, and most importantly, provided deterministic control over every layer of their MLOps stack.

When you build a private MLOps platform, you are taking responsibility for the statefulness, the networking, and the lifecycle of the models. You are no longer renting an abstraction; you are operating the engine. This module equips you with the knowledge to architect, deploy, secure, and troubleshoot that engine from the ground up, ensuring you never fall victim to black-box vendor failures.

  • Architect a modular MLOps stack on bare-metal Kubernetes utilizing self-hosted components.
  • Deploy and configure MLflow with a highly available PostgreSQL backend and MinIO artifact store.
  • Implement Feast as a bare-metal feature store utilizing Redis for online serving and PostgreSQL for offline batching.
  • Design model serving pipelines using KServe for A/B testing and canary rollouts.
  • Diagnose common stateful ML component failures, including artifact sync errors and inference memory exhaustion.
  • Evaluate orchestration tools like Kubeflow Pipelines and Argo Workflows for complex ML lifecycle management.

A functional MLOps platform handles four distinct lifecycles: Data, Experimentation, Orchestration, and Serving. On bare metal, you must provision the underlying storage (block, file, and object) and routing infrastructure that managed services typically abstract away. Without a solid foundation, the higher-level machine learning tools will suffer from extreme latency and data corruption.

When moving away from managed environments, engineers must map cloud primitives to their open-source equivalents. Teams migrating from comprehensive platforms like AWS SageMaker or Google Cloud’s BigQuery must fundamentally rethink their data pipelines to utilize open-source bare-metal equivalents.

Managed Cloud Service (AWS/GCP)Bare Metal Kubernetes EquivalentPrimary Storage Backend
S3 / GCSMinIO / Ceph Object GatewayPhysical NVMe / HDD
SageMaker Experiments / Vertex ML MetadataMLflow Tracking ServerPostgreSQL + MinIO
Vertex Feature Store / SageMaker Feature StoreFeastRedis (Online) + Postgres (Offline)
SageMaker Pipelines / Vertex PipelinesKubeflow Pipelines (KFP) / ArgoMinIO (Artifacts) + MySQL
SageMaker Endpoints / Vertex PredictionKServe / Seldon CoreKnative / Istio

To visualize this transition logically, consider the flow mapping from proprietary to open ecosystems:

flowchart LR
subgraph Proprietary Cloud Services
S3[S3 / GCS]
Sagemaker[SageMaker Experiments]
VertexFS[Vertex Feature Store]
SagemakerPipelines[SageMaker Pipelines]
SagemakerEndpoints[SageMaker Endpoints]
end
subgraph Bare Metal Equivalents
MinIO[MinIO / Ceph]
MLflow[MLflow Tracking Server]
Feast[Feast]
KFP[Kubeflow Pipelines / Argo]
KServe[KServe / Seldon Core]
end
S3 -.->|Maps to| MinIO
Sagemaker -.->|Maps to| MLflow
VertexFS -.->|Maps to| Feast
SagemakerPipelines -.->|Maps to| KFP
SagemakerEndpoints -.->|Maps to| KServe

The following diagram illustrates how these open-source components interact within a Kubernetes cluster to form a cohesive, production-ready platform.

flowchart TD
subgraph Data Layer
DS[Object Storage / MinIO]
DB[(PostgreSQL)]
Cache[(Redis)]
end
subgraph Feature Engineering
Feast[Feast Registry]
Feast -->|Materialize| Cache
Feast -->|Historical| DB
end
subgraph Experimentation
MLflow[MLflow Tracking Server]
MLflow -->|Metadata| DB
MLflow -->|Models/Artifacts| DS
end
subgraph Orchestration
Argo[Argo Workflows / KFP]
Argo -->|Fetch Data| Feast
Argo -->|Log Metrics| MLflow
end
subgraph Serving
KServe[KServe Inference]
Istio[Istio Ingress]
Istio --> KServe
KServe -->|Pull Model| DS
end

Pause and predict: Looking at the architecture diagram, what happens to the Serving layer if the MLflow Tracking Server goes offline? Predict the impact on real-time inference. Answer: Nothing. The Serving layer (KServe) pulls models directly from Object Storage (MinIO). MLflow is used for experimentation and metadata tracking, not for serving live traffic. This decoupling is a critical design principle in robust MLOps platforms.

Machine learning models are functions of the code and the data they are trained on. Versioning data on bare metal requires an object storage backend and a tracking layer. For bare metal, MinIO is the standard choice. The MinIO server is licensed under GNU AGPLv3, while its commercial enterprise offering is called AIStor.

DVC operates directly on top of Git. It tracks large datasets by storing metadata pointers (.dvc files) in Git while pushing the actual payload to MinIO. The latest stable version of DVC is 3.67.1, released under the Apache 2.0 license.

  • Pros: Requires zero additional infrastructure beyond your Git server and an S3-compatible endpoint. It leverages the developer’s existing Git workflow seamlessly.
  • Cons: Client-side heavy. Engineers must configure their local environment with S3 credentials, which can become an operational burden as the team scales across many projects.

LakeFS provides Git-like operations (branch, commit, merge, revert) directly over object storage via an API proxy.

  • Pros: Server-side implementation. Zero-copy branching (branching a ten-terabyte dataset takes milliseconds and consumes no extra storage).
  • Cons: Requires running a dedicated LakeFS PostgreSQL database and API server on the cluster. Applications must use the LakeFS S3 gateway endpoint instead of the direct MinIO endpoint.

For massive relational data warehousing on-prem, teams often rely on Greenplum or distributed PostgreSQL extensions, treating them as equivalent to cloud-native BigQuery.

A feature store solves a fundamental problem: bridging the gap between historical data (used for batch training) and real-time data (used for millisecond-latency inference serving). It ensures that the data features used for training exactly match the features used for serving, preventing training-serving skew.

Feast is a prominent open-source choice. Currently at v0.62.0, Feast is licensed under Apache 2.0. Notably, Feast is not a CNCF project, operating independently as an open standard for feature engineering.

Feast relies on two storage tiers:

  1. Offline Store: Used for batch training. On bare metal, this is typically Apache Parquet files stored in MinIO or tables in a centralized PostgreSQL instance. It stores the historical data required to generate training datasets.
  2. Online Store: Used for low-latency inference lookups. On bare metal, this is exclusively Redis. It serves only the latest real-time data points needed by live models.

To run Feast on bare metal, you must abandon cloud-native integrations and point the registry and stores to internal cluster endpoints via your feature_store.yaml file.

project: on_prem_mlops
registry:
registry_type: sql
path: postgresql://feast-user:secret@feast-postgres.mlops.svc.cluster.local:5432/feast_registry
provider: local
offline_store:
type: file
online_store:
type: redis
connection_string: feast-redis-master.mlops.svc.cluster.local:6379

When deploying Feast materialization jobs, use Kubernetes CronJobs that execute feast materialize rather than relying on external workflow orchestrators. This keeps the data movement logic tightly bound to the feature definitions.

MLflow is hosted under the LF AI & Data Foundation (it is not a CNCF project) and is Apache 2.0 licensed. The current stable release is in major version 3 (v3.11.1), moving well past the legacy v2 architecture. MLflow officially supports Kubernetes as a backend for running MLflow Projects, allowing it to build Docker images and submit Kubernetes Jobs seamlessly without external orchestrators.

MLflow requires a carefully architected deployment to prevent data loss and ensure high availability. By default, running mlflow server writes data to the local container filesystem, which is immediately lost upon pod termination.

A production MLflow deployment requires three distinct components:

  1. Tracking Server: A stateless Python API server handling HTTP requests from clients.
  2. Backend Store: An external SQL database storing parameters, metrics, and run metadata.
  3. Artifact Store: An S3-compatible bucket storing massive serialized models and deep learning checkpoints.

When running MLflow on Kubernetes with MinIO, the tracking server and client pods must be configured to communicate with the S3 API directly. You must inject these variables securely.

env:
- name: MLFLOW_S3_ENDPOINT_URL
value: "http://minio.storage.svc.cluster.local:9000"
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: minio-credentials
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: minio-credentials
key: secret-key

Stop and think: Why do we inject AWS credentials when communicating with a local MinIO instance? MinIO implements the AWS S3 API protocol exactly, so standard AWS SDKs (like boto3) require these standard environment variables to authenticate, even if the endpoint is a local Kubernetes service running down the hall.

Orchestrating machine learning workflows requires handling Directed Acyclic Graphs (DAGs) of tasks securely and efficiently. Data pipelines are complex and need dedicated orchestrators.

Kubeflow is a CNCF Incubating project (accepted July 2023, not yet Graduated), with its latest stable release at v1.10.0. Kubeflow Pipelines (KFP) SDK v1 is frozen at v1.8.22; SDK v2 (v2.16.0) is the only actively developed version. Crucially, the KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML format, moving away from the Argo Workflow YAML dependency of v1.

For model training, Kubeflow Trainer v2.2 supports PyTorch, JAX, XGBoost, MPI, and Flux distributed training under a single unified TrainJob CRD. Hyperparameter optimization is handled by Katib (v0.19.0), which supports algorithms including grid search, random search, Bayesian optimization, Hyperband, TPE, multivariate-TPE, CMA-ES, Sobol, and Population Based Training (PBT).

Argo Workflows maintains both a v4.x branch (v4.0.4) and a v3.x LTS branch (v3.7.13) simultaneously, and the Argo project as a whole is a CNCF Graduated project. Alternatively, Tekton Pipelines (latest v1.11.0) is a CNCF Incubating project as of March 2026, having moved from the Continuous Delivery Foundation.

For heavy distributed computing, KubeRay (v1.6.0) is utilized. KubeRay is not a CNCF project; it is maintained under the Ray project ecosystem originating at Anyscale.

KServe provides a Kubernetes Custom Resource Definition (CRD) for serving ML models. It handles autoscaling, networking, health checking, and server configuration across multiple frameworks. It is a CNCF Incubating project (accepted September 2025). The latest stable release is v0.17.0.

While historically widely referred to as KFServing in community lore, canonical documentation currently refers to it exclusively as KServe. The KServe InferenceService API version is serving.kserve.io/v1beta1; it has not yet graduated to a v1 stable release. You will often see modelserving/v1beta1 in legacy deployment descriptors.

Knative Serving (latest v1.21.2) is optional for KServe; it is only required for the serverless (scale-to-zero) deployment mode. Standard deployments can run without Knative if autoscaling to zero is not required or desired.

KServe built-in runtimes include TensorFlow Serving, NVIDIA Triton, Hugging Face Server, LightGBM, XGBoost, SKLearn, MLflow, and OpenVINO Model Server. TorchServe is not a built-in runtime; PyTorch models are served via Triton. NVIDIA Triton Inference Server (v2.67.0, NGC container release 26.03) is particularly powerful, supporting TensorRT, PyTorch (TorchScript), TensorFlow, ONNX Runtime, OpenVINO, Python, RAPIDS FIL, and vLLM backends.

As an alternative to KServe, Seldon Core v2 uses the Business Source License (BSL), not Apache 2.0, which may impact your deployment compliance. BentoML (v1.4.38) is another alternative, commonly understood to be Apache 2.0 licensed, though you must always verify repository licenses in enterprise contexts.

KServe supports native traffic splitting using Knative’s routing capabilities. To route traffic securely from the edge, map your Istio VirtualService to the Knative local gateway, ensuring the Host header matches the KServe InferenceService URL.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: fraud-detection
namespace: mlops
spec:
predictor:
canaryTrafficPercent: 20
model:
modelFormat:
name: xgboost
storageUri: s3://models/fraud-detection/v2
---
# The previous version remains defined or defaults to the rest of the traffic

Monitoring ML platforms requires robust policy engines. OPA (Open Policy Agent) is a CNCF Graduated project, and its OPA Gatekeeper (v3.22.0) is heavily used to enforce resource limits on ML workloads. Prometheus, another CNCF Graduated project, scrapes metrics from all components.

For specialized ML monitoring, tools like Evidently AI (v0.7.21, Apache 2.0) and ZenML (0.94.2, Apache 2.0) offer drift detection and pipeline management. If you prefer managed platforms, Weights & Biases (wandb) provides an MIT-licensed Python SDK, but the W&B platform itself is a commercial SaaS with no open-source self-hosted server edition.

  • Argo Workflows graduated from the CNCF on December 6, 2022, cementing its status in cloud-native orchestration.
  • MLflow v3.11.1 represents a major milestone in experiment tracking, officially supporting Kubernetes as a native backend for MLflow Projects.
  • Kubeflow was accepted into the CNCF Incubator on July 25, 2023, transitioning away from standard Google governance.
  • NVIDIA Triton v2.67.0 (NGC container release 26.03) integrates directly with the vLLM backend, offering massive throughput improvements for LLM serving.
MistakeWhy It HappensHow to Fix It
MinIO Signature Version V4 MismatchOlder machine learning libraries or older versions of the boto3 SDK default to S3 Signature Version 2, but MinIO requires Version 4.Explicitly set S3_SIGNATURE_VERSION=s3v4 in the container environment variables.
Feast Redis OOM KillsBatch materialization from PostgreSQL pushes data faster than Redis can handle, causing memory exhaustion and kubelet termination.Set a hard memory.limit in the Redis StatefulSet and configure maxmemory-policy allkeys-lru.
KServe Cold Start LatenciesKnative scales model pods to zero. Loading a multi-gigabyte neural network into GPU memory causes severe HTTP timeouts.Add the annotation serving.knative.dev/minScale: "1" to the InferenceService.
MLflow DB Connection ExhaustionHundreds of concurrent tuning workers attempt to log metrics to PostgreSQL without a connection pooler.Deploy PgBouncer in front of PostgreSQL and route the backend-store-uri through it.
Missing S3 EndpointsClient pods assume public AWS because MLFLOW_S3_ENDPOINT_URL is completely absent from their environment definition.Inject the MinIO endpoint URL into every training pod’s environment variables.
Mixing KFP SDKsEngineers attempt to compile KFP v2 Python code directly into Argo Workflow YAML manifests.Use the KFP v2 compiler to generate IR YAML, as Argo YAML generation is deprecated.
Boto3 Silent HangsPods without correct MinIO routing attempt to reach public AWS and hang silently due to high default timeout limits.Set AWS_METADATA_SERVICE_TIMEOUT=1 to force early failures and surface the error.

In this exercise, you will deploy a production-ready MLflow stack backed by PostgreSQL and MinIO, and log a test model. This simulates establishing the experimentation layer of a private platform.

  • A running Kubernetes v1.35 cluster (kind or k3s).
  • kubectl and helm installed locally.
  • A default StorageClass configured.

Use Helm to deploy a single-node object store.

Terminal window
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm install minio bitnami/minio \
--namespace mlops --create-namespace \
--set auth.rootUser=admin \
--set auth.rootPassword=supersecret \
--set defaultBuckets=mlflow-artifacts

Wait for the MinIO pod to become ready:

Terminal window
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=minio -n mlops --timeout=90s

Deploy the relational backend for tracking metrics.

Terminal window
helm install mlflow-db bitnami/postgresql \
--namespace mlops \
--set global.postgresql.auth.postgresPassword=postgres \
--set global.postgresql.auth.database=mlflow

Wait for the PostgreSQL pod to become ready:

Terminal window
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=postgresql -n mlops --timeout=90s

Create a file named mlflow-deployment.yaml. This manifest builds the stateless tracking server, connects it to the DB, and configures the S3 endpoint.

apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow-server
namespace: mlops
spec:
replicas: 1
selector:
matchLabels:
app: mlflow
template:
metadata:
labels:
app: mlflow
spec:
containers:
- name: mlflow
image: bitnami/mlflow:3.11.1
command:
- sh
- -c
- |
mlflow server \
--host 0.0.0.0 \
--port 5000 \
--backend-store-uri postgresql://postgres:postgres@mlflow-db-postgresql.mlops.svc.cluster.local:5432/mlflow \
--default-artifact-root s3://mlflow-artifacts/
ports:
- containerPort: 5000
env:
- name: MLFLOW_S3_ENDPOINT_URL
value: "http://minio.mlops.svc.cluster.local:9000"
- name: AWS_ACCESS_KEY_ID
value: "admin"
- name: AWS_SECRET_ACCESS_KEY
value: "supersecret"
---
apiVersion: v1
kind: Service
metadata:
name: mlflow-server
namespace: mlops
spec:
selector:
app: mlflow
ports:
- port: 5000
targetPort: 5000

Apply the deployment:

Terminal window
kubectl apply -f mlflow-deployment.yaml
kubectl wait --for=condition=ready pod -l app=mlflow -n mlops --timeout=90s

Launch a temporary Python pod to act as a training job.

Terminal window
kubectl run mlflow-test -n mlops -i --tty --image=python:3.10-slim --rm \
--env="MLFLOW_TRACKING_URI=http://mlflow-server.mlops.svc.cluster.local:5000" \
--env="MLFLOW_S3_ENDPOINT_URL=http://minio.mlops.svc.cluster.local:9000" \
--env="AWS_ACCESS_KEY_ID=admin" \
--env="AWS_SECRET_ACCESS_KEY=supersecret" \
-- sh

Once inside the pod, install dependencies and run a test:

Terminal window
# Inside the pod
pip install mlflow boto3 psycopg2-binary
python -c "
import mlflow
import os
with mlflow.start_run():
mlflow.log_param('learning_rate', 0.01)
mlflow.log_metric('accuracy', 0.95)
# Create a dummy artifact
with open('model.txt', 'w') as f:
f.write('dummy model weights')
mlflow.log_artifact('model.txt')
print('Run logged successfully!')
"
Solution & Expected Output

If your environment variables are correctly injected, the Python script will successfully authenticate against the local MinIO bucket and write the metric to PostgreSQL.

Expected Output:

Run logged successfully!

Success Checklist:

  • MinIO bucket is accessible locally.
  • PostgreSQL connection resolves without OperationalError.
  • Artifacts sync successfully without InvalidSignature errors.

Task 5: Verify Model Artifacts and Cleanup

Section titled “Task 5: Verify Model Artifacts and Cleanup”

Exit the Python shell to terminate the temporary client pod. Then, utilize a port-forward to verify the MLflow tracking server UI in your local browser, confirming the model artifact synced correctly to MinIO.

Terminal window
kubectl port-forward svc/mlflow-server -n mlops 5000:5000

Navigate to http://localhost:5000. You should see your recent run logged. Click into the run to verify that model.txt is visible in the Artifacts pane.

Solution & Expected Output The MLflow UI should load correctly, demonstrating that the frontend API server successfully queries the PostgreSQL database for metadata and retrieves the artifact byte stream directly from MinIO.

Success Checklist:

  • MLflow UI is accessible via localhost.
  • Run parameter learning_rate displays 0.01.
  • The model.txt artifact is visible and downloadable.
  • Error: psycopg2.OperationalError: could not translate host name
    • Cause: The backend-store-uri string in the MLflow deployment contains a typo or PostgreSQL has not finished initializing. Verify the Service name of your PostgreSQL deployment.
  • Error: botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL
    • Cause: MLFLOW_S3_ENDPOINT_URL is missing or the MinIO service name is incorrect. Ensure the client pod (the mlflow-test pod) has the environment variable set explicitly; it does not inherit it from the server.

1. Your data scientists report that their training jobs successfully log parameters (like learning rate) to the MLflow UI, but fail with a connection timeout when attempting to save the final model weights. What is the most likely architectural misconfiguration?

  • A) The MLflow tracking server does not have write access to the PostgreSQL database.
  • B) The training pod lacks the MLFLOW_S3_ENDPOINT_URL environment variable, defaulting to public AWS endpoints.
  • C) The MLflow tracking server pod lacks the AWS_ACCESS_KEY_ID environment variable.
  • D) The Kubernetes cluster default StorageClass is exhausted.
Answer **Correct Answer: B.** The client pod making the API call requires the S3 endpoint URL to push the artifact directly to MinIO. Parameters are sent via HTTP to the tracking server, which succeeds, but the artifact upload fails because the pod defaults to public AWS endpoints when `MLFLOW_S3_ENDPOINT_URL` is omitted.

2. A data engineering team proposes using a single MinIO bucket to act as both the Feast offline store for batch training and the Feast online store for real-time model serving. Why will this architectural design fail in production?

  • A) MinIO does not support the Apache Parquet file format required by Feast.
  • B) Object storage like MinIO cannot provide the millisecond latency required for the online serving of real-time features.
  • C) Feast requires a relational database like PostgreSQL for its online store on bare metal.
  • D) MinIO requires AWS credentials, which cannot be securely injected into Feast materialization jobs.
Answer **Correct Answer: B.** Feast relies on two distinct storage tiers. While MinIO (Object Storage) is excellent for the batch offline store, the online store requires exclusively Redis on bare metal to achieve the low-latency lookups needed by live models during inference.

3. You deploy a 4GB deep learning model via KServe. The service works during the day, but the first request sent at 3:00 AM times out with an HTTP 504 Gateway Timeout. Subsequent requests succeed. What is the standard practitioner fix for this?

  • A) Increase the Istio Gateway timeout duration to 15 minutes.
  • B) Set the annotation serving.knative.dev/minScale: "1" on the InferenceService to prevent scale-to-zero.
  • C) Switch the underlying object storage from MinIO to Ceph.
  • D) Add a Redis cache sidecar to the KServe pod to keep the model in memory.
Answer **Correct Answer: B.** Knative scales to zero by default to save resources. Loading massive deep learning models from storage into GPU memory incurs massive cold-start penalties. You must set a minimum scale of 1 (`minScale: "1"`) to ensure instant responses.

4. Two data science teams are concurrently training models on the same 100TB image dataset stored in MinIO. Team A needs to aggressively filter out blurry images, while Team B needs the unfiltered original dataset. How does LakeFS solve this conflict without requiring an additional 100TB of physical storage?

  • A) By utilizing Git hooks to track .dvc metadata files on the client side.
  • B) By providing server-side, zero-copy branching via an API gateway.
  • C) By converting the dataset into Parquet files and querying it via Feast.
  • D) By utilizing Knative Serving to scale the dataset access dynamically.
Answer **Correct Answer: B.** LakeFS provides server-side, zero-copy branching via an API gateway. This is highly efficient for massive datasets because branching takes milliseconds and requires no additional physical storage space to maintain multiple divergent states of the data concurrently.

5. You need to perform an A/B test routing exactly 10% of real-time inference traffic to a new model version (v2) while the rest goes to v1. How is this natively achieved in a KServe deployment?

  • A) Deploying two separate InferenceServices and using a custom NGINX ingress controller script to split traffic based on headers.
  • B) Using the canaryTrafficPercent: 10 spec in the InferenceService resource.
  • C) Configuring MLflow to intercept requests and round-robin the traffic.
  • D) Scaling the v2 deployment to 1 replica and the v1 deployment to 9 replicas manually.
Answer **Correct Answer: B.** KServe integrates deeply with Knative's native routing capabilities. You achieve this by configuring the `canaryTrafficPercent: 10` field within the InferenceService specification, offloading the traffic split to the underlying ingress gateway automatically.

6. Your platform team is deploying Kubeflow Pipelines (KFP) v2, but the security team explicitly forbids installing Argo Workflows in the cluster. How does the KFP v2 architecture allow you to proceed?

  • A) KFP v2 bypasses orchestrators and submits raw Pods directly to the Kubernetes API.
  • B) KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML, allowing alternative orchestrators to execute them.
  • C) KFP v2 relies exclusively on Tekton Pipelines as its default execution engine.
  • D) You cannot proceed; KFP v2 still has a hard dependency on Argo Workflows.
Answer **Correct Answer: B.** The KFP v2 SDK compiles pipelines to a backend-agnostic IR YAML format, moving away from the strict Argo Workflow YAML dependency of v1. This abstraction allows execution engines other than Argo to run the pipeline.

7. A data science team complains that their Feast materialization jobs keep failing halfway through the process. Upon inspection, you notice the Kubernetes node hosting the Feast online store is experiencing heavy swapping and intermittent kubelet unresponsiveness. What architectural flaw is causing this?

  • A) The Feast offline store in PostgreSQL is rejecting concurrent reads due to missing indexes.
  • B) The Redis pod lacks memory limits, allowing Feast batch materialization to consume all node memory and trigger an Out Of Memory (OOM) kill.
  • C) Knative Serving is autoscaling the Feast pods to zero during materialization.
  • D) The MLflow Tracking Server is rejecting the logged materialization metrics.
Answer **Correct Answer: B.** During batch materialization, data is moved from PostgreSQL to Redis. Redis can ingest data faster than it handles eviction if unbounded. Setting a hard `memory.limit` and configuring the `maxmemory-policy allkeys-lru` prevents node-level degradation.

Ready to dive deeper into inference optimization? Proceed to Module 9.5: Advanced GPU Scheduling and Inference Optimization, where we explore NVIDIA Multi-Process Service (MPS), Multi-Instance GPU (MIG) hardware configurations, and TensorRT engine compilation for massive throughput gains.