Module 5.2: Feature Engineering & Stores
Discipline Track | Complexity:
[COMPLEX]| Time: 40-45 min
Prerequisites
Section titled “Prerequisites”Before starting this module:
- Module 5.1: MLOps Fundamentals
- Basic understanding of data transformations
- Familiarity with pandas DataFrames
- Understanding of training vs. inference
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Design a feature store architecture that serves both batch training and real-time inference workloads
- Implement feature pipelines using Feast or Tecton for consistent feature computation and serving
- Build feature discovery workflows that enable ML engineers to find and reuse existing features
- Evaluate feature store solutions against requirements for latency, freshness, and data consistency
Why This Module Matters
Section titled “Why This Module Matters”The number one cause of ML production failures isn’t bad models—it’s training/serving skew. Your model trains on features computed one way, then serves predictions using features computed differently. Same feature name, different values, wrong predictions.
Feature stores solve this by providing a single source of truth for features. Compute once, use everywhere. Netflix, Uber, and Airbnb all built feature stores after learning this lesson the hard way.
If you’re doing ML at scale without a feature store, you’re building technical debt.
Did You Know?
Section titled “Did You Know?”- Uber built Michelangelo (their ML platform) primarily to solve the feature consistency problem—they found 30% of ML debugging time was spent on feature issues
- Feature computation often takes 80% of ML pipeline time—yet gets 20% of the attention. Feature stores flip this ratio by making feature engineering reusable
- The term “feature store” was coined by Uber in 2017, but the concept existed earlier as “feature engineering platforms” at Google and Facebook
- Point-in-time correctness (avoiding data leakage) is the hardest feature store problem to solve—get it wrong and your backtesting lies to you
What is a Feature Store?
Section titled “What is a Feature Store?”A feature store is a centralized repository for storing, sharing, and serving ML features. Think of it as a “data warehouse for ML features.”
flowchart TB subgraph Without["WITHOUT FEATURE STORE"] direction LR subgraph Training["TRAINING PIPELINE"] direction TB A1[SQL Query A<br/>batch, complex] --> B1[Python Transform<br/>pandas] B1 --> C1[Training Data<br/>features: X] end subgraph Serving["SERVING PIPELINE"] direction TB A2[SQL Query B<br/>realtime, fast] --> B2[Java Transform<br/>custom code] B2 --> C2[Serving Data<br/>features: X'] end A1 -.->|"Different!"| A2 B1 -.->|"Different!"| B2 C1 -.->|"SKEW!"| C2 endflowchart TB subgraph With["WITH FEATURE STORE"] direction TB FS[FEATURE STORE<br/>Feature Definition<br/>Single source of truth] FS --> Offline[OFFLINE STORE<br/>Training<br/>Data Lake<br/>Batch queries] FS --> Online[ONLINE STORE<br/>Serving<br/>Redis / DynamoDB<br/>Low latency] Offline --> TrainData[Training Data<br/>features: X] Online --> ServData[Serving Data<br/>features: X] TrainData -.->|"SAME!"| ServData endThe Training/Serving Skew Problem
Section titled “The Training/Serving Skew Problem”# TRAINING: pandas on full datasetdf['avg_purchase_30d'] = df.groupby('user_id')['amount'].transform( lambda x: x.rolling(30).mean())
# SERVING: custom SQL for single userSELECT AVG(amount)FROM purchasesWHERE user_id = ? AND date > NOW() - INTERVAL 30 DAY # Bug: different window!Small differences cause big problems:
- Different date ranges
- NULL handling differences
- Timezone mismatches
- Rounding errors
Stop and think: How would you ensure that a feature calculated as a 30-day rolling average in batch (using pandas) matches the exact same logic when calculated per-user in real-time (using custom SQL or Java)? Without a unified feature store framework, you are relying entirely on manual code translation, leaving you highly vulnerable to these small discrepancies.
War Story: The $10M Feature Bug
Section titled “War Story: The $10M Feature Bug”A financial services company deployed a credit risk model. The training pipeline computed “average balance over 90 days” correctly. The serving pipeline had a bug—it computed 30-day average instead.
The model underestimated risk. They approved loans they shouldn’t have. Six months later: $10M in defaults traced to one feature computation bug.
A feature store would have prevented this entirely.
Feature Store Architecture
Section titled “Feature Store Architecture”Core Components
Section titled “Core Components”flowchart TB subgraph Registry["FEATURE REGISTRY"] direction LR subgraph U["user_features"] direction TB U1[age<br/>tenure<br/>avg_spend] end subgraph P["product_features"] direction TB P1[price<br/>category<br/>popularity] end subgraph T["transaction_features"] direction TB T1[amount<br/>is_fraud<br/>hour_of_day] end end
Registry --> Offline Registry --> Online
subgraph Offline["OFFLINE STORE"] direction TB DL[Data Lake / Parquet<br/>• Historical data<br/>• Point-in-time<br/>• Training datasets] end
subgraph Online["ONLINE STORE"] direction TB KV[Redis / DynamoDB<br/>• Latest values only<br/>• Millisecond latency<br/>• Online inference] endOffline vs. Online Stores
Section titled “Offline vs. Online Stores”| Aspect | Offline Store | Online Store |
|---|---|---|
| Purpose | Training data | Real-time inference |
| Latency | Seconds to minutes | Milliseconds |
| Data | Full history | Latest values |
| Storage | Data lake (S3, GCS) | Key-value (Redis, DynamoDB) |
| Query | Batch, point-in-time | Key lookup |
| Cost | Storage optimized | Compute optimized |
Feature Engineering Best Practices
Section titled “Feature Engineering Best Practices”Feature Types
Section titled “Feature Types”mindmap root((FEATURE<br/>CATEGORIES)) IDENTITY FEATURES user_id, product_id Static or slowly changing Usually joined, not computed NUMERICAL FEATURES Raw: age, price Transformed: log Normalized: z-score CATEGORICAL FEATURES One-hot Ordinal Embeddings TEMPORAL FEATURES Extracted: hour, day Cyclical: sin, cos Lagged: yesterday AGGREGATE FEATURES Rolling: avg_7d Cumulative: lifetime Relative: vs_avgTransformation Code
Section titled “Transformation Code”# Good feature engineering patternsimport pandas as pdimport numpy as np
def create_user_features(df: pd.DataFrame) -> pd.DataFrame: """Create user-level features.""" features = pd.DataFrame() features['user_id'] = df['user_id']
# Numerical: log transform for skewed data features['log_total_spend'] = np.log1p(df['total_spend'])
# Temporal: cyclical encoding for hour features['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24) features['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)
# Aggregate: rolling windows features['avg_purchase_7d'] = df.groupby('user_id')['amount'].transform( lambda x: x.rolling(7, min_periods=1).mean() )
# Ratio features (often powerful) features['purchase_frequency'] = df['num_purchases'] / df['days_active']
return featuresPoint-in-Time Correctness
Section titled “Point-in-Time Correctness”The most critical feature store capability is point-in-time correctness—ensuring you only use data that was available at prediction time.
Pause and predict: If you inadvertently use future data to train your model (e.g., calculating a user’s total spend up to today for a purchase that happened last month), what will happen to your model’s evaluation metrics during offline testing versus live production?
timeline title Point-in-Time Join (Correct) Jan 1 : Purchase $50 Jan 5 : Purchase $30 Jan 10 : Purchase $100 Jan 15 : PREDICT : Known: 3 purchases, $180 total, $60 avg Jan 20 : Purchase $80 : FUTURE - Do not include!If you compute features using ALL data without enforcing a point-in-time boundary:
avg_purchase= $65 (includes the Jan 20 transaction!)- This introduces FUTURE INFORMATION into your training data.
- The model learns from data it won’t actually possess in production.
- Your offline backtests will look amazing, but the model will fail entirely when deployed.
Implementing Point-in-Time Joins
Section titled “Implementing Point-in-Time Joins”# Feast handles this automaticallyfrom feast import FeatureStore
store = FeatureStore(repo_path=".")
# Entity DataFrame with timestampsentity_df = pd.DataFrame({ "user_id": [1, 2, 3], "event_timestamp": [ datetime(2024, 1, 15), # Use features available on Jan 15 datetime(2024, 1, 16), datetime(2024, 1, 17), ]})
# Get features as of each timestamptraining_df = store.get_historical_features( entity_df=entity_df, features=[ "user_features:avg_purchase_7d", "user_features:total_purchases", ],).to_df()Feature Store Tools
Section titled “Feature Store Tools”Feast (Open Source)
Section titled “Feast (Open Source)”“Feature Store for Machine Learning”
Pros:
- Open source, free
- Cloud agnostic
- Kubernetes native
- Point-in-time joins
- Growing ecosystem
Cons:
- Less polished UI
- Smaller community
- Limited streaming capabilities
- Manual schema management
Best For: Teams wanting control, K8s environments
Feature Store Comparison
Section titled “Feature Store Comparison”| Feature Store | Type | Strengths | Best For |
|---|---|---|---|
| Feast | Open source | Flexible, K8s native | Self-hosted, multi-cloud |
| Tecton | Commercial | Streaming, enterprise | Real-time ML at scale |
| Hopsworks | Open core | ML platform integration | End-to-end ML |
| Databricks | Commercial | Spark integration | Databricks users |
| SageMaker | AWS | AWS integration | AWS-native teams |
| Vertex AI | GCP | GCP integration | GCP-native teams |
Feast Deep Dive
Section titled “Feast Deep Dive”Project Structure
Section titled “Project Structure”feast-project/├── feature_repo/│ ├── feature_store.yaml # Configuration│ ├── entities.py # Entity definitions│ ├── features.py # Feature views│ └── data_sources.py # Data source definitions├── data/│ └── user_features.parquet└── requirements.txtConfiguration
Section titled “Configuration”project: my_projectregistry: data/registry.dbprovider: localonline_store: type: sqlite path: data/online_store.dboffline_store: type: fileentity_key_serialization_version: 2Defining Features
Section titled “Defining Features”from feast import Entity
user = Entity( name="user_id", description="Unique user identifier",)
product = Entity( name="product_id", description="Unique product identifier",)from feast import FileSource
user_stats_source = FileSource( name="user_stats", path="data/user_stats.parquet", timestamp_field="event_timestamp",)from feast import FeatureView, Fieldfrom feast.types import Float32, Int64from datetime import timedelta
from entities import userfrom data_sources import user_stats_source
user_features = FeatureView( name="user_features", entities=[user], ttl=timedelta(days=1), schema=[ Field(name="total_purchases", dtype=Int64), Field(name="avg_purchase_amount", dtype=Float32), Field(name="days_since_last_purchase", dtype=Int64), ], source=user_stats_source,)Using Feast
Section titled “Using Feast”from feast import FeatureStoreimport pandas as pdfrom datetime import datetime
# Initializestore = FeatureStore(repo_path="feature_repo/")
# Apply feature definitions# Run: feast apply
# Materialize features to online store# Run: feast materialize 2024-01-01 2024-01-31
# Get training data (offline)entity_df = pd.DataFrame({ "user_id": [1, 2, 3], "event_timestamp": [datetime.now()] * 3,})
training_df = store.get_historical_features( entity_df=entity_df, features=["user_features:total_purchases", "user_features:avg_purchase_amount"],).to_df()
# Get online features (serving)online_features = store.get_online_features( features=["user_features:total_purchases", "user_features:avg_purchase_amount"], entity_rows=[{"user_id": 1}],).to_dict()
print(online_features)# {'user_id': [1], 'total_purchases': [42], 'avg_purchase_amount': [29.99]}Feature Engineering Patterns
Section titled “Feature Engineering Patterns”Pattern 1: Lag Features
Section titled “Pattern 1: Lag Features”# For time series: what happened N periods agodef create_lag_features(df, column, lags=[1, 7, 30]): for lag in lags: df[f'{column}_lag_{lag}d'] = df.groupby('user_id')[column].shift(lag) return df
# Result: value_lag_1d, value_lag_7d, value_lag_30dPattern 2: Rolling Aggregates
Section titled “Pattern 2: Rolling Aggregates”# Windowed statisticsdef create_rolling_features(df, column, windows=[7, 30, 90]): for window in windows: df[f'{column}_mean_{window}d'] = df.groupby('user_id')[column].transform( lambda x: x.rolling(window, min_periods=1).mean() ) df[f'{column}_std_{window}d'] = df.groupby('user_id')[column].transform( lambda x: x.rolling(window, min_periods=1).std() ) return dfPattern 3: Ratio Features
Section titled “Pattern 3: Ratio Features”# Comparative featuresdef create_ratio_features(df): # User vs. average user global_avg = df['purchase_amount'].mean() df['purchase_vs_avg'] = df['purchase_amount'] / global_avg
# Recent vs. historical df['recent_vs_historical'] = df['avg_7d'] / df['avg_90d']
return dfPattern 4: Interaction Features
Section titled “Pattern 4: Interaction Features”# Combine features for non-linear relationshipsdef create_interaction_features(df): df['price_x_quantity'] = df['price'] * df['quantity'] df['age_x_tenure'] = df['user_age'] * df['account_tenure'] return dfCommon Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| No point-in-time joins | Data leakage, false confidence | Use feature store with timestamps |
| Feature computed twice | Training/serving skew | Single definition, feature store |
| Missing feature versioning | Can’t reproduce models | Version features with models |
| Too many features | Overfitting, slow inference | Feature selection, importance analysis |
| No feature documentation | Team can’t understand/reuse | Document every feature |
| Ignoring feature freshness | Stale predictions | TTL and monitoring |
Test your understanding:
1. Your data science team built a fraud detection model that achieves 95% accuracy in offline testing using a massive Parquet dataset. When deployed to production using a real-time Redis cache and a Java-based serving API, the model's accuracy drops to 60%. What is the most likely architectural cause of this massive performance drop?
Answer: This is a classic symptom of training/serving skew, which occurs when feature computation logic diverges between the offline training environment and the online serving environment. In this scenario, the batch transformations applied to the Parquet dataset (e.g., aggregating 30-day transaction volumes) likely do not mathematically match the real-time Java code extracting data from the Redis cache. Even minor discrepancies—such as different timezone handling, NULL value imputation, or trailing window boundaries—will result in the model receiving inputs it has never seen before. A feature store resolves this by ensuring a single, centralized definition generates both the historical training data and the real-time serving vectors.
2. You are tasked with designing a system that must supply 10 years of historical user behavior to train a new recommendation model, while simultaneously supplying the current user's last 5 clicks to the live website with under 10 milliseconds of latency. Why would attempting to use a single database (like PostgreSQL or Snowflake) for both of these workloads fail?
Answer: Attempting to use a single database will fail because the workload requirements are fundamentally opposed, which is exactly why feature stores separate the offline and online stores. An offline store (typically a data lake or warehouse like Snowflake) is optimized for high-throughput batch queries across massive historical datasets, which is necessary for point-in-time correct training data but far too slow for real-time inference. Conversely, an online store (like Redis) is optimized for ultra-low latency key-value lookups for individual entities, but would be prohibitively expensive and inefficient for storing and joining years of historical data. By splitting the architecture, a feature store can independently optimize both workloads while maintaining a single logical definition of the features.
3. Your ML engineer trained a model to predict customer churn. They calculated a feature called "total_support_tickets" by querying the entire database for each customer's ticket history up to today, and joined it to churn events from six months ago. The model looks fantastic in backtesting. What critical mistake was made, and what will happen when this model is deployed?
Answer: The engineer failed to enforce point-in-time correctness, meaning they introduced severe data leakage into the training dataset. By including support tickets from the last six months in the feature calculation for a churn event that happened six months ago, the model was trained using future information it would never have in a real-time scenario. When deployed, the model will catastrophically underperform because the production system will only have access to strictly past data, rendering the learned patterns useless. A feature store prevents this by performing automated point-in-time joins, “time-traveling” to calculate the exact feature values as they existed at the specific moment of the historical event.
4. Your startup is building its first machine learning feature—a simple daily batch job that predicts which users might upgrade their subscription based on three static demographic features. The CTO suggests implementing Feast and Redis to ensure "enterprise readiness." Why is this likely a bad architectural decision?
Answer: Implementing a feature store in this scenario introduces massive unnecessary complexity and operational overhead for a use case that does not actually require it. Feature stores are designed to solve problems of scale, specifically training/serving skew, feature reuse across multiple models, and low-latency real-time inference. Since your model runs as a simple daily batch job using only a few static features, there is no online serving component, no strict latency requirement, and no complex feature sharing needed. Adopting a feature store too early will slow down development and waste engineering resources; you should wait until you experience the pain of feature duplication or require real-time serving before introducing this infrastructure.
Hands-On Exercise: Build a Feature Store
Section titled “Hands-On Exercise: Build a Feature Store”Let’s build a complete feature store with Feast:
# Create project directorymkdir feast-demo && cd feast-demo
# Create and activate virtual environmentpython -m venv venvsource venv/bin/activate
# Install Feastpip install feast pandas pyarrowStep 1: Initialize Feast Project
Section titled “Step 1: Initialize Feast Project”feast init feature_repocd feature_repoStep 2: Create Sample Data
Section titled “Step 2: Create Sample Data”import pandas as pdimport numpy as npfrom datetime import datetime, timedelta
# Generate user feature datanp.random.seed(42)n_users = 100n_days = 30
data = []for user_id in range(1, n_users + 1): for day in range(n_days): timestamp = datetime(2024, 1, 1) + timedelta(days=day) data.append({ "user_id": user_id, "event_timestamp": timestamp, "total_purchases": np.random.randint(0, 100), "avg_purchase_amount": round(np.random.uniform(10, 200), 2), "days_since_last_purchase": np.random.randint(0, 30), })
df = pd.DataFrame(data)df.to_parquet("data/user_features.parquet")print(f"Created {len(df)} records")print(df.head())mkdir -p datapython create_data.pyStep 3: Define Features
Section titled “Step 3: Define Features”from datetime import timedeltafrom feast import Entity, FeatureView, Field, FileSourcefrom feast.types import Float32, Int64
# Entityuser = Entity( name="user_id", join_keys=["user_id"], description="User identifier",)
# Data sourceuser_features_source = FileSource( name="user_features_source", path="data/user_features.parquet", timestamp_field="event_timestamp",)
# Feature viewuser_features = FeatureView( name="user_features", entities=[user], ttl=timedelta(days=1), schema=[ Field(name="total_purchases", dtype=Int64), Field(name="avg_purchase_amount", dtype=Float32), Field(name="days_since_last_purchase", dtype=Int64), ], source=user_features_source, online=True,)Step 4: Apply and Materialize
Section titled “Step 4: Apply and Materialize”# Apply feature definitionsfeast apply
# Materialize to online storefeast materialize 2024-01-01 2024-02-01Step 5: Use Features
Section titled “Step 5: Use Features”from feast import FeatureStoreimport pandas as pdfrom datetime import datetime
store = FeatureStore(repo_path=".")
# Training: Get historical featuresentity_df = pd.DataFrame({ "user_id": [1, 2, 3, 4, 5], "event_timestamp": [datetime(2024, 1, 15)] * 5, # Point-in-time})
training_df = store.get_historical_features( entity_df=entity_df, features=[ "user_features:total_purchases", "user_features:avg_purchase_amount", "user_features:days_since_last_purchase", ],).to_df()
print("Training data (point-in-time as of Jan 15):")print(training_df)
# Serving: Get online featuresonline_features = store.get_online_features( features=[ "user_features:total_purchases", "user_features:avg_purchase_amount", ], entity_rows=[ {"user_id": 1}, {"user_id": 2}, ],).to_dict()
print("\nOnline features (latest):")for key, values in online_features.items(): print(f" {key}: {values}")Success Criteria
Section titled “Success Criteria”You’ve completed this exercise when you can:
- Create sample feature data
- Define entities and feature views in Feast
- Apply feature definitions
- Materialize features to online store
- Retrieve historical features for training (point-in-time)
- Retrieve online features for serving (latest values)
Key Takeaways
Section titled “Key Takeaways”- Feature stores solve training/serving skew: Single source of truth for features
- Offline and online stores serve different needs: Training vs. real-time inference
- Point-in-time correctness prevents data leakage: Only use data available at prediction time
- Feature engineering is reusable: Compute once, use across models
- Start simple: Feast provides core functionality without vendor lock-in
Further Reading
Section titled “Further Reading”- Feast Documentation — Open source feature store
- Feature Store for ML — Community resources
- Uber Michelangelo — Uber’s ML platform
Summary
Section titled “Summary”Feature stores are the backbone of production ML. They ensure consistency between training and serving, prevent data leakage through point-in-time correctness, and enable feature reuse across teams. While they add complexity, the alternative—debugging training/serving skew in production—is far more expensive.
Next Module
Section titled “Next Module”Continue to Module 5.3: Model Training & Experimentation to learn how to build reproducible training pipelines with experiment tracking.
Sources
Section titled “Sources”- github.com: feast — The Feast GitHub README explicitly describes Feast as an open-source feature store with offline and low-latency online serving plus point-in-time correct feature sets.
- docs.aws.amazon.com: feature store.html — AWS documentation explicitly describes SageMaker Feature Store’s online and offline stores, real-time low-latency reads, historical training use, and feature discovery within SageMaker workflows.
- cloud.google.com: overview — Google Cloud’s Vertex AI Feature Store overview describes it as a managed cloud-native Vertex AI service with online serving, offline history in BigQuery, and integrated metadata search/discovery.
- Amazon SageMaker Feature Store Concepts — It clearly explains online versus offline stores, latest-versus-historical records, and the single-source-of-truth model for feature data.
- Monitor Models for Training-Serving Skew with Vertex AI — It is a strong primary explainer for why training-serving skew matters operationally and how it shows up in production ML systems.