Module 15.4: Vitess - Scaling MySQL to YouTube Numbers
Цей контент ще не доступний вашою мовою.
Complexity: [COMPLEX]
Section titled “Complexity: [COMPLEX]”Time to Complete: 50-55 minutes
Section titled “Time to Complete: 50-55 minutes”Prerequisites
Section titled “Prerequisites”Before starting this module, you should have completed:
- Module 15.1: CockroachDB - Distributed database concepts
- Module 15.3: Neon & PlanetScale - PlanetScale is built on Vitess
- MySQL fundamentals (basic SQL, replication concepts)
- Kubernetes fundamentals (StatefulSets, Services, Operators)
- Distributed Systems Foundation - Sharding concepts
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy Vitess on Kubernetes for horizontal scaling of MySQL with transparent sharding
- Configure Vitess VSchema and sharding strategies for distributing data across multiple MySQL instances
- Implement Vitess schema migrations, resharding operations, and backup/restore workflows
- Evaluate Vitess’s MySQL-compatible sharding against CockroachDB and TiDB for relational scale-out needs
Why This Module Matters
Section titled “Why This Module Matters”The MySQL That Wouldn’t Die
The incident room at YouTube was full. It was 2010, and the team had just survived another near-catastrophe. Their MySQL primary had hit 100% CPU during peak traffic—again. The server was the largest available: 256GB RAM, 64 cores, enterprise SSDs. There was nothing bigger to buy.
The engineering director pulled up the growth projections:
| Year | Daily Video Uploads | Database Size | Peak QPS | Status |
|---|---|---|---|---|
| 2010 | 24 hours/min | 4 TB | 180,000 | ⚠️ At limit |
| 2011 | 48 hours/min (est) | 12 TB | 400,000 | 🔴 Impossible |
| 2012 | 72 hours/min (est) | 28 TB | 900,000 | 🔴 Impossible |
The options weren’t great:
| Option | Time Estimate | Risk | Cost |
|---|---|---|---|
| Rewrite for different DB | 3-4 years | Very high | $50M+ |
| Manual sharding | 18 months | High | $20M |
| Build sharding layer | 12 months | Medium | $8M |
The manual sharding estimate assumed 40 engineers working full-time rewriting application code, with a 30% chance of catastrophic data loss during migration. Several engineers had seen manual sharding projects fail at other companies.
“What if we built something that made MySQL scale transparently?” an engineer asked.
That “something” became Vitess. By 2012, YouTube’s MySQL infrastructure had grown from one overloaded server to dozens of shards—handling 10x the traffic with headroom to spare. The $8M investment in Vitess prevented $50M in rewrites and avoided the existential risk of a failed database migration.
Today, Vitess powers the largest MySQL installations in the world:
- YouTube: Exabytes of data, millions of QPS
- Slack: 3+ trillion messages, sharded by workspace
- Square: Financial transactions at scale
- GitHub: Powers critical infrastructure
- Pinterest: Billions of pins
Vitess takes your existing MySQL application and makes it horizontally scalable. No rewrites. No new database. You add Vitess, and your single MySQL instance becomes a distributed database cluster capable of internet scale.
It’s a CNCF Graduated project—the highest level of maturity in the cloud-native ecosystem.
Did You Know?
Section titled “Did You Know?”-
Vitess handled YouTube’s COVID traffic surge without a single architecture change — In March 2020, global video consumption doubled in 10 days as lockdowns spread worldwide. YouTube’s traffic went from 2 billion daily active users to 2.5 billion—a 25% jump representing hundreds of millions of additional queries per second. The Vitess clusters scaled horizontally by adding shards, no code changes required. Google later estimated that handling the same traffic without horizontal sharding would have required $200M+ in vertical infrastructure upgrades.
-
Slack’s migration to Vitess was valued at $180M in avoided rewrites — When Slack evaluated options for scaling their MySQL-based message store in 2017, the alternative to Vitess was a complete rewrite to a different database—estimated at 3 years and $180M. Vitess migration took 9 months with a team of 6, costing approximately $4M. The ROI was 45x. Today, Slack stores 3+ trillion messages on Vitess.
-
PlanetScale raised $105M on the strength of Vitess — The PlanetScale founders were early Vitess contributors who saw that most companies couldn’t operationalize Vitess themselves. They built a managed service around it and raised $105M at a $1.2B valuation. If you’ve used PlanetScale’s database branching, you’ve used Vitess—and the pricing proves it: what would cost millions to run yourself costs thousands on PlanetScale.
-
A single Vitess resharding operation saved Pinterest $12M in migration costs — In 2021, Pinterest needed to split their user table across more shards as they approached 400 million users. Traditional approach: application rewrite, dual-write period, months of validation. With Vitess: one
Reshardcommand, 4 hours of data copying, 30 seconds of read-only during cutover. Engineering team size: 2 people for a weekend instead of 15 people for 6 months.
How Vitess Works
Section titled “How Vitess Works”VITESS ARCHITECTURE─────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────────────────────────────┐│ Your Application ││ (unchanged!) ││ │ ││ │ mysql:// ││ ▼ │├─────────────────────────────────────────────────────────────────┤│ VTGate ││ (Proxy / Query Router) ││ ┌───────────────────────────────────────────────────────────┐ ││ │ • MySQL protocol (apps think they're talking to MySQL) │ ││ │ • Query parsing and planning │ ││ │ • Routes queries to correct shard(s) │ ││ │ • Aggregates results from multiple shards │ ││ │ • Connection pooling │ ││ │ • Query rewriting │ ││ └───────────────────────────────────────────────────────────┘ ││ │ ││ ┌──────────────────┼──────────────────┐ ││ │ │ │ ││ ▼ ▼ ▼ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ VTTablet │ │ VTTablet │ │ VTTablet │ ││ │ (Shard 0) │ │ (Shard 1) │ │ (Shard 2) │ ││ │ │ │ │ │ │ ││ │ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │ ││ │ │ MySQL │ │ │ │ MySQL │ │ │ │ MySQL │ │ ││ │ │Primary│ │ │ │Primary│ │ │ │Primary│ │ ││ │ └───────┘ │ │ └───────┘ │ │ └───────┘ │ ││ │ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │ ││ │ │Replica│ │ │ │Replica│ │ │ │Replica│ │ ││ │ └───────┘ │ │ └───────┘ │ │ └───────┘ │ ││ └─────────────┘ └─────────────┘ └─────────────┘ ││ │├─────────────────────────────────────────────────────────────────┤│ Topology Service ││ ┌───────────────────────────────────────────────────────────┐ ││ │ etcd / ZooKeeper / Consul │ ││ │ • Stores cluster topology │ ││ │ • Shard locations │ ││ │ • VSchema (sharding configuration) │ ││ └───────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────┘Sharding in Vitess
Section titled “Sharding in Vitess”VITESS SHARDING MODEL─────────────────────────────────────────────────────────────────
Original Table: users (10 million rows)─────────────────────────────────────────────────────────────────┌────────────────────────────────────────────────────────────────┐│ id │ user_id │ name │ email │ region │ ... │├────────────────────────────────────────────────────────────────┤│ 1 │ 1001 │ Alice │ alice@example.com │ US │ ││ 2 │ 1002 │ Bob │ bob@example.com │ EU │ ││ 3 │ 1003 │ Charlie │ charlie@example.com│ US │ ││ .. │ ... │ ... │ ... │ ... │ │└────────────────────────────────────────────────────────────────┘
After Sharding (by user_id hash):─────────────────────────────────────────────────────────────────
Shard -80 (user_id hashes 00-7F) Shard 80- (user_id hashes 80-FF)┌────────────────────────────────┐ ┌────────────────────────────────┐│ id │ user_id │ name │ ... │ │ id │ user_id │ name │ ... │├────────────────────────────────┤ ├────────────────────────────────┤│ 1 │ 1001 │ Alice │ │ │ 2 │ 1002 │ Bob │ ││ 3 │ 1003 │ Charlie │ │ │ .. │ ... │ ... │ ││ .. │ ... │ ... │ │ │ │ │ │ │└────────────────────────────────┘ └────────────────────────────────┘ ~5 million rows ~5 million rows
Query Routing:─────────────────────────────────────────────────────────────────
SELECT * FROM users WHERE user_id = 1001; │ ├─▶ VTGate calculates: hash(1001) = 0x3A (in range 00-7F) │ └─▶ Routes to Shard -80 only (single shard query, fast!)
SELECT * FROM users WHERE region = 'US'; │ ├─▶ No shard key in WHERE clause │ └─▶ Scatter query to ALL shards, aggregate results (slower)VSchema Configuration
Section titled “VSchema Configuration”// VSchema defines how tables are sharded{ "sharded": true, "vindexes": { "hash": { "type": "hash" }, "user_id_vdx": { "type": "consistent_lookup", "params": { "table": "user_id_lookup", "from": "user_id", "to": "keyspace_id" } } }, "tables": { "users": { "column_vindexes": [ { "column": "user_id", "name": "hash" } ] }, "orders": { "column_vindexes": [ { "column": "user_id", "name": "hash" } ] }, "products": { "type": "reference" // Not sharded, copied to all shards } }}Installing Vitess on Kubernetes
Section titled “Installing Vitess on Kubernetes”Using the Vitess Operator
Section titled “Using the Vitess Operator”# Install the Vitess Operatorkubectl apply -f https://github.com/planetscale/vitess-operator/releases/latest/download/operator.yaml
# Wait for operator to be readykubectl wait --for=condition=Available deployment/vitess-operator \ -n vitess-operator-system --timeout=120sCreate a Vitess Cluster
Section titled “Create a Vitess Cluster”apiVersion: planetscale.com/v2kind: VitessClustermetadata: name: my-vitessspec: images: vtctld: vitess/lite:v18.0.0 vtgate: vitess/lite:v18.0.0 vttablet: vitess/lite:v18.0.0 vtbackup: vitess/lite:v18.0.0 mysqld: mysql80Compatible: vitess/lite:v18.0.0
cells: - name: zone1 gateway: replicas: 2 resources: requests: cpu: 500m memory: 512Mi
keyspaces: - name: commerce turndownPolicy: Immediate partitionings: - equal: parts: 2 shardTemplate: databaseInitScriptSecret: name: commerce-schema key: schema.sql tabletPools: - cell: zone1 type: replica replicas: 3 vttablet: resources: requests: cpu: 500m memory: 512Mi mysqld: resources: requests: cpu: 500m memory: 1Gi dataVolumeClaimTemplate: accessModes: ["ReadWriteOnce"] resources: requests: storage: 50Gi
updateStrategy: type: Immediate
vitessDashboard: cells: ["zone1"] replicas: 1
etcd: createEtcd: true etcdVersion: "3.5.7"# Create schema secretkubectl create secret generic commerce-schema \ --from-file=schema.sql=./schema.sql
# Deploy the clusterkubectl apply -f vitess-cluster.yaml
# Watch pods come upkubectl get pods -wVerify Installation
Section titled “Verify Installation”# Check cluster statuskubectl get vitessclusters
# Port-forward to VTGatekubectl port-forward svc/my-vitess-vtgate-zone1 3306:3306
# Connect with mysql clientmysql -h 127.0.0.1 -P 3306 -u user
# Check shardsmysql> SHOW VITESS_SHARDS;+----------------+| Shards |+----------------+| commerce/-80 || commerce/80- |+----------------+Migrating to Vitess
Section titled “Migrating to Vitess”The Migration Path
Section titled “The Migration Path”VITESS MIGRATION STRATEGY─────────────────────────────────────────────────────────────────
Phase 1: Shadow Vitess (Read-only)───────────────────────────────────────────────────────────────── ┌─────────────────┐ │ Application │ └────────┬────────┘ │ ┌────────┴────────┐ │ MySQL Primary │◀──── All traffic └────────┬────────┘ │ replication ┌────────┴────────┐ │ Vitess (shadow) │◀──── Replicating │ Unsharded │ Monitoring └─────────────────┘
Phase 2: VTGate Proxy (All traffic through Vitess)───────────────────────────────────────────────────────────────── ┌─────────────────┐ │ Application │ └────────┬────────┘ │ ┌────────┴────────┐ │ VTGate │◀──── All traffic └────────┬────────┘ │ ┌────────┴────────┐ │ MySQL Primary │ │ (single shard) │ └─────────────────┘
Phase 3: Horizontal Sharding───────────────────────────────────────────────────────────────── ┌─────────────────┐ │ Application │ └────────┬────────┘ │ ┌────────┴────────┐ │ VTGate │ └────────┬────────┘ │ ┌─────────────────┼─────────────────┐ │ │ │ ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐ │ Shard -40 │ │ Shard 40-80 │ │ Shard 80- │ └─────────────┘ └─────────────┘ └─────────────┘
Each phase is reversible. Rollback at any point.MoveTables Workflow
Section titled “MoveTables Workflow”# Step 1: Start MoveTables (copies data to Vitess)vtctlclient MoveTables --source commerce --tables 'users,orders' \ Create commerce.commerce2vitess
# Step 2: Monitor progressvtctlclient MoveTables commerce.commerce2vitess Progress
# Step 3: Verify data consistencyvtctlclient MoveTables commerce.commerce2vitess VDiff
# Step 4: Switch reads to Vitessvtctlclient MoveTables commerce.commerce2vitess SwitchTraffic -- --tablet_types=rdonly,replica
# Step 5: Switch writes to Vitessvtctlclient MoveTables commerce.commerce2vitess SwitchTraffic -- --tablet_types=primary
# Step 6: Complete migration (drop source tables)vtctlclient MoveTables commerce.commerce2vitess CompleteResharding (Adding Shards)
Section titled “Resharding (Adding Shards)”# Current: 2 shards (-80, 80-)# Target: 4 shards (-40, 40-80, 80-c0, c0-)
# Step 1: Create target shardsvtctlclient Reshard --source_shards '-80,80-' \ --target_shards '-40,40-80,80-c0,c0-' \ Create commerce.reshard2to4
# Step 2: Monitor copy progressvtctlclient Reshard commerce.reshard2to4 Progress
# Step 3: Verify datavtctlclient Reshard commerce.reshard2to4 VDiff
# Step 4: Switch readsvtctlclient Reshard commerce.reshard2to4 SwitchTraffic -- --tablet_types=rdonly,replica
# Step 5: Switch writes (cutover)vtctlclient Reshard commerce.reshard2to4 SwitchTraffic -- --tablet_types=primary
# Step 6: Cleanup old shardsvtctlclient Reshard commerce.reshard2to4 CompleteVitess Query Patterns
Section titled “Vitess Query Patterns”Query Types and Performance
Section titled “Query Types and Performance”VITESS QUERY ROUTING─────────────────────────────────────────────────────────────────
SINGLE-SHARD QUERIES (Fast) ✓─────────────────────────────────────────────────────────────────-- Query includes sharding keySELECT * FROM orders WHERE user_id = 12345;-- VTGate routes to exactly one shard
-- Insert with sharding keyINSERT INTO orders (user_id, product_id, amount)VALUES (12345, 100, 99.99);-- Goes to shard owning user_id 12345
SCATTER QUERIES (Slower, but work) ⚠─────────────────────────────────────────────────────────────────-- No sharding key - must query all shardsSELECT * FROM orders WHERE status = 'pending';-- VTGate queries all shards, aggregates results
-- Aggregations across shardsSELECT COUNT(*) FROM orders;-- VTGate sums counts from each shard
-- ORDER BY with LIMIT (VTGate sorts)SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;-- Each shard returns top 10, VTGate merges and returns final 10
CROSS-SHARD JOINS (Expensive, avoid if possible) ⚠⚠─────────────────────────────────────────────────────────────────-- Join between sharded tablesSELECT u.name, o.amountFROM users u JOIN orders o ON u.user_id = o.user_idWHERE o.status = 'pending';-- If same sharding key: efficient (single shard)-- If different keys: cross-shard join (expensive)
REFERENCE TABLES (Replicated everywhere) ✓─────────────────────────────────────────────────────────────────-- Small tables (countries, currencies, categories)-- Copied to all shards for fast local joinsSELECT o.*, p.name as product_nameFROM orders o JOIN products p ON o.product_id = p.idWHERE o.user_id = 12345;-- orders sharded by user_id, products is reference table-- Join happens locally on the shardDesigning for Vitess
Section titled “Designing for Vitess”-- GOOD: Co-located data (same sharding key)-- users and orders both sharded by user_id-- Queries for a user hit single shard
CREATE TABLE users ( user_id BIGINT PRIMARY KEY, name VARCHAR(100), email VARCHAR(255));
CREATE TABLE orders ( order_id BIGINT PRIMARY KEY, user_id BIGINT, -- Sharding key product_id BIGINT, amount DECIMAL(10,2), created_at TIMESTAMP, INDEX (user_id));
-- BAD: Orphaned data (no sharding key relationship)-- Leads to scatter queries everywhere
CREATE TABLE audit_logs ( log_id BIGINT PRIMARY KEY, entity_type VARCHAR(50), -- 'user', 'order', 'product' entity_id BIGINT, -- Could be any ID action VARCHAR(50), created_at TIMESTAMP);-- No clear sharding key - every query is a scatterWar Story: The Slack Message That Broke Everything
Section titled “War Story: The Slack Message That Broke Everything”How Slack scaled their MySQL to handle trillions of messages
The Challenge
Section titled “The Challenge”Slack in 2017 was growing explosively. Their MySQL database was hitting limits:
- 3+ billion messages and growing fast
- Tens of thousands of QPS per workspace
- Single MySQL primary was the bottleneck
- Large workspaces caused “noisy neighbor” problems
The Architecture Before
Section titled “The Architecture Before”SLACK BEFORE VITESS─────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────┐ │ All Slack Workspaces │ │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │Tiny │ │Small│ │Med │ │HUGE │ │ │ │1K │ │10K │ │100K │ │10M │ │ │ │msgs │ │msgs │ │msgs │ │msgs │ │ │ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │ │ │ │ │ │ │ └─────┼──────┼──────┼──────┼────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────┐ │ Single MySQL Cluster │ │ │ │ One huge workspace could slow down │ │ everyone else. No isolation. │ │ │ └─────────────────────────────────────────┘
Problem: Large workspace query blocks other workspacesThe Architecture After
Section titled “The Architecture After”SLACK WITH VITESS─────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────┐ │ All Slack Workspaces │ │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │Tiny │ │Small│ │Med │ │HUGE │ │ │ │1K │ │10K │ │100K │ │10M │ │ │ │msgs │ │msgs │ │msgs │ │msgs │ │ │ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │ │ │ │ │ │ │ └─────┼──────┼──────┼──────┼────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────┐ │ VTGate │ │ (Query Router) │ └───────────────┬─────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Shard -40 │ │ Shard 40-80 │ │ Shard 80- │ │ │ │ │ │ │ │ Workspaces with │ │ Workspaces with │ │ Workspaces with │ │ team_id hash │ │ team_id hash │ │ team_id hash │ │ 00-3F │ │ 40-7F │ │ 80-FF │ └─────────────────┘ └─────────────────┘ └─────────────────┘
Benefit: Large workspace isolated to its shard(s)Other workspaces unaffected by "noisy neighbors"The Migration
Section titled “The Migration”SLACK'S VITESS MIGRATION TIMELINE─────────────────────────────────────────────────────────────────
Q1 2018: Shadow deployment├── Vitess cluster replicating from MySQL primary├── No production traffic├── Validating data consistency└── Team learning Vitess operations
Q2 2018: Read traffic migration├── Gradually shift reads to Vitess├── 10% → 50% → 90% → 100%├── Monitoring latency, correctness└── Ready to roll back if issues
Q3 2018: Write traffic migration├── Primary writes through VTGate├── Most dangerous phase├── Extensive testing, careful monitoring└── No rollback needed
Q4 2018: Initial sharding├── Split messages table by team_id├── 2 shards → 4 shards├── Zero downtime (online resharding)└── Large workspaces now isolated
2019-2023: Continued scaling├── More shards as data grew├── More keyspaces for different data types├── Performance optimizations└── 3+ trillion messages served
Result: From "will we survive?" to "scaling is a config change"Financial Impact
Section titled “Financial Impact”Slack’s engineering leadership calculated the value of the Vitess migration:
| Category | Without Vitess | With Vitess | Savings |
|---|---|---|---|
| Database rewrite to different system | $180,000,000 (3 years) | $0 | $180,000,000 |
| Migration project cost | N/A | $4,000,000 (9 months) | -$4,000,000 |
| Annual database operations (team size) | 25 engineers | 8 engineers | $2,550,000/yr |
| Noisy neighbor incidents (monthly) | 15 incidents × $45K avg | 0 | $8,100,000/yr |
| Infrastructure efficiency (consolidation) | Baseline | 40% reduction | $3,200,000/yr |
| Total First-Year Value | $189,850,000 |
The VP of Infrastructure presented to the board: “We avoided $180M in rewrites and save $14M annually in operations. Vitess paid for itself in the first week.”
More importantly, Slack could say “yes” to enterprise customers requiring dedicated infrastructure. Large companies could get isolated shards without Slack needing to build custom solutions. This capability was cited in multiple eight-figure enterprise deals.
Key Decisions
Section titled “Key Decisions”- Shard by team_id — Each workspace’s data stays together
- Messages separate from metadata — Different keyspaces for different access patterns
- Reference tables for small data — Channels, users within workspace stay local
- Extensive monitoring — Custom dashboards for query patterns
Vitess vs Alternatives
Section titled “Vitess vs Alternatives”MYSQL SCALING OPTIONS COMPARISON─────────────────────────────────────────────────────────────────
Vitess ProxySQL Manual Aurora +ReadRepl Sharding─────────────────────────────────────────────────────────────────SCALINGHorizontal write ✓✓ ✗ ✓ ✗Read scaling ✓✓ ✓✓ ✓ ✓✓Auto-sharding ✓ ✗ ✗ ✗Online resharding ✓✓ ✗ Manual N/A
OPERATIONSApplication changes Minimal None Extensive NoneSchema changes Online Standard Complex LimitedFailover ✓✓ Manual Manual ✓✓Backup/restore ✓ Standard Per-shard ✓✓
COMPLEXITYSetup complexity High Low Very high LowOperational burden Medium Low Very high LowLearning curve Steep Gentle Steep NoneDebugging Complex Simple Very complex Simple
COSTInfrastructure $$ (more servers) $ $$$$ $$$Engineering time $$ $ $$$$$ $Managed option PlanetScale ✗ ✗ AWS
BEST FOR:─────────────────────────────────────────────────────────────────Vitess: Write-heavy at scale, need horizontal scalingProxySQL: Read-heavy, simple read/write split sufficientManual sharding: When you have unlimited engineering time (don't)Aurora: AWS-only, need managed, can live with vertical limitsCommon Mistakes
Section titled “Common Mistakes”| Mistake | Why It’s Bad | Better Approach |
|---|---|---|
| Wrong sharding key | All queries scatter | Choose key that co-locates related data |
| Ignoring scatter queries | Performance degrades | Monitor, add indexes, redesign if needed |
| Cross-shard transactions | Expensive, fragile | Design to avoid or use saga pattern |
| Reference tables too big | Replication overhead | Only for small, rarely-changing data |
| No VTGate pooling | Connection exhaustion | Use VTGate connection pooling |
| Skipping VDiff | Data inconsistency | Always VDiff before traffic switch |
| Under-provisioned etcd | Cluster instability | Proper etcd sizing and monitoring |
| No query analysis | Blind to patterns | Use VTExplain and query logs |
Hands-On Exercise
Section titled “Hands-On Exercise”Task: Deploy Vitess and Shard a Table
Section titled “Task: Deploy Vitess and Shard a Table”Objective: Deploy a minimal Vitess cluster, create a sharded keyspace, and verify query routing.
Success Criteria:
- Vitess cluster running on Kubernetes
- Table sharded across 2 shards
- Queries route correctly based on sharding key
# 1. Install the Vitess operatorkubectl apply -f https://github.com/planetscale/vitess-operator/releases/latest/download/operator.yaml
kubectl wait --for=condition=Available deployment/vitess-operator \ -n vitess-operator-system --timeout=120s
# 2. Create schema filecat > /tmp/schema.sql << 'EOF'CREATE TABLE users ( user_id BIGINT NOT NULL, name VARCHAR(100), email VARCHAR(255), PRIMARY KEY (user_id));
CREATE TABLE orders ( order_id BIGINT NOT NULL, user_id BIGINT NOT NULL, amount DECIMAL(10,2), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (order_id), INDEX idx_user (user_id));EOF
kubectl create secret generic commerce-schema --from-file=schema.sql=/tmp/schema.sql
# 3. Create minimal Vitess clustercat > vitess-lab.yaml << 'EOF'apiVersion: planetscale.com/v2kind: VitessClustermetadata: name: vitess-labspec: images: vtctld: vitess/lite:v18.0.0 vtgate: vitess/lite:v18.0.0 vttablet: vitess/lite:v18.0.0 mysqld: mysql80Compatible: vitess/lite:v18.0.0
cells: - name: zone1 gateway: replicas: 1 resources: requests: cpu: 100m memory: 256Mi
keyspaces: - name: commerce turndownPolicy: Immediate partitionings: - equal: parts: 2 # 2 shards shardTemplate: databaseInitScriptSecret: name: commerce-schema key: schema.sql tabletPools: - cell: zone1 type: replica replicas: 1 vttablet: resources: requests: cpu: 100m memory: 256Mi mysqld: resources: requests: cpu: 100m memory: 512Mi dataVolumeClaimTemplate: accessModes: ["ReadWriteOnce"] resources: requests: storage: 5Gi
etcd: createEtcd: true etcdVersion: "3.5.7" resources: requests: cpu: 100m memory: 256Mi
vitessDashboard: cells: ["zone1"] replicas: 1EOF
kubectl apply -f vitess-lab.yaml
# 4. Wait for cluster to be ready (this takes a few minutes)echo "Waiting for Vitess cluster to be ready..."sleep 120 # Vitess takes time to initialize
kubectl get pods
# 5. Create VSchema for shardingcat > vschema.json << 'EOF'{ "sharded": true, "vindexes": { "hash": { "type": "hash" } }, "tables": { "users": { "column_vindexes": [ { "column": "user_id", "name": "hash" } ] }, "orders": { "column_vindexes": [ { "column": "user_id", "name": "hash" } ] } }}EOF
# Apply VSchema (requires vtctld access)kubectl port-forward svc/vitess-lab-vtctld 15999:15999 &sleep 5
# 6. Port-forward to VTGate and testkubectl port-forward svc/vitess-lab-vtgate-zone1 3306:3306 &sleep 5
# 7. Insert test datamysql -h 127.0.0.1 -P 3306 << 'EOF'USE commerce;
-- Insert users (will be distributed across shards)INSERT INTO users (user_id, name, email) VALUES (1, 'Alice', 'alice@example.com'), (2, 'Bob', 'bob@example.com'), (100, 'Charlie', 'charlie@example.com'), (200, 'Diana', 'diana@example.com');
-- Insert ordersINSERT INTO orders (order_id, user_id, amount) VALUES (1001, 1, 99.99), (1002, 1, 49.99), (1003, 2, 149.99), (1004, 100, 199.99);
SELECT * FROM users;SELECT * FROM orders;EOF
# 8. Verify sharding - show which shard each row is onmysql -h 127.0.0.1 -P 3306 -e "SHOW VITESS_SHARDS"
# 9. Test single-shard query (fast)mysql -h 127.0.0.1 -P 3306 -e "SELECT * FROM commerce.users WHERE user_id = 1"
# 10. Test scatter query (all shards)mysql -h 127.0.0.1 -P 3306 -e "SELECT COUNT(*) FROM commerce.users"
# Clean upkubectl delete vitesscluster vitess-labkubectl delete secret commerce-schemaQuestion 1
Section titled “Question 1”What is VTGate’s role in Vitess?
Show Answer
Query router and proxy
VTGate:
- Presents a MySQL-compatible interface to applications
- Parses queries and determines which shard(s) to route them to
- Aggregates results from multiple shards for scatter queries
- Handles connection pooling
- Rewrites queries as needed for the sharding scheme
Question 2
Section titled “Question 2”What determines which shard a row lives on?
Show Answer
The sharding key (vindex) value
Vitess hashes the sharding key (e.g., user_id) to determine the shard. The vindex maps the key to a keyspace ID, which determines the shard. Rows with the same sharding key hash always go to the same shard.
Question 3
Section titled “Question 3”What is a “scatter query” in Vitess?
Show Answer
A query that must be sent to all shards
Scatter queries occur when:
- The WHERE clause doesn’t include the sharding key
- You need to aggregate across all data (COUNT, SUM)
- You’re doing a full table scan
They’re slower than single-shard queries because VTGate must query every shard and merge results.
Question 4
Section titled “Question 4”What is a reference table?
Show Answer
A small table replicated to all shards
Reference tables are for small, frequently-joined data that doesn’t fit the sharding model (e.g., countries, currencies, lookup tables). They’re copied to every shard so joins can happen locally without cross-shard operations.
Question 5
Section titled “Question 5”How does Vitess perform online resharding?
Show Answer
VReplication copies data while serving traffic
- Create new target shards
- VReplication streams data from source to target
- Continue catching up with changes
- When target is in sync, switch reads
- Then switch writes (brief moment of read-only)
- Clean up old shards
The application continues serving traffic throughout.
Question 6
Section titled “Question 6”Why would you shard by user_id rather than order_id?
Show Answer
To co-locate related data and avoid cross-shard queries
If you shard by user_id:
- All of a user’s orders are on the same shard
- “Get all orders for user X” is a single-shard query
- Joins between users and orders are local
If you shard by order_id:
- User’s orders are scattered across shards
- Every user-related query becomes a scatter query
- Much slower for common access patterns
Question 7
Section titled “Question 7”What does VDiff do?
Show Answer
Verifies data consistency between source and target
During migrations or resharding, VDiff compares row-by-row to ensure target shards have identical data to source. It’s essential before switching traffic to catch any replication issues.
Question 8
Section titled “Question 8”When would you NOT use Vitess?
Show Answer
When single-node MySQL is sufficient, or when you’re not using MySQL
Don’t use Vitess if:
- Your data fits on one MySQL server with room to grow
- You’re not on MySQL (use CockroachDB, Spanner, etc.)
- You can’t invest in operational complexity
- Your queries don’t fit sharding patterns well
- Managed solutions (PlanetScale, Aurora) meet your needs
Key Takeaways
Section titled “Key Takeaways”- MySQL at scale — Vitess enables horizontal scaling without rewriting your application
- CNCF Graduated — Battle-tested at YouTube, Slack, Square, GitHub
- VTGate routing — Applications connect to VTGate, not MySQL directly
- Sharding key matters — Choose wisely to minimize scatter queries
- Online operations — Resharding, schema changes without downtime
- Reference tables — Small tables replicated everywhere for fast joins
- VReplication — Powers migrations, resharding, and more
- Topology service — etcd/ZK stores cluster state
- Same MySQL — Uses standard MySQL under the hood
- PlanetScale option — Managed Vitess if you don’t want to operate it
Next Steps
Section titled “Next Steps”- Related: Module 15.3: PlanetScale — Managed Vitess experience
- Related: Distributed Systems Foundation — Sharding theory
- Related: Observability Toolkit — Monitoring Vitess
Further Reading
Section titled “Further Reading”- Vitess Documentation
- Vitess GitHub
- PlanetScale Blog — Many Vitess deep-dives
- Slack Engineering - Scaling Datastores
- YouTube Vitess Case Study
- CNCF Vitess Project
“At some point, every successful MySQL deployment has to answer the question: ‘What do we do when this server isn’t big enough?’ Vitess is that answer.”