What is the CAP theorem in distributed systems?

CAP theorem states a distributed system can only guarantee 2 of 3 properties: Consistency (all nodes see same data), Availability (every request gets a response), and Partition Tolerance (system works despite network failures). Since partitions are inevitable, you choose between CP systems (e.g., banking — consistency over availability) or AP systems (e.g., social media — availability over consistency).

What is horizontal vs vertical scaling?

Vertical scaling (scale up) adds more power to existing machines (CPU, RAM) — simple but has hardware limits. Horizontal scaling (scale out) adds more machines — no hardware limits, better fault tolerance, but requires load balancing and distributed architecture. Use vertical for databases initially; horizontal for web servers and stateless services.

What is a load balancer and how does it work?

A load balancer distributes incoming network traffic across multiple servers to ensure no single server is overwhelmed. Common algorithms: Round Robin, Weighted Round Robin, Least Connections, IP Hash, and Consistent Hashing. Examples include Nginx, HAProxy, and cloud load balancers (AWS ALB/NLB). Load balancers improve availability, fault tolerance, and throughput.

How would you design a URL shortener like bit.ly?

Key components: 1) URL shortening service that generates unique short codes using Base62 encoding or MD5 hash truncation, 2) Database storing short-to-long URL mappings (key-value store like Redis for speed), 3) Redirect service (301/302 redirects), 4) Analytics tracking. Handle collisions with retry or counter-based approach. Scale with caching (99% reads), database sharding, and CDN.

What are microservices and when should you use them?

Microservices is an architecture where an application is built as a collection of small, independent services that communicate via APIs. Each service is independently deployable, scalable, and owns its data. Use microservices when: teams are large and independent, services need different scaling, different tech stacks are needed, or fault isolation is critical. Start with monolith and migrate as needed.

System Design - Complete Interview Preparation Guide

Introduction to System Design
Fundamental Concepts
Core Components
Scalability Patterns
Database Design
Caching Strategies
Message Queues and Async Processing
Microservices Architecture
API Design
Real-World System Design Examples
Common Interview Questions
Domain-Specific System Design Topics

Introduction to System Design

What is System Design?

System design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements. It involves making high-level decisions about how different parts of a system will work together.

Why is it important?

Ensures scalability as user base grows
Provides reliability and fault tolerance
Optimizes performance and resource usage
Makes systems maintainable and extensible

The System Design Interview Process

Typical Structure:

Requirements Clarification (5 mins) - Ask questions to understand scope
High-Level Design (10-15 mins) - Draw architecture diagram
Deep Dive (15-20 mins) - Discuss specific components in detail
Bottlenecks & Trade-offs (5-10 mins) - Identify limitations and improvements

Key Framework (RESHADE):

Requirements (Functional & Non-functional)
Estimations (Traffic, Storage, Bandwidth)
Storage Schema (Database design)
High-level Architecture
API Design
Deep Dive into components
Evaluate trade-offs

Fundamental Concepts

1. Scalability

Vertical Scaling (Scale Up)

Adding more power to existing machine (CPU, RAM)
Pros: Simple, no code changes needed
Cons: Hardware limits, single point of failure, expensive
Example: Upgrading from 8GB to 32GB RAM

Horizontal Scaling (Scale Out)

Adding more machines to the pool
Pros: No hardware limits, better fault tolerance, cost-effective
Cons: Complex architecture, requires load balancing
Example: Adding more servers to handle traffic

When to use what?

Vertical: Databases (until necessary), simple applications
Horizontal: Web servers, stateless services, high-traffic systems

2. Latency vs Throughput

Latency

Time taken to perform an action or produce a result
Measured in milliseconds (ms)
Example: Time for API response = 200ms
How to reduce:
- Caching
- CDNs for static content
- Database indexing
- Efficient algorithms
- Geographical proximity of servers

Throughput

Number of operations performed in a unit of time
Measured in requests/second, transactions/second
Example: API handles 10,000 requests/second
How to increase:
- Horizontal scaling
- Load balancing
- Asynchronous processing
- Connection pooling

Trade-off: Sometimes optimizing for low latency can reduce throughput and vice versa.

3. Availability vs Consistency (CAP Theorem)

CAP Theorem states that a distributed system can only guarantee 2 out of 3:

Consistency (C)

All nodes see the same data at the same time
Every read receives the most recent write
Example: Banking systems (your account balance must be accurate)

Availability (A)

Every request receives a response (success/failure)
System is always operational
Example: Social media feeds (slight delay in posts is acceptable)

Partition Tolerance (P)

System continues operating despite network failures
Always required in distributed systems

Real-World Choices:

CP Systems (Consistency + Partition Tolerance)
- Banking, payment gateways
- Example: Traditional relational databases with ACID
AP Systems (Availability + Partition Tolerance)
- Social media, analytics
- Example: Cassandra, DynamoDB (eventual consistency)

4. Consistency Models

Strong Consistency

After a write, all subsequent reads will see that write
Example: After depositing ₹1000, balance immediately shows updated amount
Used in: Financial transactions, inventory management

Eventual Consistency

After a write, reads will eventually see that write
May return stale data temporarily
Example: Facebook post likes count might differ for few seconds
Used in: Social media, DNS, caching

Read-after-Write Consistency

User sees their own writes immediately
Others might see it eventually
Example: After posting a comment, you see it immediately but others might see it after a delay

5. Load Balancing Fundamentals

What is Load Balancing?
Distributing incoming traffic across multiple servers to ensure no single server is overwhelmed.

Load Balancing Algorithms:

Round Robin

Requests distributed sequentially
Simple and fair
Doesn't account for server capacity

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

Weighted Round Robin
- Servers with higher capacity get more requests
- Server A (weight 3) gets 3 requests before Server B (weight 1) gets 1
Least Connections
- New request goes to server with fewest active connections
- Good for long-lived connections
Least Response Time
- Routes to server with fastest response time
- Accounts for server performance
IP Hash
- Client IP determines which server handles request
- Ensures same client always goes to same server (session persistence)
- Useful for stateful applications
Random
- Randomly select a server
- Simple but can cause uneven distribution

Layer 4 (Transport Layer) vs Layer 7 (Application Layer) Load Balancing:

Layer 4 Load Balancer

Works at TCP/UDP level
Fast, simple routing based on IP/Port
Cannot inspect HTTP headers/cookies
Example: AWS Network Load Balancer

Layer 7 Load Balancer

Works at HTTP/HTTPS level
Can route based on URL, headers, cookies
More intelligent routing but slightly slower
Example: AWS Application Load Balancer, Nginx

6. Database Sharding

What is Sharding?
Splitting a large database into smaller, faster, more manageable pieces called shards.

Why Shard?

Single database can't handle load
Improve query performance
Reduce latency
Scale horizontally

Sharding Strategies:

Horizontal Sharding (Row-based)
- Split rows across multiple databases
- Example: Users 1-1000 on Shard1, Users 1001-2000 on Shard2
Vertical Sharding (Column-based)
- Split columns/tables across databases
- Example: User profiles in DB1, User activity in DB2
Hash-based Sharding
```
shard_number = hash(user_id) % number_of_shards
```
- Pros: Uniform distribution
- Cons: Adding/removing shards is difficult
Range-based Sharding
- User IDs 1-10000 → Shard 1
- User IDs 10001-20000 → Shard 2
- Pros: Simple, easy to add shards
- Cons: Can lead to uneven distribution (hot shards)
Geographic Sharding
- Users by region (India users → Asia shard)
- Pros: Low latency for regional users
- Cons: Complex to implement

Challenges:

Cross-shard queries - Joining data across shards is expensive
Rebalancing - Moving data when adding/removing shards
Transactions - ACID properties harder to maintain
Hot shards - Some shards getting more traffic than others

7. Replication

What is Replication?
Maintaining copies of data across multiple servers for reliability and performance.

Types of Replication:

Master-Slave (Primary-Replica)
- One master handles writes
- Multiple slaves handle reads
- Master replicates data to slaves
```
Client → Write → Master → Replicate → Slave 1, Slave 2
Client → Read → Slave 1 or Slave 2
```
- Pros: Scales reads, simple
- Cons: Master is single point of failure, replication lag
Master-Master (Multi-Master)
- Multiple masters can handle writes
- Masters sync with each other
- Pros: No single point of failure, scales writes
- Cons: Conflict resolution needed, complex
Synchronous Replication
- Write confirmed only after all replicas update
- Pros: Strong consistency
- Cons: Slower writes, one slow replica affects all
Asynchronous Replication
- Write confirmed immediately, replicas update later
- Pros: Fast writes
- Cons: Potential data loss, eventual consistency

8. Partitioning vs Sharding vs Replication

Partitioning

Logical division of data within a single database
Example: Table partitioning by date (Jan data, Feb data)

Sharding

Physical division of data across multiple databases
Each shard is independent

Replication

Copying same data across multiple servers
For redundancy and read scaling

Core Components

1. DNS (Domain Name System)

What it does:
Translates domain names (www.example.com) to IP addresses (192.168.1.1)

How it works:

User types www.example.com
Browser checks cache
If not found, queries DNS resolver
Resolver queries root server → TLD server → Authoritative server
IP address returned to browser
Browser connects to that IP

DNS Record Types:

A Record: Domain → IPv4 address
AAAA Record: Domain → IPv6 address
CNAME: Alias (www.example.com → example.com)
MX: Mail server for domain
TXT: Text records (often for verification)

TTL (Time To Live):

How long DNS response is cached
Low TTL = more DNS queries but faster updates
High TTL = fewer queries but slower updates

2. CDN (Content Delivery Network)

What it does:
Distributes static content (images, CSS, JS) across geographically distributed servers.

How it works:

User in Mumbai requests image
CDN routes to nearest edge server (Mumbai)
If image not in cache, fetch from origin server
Cache for future requests
Serve to user (low latency!)

Benefits:

Reduced latency (geographic proximity)
Lower server load
Better availability
DDoS protection

What to cache:

Images, videos
CSS, JavaScript files
Static HTML pages
Fonts, icons

Popular CDNs:

Cloudflare, AWS CloudFront, Akamai, Fastly

3. Reverse Proxy

What it does:
Sits in front of web servers and forwards client requests.

Functions:

Load balancing
Caching
SSL termination
Security (hide server IPs)
Request filtering

Reverse Proxy vs Forward Proxy:

Reverse Proxy: Protects servers (client → proxy → server)
Forward Proxy: Protects clients (client → proxy → internet)

Popular Tools:

Nginx, HAProxy, Apache, AWS ELB

4. Application Servers

What they do:
Run your business logic and handle API requests.

Stateless vs Stateful:

Stateless Servers

Don't store session data
Each request is independent
Easier to scale horizontally
Session data in database/cache
Example: REST APIs

Stateful Servers

Store session data in memory
Requests from same user must go to same server
Harder to scale (need sticky sessions)
Example: WebSocket connections

Best Practice: Keep application servers stateless for easier scaling.

5. Databases

Relational Databases (SQL)

Structured data with schemas
ACID properties
SQL queries
Examples: PostgreSQL, MySQL, Oracle
Use when:
- Need complex queries and joins
- ACID transactions required (banking, e-commerce)
- Structured data with relationships
- Data integrity is critical

NoSQL Databases

Document Stores
- Store JSON-like documents
- Flexible schema
- Example: MongoDB, CouchDB
- Use when:
  - Schema flexibility needed
  - Hierarchical data (user profiles, catalogs)
  - Rapid development
Key-Value Stores
- Simple key-value pairs
- Fast reads/writes
- Example: Redis, DynamoDB
- Use when:
  - Session management
  - Caching
  - Simple data structures
  - Need high performance
Column-Family Stores
- Store data in columns rather than rows
- Good for analytics
- Example: Cassandra, HBase
- Use when:
  - Write-heavy workloads
  - Time-series data
  - Need high availability
Graph Databases
- Store relationships between entities
- Example: Neo4j, Amazon Neptune
- Use when:
  - Social networks
  - Recommendation engines
  - Fraud detection

Database Selection Criteria:

Data structure and relationships
Query patterns (read-heavy vs write-heavy)
Consistency requirements
Scalability needs
Team expertise

Scalability Patterns

1. Horizontal vs Vertical Scaling (Detailed)

Vertical Scaling Deep Dive:

Before: 4 cores, 8GB RAM, 500GB SSD
After:  16 cores, 64GB RAM, 2TB SSD

Use cases:
- Databases (harder to shard)
- Applications with in-memory state
- Monolithic applications
Limitations:
- Physical hardware limits (can't infinitely upgrade)
- Downtime during upgrade
- Expensive at high end

Horizontal Scaling Deep Dive:

Before: 1 server handling 1000 req/sec
After:  10 servers each handling 100 req/sec

Requirements:
- Load balancer
- Stateless application
- Distributed session storage
- Service discovery
Challenges:
- Increased complexity
- Network latency between servers
- Data consistency across servers

2. Caching Patterns (Detailed in separate section)

3. Database Scaling Patterns

Read Replicas

Master (Writes) → Replicate → Replica 1 (Reads)
                            → Replica 2 (Reads)
                            → Replica 3 (Reads)

Handles read-heavy workloads
Replication lag is acceptable
Example: Analytics dashboards, search functionality

Write Sharding

Split write load across multiple masters
Each shard handles subset of data
Complex but necessary for write-heavy apps

CQRS (Command Query Responsibility Segregation)

Separate models for reads and writes
Write model → Optimized for updates
Read model → Optimized for queries (denormalized)
Use case: E-commerce (write orders, read product catalogs)

4. Asynchronous Processing

Why Async?

Don't block user for long-running tasks
Improve response times
Better resource utilization

Patterns:

Fire and Forget

User uploads image → Return 200 OK immediately
→ Background job processes image (resize, compress)

Callback/Webhook

User initiates payment → Return processing status
→ Payment gateway calls webhook when done

Polling

User requests report → Get job ID
→ Poll status endpoint until complete

WebSockets/Server-Sent Events

Real-time updates pushed to client
Example: Live stock prices, chat messages

5. Batch Processing

Use Case: Process large volumes of data periodically

Example: Calculating daily portfolio returns

Not time-sensitive
Can be done overnight
Optimizes resource usage

Technologies: Apache Spark, Hadoop, AWS Batch

6. Stream Processing

Use Case: Process data in real-time as it arrives

Example: Fraud detection in payment transactions

Time-sensitive
Continuous processing
Immediate action needed

Technologies: Apache Kafka, AWS Kinesis, Apache Flink

Database Design

1. Normalization

What is Normalization?
Organizing data to reduce redundancy and improve integrity.

Normal Forms:

1NF (First Normal Form)

Atomic values (no arrays/lists in cells)
Each column has unique name
Order doesn't matter

Before (Not 1NF):

UserID	Name	Phone Numbers
1	Jash	123, 456, 789

After (1NF):

UserID	Name	PhoneNumber
1	Jash	123
1	Jash	456
1	Jash	789

2NF (Second Normal Form)

Must be in 1NF
No partial dependency (all non-key attributes depend on entire primary key)

Before (Not 2NF):

OrderID	ProductID	ProductName	Quantity
1	101	iPhone	2

Problem: ProductName depends only on ProductID, not the full key (OrderID, ProductID)

After (2NF):

Orders:

OrderID	ProductID	Quantity
1	101	2

Products:

ProductID	ProductName
101	iPhone

3NF (Third Normal Form)

Must be in 2NF
No transitive dependency (non-key attributes shouldn't depend on other non-key attributes)

Before (Not 3NF):

EmployeeID	DepartmentID	DepartmentName
1	10	Engineering

Problem: DepartmentName depends on DepartmentID (transitive dependency)

After (3NF):

Employees:

EmployeeID	DepartmentID
1	10

Departments:

DepartmentID	DepartmentName
10	Engineering

2. Denormalization

What is Denormalization?
Intentionally adding redundancy to improve read performance.

When to Denormalize:

Read-heavy applications
Complex joins are slow
Acceptable to have slightly stale data

Example for a financial platform:

Normalized:

Portfolio: portfolio_id, user_id
Holdings: holding_id, portfolio_id, stock_id, quantity
Stocks: stock_id, symbol, name, current_price

Denormalized for faster reads:

Portfolio_Snapshot: portfolio_id, user_id, total_value, holdings_json
{
  "holdings": [
    {"symbol": "RELIANCE", "quantity": 10, "price": 2500, "value": 25000}
  ]
}

3. Indexing

What is an Index?
Data structure that improves query speed at the cost of write performance and storage.

Types of Indexes:

1. Primary Index

Automatically created on primary key
Unique and not null

CREATE TABLE users (
  id SERIAL PRIMARY KEY,  -- Primary index
  email VARCHAR(255)
);

2. Secondary Index

CREATE INDEX idx_email ON users(email);

Speeds up queries: SELECT * FROM users WHERE email = 'jash@example.com'

3. Composite Index

CREATE INDEX idx_name_age ON users(last_name, first_name);

Good for: WHERE last_name = 'Rashne' AND first_name = 'Jash'
Also helps: WHERE last_name = 'Rashne' (leftmost prefix)
Won't help: WHERE first_name = 'Jash' (order matters!)

4. Unique Index

CREATE UNIQUE INDEX idx_email ON users(email);

Ensures no duplicate values

5. Full-Text Index

CREATE FULLTEXT INDEX idx_description ON products(description);

For text search queries

Index Trade-offs:

✅ Faster reads (especially SELECT with WHERE)
❌ Slower writes (INSERT, UPDATE, DELETE)
❌ More storage space
❌ Index maintenance overhead

When to Index:

Columns in WHERE clauses
Columns in JOIN conditions
Columns in ORDER BY
Foreign keys

When NOT to Index:

Small tables
Columns with low cardinality (few unique values)
Frequently updated columns
Tables with heavy write operations

4. Database Transactions and ACID

ACID Properties:

Atomicity

All operations in transaction succeed or all fail
No partial completion

BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- Both happen or neither

Consistency

Database moves from one valid state to another
Constraints are maintained (foreign keys, unique constraints)

-- If account balance < 0 constraint exists, transaction will fail

Isolation

Concurrent transactions don't interfere
Isolation levels: Read Uncommitted, Read Committed, Repeatable Read, Serializable

Durability

Once committed, data persists even after system failure
Uses write-ahead logging (WAL)

Isolation Levels:

Read Uncommitted
- Lowest isolation
- Can read uncommitted changes (dirty reads)
- Rarely used
Read Committed (PostgreSQL default)
- Only see committed data
- Prevents dirty reads
- Non-repeatable reads possible
Repeatable Read
- Same query returns same results throughout transaction
- Phantom reads possible (new rows can appear)
Serializable
- Highest isolation
- Transactions appear to run serially
- Slowest, used for critical operations

5. Connection Pooling

Problem:
Creating new database connection for each request is expensive (TCP handshake, authentication).

Solution:
Maintain a pool of reusable connections.

How it works:

Application starts → Create 10 connections to database
Request comes → Borrow connection from pool
Execute query
Return connection to pool (don't close)
Next request reuses same connection

Benefits:

Faster response times
Lower database load
Better resource utilization

Configuration:

# Example with PostgreSQL
pool = psycopg2.pool.SimpleConnectionPool(
    minconn=5,      # Minimum connections
    maxconn=20,     # Maximum connections
    host='localhost',
    database='mydb'
)

Best Practices:

Min connections: Enough for normal load
Max connections: Database's max connections / number of app servers
Connection timeout: Release if idle too long
Retry logic for failed connections

Caching Strategies

1. Why Cache?

Benefits:

Reduce latency (memory is faster than disk/network)
Reduce database load
Lower costs (fewer database queries)
Better user experience

What to Cache:

Frequently accessed data
Expensive computations
Database query results
API responses
Session data
Static content

2. Cache Levels

1. Browser Cache

Client-side caching
Controlled by HTTP headers (Cache-Control, ETag)
Stores: Images, CSS, JS, HTML

2. CDN Cache

Edge server caching
Static content
Geographically distributed

3. Application Cache

In-memory cache in application server
Local to server (not shared)
Example: Python's functools.lru_cache

4. Distributed Cache

Shared across application servers
Examples: Redis, Memcached
Most common in system design

5. Database Cache

Query cache, buffer pool
Managed by database itself

3. Caching Patterns

1. Cache-Aside (Lazy Loading)

def get_user(user_id):
    # 1. Check cache
    user = cache.get(f"user:{user_id}")
    
    if user is not None:
        return user  # Cache hit
    
    # 2. Cache miss - Query database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # 3. Update cache
    cache.set(f"user:{user_id}", user, ttl=3600)
    
    return user

Pros:

Only cache what's needed
Resilient to cache failures
Data model in cache = DB

Cons:

Cache miss penalty (3 round trips)
Stale data possible
Cache warm-up needed

2. Read-Through

# Cache library handles DB interaction
user = cache.get(f"user:{user_id}")  # If not in cache, cache fetches from DB

Pros:

Application logic simpler
Cache manages data loading

Cons:

Tight coupling with cache
First request still slow

3. Write-Through

def update_user(user_id, data):
    # 1. Update cache
    cache.set(f"user:{user_id}", data)
    
    # 2. Cache updates database
    # (Handled by cache layer)

Pros:

Cache always consistent
No data loss

Cons:

Write latency (write to cache + DB)
Wasted writes (unused data cached)

4. Write-Behind (Write-Back)

def update_user(user_id, data):
    # 1. Update cache only
    cache.set(f"user:{user_id}", data)
    
    # 2. Asynchronously write to DB later
    # (Handled by cache)

Pros:

Fast writes
Can batch DB writes
Better write performance

Cons:

Risk of data loss (if cache fails before DB write)
Complex to implement

5. Refresh-Ahead

# Cache proactively refreshes data before expiration
# Based on usage patterns

Pros:

Reduces latency
Ensures fresh data

Cons:

Complex to implement
Might refresh unused data

4. Cache Invalidation

Problem: How to ensure cache has fresh data?

Strategies:

1. TTL (Time-To-Live)

cache.set("user:123", user_data, ttl=3600)  # Expires in 1 hour

Simple but data can be stale
Good for slowly changing data

2. Explicit Invalidation

def update_user(user_id, data):
    db.update(user_id, data)
    cache.delete(f"user:{user_id}")  # Remove from cache

Ensures freshness
Requires invalidation logic

3. Event-Based Invalidation

# When user updates profile
event_bus.publish("user.updated", user_id)

# Cache listener invalidates
event_bus.subscribe("user.updated", lambda user_id: cache.delete(f"user:{user_id}"))

Decoupled
Scalable

5. Cache Eviction Policies

When cache is full, which item to remove?

1. LRU (Least Recently Used)

Remove item not accessed for longest time
Most popular
Good general-purpose policy

2. LFU (Least Frequently Used)

Remove item accessed least often
Good for identifying truly popular items
Can be biased by old data

3. FIFO (First In First Out)

Remove oldest item
Simple but not optimal

4. Random

Randomly remove an item
Simple, surprisingly effective

5. TTL-based

Remove expired items first

6. Cache Stampede / Thundering Herd

Problem:

Cache expires → 1000 concurrent requests → All hit database → DB overload

Solutions:

1. Probabilistic Early Expiration

def get_with_early_expiration(key, ttl):
    value, expiry = cache.get_with_expiry(key)
    
    # Refresh cache randomly before actual expiration
    time_remaining = expiry - now()
    if random() < (1.0 / time_remaining):
        refresh_cache_async(key)
    
    return value

2. Locking

def get_user(user_id):
    user = cache.get(f"user:{user_id}")
    if user:
        return user
    
    # Acquire lock
    if cache.set_if_not_exists(f"lock:user:{user_id}", "1", ttl=10):
        try:
            user = db.query("SELECT * FROM users WHERE id = ?", user_id)
            cache.set(f"user:{user_id}", user)
            return user
        finally:
            cache.delete(f"lock:user:{user_id}")
    else:
        # Another request is fetching, wait and retry
        sleep(0.1)
        return get_user(user_id)

3. Serve Stale Data

# Serve slightly stale data while refreshing in background

7. Distributed Caching Challenges

1. Consistency

Multiple app servers accessing same cache
Need distributed locks for critical sections

2. Hot Keys

Some keys accessed way more than others
Example: Celebrity user profile
Solution: Local caching + distributed cache

3. Cache Penetration

Requests for non-existent keys bypass cache
Solution: Cache null values with short TTL

4. Cache Breakdown

Popular key expires, many requests hit DB
Solution: Locks or serve stale data

Message Queues and Async Processing

1. What is a Message Queue?

Definition: A message queue is a form of asynchronous communication where messages are stored until they're processed.

Components:

Producer: Sends messages to queue
Queue: Stores messages
Consumer: Processes messages from queue

Benefits:

Decoupling (producer doesn't know about consumer)
Asynchronous processing
Load leveling (handle traffic spikes)
Fault tolerance (retry failed messages)
Scalability (add more consumers)

2. Use Cases

Email Notifications

User signs up → Enqueue welcome email → Email service processes queue

Image Processing

User uploads photo → Enqueue processing job → Worker generates thumbnails

Payment Processing

User initiates payment → Enqueue → Payment service processes → Notify user

Data Analytics

User action → Enqueue event → Analytics service processes → Update dashboards

Order Processing (example)

User places order to buy stocks → Enqueue → Order processing service → Execute trade

3. Queue vs Pub/Sub

Queue (Point-to-Point)

One message consumed by one consumer
Load balancing across consumers
Example: Task processing

Producer → [M1, M2, M3] → Consumer A (gets M1)
                       → Consumer B (gets M2)
                       → Consumer C (gets M3)

Pub/Sub (Publish/Subscribe)

One message delivered to multiple subscribers
Broadcasting
Example: Event notifications

Publisher → [Event] → Subscriber A (gets Event)
                   → Subscriber B (gets Event)
                   → Subscriber C (gets Event)

4. Popular Message Queue Systems

1. RabbitMQ

Full-featured message broker
Supports multiple protocols
Routing, priorities, TTL
Good for complex routing

2. Apache Kafka

Distributed streaming platform
High throughput
Persistent log
Good for event sourcing, log aggregation

3. AWS SQS

Managed queue service
Fully serverless
Good for simple use cases

4. Redis (Pub/Sub + Streams)

In-memory
Very fast
Limited persistence
Good for real-time notifications

5. Message Queue Patterns

1. Work Queue Pattern

Producer → Queue → Worker 1
                 → Worker 2
                 → Worker 3

Distribute tasks among workers
Load balancing

2. Pub/Sub Pattern

Publisher → Topic → Subscriber 1 (Analytics)
                  → Subscriber 2 (Email)
                  → Subscriber 3 (Logging)

Event broadcasting
Loose coupling

3. Request/Reply Pattern

Client → Request Queue → Server
Client ← Reply Queue   ← Server

Synchronous-like behavior over async messaging
RPC over message queue

4. Priority Queue

High Priority: [M1, M2]
Low Priority:  [M3, M4, M5]

Process important messages first
Example: Premium user requests

5. Dead Letter Queue

Main Queue → Failed Message → DLQ

Messages that can't be processed
Analyze failures, retry later

6. Message Delivery Guarantees

At-Most-Once

Message delivered 0 or 1 time
May lose messages
Fastest
Example: Metrics collection (ok to lose some)

At-Least-Once

Message delivered 1 or more times
May duplicate messages
Most common
Example: Order processing (use idempotency)

Exactly-Once

Message delivered exactly 1 time
Hardest to achieve
Requires idempotency + deduplication
Example: Financial transactions

7. Idempotency (Important!)

What is Idempotency?
Performing operation multiple times has same effect as performing it once.

Why important for queues?
Messages can be delivered multiple times (at-least-once delivery).

Idempotent Operations:

# Idempotent - can safely retry
user.email = "new@email.com"
SET balance = 100

# NOT Idempotent - multiple executions cause issues  
balance += 100  # Retry adds 100 multiple times!

Making Operations Idempotent:

1. Idempotency Key

def process_payment(order_id, amount, idempotency_key):
    # Check if already processed
    if db.exists(f"payment:{idempotency_key}"):
        return "Already processed"
    
    # Process payment
    execute_payment(order_id, amount)
    
    # Mark as processed
    db.set(f"payment:{idempotency_key}", "processed")

2. Natural Idempotency

# Use absolute values instead of increments
UPDATE portfolio SET shares = 100 WHERE id = 1
# Instead of
UPDATE portfolio SET shares = shares + 10 WHERE id = 1

8. Handling Failures

Retry Strategies:

1. Immediate Retry

for attempt in range(3):
    try:
        process_message(msg)
        break
    except Exception:
        if attempt == 2:
            raise
        continue

2. Exponential Backoff

delay = 1
for attempt in range(5):
    try:
        process_message(msg)
        break
    except Exception:
        sleep(delay)
        delay *= 2  # 1s, 2s, 4s, 8s, 16s

3. Dead Letter Queue

try:
    process_message(msg)
except Exception:
    if msg.retry_count > 3:
        dlq.send(msg)  # Send to DLQ
    else:
        msg.retry_count += 1
        queue.send(msg, delay=60)  # Retry after 60s

9. Message Queue Best Practices

Keep messages small - Large messages slow down queue
Use idempotent consumers - Handle duplicate messages
Monitor queue depth - Alert if queue growing (consumer too slow)
Set appropriate TTL - Remove old messages
Use DLQ - Don't lose failed messages
Scale consumers based on queue depth - Auto-scaling
Log message processing - For debugging
Use batch processing - Process multiple messages together for efficiency

Microservices Architecture

1. Monolith vs Microservices

Monolithic Architecture

Single Application
├── User Service
├── Payment Service
├── Notification Service
└── Shared Database

Pros:

Simple to develop and deploy
Easy to test
No network latency
Simple debugging

Cons:

Hard to scale (must scale entire app)
Deployment risk (one bug affects everything)
Technology lock-in
Large codebase hard to maintain

Microservices Architecture

User Service      → User DB
Payment Service   → Payment DB  
Notification Service → Notification DB

Pros:

Independent deployment
Technology flexibility
Better scalability
Fault isolation
Team autonomy

Cons:

Complex deployment
Network latency
Data consistency challenges
Testing complexity
DevOps overhead

2. When to Use Microservices?

Use Microservices when:

Large, complex application
Multiple teams
Different parts have different scaling needs
Frequent deployments required
Long-term project with changing requirements

Stick with Monolith when:

Small team (< 10 people)
Simple application
Tight deadlines
Unclear requirements
Limited DevOps capabilities

Rule of thumb: Start with monolith, break into microservices when needed.

3. Microservice Design Principles

1. Single Responsibility

Each service does one thing well
Example: User Service only manages users, not payments

2. Autonomy

Service owns its data
No shared databases
Can be deployed independently

3. Domain-Driven Design

Services organized by business domains
Example: User Management, Order Processing, Inventory

4. Decentralization

No central orchestrator (prefer choreography)
Services communicate directly

5. Fault Tolerance

Service failures don't cascade
Graceful degradation

4. Service Communication

Synchronous Communication

1. REST APIs

User Service → HTTP GET → Product Service
             ← JSON Response ←

Pros:

Simple, well-understood
HTTP/JSON standard
Easy to debug

Cons:

Tight coupling
Network latency
Service dependency

2. gRPC

Service A → gRPC call → Service B
          ← Binary Response ←

Pros:

Faster than REST (binary protocol)
Type-safe (Protocol Buffers)
Bi-directional streaming

Cons:

More complex
Less human-readable

Asynchronous Communication

Message Queues

Service A → Message Queue → Service B

Pros:

Loose coupling
Handles traffic spikes
Retry logic
Service B can be down

Cons:

Eventual consistency
Complex error handling
Debugging harder

Event-Driven

Order Service → Event: "Order Placed" → Inventory Service (updates stock)
                                     → Email Service (sends confirmation)
                                     → Analytics Service (tracks metrics)

Pros:

Decoupled services
Easy to add new consumers
Scalable

Cons:

Hard to trace flows
Eventual consistency

5. Service Discovery

Problem: Service A needs to call Service B, but Service B's location changes (new instances, scaling).

Solution: Service Discovery

Client-Side Discovery

Service A → Query Registry → Get Service B's address → Call Service B

Example: Netflix Eureka
Client handles load balancing

Server-Side Discovery

Service A → Load Balancer → Routes to Service B instances

Example: AWS ELB, Kubernetes Service
Load balancer queries registry

Service Registry:

Etcd, Consul, ZooKeeper
Stores service locations
Health checks

6. API Gateway

What is API Gateway?
Single entry point for all client requests.

Client → API Gateway → User Service
                    → Order Service
                    → Product Service

Responsibilities:

Routing: Forward requests to appropriate service
Authentication: Verify JWT tokens
Rate Limiting: Prevent abuse
Request/Response Transformation: Aggregate responses
Caching: Cache frequent requests
Logging: Centralized logging

Example:

GET /api/user/123/orders

API Gateway:
1. Authenticates request
2. Calls User Service to verify user
3. Calls Order Service to get orders
4. Aggregates responses
5. Returns combined result

Popular Tools:

Kong, AWS API Gateway, Nginx, Zuul

7. Data Management in Microservices

Database per Service Pattern

User Service → Users DB
Order Service → Orders DB
Product Service → Products DB

Pros:

Service independence
Can use different DB types
Easier to scale

Cons:

No joins across services
Data consistency challenges
Transactions across services hard

Handling Distributed Transactions:

1. Saga Pattern

Choreography-based Saga:

Order Service → Create Order → Publishes "OrderCreated" event
Payment Service → Listens → Process Payment → Publishes "PaymentProcessed"
Inventory Service → Listens → Reserve Items → Publishes "ItemsReserved"

If Payment fails:
Payment Service → Publishes "PaymentFailed"
Order Service → Listens → Cancel Order

Orchestration-based Saga:

Saga Orchestrator:
1. Call Order Service → Create Order
2. Call Payment Service → Process Payment
3. If payment succeeds:
   Call Inventory Service → Reserve Items
   Call Notification Service → Send Email
4. If any step fails:
   Execute compensating transactions (undo previous steps)

2. Event Sourcing

Store events instead of current state
Example: BankAccount has events: Deposited $100, Withdrew$ 50
Current balance = Sum of events
Can replay events to rebuild state

3. CQRS (Command Query Responsibility Segregation)

Separate read and write models
Write model: Optimized for updates
Read model: Denormalized for fast queries
Often used with Event Sourcing

8. Microservices Challenges & Solutions

1. Network Latency

Problem: Service A → Service B → Service C (multiple hops)
Solutions:
- Minimize inter-service calls
- Batch requests
- Caching
- Asynchronous communication

2. Partial Failures

Problem: Service B is down, Service A depends on it
Solutions:
- Circuit breaker
- Retry with exponential backoff
- Fallback responses
- Timeouts

3. Circuit Breaker Pattern

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.state = "CLOSED"  # CLOSED, OPEN, HALF_OPEN
        self.timeout = timeout
    
    def call(self, func):
        if self.state == "OPEN":
            if time_since_open > self.timeout:
                self.state = "HALF_OPEN"
            else:
                raise ServiceUnavailableError("Circuit is OPEN")
        
        try:
            result = func()
            if self.state == "HALF_OPEN":
                self.state = "CLOSED"
                self.failure_count = 0
            return result
        except Exception:
            self.failure_count += 1
            if self.failure_count >= self.failure_threshold:
                self.state = "OPEN"
            raise

# Usage
breaker = CircuitBreaker()
breaker.call(lambda: call_payment_service())

States:

CLOSED: Normal operation
OPEN: Service is down, fail fast without calling
HALF_OPEN: Test if service recovered

4. Data Consistency

Problem: No ACID transactions across services
Solutions:
- Eventual consistency
- Saga pattern
- Event sourcing

5. Monitoring & Debugging

Problem: Hard to trace requests across services
Solutions:
- Distributed tracing (Jaeger, Zipkin)
- Correlation IDs
- Centralized logging (ELK stack)
- Metrics (Prometheus, Grafana)

Distributed Tracing Example:

Request ID: abc123

User Service (20ms) → Order Service (50ms) → Payment Service (100ms)
                                           → Inventory Service (30ms)

Total: 200ms, Bottleneck: Payment Service

6. Testing

Problem: Integration testing across services
Solutions:
- Contract testing (Pact)
- Service virtualization
- End-to-end testing in staging

9. Microservices Best Practices

Start with monolith - Don't over-engineer early
Define clear boundaries - Use domain-driven design
Automate everything - CI/CD pipelines
Monitor relentlessly - Metrics, logs, traces
Design for failure - Circuit breakers, retries, timeouts
Version your APIs - /v1/users, /v2/users
Document APIs - OpenAPI/Swagger
Use async communication - Prefer events over sync calls
Secure service-to-service - mTLS, JWT
Keep services small - Should fit in developer's head

API Design

1. REST API Design

REST Principles:

Resource-Based
- URLs represent resources (nouns), not actions (verbs)
- Good: GET /users/123
- Bad: GET /getUser?id=123
HTTP Methods
- GET: Retrieve resource
- POST: Create resource
- PUT: Update entire resource
- PATCH: Partial update
- DELETE: Remove resource
Stateless
- Each request contains all info needed
- Server doesn't store client state
Uniform Interface
- Consistent naming and structure

RESTful API Examples:

# Users
GET    /users              # List all users
GET    /users/123          # Get specific user
POST   /users              # Create new user
PUT    /users/123          # Update entire user
PATCH  /users/123          # Partially update user
DELETE /users/123          # Delete user

# Nested Resources
GET    /users/123/orders   # Get orders for user 123
POST   /users/123/orders   # Create order for user 123
GET    /users/123/orders/456  # Get specific order

# Filtering, Sorting, Pagination
GET /users?status=active&sort=created_at&page=2&limit=20

HTTP Status Codes:

2xx Success

200 OK - Request succeeded
201 Created - Resource created
204 No Content - Success, no response body

3xx Redirection

301 Moved Permanently
302 Found (temporary redirect)
304 Not Modified (use cached version)

4xx Client Errors

400 Bad Request - Invalid input
401 Unauthorized - Authentication required
403 Forbidden - No permission
404 Not Found - Resource doesn't exist
409 Conflict - Resource conflict (e.g., duplicate email)
422 Unprocessable Entity - Validation failed
429 Too Many Requests - Rate limit exceeded

5xx Server Errors

500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout

Response Format:

{
  "success": true,
  "data": {
    "id": 123,
    "name": "Jash Rashne",
    "email": "jash@example.com"
  },
  "meta": {
    "page": 1,
    "total": 100
  }
}

Error Response:

{
  "success": false,
  "error": {
    "code": "INVALID_EMAIL",
    "message": "Email format is invalid",
    "details": {
      "field": "email",
      "value": "invalid-email"
    }
  }
}

2. API Versioning

Why Version?

Breaking changes without affecting existing clients
Gradual migration

Versioning Strategies:

1. URI Versioning

/v1/users
/v2/users

Pros: Clear, easy to route
Cons: Verbose URLs

2. Header Versioning

GET /users
Accept: application/vnd.myapi.v1+json

Pros: Clean URLs
Cons: Less visible

3. Query Parameter

/users?version=1

Pros: Simple
Cons: Easy to miss

Best Practice: URI versioning for major versions, deprecate old versions gracefully.

3. Pagination

Why Paginate?

Large responses slow
Memory issues
Better UX

Offset-based Pagination:

GET /users?page=2&limit=20

Response:
{
  "data": [...],
  "meta": {
    "page": 2,
    "limit": 20,
    "total": 1000,
    "total_pages": 50
  }
}

Pros: Simple
Cons: Inconsistent results if data changes (items added/removed)

Cursor-based Pagination:

GET /users?cursor=eyJpZCI6MTAwfQ&limit=20

Response:
{
  "data": [...],
  "meta": {
    "next_cursor": "eyJpZCI6MTIwfQ",
    "has_more": true
  }
}

Pros: Consistent, efficient
Cons: Can't jump to specific page

Best Practice: Use cursor-based for real-time feeds, offset-based for tables.

4. Rate Limiting

Why Rate Limit?

Prevent abuse
Ensure fair usage
Protect server resources

Algorithms:

1. Token Bucket

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.last_refill = time.now()
    
    def allow_request(self):
        self.refill()
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False
    
    def refill(self):
        now = time.now()
        elapsed = now - self.last_refill
        tokens_to_add = elapsed * self.refill_rate
        self.tokens = min(self.capacity, self.tokens + tokens_to_add)
        self.last_refill = now

2. Leaky Bucket

Process requests at constant rate
Queue overflow requests

3. Fixed Window

Time: 10:00:00 - 10:00:59
Requests: 100 allowed per minute
Reset: At 10:01:00

Problem: Burst at window boundaries

4. Sliding Window

Smooth rate limiting
Counts requests in last N seconds

Rate Limit Headers:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1609459200
Retry-After: 60

5. API Security

1. Authentication

JWT (JSON Web Token)

Header.Payload.Signature

Example:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.
eyJ1c2VyX2lkIjoxMjMsImV4cCI6MTYwOTQ1OTIwMH0.
signature_here

Flow:

User logs in with credentials
Server generates JWT
Client stores JWT
Client sends JWT in header: Authorization: Bearer <token>
Server verifies signature

Advantages:

Stateless (no session storage)
Scalable
Works across domains

2. API Keys

GET /api/users
X-API-Key: your_api_key_here

3. OAuth 2.0

For third-party access
User authorizes app without sharing password

4. Input Validation

def create_user(request):
    email = request.data.get('email')
    
    # Validate
    if not email or '@' not in email:
        return Response({"error": "Invalid email"}, status=400)
    
    # Sanitize (prevent SQL injection, XSS)
    email = sanitize(email)
    
    # Process
    user = User.create(email=email)
    return Response(user.to_dict(), status=201)

5. HTTPS

Encrypt data in transit
Prevent man-in-the-middle attacks

6. CORS (Cross-Origin Resource Sharing)

# Allow requests from specific origin
Access-Control-Allow-Origin: https://mycompany.com
Access-Control-Allow-Methods: GET, POST, PUT
Access-Control-Allow-Headers: Content-Type, Authorization

6. Idempotency in APIs

Idempotent HTTP Methods:

GET, PUT, DELETE, HEAD, OPTIONS - Idempotent by design
POST - NOT idempotent (creates new resource each time)

Making POST Idempotent:

POST /orders
Idempotency-Key: uuid-1234-5678

Server:
1. Check if request with this key already processed
2. If yes, return same response (don't create duplicate)
3. If no, process and store result with key

Example:

def create_order(request):
    idempotency_key = request.headers.get('Idempotency-Key')
    
    # Check cache
    cached_response = cache.get(f"order:{idempotency_key}")
    if cached_response:
        return cached_response
    
    # Process order
    order = Order.create(request.data)
    response = {"order_id": order.id}
    
    # Cache response
    cache.set(f"order:{idempotency_key}", response, ttl=86400)
    
    return response

7. API Documentation

OpenAPI/Swagger Specification

openapi: 3.0.0
info:
  title: My App API
  version: 1.0.0
paths:
  /users/{userId}:
    get:
      summary: Get user by ID
      parameters:
        - name: userId
          in: path
          required: true
          schema:
            type: integer
      responses:
        200:
          description: Successful response
          content:
            application/json:
              schema:
                type: object
                properties:
                  id:
                    type: integer
                  name:
                    type: string
                  email:
                    type: string

Tools:

Swagger UI - Interactive documentation
Postman - API testing and documentation
Redoc - Clean documentation generator

Real-World System Design Examples

1. Design URL Shortener (bit.ly)

Requirements:

Shorten long URL to short URL
Redirect short URL to original
High availability
Low latency

Functional Requirements:

POST /shorten - Create short URL
GET /{short_code} - Redirect to original

Non-Functional Requirements:

100M new URLs per month
100:1 read:write ratio
URL should be short (6-7 characters)
High availability

Estimations:

Write QPS = 100M / (30 days * 24 hrs * 3600 sec) = ~40 writes/sec
Read QPS = 40 * 100 = 4000 reads/sec

Storage (5 years):
100M * 12 months * 5 years = 6B URLs
Average URL size = 500 bytes
Total = 6B * 500 bytes = 3TB

Database Schema:

CREATE TABLE urls (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code VARCHAR(10) UNIQUE NOT NULL,
    original_url TEXT NOT NULL,
    created_at TIMESTAMP,
    expires_at TIMESTAMP,
    INDEX(short_code)
);

URL Encoding:

Approach 1: Hash-based

import hashlib

def shorten_url(url):
    hash_value = hashlib.md5(url.encode()).hexdigest()
    short_code = hash_value[:7]  # Take first 7 characters
    return short_code

Problem: Collision (two URLs get same hash)

Solution: Check if exists, if yes, append counter and rehash

Approach 2: Base62 Encoding

def base62_encode(num):
    chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    if num == 0:
        return chars[0]
    
    result = []
    while num:
        result.append(chars[num % 62])
        num //= 62
    
    return ''.join(reversed(result))

# Auto-increment ID = 12345
short_code = base62_encode(12345)  # "3D7"

62^7 = 3.5 trillion possible codes (enough!)

High-Level Design:

Client
   ↓
Load Balancer
   ↓
Application Servers (stateless)
   ↓
   ├→ Write → Master DB
   └→ Read → Read Replicas (3 slaves)
   
Cache (Redis)
- Store popular URLs (80/20 rule)

API Design:

POST /api/v1/shorten
Request:
{
  "url": "https://www.example.com/very/long/url",
  "custom_alias": "mylink" (optional),
  "expires_in": 86400 (optional)
}

Response:
{
  "short_url": "https://short.ly/abc123",
  "created_at": "2026-02-12T10:00:00Z"
}

GET /{short_code}
Response: 301 Redirect to original URL

Workflow:

Create Short URL:

User submits long URL
Generate unique short code (base62 of auto-increment ID)
Store in database: short_code → original_url
Return short URL

Redirect:

User visits short URL
Check cache for short_code
If not in cache, query database
Update cache
Return 301 redirect to original URL

Optimizations:

Cache popular URLs (LRU cache)
Database sharding (hash of short_code)
Read replicas for high read load
CDN for static content
Analytics (track clicks) - async using message queue

Advanced Features:

Custom aliases
Expiration
Analytics (click count, geography)
Rate limiting (prevent spam)

2. Design Pastebin

Requirements:

Users can paste text and get shareable link
Access via link
Optional expiration
Syntax highlighting

Similar to URL shortener with differences:

Store text content (not just URL)
Larger storage (text can be big)
Object storage (S3) for large pastes

Database Schema:

CREATE TABLE pastes (
    id BIGINT PRIMARY KEY,
    short_code VARCHAR(10) UNIQUE,
    content TEXT,  -- or S3 key if large
    language VARCHAR(50),
    expires_at TIMESTAMP,
    created_at TIMESTAMP
);

Storage Strategy:

Small pastes (< 10KB): Store in database
Large pastes: Store in S3, keep S3 key in database

3. Design Rate Limiter

Requirements:

Limit requests per user/IP
Support different limits (100/minute, 1000/hour)
Low latency

Approaches:

1. In-Memory (Single Server)

from collections import defaultdict
import time

class RateLimiter:
    def __init__(self, max_requests, window_seconds):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.requests = defaultdict(list)
    
    def allow_request(self, user_id):
        now = time.time()
        window_start = now - self.window_seconds
        
        # Remove old requests
        self.requests[user_id] = [
            req_time for req_time in self.requests[user_id]
            if req_time > window_start
        ]
        
        # Check limit
        if len(self.requests[user_id]) < self.max_requests:
            self.requests[user_id].append(now)
            return True
        
        return False

2. Distributed (Redis)

def rate_limit(user_id, max_requests, window_seconds):
    key = f"rate_limit:{user_id}"
    now = int(time.time())
    window_start = now - window_seconds
    
    # Redis pipeline for atomicity
    pipe = redis.pipeline()
    
    # Remove old entries
    pipe.zremrangebyscore(key, 0, window_start)
    
    # Count current requests
    pipe.zcard(key)
    
    # Add current request
    pipe.zadd(key, {now: now})
    
    # Set expiration
    pipe.expire(key, window_seconds)
    
    results = pipe.execute()
    request_count = results[1]
    
    return request_count < max_requests

Token Bucket (Recommended):

def token_bucket_rate_limit(user_id, capacity, refill_rate):
    key = f"rate_limit:{user_id}"
    
    # Get current tokens and last refill time
    data = redis.get(key)
    if data:
        tokens, last_refill = json.loads(data)
    else:
        tokens = capacity
        last_refill = time.time()
    
    # Refill tokens
    now = time.time()
    elapsed = now - last_refill
    tokens_to_add = elapsed * refill_rate
    tokens = min(capacity, tokens + tokens_to_add)
    
    # Check if request allowed
    if tokens >= 1:
        tokens -= 1
        redis.set(key, json.dumps([tokens, now]), ex=3600)
        return True
    
    redis.set(key, json.dumps([tokens, last_refill]), ex=3600)
    return False

High-Level Design:

Client Request
   ↓
API Gateway / Load Balancer
   ↓
Rate Limiter (checks Redis)
   ↓
If allowed → Application Server
If blocked → Return 429 Too Many Requests

4. Design Notification System

Requirements:

Send notifications via email, SMS, push
Support millions of users
Handle delivery failures
No duplicate notifications

Components:

API Service - Receives notification requests
Message Queue - Stores notifications to process
Workers - Process notifications
Third-party Services - SendGrid (email), Twilio (SMS), FCM (push)
Database - Store notification status

Flow:

Application → POST /notifications → API Service
                                        ↓
                                   Message Queue
                                        ↓
                    Worker 1 ← Worker 2 ← Worker 3
                        ↓           ↓           ↓
                    SendGrid    Twilio      FCM

Database Schema:

CREATE TABLE notifications (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    type ENUM('email', 'sms', 'push'),
    message TEXT,
    status ENUM('pending', 'sent', 'failed'),
    retry_count INT DEFAULT 0,
    created_at TIMESTAMP,
    sent_at TIMESTAMP
);

API:

POST /api/notifications
{
  "user_id": 123,
  "type": "email",
  "template": "welcome_email",
  "data": {
    "name": "Jash"
  }
}

Worker Logic:

def process_notification(notification):
    try:
        if notification.type == 'email':
            sendgrid.send(notification.email, notification.message)
        elif notification.type == 'sms':
            twilio.send(notification.phone, notification.message)
        elif notification.type == 'push':
            fcm.send(notification.device_token, notification.message)
        
        db.update(notification.id, status='sent', sent_at=now())
    
    except Exception as e:
        if notification.retry_count < 3:
            # Retry with exponential backoff
            queue.send_delayed(notification, delay=2^retry_count * 60)
            db.update(notification.id, retry_count=retry_count+1)
        else:
            db.update(notification.id, status='failed')

Preventing Duplicates:

Idempotency key on notification creation
Deduplication in message queue (SQS FIFO)

User Preferences:

CREATE TABLE notification_preferences (
    user_id BIGINT,
    type ENUM('email', 'sms', 'push'),
    enabled BOOLEAN,
    PRIMARY KEY(user_id, type)
);

5. Design News Feed (Twitter/Facebook)

Requirements:

Users can post updates
Users can follow others
View feed of posts from followed users
Scalable to millions of users

Approaches:

1. Pull Model (Read-Heavy)

User requests feed
→ Fetch list of followed users
→ Fetch recent posts from each
→ Merge and sort by timestamp
→ Return feed

Pros: Simple, saves storage
Cons: Slow for users following many people

2. Push Model (Write-Heavy - Fanout)

User posts update
→ Fetch list of followers
→ Write post to each follower's feed
→ Followers read their pre-computed feed (fast)

Pros: Fast reads
Cons: Expensive writes (celebrity with millions of followers)

3. Hybrid Model

Normal users: Push model
Celebrities: Pull model

Database Schema:

-- Posts
CREATE TABLE posts (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    content TEXT,
    created_at TIMESTAMP,
    INDEX(user_id, created_at)
);

-- Followers
CREATE TABLE followers (
    follower_id BIGINT,
    followee_id BIGINT,
    PRIMARY KEY(follower_id, followee_id)
);

-- Feed (for push model)
CREATE TABLE feed (
    user_id BIGINT,
    post_id BIGINT,
    created_at TIMESTAMP,
    PRIMARY KEY(user_id, created_at)
);

Feed Generation (Push Model):

def create_post(user_id, content):
    # 1. Create post
    post = db.create_post(user_id, content)
    
    # 2. Get followers
    followers = db.get_followers(user_id)
    
    # 3. Fanout to followers (async via queue)
    for follower_id in followers:
        queue.send({
            'action': 'add_to_feed',
            'user_id': follower_id,
            'post_id': post.id
        })
    
    return post

def add_to_feed_worker(message):
    db.insert_feed(message.user_id, message.post_id)

Feed Retrieval:

def get_feed(user_id, page=1, limit=20):
    # Simple read from feed table
    return db.query("""
        SELECT posts.* FROM feed
        JOIN posts ON feed.post_id = posts.id
        WHERE feed.user_id = ?
        ORDER BY feed.created_at DESC
        LIMIT ? OFFSET ?
    """, user_id, limit, (page-1)*limit)

Optimizations:

Cache popular feeds (celebrities)
Pagination with cursor
CDN for media (images, videos)
Lazy loading (load more on scroll)

6. Design YouTube / Video Streaming

Requirements:

Upload videos
Stream videos
Handle different resolutions (360p, 720p, 1080p)
Millions of concurrent viewers

Components:

Upload Service - Handle video uploads
Transcoding Service - Convert to multiple formats/resolutions
Object Storage - Store videos (S3, GCS)
CDN - Distribute videos globally
Metadata Database - Video info, comments, likes

Upload Flow:

User uploads video (large file)
   ↓
Upload Service (chunks upload)
   ↓
Store raw video in S3
   ↓
Trigger transcoding (message queue)
   ↓
Transcoding Service
- Generate 360p, 480p, 720p, 1080p
- Generate thumbnail
   ↓
Store transcoded videos in S3
   ↓
Update metadata in database

Streaming Flow:

User requests video
   ↓
CDN (nearest edge server)
   ↓
If not in CDN, fetch from S3
   ↓
Stream to user (HLS, DASH protocol)

Database Schema:

CREATE TABLE videos (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    title VARCHAR(255),
    description TEXT,
    s3_key VARCHAR(255),  -- Raw video
    thumbnail_url VARCHAR(255),
    duration INT,
    status ENUM('processing', 'ready', 'failed'),
    created_at TIMESTAMP,
    INDEX(user_id),
    INDEX(created_at)
);

CREATE TABLE video_files (
    id BIGINT PRIMARY KEY,
    video_id BIGINT,
    resolution VARCHAR(10),  -- 360p, 720p, etc.
    s3_key VARCHAR(255),
    format VARCHAR(10),  -- mp4, webm
    FOREIGN KEY(video_id) REFERENCES videos(id)
);

Adaptive Bitrate Streaming:

User's bandwidth changes
Video quality adjusts automatically

HLS (HTTP Live Streaming) protocol:

master.m3u8 (playlist)
├── 360p.m3u8
├── 720p.m3u8
└── 1080p.m3u8

Challenges:

Storage: Videos are huge (1 hour 1080p = ~4GB)
- Solution: Compression, delete old/unpopular videos
Bandwidth: Streaming millions of videos
- Solution: CDN, P2P streaming (WebRTC)
Transcoding: CPU-intensive
- Solution: Distributed transcoding, GPU acceleration

7. Design Instagram

Requirements:

Upload photos
Follow users
View feed
Like, comment on photos

Functional Requirements:

Upload photo with caption
Follow/unfollow users
View feed of followed users' photos
Like and comment

Database Schema:

CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    username VARCHAR(50) UNIQUE,
    email VARCHAR(255) UNIQUE,
    created_at TIMESTAMP
);

CREATE TABLE photos (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    image_url VARCHAR(500),
    caption TEXT,
    created_at TIMESTAMP,
    FOREIGN KEY(user_id) REFERENCES users(id),
    INDEX(user_id, created_at)
);

CREATE TABLE follows (
    follower_id BIGINT,
    followee_id BIGINT,
    created_at TIMESTAMP,
    PRIMARY KEY(follower_id, followee_id)
);

CREATE TABLE likes (
    user_id BIGINT,
    photo_id BIGINT,
    created_at TIMESTAMP,
    PRIMARY KEY(user_id, photo_id)
);

CREATE TABLE comments (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    photo_id BIGINT,
    content TEXT,
    created_at TIMESTAMP,
    FOREIGN KEY(user_id) REFERENCES users(id),
    FOREIGN KEY(photo_id) REFERENCES photos(id)
);

Photo Upload Flow:

User uploads photo
   ↓
API Server
   ↓
Upload to S3 (original)
   ↓
Trigger processing (queue)
   ↓
Image Processing Service:
- Resize to multiple sizes (thumbnail, medium, large)
- Apply filters
- Compress
   ↓
Store processed images in S3
   ↓
Store metadata in database
   ↓
Fanout to followers' feeds (like Twitter feed)

Feed Generation:

Same as News Feed design
Hybrid push/pull model

Challenges:

Storage: Photos take space
- Solution: Image compression, CDN, delete old photos
Feed Ranking: Show popular posts first
- Solution: ML-based ranking (engagement score)

Common Interview Questions

1. How would you design a system to handle 1 million concurrent users?

Approach:

Horizontal Scaling
- Add more application servers
- Use load balancer to distribute traffic
Caching
- Cache frequently accessed data (Redis)
- Reduce database load
Database Optimization
- Read replicas for read-heavy workload
- Indexing for faster queries
- Connection pooling
Asynchronous Processing
- Use message queues for non-critical tasks
- Background jobs
CDN
- Serve static content from edge servers
- Reduce latency
Microservices
- Break monolith into services
- Scale services independently
Auto-scaling
- Scale servers based on traffic
- AWS Auto Scaling, Kubernetes HPA

2. How do you ensure data consistency in a distributed system?

Approaches:

Strong Consistency
- Use distributed transactions (2PC, 3PC)
- Example: Google Spanner
- Pros: Always consistent
- Cons: Slow, complex
Eventual Consistency
- Accept temporary inconsistency
- Example: DynamoDB, Cassandra
- Pros: Fast, available
- Cons: Complexity in application logic
Use Saga Pattern
- Break transaction into steps
- Compensate on failure
Event Sourcing
- Store events, derive state
- Replay events for consistency

3. How would you design a system with 99.99% availability?

Strategies:

Redundancy
- Multiple servers, databases
- No single point of failure
Load Balancing
- Distribute traffic across servers
- Health checks
Database Replication
- Master-slave replication
- Automatic failover
Monitoring & Alerts
- Detect failures quickly
- Prometheus, Grafana, PagerDuty
Graceful Degradation
- If service fails, degrade gracefully
- Example: Netflix continues with cached data
Circuit Breaker
- Stop calling failed services
- Prevent cascade failures
Multi-region Deployment
- Deploy in multiple AWS regions
- If one region fails, others serve

Calculation:
99.99% uptime = 52.56 minutes downtime per year

4. How do you handle database schema changes in production?

Strategies:

Blue-Green Deployment
- Deploy new version alongside old
- Switch traffic when ready
- Rollback if issues
Backward Compatible Changes
- Add new column (nullable)
- Deploy code that uses new column
- Populate data
- Remove old column later
Online Schema Change Tools
- pt-online-schema-change (Percona)
- gh-ost (GitHub)
- Make changes without locking table
Database Migrations
- Version controlled migration scripts
- Alembic (Python), Flyway (Java)

Example:

-- Step 1: Add new column (nullable)
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);

-- Step 2: Deploy code that writes to both name and full_name

-- Step 3: Backfill data
UPDATE users SET full_name = CONCAT(first_name, ' ', last_name);

-- Step 4: Make column NOT NULL
ALTER TABLE users MODIFY full_name VARCHAR(255) NOT NULL;

-- Step 5: Deploy code that only uses full_name

-- Step 6: Drop old columns
ALTER TABLE users DROP COLUMN first_name, DROP COLUMN last_name;

5. How would you debug a slow API?

Steps:

Check Logs
- Look for errors
- Check API latency
Application Performance Monitoring (APM)
- New Relic, DataDog
- Identify slow database queries, API calls
Database Query Analysis
- Use EXPLAIN on SQL queries
- Check if indexes used
- Slow query log
Check External Dependencies
- Third-party API calls
- Timeouts?
Profiling
- CPU profiling (which function is slow?)
- Memory profiling (memory leaks?)
Load Testing
- JMeter, Locust
- Identify bottlenecks
Caching
- Are we caching frequently accessed data?
- Cache hit rate?

Common Culprits:

N+1 query problem
Missing indexes
Large response payloads
Synchronous external API calls
No connection pooling

Domain-Specific System Design Topics

1. Design Stock Portfolio Tracker

Requirements:

Track user's stock holdings
Real-time portfolio value
Historical performance
Support millions of users

Database Schema:

CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255) UNIQUE
);

CREATE TABLE portfolios (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    name VARCHAR(255),
    created_at TIMESTAMP,
    FOREIGN KEY(user_id) REFERENCES users(id)
);

CREATE TABLE holdings (
    id BIGINT PRIMARY KEY,
    portfolio_id BIGINT,
    stock_symbol VARCHAR(10),
    quantity DECIMAL(15, 2),
    avg_buy_price DECIMAL(15, 2),
    last_updated TIMESTAMP,
    FOREIGN KEY(portfolio_id) REFERENCES portfolios(id)
);

CREATE TABLE transactions (
    id BIGINT PRIMARY KEY,
    portfolio_id BIGINT,
    stock_symbol VARCHAR(10),
    type ENUM('BUY', 'SELL'),
    quantity DECIMAL(15, 2),
    price DECIMAL(15, 2),
    executed_at TIMESTAMP,
    FOREIGN KEY(portfolio_id) REFERENCES portfolios(id)
);

CREATE TABLE stock_prices (
    symbol VARCHAR(10),
    price DECIMAL(15, 2),
    timestamp TIMESTAMP,
    PRIMARY KEY(symbol, timestamp)
);

Real-time Portfolio Value:

Approach 1: Calculate on Demand

def get_portfolio_value(portfolio_id):
    holdings = db.get_holdings(portfolio_id)
    total_value = 0
    
    for holding in holdings:
        current_price = get_stock_price(holding.symbol)  # External API or cache
        value = holding.quantity * current_price
        total_value += value
    
    return total_value

Problem: Slow if portfolio has many stocks

Approach 2: Pre-compute and Cache

# Batch job every minute
def update_portfolio_values():
    portfolios = db.get_all_portfolios()
    stock_prices = get_all_stock_prices()  # Batch fetch
    
    for portfolio in portfolios:
        holdings = db.get_holdings(portfolio.id)
        total_value = 0
        
        for holding in holdings:
            current_price = stock_prices[holding.symbol]
            value = holding.quantity * current_price
            total_value += value
        
        # Cache portfolio value
        cache.set(f"portfolio_value:{portfolio.id}", total_value, ttl=60)

Get portfolio value:

def get_portfolio_value(portfolio_id):
    # Try cache first
    value = cache.get(f"portfolio_value:{portfolio_id}")
    if value:
        return value
    
    # Calculate and cache
    value = calculate_portfolio_value(portfolio_id)
    cache.set(f"portfolio_value:{portfolio_id}", value, ttl=60)
    return value

Real-time Stock Prices:

Option 1: Poll External API

Fetch prices every minute
Store in cache
Cons: Rate limits, latency

Option 2: WebSocket from Exchange

Live price updates
Push to clients via WebSocket

Stock Exchange (WebSocket) → Our Server → Redis Pub/Sub → Client WebSocket

Historical Performance:

CREATE TABLE portfolio_snapshots (
    portfolio_id BIGINT,
    value DECIMAL(15, 2),
    date DATE,
    PRIMARY KEY(portfolio_id, date)
);

Daily job:

# Run at market close
def create_daily_snapshots():
    portfolios = db.get_all_portfolios()
    stock_prices = get_closing_prices()
    
    for portfolio in portfolios:
        value = calculate_portfolio_value(portfolio.id, stock_prices)
        db.insert_snapshot(portfolio.id, value, today())

Charts:

def get_performance(portfolio_id, period='1M'):
    snapshots = db.query("""
        SELECT date, value FROM portfolio_snapshots
        WHERE portfolio_id = ? AND date >= ?
        ORDER BY date
    """, portfolio_id, start_date(period))
    
    return snapshots

2. Design Order Execution System

Requirements:

Place buy/sell orders
Execute at broker/exchange
Handle failures
Idempotent
Track status

Order States:

PENDING → SUBMITTED → EXECUTED → SETTLED
            ↓
         FAILED/CANCELLED

Database Schema:

CREATE TABLE orders (
    id BIGINT PRIMARY KEY,
    user_id BIGINT,
    portfolio_id BIGINT,
    stock_symbol VARCHAR(10),
    type ENUM('BUY', 'SELL'),
    quantity DECIMAL(15, 2),
    price DECIMAL(15, 2),
    status ENUM('PENDING', 'SUBMITTED', 'EXECUTED', 'SETTLED', 'FAILED', 'CANCELLED'),
    broker_order_id VARCHAR(255),
    created_at TIMESTAMP,
    executed_at TIMESTAMP,
    INDEX(user_id, created_at),
    INDEX(status)
);

Order Flow:

1. User places order
   ↓
2. Validate (sufficient balance, valid quantity)
   ↓
3. Create order record (status=PENDING)
   ↓
4. Enqueue order processing
   ↓
5. Worker picks order
   ↓
6. Call broker API (with idempotency key)
   ↓
7. Update status (SUBMITTED)
   ↓
8. Poll broker for execution status
   ↓
9. Update status (EXECUTED)
   ↓
10. Update holdings
   ↓
11. Update status (SETTLED)

Idempotency:

def submit_order_to_broker(order):
    idempotency_key = f"order:{order.id}"
    
    # Check if already submitted
    broker_order = cache.get(idempotency_key)
    if broker_order:
        return broker_order
    
    # Submit to broker
    broker_order = broker_api.place_order(
        symbol=order.stock_symbol,
        quantity=order.quantity,
        price=order.price,
        idempotency_key=idempotency_key
    )
    
    # Cache result
    cache.set(idempotency_key, broker_order, ttl=86400)
    
    # Update order
    db.update_order(order.id, {
        'status': 'SUBMITTED',
        'broker_order_id': broker_order.id
    })
    
    return broker_order

Handling Failures:

def process_order(order_id):
    order = db.get_order(order_id)
    
    try:
        # Submit to broker
        broker_order = submit_order_to_broker(order)
        
        # Poll for execution
        for _ in range(10):  # Max 10 attempts
            status = broker_api.get_order_status(broker_order.id)
            
            if status == 'EXECUTED':
                db.update_order(order.id, {'status': 'EXECUTED'})
                update_holdings(order)
                db.update_order(order.id, {'status': 'SETTLED'})
                break
            elif status == 'FAILED':
                db.update_order(order.id, {'status': 'FAILED'})
                break
            
            time.sleep(5)  # Poll every 5 seconds
    
    except BrokerAPIException as e:
        db.update_order(order.id, {'status': 'FAILED', 'error': str(e)})
        
        # Retry with exponential backoff
        if order.retry_count < 3:
            queue.send_delayed(order, delay=2^order.retry_count * 60)
            db.update_order(order.id, {'retry_count': order.retry_count + 1})

Scalability:

Shard orders by user_id
Use message queue for async processing
Multiple workers for parallel execution

3. Design Real-time Stock Price Updates

Requirements:

Display real-time prices to millions of users
Low latency (<100ms)
Handle market volatility (high update rate)

Approaches:

1. Server-Sent Events (SSE)

Server → Client (one-way stream)

# Server
@app.route('/stream/')
def stream_price(symbol):
    def generate():
        pubsub = redis.pubsub()
        pubsub.subscribe(f'stock:{symbol}')
        
        for message in pubsub.listen():
            if message['type'] == 'message':
                yield f"data: {message['data']}\n\n"
    
    return Response(generate(), mimetype='text/event-stream')

# Client (JavaScript)
const eventSource = new EventSource('/stream/RELIANCE');
eventSource.onmessage = (event) => {
    const price = JSON.parse(event.data);
    updateUI(price);
};

2. WebSockets

Server ↔ Client (two-way)

# Server
@socketio.on('subscribe')
def handle_subscribe(symbol):
    join_room(f'stock:{symbol}')

# When price updates
def broadcast_price(symbol, price):
    socketio.emit('price_update', {'symbol': symbol, 'price': price}, 
                  room=f'stock:{symbol}')

Architecture:

Stock Exchange API (WebSocket)
   ↓
Price Aggregator Service
   ↓
Redis Pub/Sub
   ↓
WebSocket Servers (multiple instances)
   ↓
Clients

Optimizations:

CDN for WebSocket (CloudFlare, AWS Global Accelerator)
Throttle updates (send max 1 update per second per stock)
Batch updates (send multiple stock prices together)
Lazy loading (only subscribe to stocks user is viewing)

Key Takeaways & Interview Tips

1. Framework for System Design Interviews

Step 1: Clarify Requirements (5 mins)

Ask questions about functional requirements
Ask about scale (users, requests, data size)
Ask about non-functional requirements (availability, consistency, latency)

Example Questions:

How many users?
Read vs write ratio?
Consistency vs availability priority?
Average/peak load?
Data retention period?

Step 2: High-Level Design (10 mins)

Draw components: Client, Load Balancer, Servers, Database, Cache
Explain data flow
Justify choices

Step 3: Deep Dive (15 mins)

Pick 1-2 components to go deep
Discuss algorithms, data structures
Talk about trade-offs
Optimize bottlenecks

Step 4: Address Bottlenecks (5 mins)

Identify limitations
Propose solutions
Discuss scalability, fault tolerance

2. What Interviewers Look For

Communication
- Think aloud
- Ask clarifying questions
- Explain trade-offs
Problem Solving
- Break down complex problem
- Identify constraints
- Propose solutions
Technical Knowledge
- Understanding of systems
- Knowledge of trade-offs
- Awareness of technologies
Pragmatism
- Don't over-engineer
- Start simple, iterate
- Consider real-world constraints

3. Common Mistakes to Avoid

Jumping into solution without clarifying requirements
- Always ask questions first
Over-engineering
- Start simple, add complexity if needed
- Don't use Kafka if a simple queue suffices
Ignoring scale
- Always think about scalability
- But don't optimize prematurely
Not considering trade-offs
- Every decision has pros/cons
- Discuss them
Poor communication
- Don't stay silent
- Explain your thought process

4. Must-Know Concepts Summary

Foundational:

Horizontal vs Vertical Scaling
CAP Theorem
Load Balancing
Caching
Database Replication & Sharding

Intermediate:

Microservices Architecture
Message Queues
API Design (REST, idempotency)
Rate Limiting
Consistent Hashing

Advanced:

Distributed Transactions (Saga)
Event Sourcing & CQRS
Circuit Breaker
Service Mesh
Distributed Tracing

5. Technologies to Know

Databases:

SQL: PostgreSQL, MySQL
NoSQL: MongoDB, Redis, Cassandra, DynamoDB

Caching:

Redis, Memcached

Message Queues:

RabbitMQ, Kafka, AWS SQS

Load Balancers:

Nginx, HAProxy, AWS ELB

Monitoring:

Prometheus, Grafana, ELK stack

Cloud:

AWS: EC2, S3, RDS, ElastiCache, SQS, Lambda
Basics of other clouds (GCP, Azure)

Practice Questions

Easy

Design a URL shortener
Design Pastebin
Design a rate limiter
Design a cache system

Medium

Design Instagram
Design Twitter feed
Design notification system
Design API gateway
Design leaderboard system
Design stock portfolio tracker

Hard

Design YouTube
Design Uber
Design distributed cache
Design search autocomplete
Design payment system

System Design - Complete Interview Preparation Guide

Table of Contents

Introduction to System Design

What is System Design?

The System Design Interview Process

Fundamental Concepts

1. Scalability

2. Latency vs Throughput

3. Availability vs Consistency (CAP Theorem)

4. Consistency Models

5. Load Balancing Fundamentals

6. Database Sharding

7. Replication

8. Partitioning vs Sharding vs Replication

Core Components

1. DNS (Domain Name System)

2. CDN (Content Delivery Network)

3. Reverse Proxy

4. Application Servers

5. Databases

Scalability Patterns

1. Horizontal vs Vertical Scaling (Detailed)

2. Caching Patterns (Detailed in separate section)

3. Database Scaling Patterns

4. Asynchronous Processing

5. Batch Processing

6. Stream Processing

Database Design

1. Normalization

2. Denormalization

3. Indexing

4. Database Transactions and ACID

5. Connection Pooling

Caching Strategies

1. Why Cache?

2. Cache Levels

3. Caching Patterns

4. Cache Invalidation

5. Cache Eviction Policies

6. Cache Stampede / Thundering Herd

7. Distributed Caching Challenges

Message Queues and Async Processing

1. What is a Message Queue?

2. Use Cases

3. Queue vs Pub/Sub

4. Popular Message Queue Systems

5. Message Queue Patterns

6. Message Delivery Guarantees

7. Idempotency (Important!)

8. Handling Failures

9. Message Queue Best Practices

Microservices Architecture

1. Monolith vs Microservices

2. When to Use Microservices?

3. Microservice Design Principles

4. Service Communication

5. Service Discovery

6. API Gateway

7. Data Management in Microservices

8. Microservices Challenges & Solutions

9. Microservices Best Practices

API Design

1. REST API Design

2. API Versioning

3. Pagination

4. Rate Limiting

5. API Security

6. Idempotency in APIs

7. API Documentation

Real-World System Design Examples

1. Design URL Shortener (bit.ly)

2. Design Pastebin

3. Design Rate Limiter

4. Design Notification System

5. Design News Feed (Twitter/Facebook)

6. Design YouTube / Video Streaming

7. Design Instagram

Common Interview Questions

1. How would you design a system to handle 1 million concurrent users?

2. How do you ensure data consistency in a distributed system?