What is System Design?
System design is the process of defining the architecture, components, and data flow of a large-scale software system to meet functional and non-functional requirements.
Why It Matters
- Core topic in senior/staff engineer interviews (L5+)
- Teaches you to think about trade-offs, not just code
- Required to build systems that handle millions of users
Interview Framework
- Clarify requirements — functional (what it does) and non-functional (scale, latency)
- Estimate scale — QPS, storage, bandwidth
- High-level design — components and their connections
- Detailed design — deep dive into 2-3 key components
- Identify bottlenecks and how to address them
Scalability
Vertical Scaling (Scale Up)
Add more CPU/RAM to a single server. Simple but has an upper limit and single point of failure.
Horizontal Scaling (Scale Out)
Add more servers. Needs a load balancer. No theoretical limit. Standard approach for web apps.
Stateless Services
Keep no session state on the server — store it in Redis or a DB. Required for horizontal scaling.
Database Sharding
Split data across multiple DB servers by a shard key (e.g. user_id % N). Enables massive scale.
CAP Theorem
In a distributed system, you can only guarantee two out of three:
Consistency (C)
Every read gets the most recent write. All nodes see the same data at the same time.
Availability (A)
Every request gets a response (not guaranteed to be the latest). The system stays up.
Partition Tolerance (P)
The system continues operating even if some network messages are lost between nodes.
In practice, network partitions happen — so you must choose between CP (consistent but may be unavailable during a partition) or AP (available but may serve stale data).
- CP — ZooKeeper, HBase, traditional SQL
- AP — Cassandra, DynamoDB, CouchDB
Load Balancers
A load balancer distributes incoming traffic across multiple servers to prevent any one server from being overwhelmed.
Algorithms
- Round Robin — requests distributed in rotation
- Least Connections — route to the server with fewest active connections
- IP Hash — same client always goes to same server (sticky sessions)
- Weighted Round Robin — more powerful servers get more traffic
Layer 4 vs Layer 7
- L4 (Transport) — routes by IP/TCP without looking at content; faster
- L7 (Application) — can route by URL, headers, cookies; enables path-based routing
Caching
Caching stores frequently accessed data in fast memory (RAM) to reduce DB load and response latency.
Cache Strategies
- Cache-Aside (Lazy Loading) — app checks cache first; on miss, loads from DB and writes to cache. Most common.
- Write-Through — write to cache and DB simultaneously. Consistent but slower writes.
- Write-Back — write to cache only; flush to DB asynchronously. Fast writes, risk of data loss.
Eviction Policies
- LRU — Least Recently Used (most common)
- LFU — Least Frequently Used
- TTL — expire after a time period
Tools
- Redis — in-memory data store; supports strings, hashes, lists, sets. Use for session, leaderboards, pub/sub.
- Memcached — simple key-value cache; faster for pure caching use cases.
Databases — SQL vs NoSQL
SQL (Relational)
Structured schema, ACID transactions, complex joins. MySQL, PostgreSQL. Good for financial, inventory, user data.
NoSQL (Document)
Flexible schema, horizontal scaling. MongoDB, DynamoDB. Good for catalogues, user profiles, content.
NoSQL (Wide-Column)
Cassandra, HBase. Excellent write throughput. Good for time-series, IoT, activity logs.
NoSQL (Graph)
Neo4j. Models relationships natively. Good for social graphs, recommendation engines.
Database Replication
- Master-Slave — writes go to master, reads from replicas. Most common.
- Multi-Master — writes to any node; complex conflict resolution.
Message Queues
Message queues decouple producers from consumers, enabling async processing and absorbing traffic spikes.
Use Cases
- Sending emails/notifications asynchronously after a user action
- Processing uploads (video encoding, image resizing) in the background
- Distributing work across multiple worker instances
- Event-driven microservices communication
Tools
- Kafka — high-throughput event streaming; replay messages; used at Netflix, LinkedIn
- RabbitMQ — traditional message broker; flexible routing; easier to set up than Kafka
- AWS SQS — fully managed queue; great for AWS-native apps
CDN (Content Delivery Network)
A CDN caches static assets (images, CSS, JS, videos) on servers geographically close to users, reducing latency and origin server load.
How It Works
- User requests
cdn.example.com/logo.png - CDN routes to the nearest edge server (PoP)
- If cached → serve immediately (cache hit)
- If not → fetch from origin, cache, then serve (cache miss)
Tools
- CloudFront — AWS CDN, integrates with S3 and EC2
- Cloudflare — CDN + DDoS protection + DNS
- Fastly — programmable CDN, used by GitHub, Stripe
Microservices
Microservices break a monolith into small, independently deployable services, each owning its domain and data.
Monolith vs Microservices
Monolith
Simpler to build, test, and deploy initially. Single deployment unit. Scales poorly at high load.
Microservices
Independent scale, deploy, and tech stack per service. Complex ops (service discovery, distributed tracing).
When to Use Microservices
- Different parts of the system have very different scaling needs
- Multiple teams working independently
- Different reliability requirements per domain
API Gateway
An API Gateway is the single entry point for all client requests in a microservices architecture. It handles cross-cutting concerns so services don't have to.
Responsibilities
- Request routing to the correct microservice
- Authentication & authorisation (validate JWT before forwarding)
- Rate limiting & throttling
- SSL termination
- Response aggregation (combine data from multiple services)
- Logging & monitoring
Tools
- AWS API Gateway
- Kong
- Nginx (simple reverse proxy)
- Spring Cloud Gateway
Rate Limiting
Rate limiting controls how many requests a client can make in a time window, protecting APIs from abuse and DDoS attacks.
Algorithms
- Token Bucket — tokens refill at a fixed rate; burst allowed up to bucket capacity. Most common.
- Leaky Bucket — requests processed at a constant rate regardless of burst; smooths traffic.
- Fixed Window Counter — count requests per time window; simple but vulnerable to edge spikes.
- Sliding Window Log — accurate but memory-intensive.
Implementation
- Store counters in Redis (fast, shared across server instances)
- Return
429 Too Many Requestswith aRetry-Afterheader - Use Nginx
limit_reqmodule for simple cases
Case Study: URL Shortener
Design a system like bit.ly that converts long URLs to short codes and redirects users.
Requirements
- Shorten a URL → return a short code (e.g.
7ds.in/aB3kR) - Redirect short URL to original URL in < 10ms
- 100M new URLs/day; 10B redirects/day (100:1 read/write ratio)
Design Decisions
- Short code generation — Base62 encode a unique ID (integers → a-zA-Z0-9, 6-7 chars)
- Storage — MySQL:
(id, short_code, long_url, created_at); index onshort_code - Cache — Redis for hot URLs; 80% of reads likely hit 20% of URLs (Pareto principle)
- Redirect —
301 Permanentfor SEO (browser caches);302 Temporaryif analytics needed - Scaling — stateless app servers behind a load balancer; DB read replicas for redirect queries
→ Check Redis cache → hit → 302 redirect to long URL
→ miss → query MySQL → write to cache → redirect
Case Study: News Feed
Design the Facebook/Twitter home feed — show a user a ranked list of posts from people they follow.
Two Approaches
Fan-Out on Write (Push)
When user A posts, precompute and push the post to all followers' feed caches.
- Reads are fast — feed is pre-built in Redis
- Writes are expensive for celebrities (millions of followers)
Fan-Out on Read (Pull)
When user B loads their feed, query posts from everyone they follow and merge.
- Writes are cheap
- Reads are slow at scale
Hybrid Approach (used by Instagram/Twitter)
- Push for normal users (small follower counts)
- Pull for celebrities — fetch on read, merge with pre-built feed
Key Components
- Post storage — Cassandra (time-series writes)
- Feed cache — Redis sorted set (score = timestamp)
- Media — S3 + CDN
- Feed generation — async Kafka consumer per post publish event