What is System Design?

🏗️ System Design · Foundations · 5 min read

System design is the process of defining the architecture, components, and data flow of a large-scale software system to meet functional and non-functional requirements.

Why It Matters

Core topic in senior/staff engineer interviews (L5+)
Teaches you to think about trade-offs, not just code
Required to build systems that handle millions of users

Interview Framework

Clarify requirements — functional (what it does) and non-functional (scale, latency)
Estimate scale — QPS, storage, bandwidth
High-level design — components and their connections
Detailed design — deep dive into 2-3 key components
Identify bottlenecks and how to address them

Scalability

🏗️ System Design · Foundations · 6 min read

Vertical Scaling (Scale Up)

Add more CPU/RAM to a single server. Simple but has an upper limit and single point of failure.

Horizontal Scaling (Scale Out)

Add more servers. Needs a load balancer. No theoretical limit. Standard approach for web apps.

Stateless Services

Keep no session state on the server — store it in Redis or a DB. Required for horizontal scaling.

Database Sharding

Split data across multiple DB servers by a shard key (e.g. user_id % N). Enables massive scale.

CAP Theorem

🏗️ System Design · Foundations · 5 min read

In a distributed system, you can only guarantee two out of three:

Consistency (C)

Every read gets the most recent write. All nodes see the same data at the same time.

Availability (A)

Every request gets a response (not guaranteed to be the latest). The system stays up.

Partition Tolerance (P)

The system continues operating even if some network messages are lost between nodes.

In practice, network partitions happen — so you must choose between CP (consistent but may be unavailable during a partition) or AP (available but may serve stale data).

CP — ZooKeeper, HBase, traditional SQL
AP — Cassandra, DynamoDB, CouchDB

Load Balancers

🏗️ System Design · Core Components · 5 min read

A load balancer distributes incoming traffic across multiple servers to prevent any one server from being overwhelmed.

Algorithms

Round Robin — requests distributed in rotation
Least Connections — route to the server with fewest active connections
IP Hash — same client always goes to same server (sticky sessions)
Weighted Round Robin — more powerful servers get more traffic

Layer 4 vs Layer 7

L4 (Transport) — routes by IP/TCP without looking at content; faster
L7 (Application) — can route by URL, headers, cookies; enables path-based routing

💡 AWS ALB (Application Load Balancer) is L7. AWS NLB (Network Load Balancer) is L4. Use ALB for most web apps.

Caching

🏗️ System Design · Core Components · 6 min read

Caching stores frequently accessed data in fast memory (RAM) to reduce DB load and response latency.

Cache Strategies

Cache-Aside (Lazy Loading) — app checks cache first; on miss, loads from DB and writes to cache. Most common.
Write-Through — write to cache and DB simultaneously. Consistent but slower writes.
Write-Back — write to cache only; flush to DB asynchronously. Fast writes, risk of data loss.

Eviction Policies

LRU — Least Recently Used (most common)
LFU — Least Frequently Used
TTL — expire after a time period

Tools

Redis — in-memory data store; supports strings, hashes, lists, sets. Use for session, leaderboards, pub/sub.
Memcached — simple key-value cache; faster for pure caching use cases.

Databases — SQL vs NoSQL

🏗️ System Design · Core Components · 7 min read

SQL (Relational)

Structured schema, ACID transactions, complex joins. MySQL, PostgreSQL. Good for financial, inventory, user data.

NoSQL (Document)

Flexible schema, horizontal scaling. MongoDB, DynamoDB. Good for catalogues, user profiles, content.

NoSQL (Wide-Column)

Cassandra, HBase. Excellent write throughput. Good for time-series, IoT, activity logs.

NoSQL (Graph)

Neo4j. Models relationships natively. Good for social graphs, recommendation engines.

Database Replication

Master-Slave — writes go to master, reads from replicas. Most common.
Multi-Master — writes to any node; complex conflict resolution.

Message Queues

🏗️ System Design · Core Components · 5 min read

Message queues decouple producers from consumers, enabling async processing and absorbing traffic spikes.

Use Cases

Sending emails/notifications asynchronously after a user action
Processing uploads (video encoding, image resizing) in the background
Distributing work across multiple worker instances
Event-driven microservices communication

Tools

Kafka — high-throughput event streaming; replay messages; used at Netflix, LinkedIn
RabbitMQ — traditional message broker; flexible routing; easier to set up than Kafka
AWS SQS — fully managed queue; great for AWS-native apps

CDN (Content Delivery Network)

🏗️ System Design · Core Components · 4 min read

A CDN caches static assets (images, CSS, JS, videos) on servers geographically close to users, reducing latency and origin server load.

How It Works

User requests cdn.example.com/logo.png
CDN routes to the nearest edge server (PoP)
If cached → serve immediately (cache hit)
If not → fetch from origin, cache, then serve (cache miss)

Tools

CloudFront — AWS CDN, integrates with S3 and EC2
Cloudflare — CDN + DDoS protection + DNS
Fastly — programmable CDN, used by GitHub, Stripe

Microservices

🏗️ System Design · Patterns · 6 min read

Microservices break a monolith into small, independently deployable services, each owning its domain and data.

Monolith vs Microservices

Monolith

Simpler to build, test, and deploy initially. Single deployment unit. Scales poorly at high load.

Microservices

Independent scale, deploy, and tech stack per service. Complex ops (service discovery, distributed tracing).

When to Use Microservices

Different parts of the system have very different scaling needs
Multiple teams working independently
Different reliability requirements per domain

💡 Start with a monolith. Extract microservices only when you have clear pain points — premature decomposition is a common mistake.

API Gateway

🏗️ System Design · Patterns · 5 min read

An API Gateway is the single entry point for all client requests in a microservices architecture. It handles cross-cutting concerns so services don't have to.

Responsibilities

Request routing to the correct microservice
Authentication & authorisation (validate JWT before forwarding)
Rate limiting & throttling
SSL termination
Response aggregation (combine data from multiple services)
Logging & monitoring

Tools

AWS API Gateway
Kong
Nginx (simple reverse proxy)
Spring Cloud Gateway

Rate Limiting

🏗️ System Design · Patterns · 5 min read

Rate limiting controls how many requests a client can make in a time window, protecting APIs from abuse and DDoS attacks.

Algorithms

Token Bucket — tokens refill at a fixed rate; burst allowed up to bucket capacity. Most common.
Leaky Bucket — requests processed at a constant rate regardless of burst; smooths traffic.
Fixed Window Counter — count requests per time window; simple but vulnerable to edge spikes.
Sliding Window Log — accurate but memory-intensive.

Implementation

Store counters in Redis (fast, shared across server instances)
Return 429 Too Many Requests with a Retry-After header
Use Nginx limit_req module for simple cases

Case Study: URL Shortener

🏗️ System Design · Case Studies · 8 min read

Design a system like bit.ly that converts long URLs to short codes and redirects users.

Requirements

Shorten a URL → return a short code (e.g. 7ds.in/aB3kR)
Redirect short URL to original URL in < 10ms
100M new URLs/day; 10B redirects/day (100:1 read/write ratio)

Design Decisions

Short code generation — Base62 encode a unique ID (integers → a-zA-Z0-9, 6-7 chars)
Storage — MySQL: (id, short_code, long_url, created_at); index on short_code
Cache — Redis for hot URLs; 80% of reads likely hit 20% of URLs (Pareto principle)
Redirect — 301 Permanent for SEO (browser caches); 302 Temporary if analytics needed
Scaling — stateless app servers behind a load balancer; DB read replicas for redirect queries

GET /aB3kR

→ Check Redis cache → hit → 302 redirect to long URL

→ miss → query MySQL → write to cache → redirect

Case Study: News Feed

🏗️ System Design · Case Studies · 8 min read

Design the Facebook/Twitter home feed — show a user a ranked list of posts from people they follow.

Two Approaches

Fan-Out on Write (Push)

When user A posts, precompute and push the post to all followers' feed caches.

Reads are fast — feed is pre-built in Redis
Writes are expensive for celebrities (millions of followers)

Fan-Out on Read (Pull)

When user B loads their feed, query posts from everyone they follow and merge.

Writes are cheap
Reads are slow at scale

Hybrid Approach (used by Instagram/Twitter)

Push for normal users (small follower counts)
Pull for celebrities — fetch on read, merge with pre-built feed

Key Components

Post storage — Cassandra (time-series writes)
Feed cache — Redis sorted set (score = timestamp)
Media — S3 + CDN
Feed generation — async Kafka consumer per post publish event