500 million tweets per day. 350 million monthly active users. A timeline that loads in under 2 seconds for everyone, everywhere, all at once.
Most system design guides will tell you "use a cache" and move on. That's not good enough.
As a senior AI PM, you need to understand why Twitter's architecture looks the way it does, what product decisions live inside each engineering choice, and what trade-offs your team is navigating every sprint.
Twitter/X is one of the most instructive system designs in tech — it combines real-time ingestion, graph traversal, ML feed ranking, and global distribution at a scale that breaks every naive assumption.
This is Part 2 of my Core System Design & AI System Design series — built specifically for AI product managers and data professionals who want to go beyond surface-level architecture.
If you haven't read Part 1 (YouTube System Design), start there first.
This is the breakdown that actually covers it all. 👇
📌 TL;DR
Twitter handles 500M+ tweets/day, 350M MAU, and timelines that must load in <2s globally
The hardest problem in Twitter's architecture isn't storing tweets — it's delivering them to the right people, in the right order, instantly
Five critical subsystems: Tweet ingestion pipeline, Timeline fanout service, ML feed ranking, Real-time search, and Notification system
The "fanout on write vs. fanout on read" trade-off is the most important architectural decision in social feed design — and it's a product decision, not a technical one
Celebrities with 100M followers get special architectural treatment — their tweets are not fanned out at write time. That's a deliberate product exception baked into the system

Twitter/X System Design complete Architecture Diagram
📊 The Numbers That Define Every Decision
Scale isn't a detail — it's the constraint that shapes every architectural choice. Before you design Twitter, internalize these:
Metric | Scale |
|---|---|
Daily active users | 250+ million |
Monthly active users | 350+ million |
Tweets per day | 500+ million |
Tweets per second (peak) | 150,000+ |
Timeline requests per second | 300,000+ |
Follows (graph edges) | 200+ billion |
Average followers per user | ~200 |
Celebrity accounts (10M+ followers) | Thousands |
Search queries per day | 2.1+ billion |
Notifications delivered per day | Billions |
These numbers immediately tell you what the system must do:
A single database cannot hold 200 billion follow relationships — the graph must be sharded
A naive "query all tweets from people you follow" approach would time out at 300,000 timeline requests/second
Storing tweets is the easy part — delivering them is the hard part
Celebrity accounts break every normal assumption — 100M followers means 100M writes on every tweet
Search must be real-time: a tweet posted 30 seconds ago must appear in search results
✅ Functional Requirements: What Twitter/X Must Do
As a senior AI PM, you don't just list features — you prioritize them.
Feature | Description | Priority |
|---|---|---|
Post a tweet | Text (280 chars), images, video, polls, threads | P0 |
Home timeline | Personalized feed of followed accounts + ML-ranked content | P0 |
Follow / unfollow | Build and maintain social graph | P0 |
Real-time search | Search tweets, users, hashtags in near real-time | P0 |
Notifications | Likes, replies, retweets, mentions, follows | P0 |
Direct messages | Encrypted 1:1 and group messaging | P1 |
Trending topics | Real-time hashtag and topic detection globally | P1 |
Lists | Curated feeds from specific accounts | P1 |
Spaces (audio) | Live audio rooms with speaker/listener model | P1 |
Ads / Promoted tweets | Paid content inserted into timelines | P1 |
Bookmarks | Save tweets privately | P2 |
Analytics (creator) | Impressions, engagements, profile visits | P2 |
💡 My Take: In a system design interview, the moment you write "post a tweet" as your only P0, you signal junior thinking. The hardest P0 is the home timeline — it's the feature that determines whether the product feels alive. Everything else is easier. A senior PM defines scope by what's architecturally hardest to get right, not just what users interact with most.
⚙️ Non-Functional Requirements: Where Architecture Gets Designed
NFRs are where every architectural box-and-arrow decision flows from.
Requirement | Target | Justification |
|---|---|---|
Timeline load time | <2s p99 globally | Drop-off spikes above 3s — engagement falls off a cliff |
Tweet posting latency | <200ms confirmation | Creator experience — don't make people wait to publish |
Search freshness | <30s for new tweets | Real-time value proposition — news breaks on Twitter |
Availability | 99.99% | Revenue and reputation — outages trend on their own platform |
Notification delivery | <5s for push | Engagement loop — delayed likes kill the dopamine cycle |
Storage durability | 11 nines | Tweets are permanent records |
Consistency (like counts) | Eventual | Strong consistency at 500M tweets/day creates global locks |
Fan-out latency | <5s for non-celebrity | Timeline must feel real-time |
DM delivery | <1s | Messaging expectation is instant |
Throughput | 150,000+ tweets/sec | Peak event volumes (elections, sports, breaking news) |
💡 My Take: The most important NFR on this list isn't availability — it's search freshness. Twitter's entire value proposition during breaking news events is that it's faster than TV. If a tweet takes 5 minutes to appear in search, the product fails its core use case. This NFR is where the real-time search architecture comes from.
🗂️ High-Level Architecture: The Five Major Subsystems
Twitter's architecture breaks into 5 independently scalable subsystems. Each has its own scaling profile, failure mode, and data access pattern.
Client (Web / iOS / Android)
↓
API Gateway / Load Balancer
↓
┌──────────────────────────────────────────────────────┐
│ 1. Tweet Ingestion Pipeline │
│ 2. Timeline Fanout Service │
│ 3. ML Feed Ranking Engine │
│ 4. Real-Time Search │
│ 5. Notification System │
└──────────────────────────────────────────────────────┘
↓
Storage Layer (Graph DB + Cache + Object Storage + Search Index)Coupling these subsystems would mean one bottleneck takes down the entire product. They must be independently deployable, independently scalable, and independently failure-tolerant.
1️⃣ Tweet Ingestion Pipeline
This is the entry point — and the most underestimated subsystem. Every downstream service depends on this pipeline being fast, durable, and correct.

Key Design Decisions
🆔 Snowflake ID generation Every tweet gets a 64-bit globally unique ID that encodes: timestamp (41 bits) + datacenter ID (5 bits) + machine ID (5 bits) + sequence number (12 bits). This means tweet IDs are sortable by time without a database lookup — a massive performance gain for timeline construction. It also means Twitter can generate 4,096 unique IDs per millisecond per machine without coordination.
📬 Kafka as the central nervous system The tweet creation event goes to Kafka, and every downstream system — fanout, search, notifications, trends — consumes from Kafka independently. This means:
Posting a tweet is fast (just write to DB + publish event)
Downstream systems can fail and replay from Kafka without data loss
New downstream consumers can be added without touching the tweet ingestion path
🖼️ Media pipeline Images and videos are uploaded to blob storage asynchronously. The tweet is confirmed to the user before media processing completes. A background pipeline handles compression, resizing (multiple dimensions), and CDN distribution. Video gets transcoded to multiple bitrates — the same adaptive streaming approach as YouTube.
🚦 Rate limiting at ingestion Twitter enforces rate limits at the API gateway: 300 tweets per 3 hours per account for standard users. This isn't just abuse prevention — it's also a system protection mechanism. Without rate limits, a single viral bot account could overwhelm the fanout pipeline.
💡 My Take: The Snowflake ID system is one of Twitter's most elegant architectural decisions — and most PMs have no idea it exists. Time-ordered IDs mean you can paginate a timeline with just an ID cursor instead of a timestamp query. That's the difference between a timeline that loads in 2ms and one that times out under load. Architecture serving product experience.
2️⃣ Timeline Fanout Service
This is the hardest problem in Twitter's architecture. And the answer is a trade-off that every senior PM should be able to articulate.

The Core Problem
When you load your Twitter home timeline, you expect to see tweets from everyone you follow, roughly in order, in under 2 seconds.
The naive approach: query all tweets from all accounts you follow, sort by time, return the top N.
Why this fails at scale:
Average user follows 200 accounts
300,000 timeline requests/second
Each request triggers 200 DB queries
= 60 million queries/second
Every database dies
Fanout on Write vs. Fanout on Read
Approach | How it works | Pros | Cons |
|---|---|---|---|
Fanout on write (push) | When a tweet is posted, immediately write it to all followers' timeline caches | Timeline reads are O(1) — just read the cache | Writing one tweet from a 10M-follower account = 10M cache writes |
Fanout on read (pull) | When a user opens the app, query all followed accounts' recent tweets | No write amplification | Read is O(followers) — too slow at scale |
Hybrid | Fanout on write for normal users, fanout on read for celebrities | Best of both worlds | Complex to implement and maintain |
Twitter uses the hybrid approach — and understanding why is the mark of a senior PM.
The Fanout Pipeline (for normal users)
Tweet event arrives from Kafka
↓
Fanout Service reads follower list from Social Graph DB
↓
For each follower (up to ~10,000):
└── Write tweet ID to follower's Timeline Cache (Redis)
Timeline Cache: sorted set per user
Key: user_id
Value: sorted set of tweet IDs (by time)
Max size: 800 tweet IDs per user
TTL: active users refreshed; inactive users prunedThe Celebrity Exception
Accounts with more than ~10,000-50,000 followers are flagged as "celebrities" in the system. Their tweets are not fanned out at write time.
Instead, when a user opens their timeline:
Read their precomputed timeline cache (tweets from non-celebrity follows)
Separately query celebrity accounts they follow (fanout on read — but only for a small list of celebrities)
Merge and rank the results
Why this matters: Lady Gaga has 85M followers. Fanning out one of her tweets would require 85 million Redis writes in seconds — overwhelming the fanout infrastructure for every other user on the platform simultaneously.
💡 My Take: The celebrity exception isn't a hack — it's a deliberate product decision that acknowledges not all users are architecturally equal. Most PMs treat the social graph as a uniform structure. The engineers at Twitter had to say: "Some nodes in this graph have properties that break our entire write path. We need a different product behavior for them." That's product-architecture thinking. You can't design this system without understanding both.
Timeline Cache Structure
Redis Sorted Set per user:
Key: "timeline:{user_id}"
Score: tweet timestamp (Unix epoch)
Value: tweet ID (Snowflake)
Read path:
ZREVRANGE "timeline:{user_id}" 0 99
→ Returns 100 most recent tweet IDs
→ Batch fetch tweet content from Tweet Cache
→ Merge with celebrity tweets
→ Return ranked timeline to client
3️⃣ ML Feed Ranking Engine 🤖
The home timeline used to be reverse-chronological. Then Twitter introduced algorithmic ranking — and the product changed fundamentally.
Hot take: The decision to algorithmically rank the Twitter feed is the equivalent of YouTube's 2012 watch-time decision. It's not an ML decision. It's a product decision about what Twitter optimizes for — and it has real consequences for what content and whose voice gets amplified.

Features Used in Ranking
Feature Category | Examples |
|---|---|
Engagement signals | Likes, retweets, replies, quote tweets per impression |
Author affinity | How often you interact with this account |
Content relevance | Semantic similarity between tweet and your interests |
Network signals | Whether accounts you follow engaged with this tweet |
Recency | Time since tweet was posted |
Media presence | Tweets with images/video generally ranked higher |
Real-time trending | Boost for content surfing a trending topic |
Follower relationship | Direct follow vs. second-degree follow |
Real-Time vs. Batch Signals
Signal | Pipeline | Update frequency |
|---|---|---|
User interest embeddings | Batch offline | Daily |
Author engagement history | Batch offline | Daily |
Real-time engagement velocity | Near real-time Kafka | Minutes |
Trending topic membership | Real-time | Seconds |
Session context (what you just liked) | Real-time | Immediate |
A/B test assignment | Real-time | Milliseconds |
This is the central product tension at every social platform. The loss function is the product strategy.
💡 My Take: Twitter has publicly struggled with this more than any other platform because its architecture made the objective function more visible. When Elon Musk open-sourced parts of the ranking algorithm in 2023, it revealed how engagement signals were weighted — including a significant boost for accounts the algorithm classified as "power users." The ranking model is not neutral. It encodes product values. Every PM building an AI-ranked feed owns those values whether they acknowledge it or not.
4️⃣ Real-Time Search Architecture
Twitter's search is architecturally distinct from Google-style search in one critical way: freshness beats relevance.
A tweet posted 20 seconds ago must appear in search results. No traditional search index can do this — they batch-index on crawl cycles measured in hours or days.

The Earlybird Index
Twitter's custom search index, called Earlybird, is designed around one constraint: index a tweet in under 10 seconds.
Traditional inverted index assumptions that Earlybird breaks:
Documents are immutable (tweets can be deleted, liked, retweeted — engagement signals change constantly)
Relevance is static (a tweet's ranking changes as engagement accumulates)
Index size is bounded (500M tweets/day means the index grows continuously)
Earlybird solves this by:
Keeping only recent tweets in the real-time index (last 7–30 days)
Updating engagement signals in-memory without full re-indexing
Routing older queries to a separate archive index
Sharding by time bucket, not by content — so all new tweets land in the same shard for fast sequential writes
Trending Topics Detection
Tweet stream → Kafka
↓
Sliding window counter (1-hour, 6-hour windows)
Count term frequency by region
↓
Anomaly detection
Compare current frequency vs. historical baseline
Flag: "this term is appearing 10× faster than normal"
↓
Trend candidate
↓
Human/ML filter
Remove: spam, manipulation, policy violations
↓
Published to Trending Topics (by country, by interest)💡 My Take: Trending topics is a product feature that appears simple and is architecturally profound. The hard part isn't detecting frequency spikes — it's defining what "trending" means. Is a term trending if it's always popular? (No — baseline matters.) Is it trending if it's being artificially amplified by bots? (No — manipulation detection required.) Is it trending if it's a slur that happens to spike? (No — policy filter required.) Every one of those is a product decision embedded in a data pipeline. The PM who owns trends owns all of them.
5️⃣ Notification System
Notifications are the engagement loop that brings users back. They are also one of the highest-complexity distributed systems Twitter runs.
Notification Pipeline

The Batching Decision
Why does Twitter sometimes show "X and 47 others liked your tweet" instead of 48 separate notifications?
The naive approach: Send a push notification for every like.
The problem: A viral tweet can get 10,000 likes in an hour. Sending 10,000 push notifications to one person's phone would:
Drain their battery
Saturate APNs/FCM rate limits for Twitter's sender ID
Result in the user disabling notifications (permanent engagement loss)
The solution: Batch aggregation within a time window. Group all likes on the same tweet within a 5-minute window into a single notification. This is a product decision — what's the right window? Too short and you spam. Too long and the notification feels stale.
Push Delivery at Scale
Channel | Protocol | Latency target | Volume |
|---|---|---|---|
iOS push | APNs | <5s | Billions/day |
Android push | FCM | <5s | Billions/day |
Web push | Web Push API | <10s | Millions/day |
In-app (WebSocket) | Persistent connection | <1s | Real-time |
SMTP | Minutes | Inactive users only |
💡 My Take: Notification design is one of the most consequential PM decisions in consumer apps — and one of the least rigorously thought through. Every notification is a bet: "This will bring the user back, not push them to disable notifications entirely." Twitter's notification system has frequency capping, priority scoring, and deduplication because someone had to quantify that trade-off. The PM who owns notifications owns the engagement loop — and the churn risk. These are not UX decisions. They are product architecture decisions.
💾 Storage Layer: The Database Decisions
Different data has fundamentally different access patterns. One database for all of it would fail at every workload.
Data Type | Storage System | Justification |
|---|---|---|
Tweet content | Manhattan (Twitter's distributed KV store) / MySQL sharded | High write throughput; time-ordered access |
Social graph (follows) | FlockDB / distributed graph store | 200B+ edges; follow/unfollow is write-heavy |
Timeline cache | Redis sorted sets | O(1) reads; in-memory for speed |
User profiles | Manhattan / MySQL | Strong consistency for auth; moderate read volume |
Media (images, video) | Blob store (S3-compatible) + CDN | Immutable files; high read throughput globally |
Search index | Earlybird (custom inverted index) | Real-time ingest + recency-biased ranking |
DMs | Encrypted Cassandra | Append-only message history; no global read requirement |
Notifications | Cassandra | Write-heavy; time-ordered; eventual consistency fine |
Like/retweet counts | Redis → async flush to Manhattan | Counter aggregation; same pattern as YouTube view counts |
ML feature store | Manhattan + HDFS | Fast serving reads; batch training data |
Trending counters | Redis with sliding windows | In-memory frequency counting; short TTL |
Why Like Counts Are Eventually Consistent
Same principle as YouTube view counts — and just as important to articulate.
The naive approach: Every like increments the count in MySQL with a row lock.
The problem: A viral tweet gets 100,000 likes in 10 minutes = ~167 like increments per second on a single row. Row-level locking creates a serial bottleneck. Every like waits for every previous like to commit.
The right approach:
Like event
↓
Kafka stream (append-only, no locking)
↓
Batch aggregation (every 30–60 seconds)
↓
Single atomic increment to Redis counter
↓
Periodic async flush to persistent store
The like count shown might be 5–10 seconds behind reality. No user notices. The system scales horizontally.
💡 My Take: Every time a PM asks for "real-time accurate counters" on a viral piece of content, they are asking for a global distributed lock. The right response is: "What decision does accuracy serve? If it's display, eventual consistency is imperceptible to users. If it's billing, you need exact counts — use a different pipeline." Always tie the consistency requirement to the business decision it serves.
The answer that wins interviews:
"The hardest design decision in Twitter is not where to store tweets — it's how to deliver them. Fanout on write is O(1) to read but O(followers) to write. Fanout on read is the opposite. The hybrid approach — write fanout for normal users, read fanout for celebrities — solves both, but requires defining 'celebrity' as a system concept, maintaining a flag in the social graph, and merging two different data sources at read time. The complexity is worth it because it's the only approach that meets both our write throughput and read latency SLAs simultaneously."
Twitter/X System Design Interview Questions
Q: What's the biggest architectural difference between Twitter and Facebook's feed?
Facebook has a denser social graph (average 300+ friends vs. Twitter's ~200 follows) but fewer power-law extremes — Facebook's most connected users have hundreds of thousands of connections, not hundreds of millions. Twitter's celebrity problem is more extreme, which is why Twitter's hybrid fanout is more complex. Facebook uses a similar hybrid but with different thresholds.
Q: How does Twitter handle tweet deletion?
Tweet deletion is a tombstone operation: a "deleted" flag is set in the tweet store, the tweet ID is published to Kafka as a delete event, and downstream caches (timeline cache, search index) consume the event and remove the tweet. Timeline caches are eventually consistent — a deleted tweet may appear for seconds to minutes in already-fetched timelines. Search indexes remove on the delete event within the freshness SLA.
Q: How does real-time trending work at country level?
Trend detection runs per region by routing tweets through geo-tagged Kafka partitions. Each regional pipeline maintains its own sliding window frequency counters. A term can be trending in Brazil without trending globally. The anomaly detection baseline is also regional — "World Cup" may trend in Brazil at a lower frequency spike than in countries where soccer is less dominant.
Q: What happens architecturally when a major event causes a tweet spike?
The system has auto-scaling on the Kafka consumer groups (fanout workers) and pre-warming logic for predicted high-traffic events (Super Bowl, elections). The Snowflake ID system doesn't require coordination so it scales horizontally with no bottleneck. The timeline cache has no single point of failure — Redis clusters are sharded by user ID. The weak point historically has been the fanout service — when everyone tweets simultaneously about the same event, follower graph reads spike. Twitter mitigates this with a read-through cache on the social graph.
Q: How does the "For You" tab differ architecturally from "Following"?
"Following" is the classic fanout cache — chronological tweets from accounts you explicitly follow. "For You" is the ML-ranked feed that includes content from accounts you don't follow, based on interest modeling, network signals, and engagement velocity. They share the same ranking infrastructure but have different candidate pools: Following uses the fanout cache; For You uses a broader candidate retrieval step similar to YouTube's two-tower model.
Q: How should an AI PM talk about Twitter's ranking system in an interview?
Start with the objective function: "Twitter's ranking model optimizes for engagement, but the definition of 'engagement' encodes product values — whether you weight replies, retweets, or time spent changes what content gets amplified." Then describe the two-stage pipeline (candidate scoring → deep ranking → policy filters). Name the real-time vs. batch signal split. Close with what you'd measure beyond engagement: conversation health, content diversity, creator distribution. Interviewers want to know you understand that the model is a product decision, not just a technical component.
💡 The Honest Take
Twitter's architecture is one of the most instructive in tech — not because it's the most elegant, but because the trade-offs are so visible.
Every architectural decision is a product decision in disguise:
Fanout on write vs. read = how do you balance creator and consumer experience?
The celebrity exception = are all users equal in your system?
Eventually consistent like counts = what level of precision does the product actually need?
The ranking objective function = what does your platform want to optimize for in the world?
Understanding the architecture without understanding the trade-offs is just memorizing boxes and arrows.
Your edge as a senior AI PM isn't that you can draw the fanout diagram. It's that you can explain why it's structured that way, what the alternative was, and what product goal it serves.
That's the difference between a PM who can talk about system design and one who thinks in system design. 🚀
📬 Found this useful? AI PM Insider publishes every week for AI PMs and leaders building at the frontier. This is Part 2 of the Core System Design & AI System Design series. Join subscribers at aiskillshub.io
Written by Ashima Malik · LinkedIn
