This website uses cookies

Read our Privacy policy and Terms of use for more information.

500 million tweets per day. 350 million monthly active users. A timeline that loads in under 2 seconds for everyone, everywhere, all at once.

Most system design guides will tell you "use a cache" and move on. That's not good enough.

As a senior AI PM, you need to understand why Twitter's architecture looks the way it does, what product decisions live inside each engineering choice, and what trade-offs your team is navigating every sprint.

Twitter/X is one of the most instructive system designs in tech — it combines real-time ingestion, graph traversal, ML feed ranking, and global distribution at a scale that breaks every naive assumption.

This is Part 2 of my Core System Design & AI System Design series — built specifically for AI product managers and data professionals who want to go beyond surface-level architecture.

If you haven't read Part 1 (YouTube System Design), start there first.

This is the breakdown that actually covers it all. 👇

📌 TL;DR

  • Twitter handles 500M+ tweets/day, 350M MAU, and timelines that must load in <2s globally

  • The hardest problem in Twitter's architecture isn't storing tweets — it's delivering them to the right people, in the right order, instantly

  • Five critical subsystems: Tweet ingestion pipeline, Timeline fanout service, ML feed ranking, Real-time search, and Notification system

  • The "fanout on write vs. fanout on read" trade-off is the most important architectural decision in social feed design — and it's a product decision, not a technical one

  • Celebrities with 100M followers get special architectural treatment — their tweets are not fanned out at write time. That's a deliberate product exception baked into the system

Ashima Malik

Twitter/X System Design complete Architecture Diagram

📊 The Numbers That Define Every Decision

Scale isn't a detail — it's the constraint that shapes every architectural choice. Before you design Twitter, internalize these:

Metric

Scale

Daily active users

250+ million

Monthly active users

350+ million

Tweets per day

500+ million

Tweets per second (peak)

150,000+

Timeline requests per second

300,000+

Follows (graph edges)

200+ billion

Average followers per user

~200

Celebrity accounts (10M+ followers)

Thousands

Search queries per day

2.1+ billion

Notifications delivered per day

Billions

These numbers immediately tell you what the system must do:

  • A single database cannot hold 200 billion follow relationships — the graph must be sharded

  • A naive "query all tweets from people you follow" approach would time out at 300,000 timeline requests/second

  • Storing tweets is the easy part — delivering them is the hard part

  • Celebrity accounts break every normal assumption — 100M followers means 100M writes on every tweet

  • Search must be real-time: a tweet posted 30 seconds ago must appear in search results

Functional Requirements: What Twitter/X Must Do

As a senior AI PM, you don't just list features — you prioritize them.

Feature

Description

Priority

Post a tweet

Text (280 chars), images, video, polls, threads

P0

Home timeline

Personalized feed of followed accounts + ML-ranked content

P0

Follow / unfollow

Build and maintain social graph

P0

Real-time search

Search tweets, users, hashtags in near real-time

P0

Notifications

Likes, replies, retweets, mentions, follows

P0

Direct messages

Encrypted 1:1 and group messaging

P1

Trending topics

Real-time hashtag and topic detection globally

P1

Lists

Curated feeds from specific accounts

P1

Spaces (audio)

Live audio rooms with speaker/listener model

P1

Ads / Promoted tweets

Paid content inserted into timelines

P1

Bookmarks

Save tweets privately

P2

Analytics (creator)

Impressions, engagements, profile visits

P2

💡 My Take: In a system design interview, the moment you write "post a tweet" as your only P0, you signal junior thinking. The hardest P0 is the home timeline — it's the feature that determines whether the product feels alive. Everything else is easier. A senior PM defines scope by what's architecturally hardest to get right, not just what users interact with most.

⚙️ Non-Functional Requirements: Where Architecture Gets Designed

NFRs are where every architectural box-and-arrow decision flows from.

Requirement

Target

Justification

Timeline load time

<2s p99 globally

Drop-off spikes above 3s — engagement falls off a cliff

Tweet posting latency

<200ms confirmation

Creator experience — don't make people wait to publish

Search freshness

<30s for new tweets

Real-time value proposition — news breaks on Twitter

Availability

99.99%

Revenue and reputation — outages trend on their own platform

Notification delivery

<5s for push

Engagement loop — delayed likes kill the dopamine cycle

Storage durability

11 nines

Tweets are permanent records

Consistency (like counts)

Eventual

Strong consistency at 500M tweets/day creates global locks

Fan-out latency

<5s for non-celebrity

Timeline must feel real-time

DM delivery

<1s

Messaging expectation is instant

Throughput

150,000+ tweets/sec

Peak event volumes (elections, sports, breaking news)

💡 My Take: The most important NFR on this list isn't availability — it's search freshness. Twitter's entire value proposition during breaking news events is that it's faster than TV. If a tweet takes 5 minutes to appear in search, the product fails its core use case. This NFR is where the real-time search architecture comes from.

🗂️ High-Level Architecture: The Five Major Subsystems

Twitter's architecture breaks into 5 independently scalable subsystems. Each has its own scaling profile, failure mode, and data access pattern.

Client (Web / iOS / Android)
         ↓
API Gateway / Load Balancer
         ↓
┌──────────────────────────────────────────────────────┐
│  1. Tweet Ingestion Pipeline                          │
│  2. Timeline Fanout Service                          │
│  3. ML Feed Ranking Engine                           │
│  4. Real-Time Search                                 │
│  5. Notification System                              │
└──────────────────────────────────────────────────────┘
         ↓
Storage Layer (Graph DB + Cache + Object Storage + Search Index)

Coupling these subsystems would mean one bottleneck takes down the entire product. They must be independently deployable, independently scalable, and independently failure-tolerant.

1️⃣ Tweet Ingestion Pipeline

This is the entry point — and the most underestimated subsystem. Every downstream service depends on this pipeline being fast, durable, and correct.

Key Design Decisions

🆔 Snowflake ID generation Every tweet gets a 64-bit globally unique ID that encodes: timestamp (41 bits) + datacenter ID (5 bits) + machine ID (5 bits) + sequence number (12 bits). This means tweet IDs are sortable by time without a database lookup — a massive performance gain for timeline construction. It also means Twitter can generate 4,096 unique IDs per millisecond per machine without coordination.

📬 Kafka as the central nervous system The tweet creation event goes to Kafka, and every downstream system — fanout, search, notifications, trends — consumes from Kafka independently. This means:

  • Posting a tweet is fast (just write to DB + publish event)

  • Downstream systems can fail and replay from Kafka without data loss

  • New downstream consumers can be added without touching the tweet ingestion path

🖼️ Media pipeline Images and videos are uploaded to blob storage asynchronously. The tweet is confirmed to the user before media processing completes. A background pipeline handles compression, resizing (multiple dimensions), and CDN distribution. Video gets transcoded to multiple bitrates — the same adaptive streaming approach as YouTube.

🚦 Rate limiting at ingestion Twitter enforces rate limits at the API gateway: 300 tweets per 3 hours per account for standard users. This isn't just abuse prevention — it's also a system protection mechanism. Without rate limits, a single viral bot account could overwhelm the fanout pipeline.

💡 My Take: The Snowflake ID system is one of Twitter's most elegant architectural decisions — and most PMs have no idea it exists. Time-ordered IDs mean you can paginate a timeline with just an ID cursor instead of a timestamp query. That's the difference between a timeline that loads in 2ms and one that times out under load. Architecture serving product experience.

Ashima Malik

2️⃣ Timeline Fanout Service

This is the hardest problem in Twitter's architecture. And the answer is a trade-off that every senior PM should be able to articulate.

The Core Problem

When you load your Twitter home timeline, you expect to see tweets from everyone you follow, roughly in order, in under 2 seconds.

The naive approach: query all tweets from all accounts you follow, sort by time, return the top N.

Why this fails at scale:

  • Average user follows 200 accounts

  • 300,000 timeline requests/second

  • Each request triggers 200 DB queries

  • = 60 million queries/second

  • Every database dies

Fanout on Write vs. Fanout on Read

Approach

How it works

Pros

Cons

Fanout on write (push)

When a tweet is posted, immediately write it to all followers' timeline caches

Timeline reads are O(1) — just read the cache

Writing one tweet from a 10M-follower account = 10M cache writes

Fanout on read (pull)

When a user opens the app, query all followed accounts' recent tweets

No write amplification

Read is O(followers) — too slow at scale

Hybrid

Fanout on write for normal users, fanout on read for celebrities

Best of both worlds

Complex to implement and maintain

Twitter uses the hybrid approach — and understanding why is the mark of a senior PM.

The Fanout Pipeline (for normal users)

Tweet event arrives from Kafka
  ↓
Fanout Service reads follower list from Social Graph DB
  ↓
For each follower (up to ~10,000):
  └── Write tweet ID to follower's Timeline Cache (Redis)
  
Timeline Cache: sorted set per user
  Key: user_id
  Value: sorted set of tweet IDs (by time)
  Max size: 800 tweet IDs per user
  TTL: active users refreshed; inactive users pruned

The Celebrity Exception

Accounts with more than ~10,000-50,000 followers are flagged as "celebrities" in the system. Their tweets are not fanned out at write time.

Instead, when a user opens their timeline:

  1. Read their precomputed timeline cache (tweets from non-celebrity follows)

  2. Separately query celebrity accounts they follow (fanout on read — but only for a small list of celebrities)

  3. Merge and rank the results

Why this matters: Lady Gaga has 85M followers. Fanning out one of her tweets would require 85 million Redis writes in seconds — overwhelming the fanout infrastructure for every other user on the platform simultaneously.

💡 My Take: The celebrity exception isn't a hack — it's a deliberate product decision that acknowledges not all users are architecturally equal. Most PMs treat the social graph as a uniform structure. The engineers at Twitter had to say: "Some nodes in this graph have properties that break our entire write path. We need a different product behavior for them." That's product-architecture thinking. You can't design this system without understanding both.

Timeline Cache Structure

Redis Sorted Set per user:
Key: "timeline:{user_id}"
Score: tweet timestamp (Unix epoch)
Value: tweet ID (Snowflake)

Read path:
ZREVRANGE "timeline:{user_id}" 0 99
→ Returns 100 most recent tweet IDs
→ Batch fetch tweet content from Tweet Cache
→ Merge with celebrity tweets
→ Return ranked timeline to client

3️⃣ ML Feed Ranking Engine 🤖

The home timeline used to be reverse-chronological. Then Twitter introduced algorithmic ranking — and the product changed fundamentally.

Hot take: The decision to algorithmically rank the Twitter feed is the equivalent of YouTube's 2012 watch-time decision. It's not an ML decision. It's a product decision about what Twitter optimizes for — and it has real consequences for what content and whose voice gets amplified.

Features Used in Ranking

Feature Category

Examples

Engagement signals

Likes, retweets, replies, quote tweets per impression

Author affinity

How often you interact with this account

Content relevance

Semantic similarity between tweet and your interests

Network signals

Whether accounts you follow engaged with this tweet

Recency

Time since tweet was posted

Media presence

Tweets with images/video generally ranked higher

Real-time trending

Boost for content surfing a trending topic

Follower relationship

Direct follow vs. second-degree follow

Real-Time vs. Batch Signals

Signal

Pipeline

Update frequency

User interest embeddings

Batch offline

Daily

Author engagement history

Batch offline

Daily

Real-time engagement velocity

Near real-time Kafka

Minutes

Trending topic membership

Real-time

Seconds

Session context (what you just liked)

Real-time

Immediate

A/B test assignment

Real-time

Milliseconds

This is the central product tension at every social platform. The loss function is the product strategy.

💡 My Take: Twitter has publicly struggled with this more than any other platform because its architecture made the objective function more visible. When Elon Musk open-sourced parts of the ranking algorithm in 2023, it revealed how engagement signals were weighted — including a significant boost for accounts the algorithm classified as "power users." The ranking model is not neutral. It encodes product values. Every PM building an AI-ranked feed owns those values whether they acknowledge it or not.

4️⃣ Real-Time Search Architecture

Twitter's search is architecturally distinct from Google-style search in one critical way: freshness beats relevance.

A tweet posted 20 seconds ago must appear in search results. No traditional search index can do this — they batch-index on crawl cycles measured in hours or days.

The Earlybird Index

Twitter's custom search index, called Earlybird, is designed around one constraint: index a tweet in under 10 seconds.

Traditional inverted index assumptions that Earlybird breaks:

  • Documents are immutable (tweets can be deleted, liked, retweeted — engagement signals change constantly)

  • Relevance is static (a tweet's ranking changes as engagement accumulates)

  • Index size is bounded (500M tweets/day means the index grows continuously)

Earlybird solves this by:

  • Keeping only recent tweets in the real-time index (last 7–30 days)

  • Updating engagement signals in-memory without full re-indexing

  • Routing older queries to a separate archive index

  • Sharding by time bucket, not by content — so all new tweets land in the same shard for fast sequential writes

Tweet stream → Kafka
  ↓
Sliding window counter (1-hour, 6-hour windows)
  Count term frequency by region
  ↓
Anomaly detection
  Compare current frequency vs. historical baseline
  Flag: "this term is appearing 10× faster than normal"
  ↓
Trend candidate
  ↓
Human/ML filter
  Remove: spam, manipulation, policy violations
  ↓
Published to Trending Topics (by country, by interest)

💡 My Take: Trending topics is a product feature that appears simple and is architecturally profound. The hard part isn't detecting frequency spikes — it's defining what "trending" means. Is a term trending if it's always popular? (No — baseline matters.) Is it trending if it's being artificially amplified by bots? (No — manipulation detection required.) Is it trending if it's a slur that happens to spike? (No — policy filter required.) Every one of those is a product decision embedded in a data pipeline. The PM who owns trends owns all of them.

5️⃣ Notification System

Notifications are the engagement loop that brings users back. They are also one of the highest-complexity distributed systems Twitter runs.

Notification Pipeline

The Batching Decision

Why does Twitter sometimes show "X and 47 others liked your tweet" instead of 48 separate notifications?

The naive approach: Send a push notification for every like.

The problem: A viral tweet can get 10,000 likes in an hour. Sending 10,000 push notifications to one person's phone would:

  • Drain their battery

  • Saturate APNs/FCM rate limits for Twitter's sender ID

  • Result in the user disabling notifications (permanent engagement loss)

The solution: Batch aggregation within a time window. Group all likes on the same tweet within a 5-minute window into a single notification. This is a product decision — what's the right window? Too short and you spam. Too long and the notification feels stale.

Push Delivery at Scale

Channel

Protocol

Latency target

Volume

iOS push

APNs

<5s

Billions/day

Android push

FCM

<5s

Billions/day

Web push

Web Push API

<10s

Millions/day

In-app (WebSocket)

Persistent connection

<1s

Real-time

Email

SMTP

Minutes

Inactive users only

💡 My Take: Notification design is one of the most consequential PM decisions in consumer apps — and one of the least rigorously thought through. Every notification is a bet: "This will bring the user back, not push them to disable notifications entirely." Twitter's notification system has frequency capping, priority scoring, and deduplication because someone had to quantify that trade-off. The PM who owns notifications owns the engagement loop — and the churn risk. These are not UX decisions. They are product architecture decisions.

💾 Storage Layer: The Database Decisions

Different data has fundamentally different access patterns. One database for all of it would fail at every workload.

Data Type

Storage System

Justification

Tweet content

Manhattan (Twitter's distributed KV store) / MySQL sharded

High write throughput; time-ordered access

Social graph (follows)

FlockDB / distributed graph store

200B+ edges; follow/unfollow is write-heavy

Timeline cache

Redis sorted sets

O(1) reads; in-memory for speed

User profiles

Manhattan / MySQL

Strong consistency for auth; moderate read volume

Media (images, video)

Blob store (S3-compatible) + CDN

Immutable files; high read throughput globally

Search index

Earlybird (custom inverted index)

Real-time ingest + recency-biased ranking

DMs

Encrypted Cassandra

Append-only message history; no global read requirement

Notifications

Cassandra

Write-heavy; time-ordered; eventual consistency fine

Like/retweet counts

Redis → async flush to Manhattan

Counter aggregation; same pattern as YouTube view counts

ML feature store

Manhattan + HDFS

Fast serving reads; batch training data

Trending counters

Redis with sliding windows

In-memory frequency counting; short TTL

Why Like Counts Are Eventually Consistent

Same principle as YouTube view counts — and just as important to articulate.

The naive approach: Every like increments the count in MySQL with a row lock.

The problem: A viral tweet gets 100,000 likes in 10 minutes = ~167 like increments per second on a single row. Row-level locking creates a serial bottleneck. Every like waits for every previous like to commit.

The right approach:

Like event

Kafka stream (append-only, no locking)

Batch aggregation (every 30–60 seconds)

Single atomic increment to Redis counter

Periodic async flush to persistent store

The like count shown might be 5–10 seconds behind reality. No user notices. The system scales horizontally.

💡 My Take: Every time a PM asks for "real-time accurate counters" on a viral piece of content, they are asking for a global distributed lock. The right response is: "What decision does accuracy serve? If it's display, eventual consistency is imperceptible to users. If it's billing, you need exact counts — use a different pipeline." Always tie the consistency requirement to the business decision it serves.

The answer that wins interviews:

"The hardest design decision in Twitter is not where to store tweets — it's how to deliver them. Fanout on write is O(1) to read but O(followers) to write. Fanout on read is the opposite. The hybrid approach — write fanout for normal users, read fanout for celebrities — solves both, but requires defining 'celebrity' as a system concept, maintaining a flag in the social graph, and merging two different data sources at read time. The complexity is worth it because it's the only approach that meets both our write throughput and read latency SLAs simultaneously."

Ashima Malik

Twitter/X System Design Interview Questions

Q: What's the biggest architectural difference between Twitter and Facebook's feed?

Facebook has a denser social graph (average 300+ friends vs. Twitter's ~200 follows) but fewer power-law extremes — Facebook's most connected users have hundreds of thousands of connections, not hundreds of millions. Twitter's celebrity problem is more extreme, which is why Twitter's hybrid fanout is more complex. Facebook uses a similar hybrid but with different thresholds.

Q: How does Twitter handle tweet deletion?

Tweet deletion is a tombstone operation: a "deleted" flag is set in the tweet store, the tweet ID is published to Kafka as a delete event, and downstream caches (timeline cache, search index) consume the event and remove the tweet. Timeline caches are eventually consistent — a deleted tweet may appear for seconds to minutes in already-fetched timelines. Search indexes remove on the delete event within the freshness SLA.

Q: How does real-time trending work at country level?

Trend detection runs per region by routing tweets through geo-tagged Kafka partitions. Each regional pipeline maintains its own sliding window frequency counters. A term can be trending in Brazil without trending globally. The anomaly detection baseline is also regional — "World Cup" may trend in Brazil at a lower frequency spike than in countries where soccer is less dominant.

Q: What happens architecturally when a major event causes a tweet spike?

The system has auto-scaling on the Kafka consumer groups (fanout workers) and pre-warming logic for predicted high-traffic events (Super Bowl, elections). The Snowflake ID system doesn't require coordination so it scales horizontally with no bottleneck. The timeline cache has no single point of failure — Redis clusters are sharded by user ID. The weak point historically has been the fanout service — when everyone tweets simultaneously about the same event, follower graph reads spike. Twitter mitigates this with a read-through cache on the social graph.

Q: How does the "For You" tab differ architecturally from "Following"?

"Following" is the classic fanout cache — chronological tweets from accounts you explicitly follow. "For You" is the ML-ranked feed that includes content from accounts you don't follow, based on interest modeling, network signals, and engagement velocity. They share the same ranking infrastructure but have different candidate pools: Following uses the fanout cache; For You uses a broader candidate retrieval step similar to YouTube's two-tower model.

Q: How should an AI PM talk about Twitter's ranking system in an interview?

Start with the objective function: "Twitter's ranking model optimizes for engagement, but the definition of 'engagement' encodes product values — whether you weight replies, retweets, or time spent changes what content gets amplified." Then describe the two-stage pipeline (candidate scoring → deep ranking → policy filters). Name the real-time vs. batch signal split. Close with what you'd measure beyond engagement: conversation health, content diversity, creator distribution. Interviewers want to know you understand that the model is a product decision, not just a technical component.

💡 The Honest Take

Twitter's architecture is one of the most instructive in tech — not because it's the most elegant, but because the trade-offs are so visible.

Every architectural decision is a product decision in disguise:

  • Fanout on write vs. read = how do you balance creator and consumer experience?

  • The celebrity exception = are all users equal in your system?

  • Eventually consistent like counts = what level of precision does the product actually need?

  • The ranking objective function = what does your platform want to optimize for in the world?

Understanding the architecture without understanding the trade-offs is just memorizing boxes and arrows.

Your edge as a senior AI PM isn't that you can draw the fanout diagram. It's that you can explain why it's structured that way, what the alternative was, and what product goal it serves.

That's the difference between a PM who can talk about system design and one who thinks in system design. 🚀

📬 Found this useful? AI PM Insider publishes every week for AI PMs and leaders building at the frontier. This is Part 2 of the Core System Design & AI System Design series. Join subscribers at aiskillshub.io

Written by Ashima Malik · LinkedIn

Reply

Avatar

or to participate

Keep Reading