System Design
Cheat Sheet
Tips
- Lead the discussion
- Take notes
- Create TODO list
- Back or envelope calculation
- API
- Error handling
- Discuss trade offs!
- Define success metrics
- Cleanup: Eviction, workers to clean things up
Powers of 10
10^x | Bytes |
---|---|
10^0 | 1 Byte |
10^2 | 128 Bytes |
- | 256 Bytes |
10^3 | KB (Kilo) |
10^6 | MB (Mega) |
10^9 | GB (Giga) |
10^12 | TB (Tera) |
10^15 | PB (Peta) |
10^18 | EB (Exa) |
10^21 | ZB (Zetta) |
10^24 | YB (Yotta) |
1 byte = 8 bits
Timing 35m
Remember! Discuss Tradeoffs
5m Discovery / Estimations
- Functional - How the user interacts with the system
- Non Functional - Consistency, Availability, Latency, Fault Tolerant, Write vs Read
- Estimations QPS - Back of envelope, read, write, data storage
- Determine use caes
- API calls per day DAU
- Divide by 100k (10^5)
- Storage
- Incoming/Outgoing Data
- GB / Day / 86,400 = GB per sec
- Worst Case - What is the worst case?
2-5m on API
5m on High Level (diagram) - breadth first!
- Stay high level, avoid going into details here
- Microservices
2-5m on Data Model
- Tables logical structure and relations
- Total DB Storage
15m dive deeper into thing
- Walk through data flow
- Handle Edge conditions
- How does this get stored
- What can go wrong
- Prove your design is correct
2m on Scaling
- How many servers to handle the QPS
Data Stores
Type | Description | Pros | Cons |
---|---|---|---|
SQL | Collection of data items organized in tables | - ACID - Transactions - Relational - Strict Schema - Consitent - Durable |
- Scaling Writes |
NoSQL | NoSQL is a collection of data items which favors availability over consistency | - Available - Flexible Schema - Non Relational - Large Storage Capacity TB or PB - High IOPS - Horizontally Scalable |
- Eventually Consistent |
NoSQL Key/Value | High performance for simple data models due to being stored in memory | - O(1) Read/Write - Very Fast |
- Simple Data Model - Limited Storage Capacity |
NoSQL Document | A document store is basically a fancy key-value store. The values are blobs of semi-structured data, such as JSON or XML, and we treat the data store like it’s just a big array of these blobs. The query language of the document store will then allow you to filter or sort based on the content inside of those document blobs. | - Documents - Scalable - Available |
- Not good for data aggregation - Not ACID |
NoSQL Column | A wide column store is somewhere in between a document store and a relational DB. It still uses tables, rows, and columns like a relational DB, but the names and formats of the columns can be different for various rows in the same table. This strategy combines the strict table structure of a relational database with the flexible content of a document store. | - Retrieve Aggregate Data Efficiently - Scalable - Available - Flexible Schema |
- Not ACID |
NoSQL Graph | Each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships. | - Fast traversal of relationships - Relations |
|
Search Engine | Special data store for searching text based documents. Could use a Trie or Inverted Index | - Very good text seraches - Trie (Type Ahead) - Inverted Index (Full Text) |
|
Message Queues | Good for when you need to ship some data between services in a way that is fast, reliable, and distributed. | - Reliable - Fast - Support Distributing Loads |
- Asynchronous |
QuadTrees | Tree structure where every non-leaf node has 4 childres. | - Geolocation + Caching - Hit Detection - Find Nearest Neighbor - Searching Sparse Data - Keep a balance among the precision and validity of results |
Byte Estimations
Name | Bytes | Use |
---|---|---|
Ascii Char | 1 | Backend |
Unicode Char | 2 | User Facing |
Int | 8 | 16 more precision |
Ascii Word | 10 | Text |
Storage Assumptions
Type | Quality | Size | Example |
---|---|---|---|
Images | Low | 20 KB | Thumbnails |
Images | Medium | 200 KB | Most website images |
Images | High | 2 MB | Most smartphone images |
Video | Low | 256 MB | One Min 720p video |
Video | Medium | 403 MB | One Min 1080p video |
Video | High | 2 GB | One min 4K video |
eBook | - | 1 - 5 MB | |
song mp3 | - | 3 - 4 MB | |
comment | - | 100 bytes |
API Tips
- Use Nouns
- Follow CRUD
Methods
Method | Purpose | Success Response Codes | Notes |
---|---|---|---|
GET | Read | 200 | Transfer a current representation of the target resource |
POST | Create | 200 created + result, 201 created | Perform resource-specific processing on the request payload |
PUT | Update | 201 created, 200/204 updated | Idempotent, Replace all current representations of the target resource with the request payload |
DELETE | Delete | 202 likely to succeed, 204 completed, 200 result | Remove all current representations of the target resource |
- POST-PUT Creation Pattern
- Client sends empty POST which creates an empty resource with no side effects
- Client sends a PUT with content for the resource, since PUT is idempotent resending it multiple times will have no ill effects
Status Codes
Code | Name | Description |
---|---|---|
200 | OK | Request succeeded |
201 | Created | Request succeeded, new resource created |
202 | Accepted | Request received but not acted upon |
204 | No Content | Request succeeded, client can stay on current page |
301 | Moved Permanently | The URL has been moved permanently |
302 | Found | The URL has been moved temporarily |
304 | Not Modified | The response has not been modified so the client can use the same cached response |
400 | Bad Request | Invalid request syntax |
401 | Unauthorized | The client must authenticate itself to get the requested response |
403 | Forbidden | The client is authenticated, but it’s not allowed to access a resource |
404 | Not Found | Server cannot find the resource |
405 | Method Not Allowed | The request method is knwon by the server but not supported by the target resource |
409 | Conflict | The request conflicts with the current state of the server |
500 | Internal Server Error | The server encountered a situation it does not know how to handle |
502 | Bad Gateway | The server while working as a gateway got an invalid response from an upstream server |
503 | Service Unavailable | The server is not ready to handle the request, could be overloaded |
Hashing / IDs
Name | Bits / Bytes | Chars | Purpose |
---|---|---|---|
UUIDv4 | 128 / 16 | 36 | Unique Random ID |
MD5 | 128 / 16 | 32 | Simple Hash Vulnerable |
SHA-1 | 160 / 20 | 40 | Better Still Vulnerable |
SHA-2 (256) | 256 / 32 | 64 | NIST Recommended Secure |
SHA-3 (256) | 256 / 32 | 64 | May be slightly better than SHA-2 |
Encodings
Type | Details | Notes |
---|---|---|
Base64 | 10 Digits + 26 Lower Char + 26 Upper Char | Includes +/ and 65th char = for padding |
Base62 | 10 Digits + 26 Lower Char + 26 Upper Char | All alphanumeric, no padding |
Base58 | Base62 without 0 I O l |
|
Base32 | 6 Digits (2-7 ) + 26 Upper Char |
Includes = for padding. Takes 20% more space than Base64 |
z-base-32 | 8 Digits (1-9 exclude 2 ) + 24 Lower Char (excludes l and v ) |
No padding |
Base16 (Hex) | Uses 0–9 and A-F |
Request Types
Type | Details |
---|---|
Polling | Request made from client every x seconds |
Long Polling | Request made from client, if data is not available it hangs until the data is available then responds or times out. Client sends the next request |
Web Sockets | BiDirectional, persistent connection |
Tradeoffs
Databases
MySQL vs NoSQL
Requests
Long Polling Requests Websockets
Powers Table
Power 10 | Bytes | Name |
---|---|---|
10^0 | 2^0 = 1 | Byte |
10^2 | 2^7 = 128 | Bytes |
- | 2^8 = 256 | Bytes |
10^3 | 2^10 = 1k | 1 KB (Kilo) |
10^6 | 2^20 = 1mil | 1 MB (Mega) |
10^9 | 2^30 = 1bil | 1 GB (Giga) |
10^12 | 2^40 = 1tril | 1 TB (Tera) |
10^15 | 2^50 = 1quadr | 1 PB (Peta) |
10^18 | 2^60 = 1quint | 1 EB (Exa) |
10^21 | 2^70 = 1sext | 1 ZB (Zetta) |
10^24 | 2^80 = 1sept | 1 YB (Yotta) |
Shortcuts
- K * K = M
- G / M = K
Bytes Storage
* 1 bit: a binary decision
* 1 byte: a character
* 5 Megabytes: The complete works of Shakespeare
* 2 Gigabytes: 20 meters of shelved books
* 10 Terabytes: The printed collection of the US Library of Congress
* 200 Petabytes: All printed material
* 5 Exabytes: All words ever spoken by human beings
Unique Characters
10 Characters
Base62 Encoded
62^10 = 8.3 * 10^17 over a quadrillion
6 Characters
62^6 = 58B
Latency
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 10,000 ns 10 us
Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us
Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
HDD seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
Read 1 MB sequentially from HDD 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Caching
Caching improves page load times and can reduce the load on your servers and databases. In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
Databases often benefit from a uniform distribution of reads and writes across its partitions. Popular items can skew the distribution, causing bottlenecks. Putting a cache in front of a database can help absorb uneven loads and spikes in traffic.
Disadvantage(s): cache
- Need to maintain consistency between caches and the source of truth such as the database through cache invalidation.
- Cache invalidation is a difficult problem, there is additional complexity associated with when to update the cache.
- Need to make application changes such as adding Redis or memcached.
Types
Client
Caches can be located on the client side (OS or browser), server side, or in a distinct cache layer.
CDN
CDNs are considered a type of cache.
Web Server
Reverse proxies and caches such as Varnish can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.
Database
Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance.
Application
In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so cache invalidation algorithms such as least recently used (LRU) can help invalidate 'cold' entries and keep 'hot' data in RAM.
Redis has the following additional features: - Persistence option - Built-in data structures such as sorted sets and lists
There are multiple levels you can cache that fall into two general categories: database queries and objects: - Row level - Query-level - Fully-formed serializable objects - Fully-rendered HTML
Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult
Database Query Level
Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers from expiration issues: - Hard to delete a cached result with complex queries - If one piece of data changes such as a table cell, you need to delete all cached queries that might include the changed cell
Object Level
See your data as an object, similar to what you do with your application code. Have your application assemble the dataset from the database into a class instance or a data structure(s): - Remove the object from cache if its underlying data has changed - Allows for asynchronous processing: workers assemble objects by consuming the latest cached object
Suggestions of what to cache: - User sessions - Fully rendered web pages - Activity streams - User graph data
Updating the Cache
Since you can only store a limited amount of data in cache, you'll need to determine which cache update strategy works best for your use case.
Cache-aside
The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following:
- Look for entry in cache, resulting in a cache miss
- Load entry from the database
- Add entry to cache
- Return entry
Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn't requested.
Disadvantage(s): cache-aside
- Each cache miss results in three trips, which can cause a noticeable delay.
- Data can become stale if it is updated in the database. This issue is mitigated by setting a time-to-live (TTL) which forces an update of the cache entry, or by using write-through.
- When a node fails, it is replaced by a new, empty node, increasing latency.
Write-through
The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:
- Application adds/updates entry in cache
- Cache synchronously writes entry to data store
- Return
Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.
Disadvantage(s): write through
- When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue.
- Most data written might never be read, which can be minimized with a TTL.
Write-behind
In write-behind, the application does the following:
- Add/update entry in cache
- Asynchronously write entry to the data store, improving write performance
Disadvantage(s): write-behind
- There could be data loss if the cache goes down prior to its contents hitting the data store.
- It is more complex to implement write-behind than it is to implement cache-aside or write-through.
Refresh-ahead
You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.
Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.
Disadvantage(s): refresh-ahead
- Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead.
Availability vs Consistency
- Consistency - A read is guaranteed to return the most recent write for a given client.
- Availability - A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout).
- Partition Tolerance - The system will continue to function when network partitions occur.
Networks aren't reliable, so you'll need to support partition tolerance. You'll need to make a software tradeoff between consistency and availability.
CP - Consistency/Partition Tolerance
Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if
your business needs require atomic reads and writes.
AP - Availability and Partition Tolerance
Responses return the most readily available version of the data available on any node, which might not be the latest.
Writes might take some time to propagate when the partition is resolved.
AP is a good choice if the business needs allow for eventual consistency or when the system needs to continue working despite external errors.
Databases
SQL vs NoSQL
Reasons for SQL
- Structured data
- Strict schema
- Relational data
- Need for complex joins
- Transactions
- Clear patterns for scaling
- More established: developers, community, code, tools, etc
- Lookups by index are very fast
Reasons for NoSQL
- Semi-structured data
- Dynamic or flexible schema
- Non-relational data
- No need for complex joins
- Store many TB (or PB) of data
- Very data intensive workload
- Very high throughput for IOPS
Sample data well-suited for NoSQL:
- Rapid ingest of clickstream and log data
- Leaderboard or scoring data
- Temporary data, such as a shopping cart
- Frequently accessed ('hot') tables
- Metadata/lookup tables
SQL
A relational database like SQL is a collection of data items organized in tables.
ACID is a set of properties of relational database transactions.
- Atomicity - Each transaction is all or nothing
- Consistency - Any transaction will bring the database from one valid state to another
- Isolation - Executing transactions concurrently has the same results as if the transactions were executed serially
- Durability - Once a transaction has been committed, it will remain so
NoSQL
NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor eventual consistency.
BASE is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE chooses availability over consistency.
- Basically available - the system guarantees availability.
- Soft state - the state of the system may change over time, even without input.
- Eventual consistency - the system will become consistent over a period of time, given that the system doesn't receive input during that period.
Key - Value Store
DynamoDB, Redis, Memcached
Abstraction: hash table
A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.
Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.
A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph database.
Document Store
MongoDB, DynamoDB
Abstraction: key-value store with documents stored as values
A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types.
Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.
Some document stores like MongoDB and CouchDB also provide a SQL-like language to perform complex queries. DynamoDB supports both key-values and documents.
Document stores provide high flexibility and are often used for working with occasionally changing data.
Wide Column Store
Bigtable, HBase, Cassandra
Abstraction: nested map
ColumnFamily<RowKey, Columns<ColKey, Value, Timestamp>>
A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.
Google introduced Bigtable as the first wide column store, which influenced the open-source HBase often-used in the Hadoop ecosystem, and Cassandra from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.
Wide column stores offer high availability and high scalability. They are often used for very large data sets.
Graph
Neo4j, FlockDB
Abstraction: graph
In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.
Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs.
Asynchronism
Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data.
Message Brokers
Asynchronous messaging is a messaging scheme where message production by a producer is decoupled from its processing by a consumer. When dealing with messaging systems, we typically identify two main messaging patterns – Message Queuing and Publish/Subscribe.
Message Queueing
Multiple producers can send messages to the same queue; however, when a consumer processes a message, it is locked or removed from the queue and is no longer available. Only a single consumer consumes a specific message.
As a side note, if the consumer fails to process a certain message, the messaging platform typically returns the message to the queue where it is made available for other consumers. Besides temporal decoupling, queues allow us to scale producers and consumers independently as well as providing a degree of fault-tolerance against processing errors.
Publish -> Subscribe
In the Publish/Subscribe (or Pub/Sub) communication pattern, a single message can be received and processed by multiple subscribers concurrently.
This pattern allows a publisher, for example, to notify all subscribers that something has happened in the system. Many queueing platforms often associate pub/sub with the term topics. In RabbitMQ, topics are a specific type of pub/sub implementation (a type of exchange to be exact), but during this post, I refer to topics as a representation of pub/sub as a whole.
Generically speaking, there are two types of subscriptions:
- An ephemeral subscription, where the subscription is only active as long the consumer is up and running. Once the consumer shuts down, its subscription and yet-to-be processed messages are lost.
- A durable subscription, where the subscription is maintained as long as it is not explicitly deleted. When the consumer shuts down, the messaging platform maintains the subscription and message processing can be resumed later.
Advantages
- Provided communication between services that may not be running at the same time. The producer can send messages regardless of whether the consumer is active or not. All it needs is a running message broker. The same applies to the consumer.
- Improved system performance by introducing asynchronous processing. High-consuming tasks can be distributed to separate processes. That will fasten up your application and increase user experience.
- Increased reliability by guaranteeing the transmission of messages. Message brokers offer a redelivering mechanism. In case of consumer failure, it can redeliver the message immediately or after some specified time. It also supports routing of not delivered messages – it’s called a dead-letter mechanism.
Disadvantages
- Eventual Consistency Some components could not have up-to-date data until the messages are propagated and processed.