Real-Time Read-Heavy Database Workloads: Considerations and Tips
Reading and writing are distinctly different beasts. This is true with reading and writing words, reading and writing code, and also about reading and writing data to a database. So, when it comes to optimizing database performance, your read:write ratio really does matter.
I recently wrote about performance considerations that are important for write-heavy database workloads — covering factors like log-structured merge (LSM) tree vs. B-tree engines, payload size, compression, compaction and batching. But read-heavy database workloads bring a different set of challenges, including:
- Scaling a cache: Many teams try to speed up reads by adding a cache in front of their database, but the cost and complexity can become prohibitive as the workload grows.
- Competing workloads: Things might work well initially, but as new use cases are added, a single workload can end up bottlenecking all the others.
- Constant change: As your data set grows or user behaviors shift, hot spots might surface.
Let’s explore high-level considerations and best practices for handling latency-sensitive read-heavy workloads.
What Is a Real-Time Read-Heavy Workload?
When we talk about a “real-time read-heavy” workload, we mean workloads that:
- Involve a large amount of sustained traffic; e.g., over 50K operations per second (OPS).
- Involve more reads than writes.
- Are bound by strict latency service-level agreements (SLAs); e.g., single-digit millisecond P99 latency.
Here are a few examples of how they manifest themselves in the wild:
- Betting: Everyone betting on a given event is constantly checking individual player, team and game stats as the match progresses.
- Social networks: A small subset of people are posting new content, while the vast majority of users are typically just browsing through their feeds and timelines.
- Product catalogs: As with social media, there’s a lot more browsing than actual updating.

Source: ScyllaDB
Performance Considerations
Here are some key considerations that impact read performance in real-time database systems.
The Database’s Read Path
To understand how databases like ScyllaDB process read operations, let’s recap its read path. When you submit a read (a SELECT statement), the database first checks for the requested data in memtables, which are in-memory data structures that temporarily hold your recent writes.

Source: ScyllaDB
Additionally, the database checks whether the data is present in the cache. Why is this extra step necessary? Because the memtable may not always hold the latest data. Sometimes data could be written out of order, especially if applications consume data from unordered sources. As the protocol allows for clients to manipulate record timestamps to prevent correct ordering, checking both the memtable and the cache is necessary to ensure that the latest write gets returned.
Then, the database takes one of two actions:
- If the data is stored on the disk, the database populates the cache to speed up subsequent reads.
- If the data doesn’t exist on disk, the database notes this absence in the cache — avoiding further unnecessary lookups there.
As memtables flush to disk, the data also gets merged with the cache. That way, the cache ends up reflecting the latest on-disk data.
Hot vs. Cold Reads
Reading from cache is always faster than reading from disk. The more data your database can serve directly from cache, the better its performance (since reading data from memory has a practically unlimited fetch ceiling).
But how can you tell whether your reads are going to cache or disk? Monitoring. You can use tools such as the ScyllaDB Monitoring Stack to learn all about your cache hits and misses. The fewer cache misses, the better your read latencies.
ScyllaDB uses a least recently used (LRU) caching strategy, similar to Redis and Memcached. When the cache gets full, the least-accessed data is evicted to make room for new entries. With the LRU approach, you need to be mindful about your reads. You want to avoid situations where a few “expensive” reads end up evicting important items from your cache.
If you don’t optimize cache usage, you might encounter a phenomenon called “cache thrashing.” That’s what happens when you’re continuously evicting and replacing items in your cache, essentially rendering the cache ineffective. For instance, full table scans can create significant cache pressure, particularly when your working set size is larger than your available caching space.
During a scan, if a competing workload relies on reading frequently cached data, its read latency will momentarily increase because those items were evicted. To prevent this situation, expensive reads should specify options like ScyllaDB’s BYPASS_CACHE to prevent its results from evicting important items.
Paging
Paging is another important factor to consider. It’s designed to prevent the database from running out of memory when scanning through large results. Basically, rows get split into pages as defined by your page size, and selecting an appropriate page size is essential for minimizing end-to-end latency.
For example, assume you have a quorum read request in a three-node cluster. Two replicas must respond for the request to be successful. Each replica computes a single page, which then gets reconciled by the coordinator before returning data back to the client. Note that:
- ScyllaDB latencies are reported per page. If your application latencies are high, but they’re low on the database side, it is an indication that your clients may be often paging.
- Smaller page sizes increase the number of client-server round trips. For example, retrieving 1,000 rows with a page size of 10 requires 100 client-server round trips, impacting latency.
Testing various page sizes helps with finding the optimal balance. Most drivers default to 5,000 rows per page, which works well in most cases, but you may want to increase from the defaults when scanning through wide rows or during full scans — at the expense of letting the database work more before receiving a response. Sometimes trial and error is needed to get the page size nicely tuned for your application.
Optimizing Read-Heavy Workloads
While ScyllaDB’s LSM tree storage engine makes it quite well-suited for write-heavy workloads, our engineers have introduced a variety of features that optimize it for ultra-low latency reads as well. These include unified cache, SSTable index caching, workload prioritization, heat-weighted load balancing, prepared statements and high concurrency.
There are many ways you can keep latencies low with read-heavy workloads — even on databases such as ScyllaDB that are also optimized for write-heavy workloads. In fact, ScyllaDB performance is comparable to dedicated caching solutions like Memcached for certain workloads.
If you want to learn more, here are some firsthand perspectives from teams that tackled some interesting read-heavy challenges:
- Discord: With millions of users actively reading and searching chat history, Discord needs ultra-low-latency reads and high throughput to maintain real-time interactions at scale.
- Epic Games: To support Unreal Engine Cloud, Epic Games needed a high-speed, scalable metadata store that could handle rapid cache invalidation and support metadata storage for game assets.
- ZeroFlucs: To power its sports betting application, ZeroFlucs (now Caesars Entertainment) had to process requests in near real-time, constantly, and in a region local to both the customer and the data.
Also, take a look at the following video, which goes into even greater depth on these read-heavy challenges and also walks through what these workloads look like on ScyllaDB.