In this blog I’ll explain how to get 100% cache hit rate by using CDC (Change Data Capture) technology and Redis cache.
There are multiple benefits of having caching layer in front of back-end database system.
By fetching data from the cache instead of back-end we are actually free up valuable database resources for something else (e.g. to better handle peak load during peak load).
In case you are using commercial database, you can achieve savings in license costs, since it usually depends on the number of CPU.
Another benefit may be to have cache closer to your application which may significantly improve latency of the system.
In addition, in case of Redis, you don’t need to perform object-relational mappings, something you usually do when querying database from object oriented language.
The most obvious downside of Redis is lack of SQL interface.
If you are looking for In-memory SQL database, you may check out Apache Ignite on the following link:
Here is the typical architecture of caching layer in-front of database layer.
Application logic is simple: try to fetch data from the Redis cache if they are available, otherwise get data out from the back-end database.
When cache hit rate falls below some previously agreed threshold (e.g. 70%), clear up the cache and start initial load to upload the latest data into cache.
There are two issues with such approach:
Clearing the cache and executing initial loading are invasive, time consuming and resource consuming task with high impact on the back-end database.
After a few days (depending on the frequency of the DML commands being executed) cache hit rate will fall again below threshold value and the whole resource intensive process of refreshing a cache needs to be repeated.
With today tech there is a way to avoid the whole process and to keep 100% cache hit rate.
The following picture shows how to do that.
By creating a data pipeline based on CDC (Change Data Capture), changed data will be propagated immediately to a cache.
Since Redis has its own internal data structure, I need to use Groovy script to do transformation from the SQL stream.
The whole pipeline is not only very fast, but also very efficient.
Pipeline reads data not from database, but from transaction log file instead (e.g. Redo log in case of Oracle or WAL in case of Postgres), and the whole process is fully asynchronous.
Obviously you need to tune data source (Postgres in this case) to propagate changes in a real-time but that is out of scope of this post.