30. 05. 2022.

How to get 100% cache hit rate by using Change Data Capture & Redis

In this blog I’ll explain how to get 100% cache hit rate by using CDC (Change Data Capture) technology and Redis cache.   There are multiple benefits of having caching layer in front of back-end database system. By fetching data from the cache instead of back-end we are actually free up valuable database resources for […]

25. 02. 2022.

Redis performance tuning – Top 10 mistakes

Redis is the most popular Key-value data store and one of the most popular database systems overall.   According to Db-engine ranking: https://db-engines.com/en/ranking/key-value+store (You can check the picture below), Redis is on the top position by large margin among Key-values data stores. Amazon DynamoDb on the second place, lags behind Redis more than a twice. […]

04. 02. 2022.

StreamSets review – creating real time data pipelines in no time

In this post I’ll try to review StreamSets Data Collector, one of the most popular tools for creating smart data pipelines for streaming, batch and change data capture, which allows you to move data around in a near real time.   First I’d like to point out that the whole review is my own personal […]

29. 11. 2021.

Complex near real-time transformations in data pipelines

For many years, ETL daily batch job was the dominant way to perform data transformations before loading in Data Warehouse. These days requirements are quite different starting with the most important one which is to ensure that new data has to be available for AI/ML and analysis near real time. Moreover, classical DWH databases are […]

05. 06. 2021.

Missing columns in PrestoSQL

One of the first issues when starting to use PrestoSQL distributed query engine is related to missing columns of certain data types, especially numeric and all variants of date. This issue is usually because of missing precision at the data source, which is not only one of the most common, but also one of the […]

01. 04. 2021.

Trino (ex. Presto) – troubleshooting distributed transactions among various data sources

In this post I’ll demonstrate one of many use cases of Presto technology, that you might overlooked – How to troubleshoot distributed transactions which are very common these days as a result of a complex Microservices architecture. In the following SELECT statement I’ll combine three different data sources: Oracle Postgres Kafka by using good old […]

17. 03. 2021.

Trino (ex. Presto) – high performance distributed query engine

In this article I’ll share some of my experiences with Trino (ex. Presto) – high performance distributed query engine.   First some intro about the project Presto. Couple of members from the Facebook infrastructure team created the project Presto to address problems they have with 300 Petabytes Hadoop Data Warehouse. The main goal of the […]

16. 02. 2021.

Postgres monitoring with Percona PMM

For those who are coming from Oracle world, the best alternative database is probably Postgres, because of many similarities between those two Db engines (data types, tablespace concept etc.).   However, one of the first thing you want to do is to grab a full control over what is going on in your database. If […]

06. 12. 2020.

Bashtop – future of the terminal Linux monitoring

Although the idea of the original top utility is follewed in many similar utilities for terminal based Linux monitoring, till now I’ve been using Htop, atop (which can monitor GPU on top of CPU/Mem/Net/Disk) and Nmon to do a job (later one, called „Topas on steroids” is ported from AIX to Linux). Quite recently I’ve […]

22. 07. 2019.

Oracle database benchmarking by using CALIBRATE_IO

CALIBRATE_IO is yet another popular database simulation/stress test utility, mainly used to perform IO benchmarks.   Procedure, that is part of DBMS_RESOURCE_MANAGER package, will generate read-only workload made up of 1 MB of random of I/Os to the database to determine the maximum number of IOPS and MB per second. You can save the following […]