29. 11. 2021.

Complex near real-time transformations in data pipelines

For many years, ETL daily batch job was the dominant way to perform data transformations before loading in Data Warehouse. These days requirements are quite different starting with the most important one which is to ensure that new data has to be available for AI/ML and analysis near real time. Moreover, classical DWH databases are […]

21. 01. 2021.

Kafka & TCP Retrans Error rate

Recently I had an interesting case where in the data pipeline I’ve found duplicate messages in the Kafka topics. Duplicate records in Kafka topics might appear for many different reasons, but most of them you can find only those related to the Kafka settings (especially those related to the Kafka settings). In this article you […]