Apache Ignite is one of the very few In-memory SQL compliant distributed databases/data grid among open-source projects.
It’s often called “Redis done right” or “Redis on steroid”, because Redis looks primitive and limited when compared with Apache Ignite.
Ignite offers great flexibility and lot of features that can easily fit to many use cases.
Instead to write 1000 pages book/blog to describe Ignite functionalities, I’ll concentrate on the most visible difference between Ignite and similar solutions such as Redis – Ignite as ANSI-99 SQL compliant In-memory distributed database.
Like majority of open-source projects, there is also a commercial version of Apache Ignite which is supported by GridGrain company, that adds many functionalities not present Ignite.
On the following link you can check the feature comparisn between those two.
The second goal of this blog is to show use cases where the enterprise functionalities won’t be needed.
Let’s get started.
Apache Ignite is currently on the 8th position on Db-Engines Ranking list among key-value databases (a month ago when I checked last time, Ignite was at 9th place).
You can check that on the following link:
Among many use cases (for example Apache Ignite as In-memory RDD for Spark cluster), free version of Apache Ignite can be used as caching layer for existing apps, just like Redis with addition of SQL support.
On the following slide you can see Oracle AWR report from the Oracle database, where the first query/procedure takes majority of database time.
If you can execute the same code somewhere else (like Apache Ignite in this case), you can spare a lot of money in terms of number of required licenses with commercial rdbms database you use.
Another excellent use case is to leverage Apache Ignite as cache node for microservices especially if you are using Spring framework.
To start experimenting with Ignite, I suggest to installing it in single node configuration without native persistence and then to gradually add more feature that you will need to fit your case.
You need to keep in mind that free version doesn’t have backup capabilities and GUI tools (Web Administration console is part of the commercial offering) and some other feature you might need, in case you want to stick with free version.
Down below you can find a couple of screenshots from the visor – command line utility, which is the only way (in addition to control shell script) for monitoring and administration of Apache Ignite which is a reason why I suggest to stick with simple configuration.
Let’s finally start with Ignite SQL capabilities.
Apache Ignite is key-value database with SQL API on top.
Similar approach is used by many distributed databases (YugabyteDb – database that I’ve tested recently is an example).
Only atomic transactions over SQL are currently supported (this feature is currently in beta and will be introduced in 3.x version of Apache Ignite).
In the current release you’ll need to use classical (not SQL) API calls to have full ACID transaction support.
Secondary indexes (indexes on data) are supported, but FK (foreign keys) are not.
You can configure data affinity and co-location and many other features, but they are not relevant for single node setup.
When creating a table, you can add Ignite specific keywords on top of SQL CREATE TABLE statement.
Here is how it looks like:
CREATE TABLE emp ( EMPNO int(4) NOT NULL, ENAME VARCHAR(10), JOB VARCHAR(9), MGR decimal(4,0), SAL decimal(7,2), COMM decimal(7,2), DEPTNO int, PRIMARY KEY (empno) ) WITH "template=partitioned,backups=0,atomicity=atomic,write_synchronization_mode=primary_sync, cache_name=jp_emp";
In the previous example I added partitioning template, number of backups, transactions support etc.
Default setup of Ignite includes off-heap in-memory storage which means that you shouldn’t suffer from the GC (garbage collection) pause, which will have high impact with larger Java heap memory sizes, but for smaller heap sizes it is faster.
On the other hand, off-heap memory setup adds overhead due to the serialization/deserialization, which is observable with larger scans.
Ignite uses H2 database which will be replaced by Apache Calcite in the release 3 for SQL parsing and generating execution plans.
When you take a look at execution plan that Ignite produces, you’ll be surprised to see two records/rows with plans instead of one.
This is because Ignite always uses Map/Reduce pattern to process SQL queries under the hood.
explain analyze SELECT * FROM mdm_relation LIMIT 10;
In the first execution plan you can observer SCAN operation on the local node (in our case the only node),
while in the second plan you can see the MERGE operation that would gather data from all nodes (only one node in our case).
Since multi node setup with replication introduces many complexities in execution plan, it is one of the reasons why you should start with just one node.
Let’s check how Ignite stands against rdbms database such as Oracle.
This is by no means real tests, which is in any case impossible to generalize due to so many dependencies that can impact the results.
Main purpose is just to get some feeling of possible improvement you might expect.
Test 1: Range scan on top of indexed column
SELECT * FROM relation WHERE some_id BETWEEN 16315580 AND 16315610; Oracle: 59 ms Ignite: 47 ms
The column some_id is not the primary key, it’s just indexed (regular index in the case of Oracle, secondary index in case of Ignite).
We can see that the speed about the same.
Test 2: Count operation on the whole table
SELECT count(*) FROM relation; Oracle: 2,4 sec Ignite: 0,6 sec
Here we can observer that Ignite is 4x faster.
Test 3: Sorting of a large table (couple dozen millions records)
SELECT * FROM relation ORDER BY entity_id_1 desc LIMIT 10000; Ignite: 37,75
In Oracle 11g LIMIT is not supported. For that reason I need to execute the following:
with tmp as (SELECT rownum , a.* FROM relation a ORDER BY entity_id_1 desc) select * from tmp where rownum <= 10000;¸ Oracle: 94,143 sec
If I execute the same statement from Ignite, I have to interrupt execution after a couple of minutes.
Test 4: Getting the row by using PK in the WHERE clause
SELECT * FROM relation WHERE id = <some_value>; Oracle: 88ms Ignite: 72 ms
Performance tests clearly show that Ignite, although completely in memory (keys, indexes and values – which maps to indexes and table rows in rdbms) executes in about the same speed as Oracle (without Oracle In-memory database option).
That might sound a bit disappointing as you’d expect to get better results as Ignite is in memory.
Reasons for that you can find in Ignite worse execution plans due to the distributed nature and map/reduce 2 steps query execution and key-value engine with SQL interface on top.
The most benefit from Apache Ignite would be to use Ignite as free (no licenses needed) cache which stands between Application servers or Microservices from one side and database on the other side.
In such cases you would mostly use simple queries over PK or indexes (keep in mind that Foreign Keys are not supported in Ignite) and you can expect excellent query speed (same speed as rdbms).
For complex queries or analytical load, I would go with some DWH solution, although it is possible to do that in Ignite.
In case you need to leverage more features (many nodes, Spark, persistency…), commercial offering from Grid Grain is better option.
I hope from this short blog you can get some feedback what you can expect from Ignite.
Next release (version 3) of Ignite we can expect a huge changes in SQL engine (no more H2), and we can expect much better performance along with SQL ACID support across nodes with Raft protocol, Facebook RocksDb storage engine (standard stack used by many distributed databases) and many goodies which will move Ignite few steps forward.