We all know that for bulk collect PL/SQL operation it’s important to keep the memory consumption under control.
For that reason all bulk collect operations should be combined with a LIMIT clause like in the following code fragment:
DECLARE TYPE t_bulk_collect_type IS TABLE OF test_table%ROWTYPE; l_tab_collection t_bulk_collect_type; CURSOR c_cursor IS SELECT * FROM test_table; OPEN c_cursor; LOOP FETCH c_cursor BULK COLLECT INTO l_tab_collection LIMIT 10000; EXIT WHEN l_tab_collection.count = 0; -- Process contents of collection here. DBMS_OUTPUT.put_line(l_tab_collection.count || ' rows'); END LOOP; CLOSE c_cursor;
Setting the limit to process only 10K records in bulk is does not impress me in a time of Big Data and massive parallelism.
But what would happen if I try to insert 50 million records in the collection?
If that would be possible possible, I’ll be able to speed up processing logic which is to complex for SQL, by putting 100 million rows in memory at once, without a purchase of “In Memory” option, which is available from the 12c release of Oracle database.
Here is the code I’ve tried to run:
declare start_time timestamp; end_time timestamp; type blk_test is table of tran_data_history%rowtype index by pls_integer; l_blk_test blk_test; begin start_time := systimestamp; select * bulk collect into l_blk_test from tran_data_history where rownum <=50000000; end_time := systimestamp; DBMS_OUTPUT.PUT_LINE('start_time = ' || start_time); DBMS_OUTPUT.PUT_LINE('end_time = ' || end_time); end;
Although the code is very simple (It just inserts 50 million records into the collection), after about 1 minute of execution I’ve got the error:
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu call ,pmuccst: adt/record)
Quick and dirty method of how to fix that error was to increase the PGA memory, as collections are using PGA, not the SGA memory.
Therefore I’ve increased the PGA memory from 4GB –> 16GB and repeat the test again.
Unfortunately after a 1 minute of running, I’ve got the same error as before.
As Oracle Database is designed to serve a large number of sessions at the same time, it cannot allow to allocate all of the PGA memory to one session.
This is a moment when you need to think about architecture and if your approach is appropriate for this case.
If you are sure you can take the whole PGA memory for your session to do e.g. Machine Learning processing or some other complex processing on the Oracle database (without In Memory and Advanced Analytic options), I’ll show you how to do that.
There are several parameters involved here, but first you should start with max_map_count.
That parameter limits the maximum number of memory map areas a process may have.
Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries.
For many modern applications (like Elasticsearch) that parameter is a way to low.
In this case we need a lot of memory to hold 100 million records from a wide table.
This is how you can check default value of the parameter:
more /proc/sys/vm/max_map_count 65530
The ORA-04030 error in this case means that I’m running out of map entries from the OS which you can confirm by looking at the trace file.
======================================= TOP 10 MEMORY USES FOR THIS PROCESS --------------------------------------- *** 2019-02-19 00:03:36.423 98% 4011 MB, 257475 chunks: "pmuccst: adt/record " PL/SQL koh-kghu call ds=0x7f55230062f8 dsprt=0xc121f40 2% 64 MB, 4090 chunks: "pmucalm coll " PL/SQL ======================================= PRIVATE MEMORY SUMMARY FOR THIS PROCESS --------------------------------------- *** 2019-02-19 00:03:37.656 ****************************************************** PRIVATE HEAP SUMMARY DUMP 4089 MB total: 4088 MB commented, 194 KB permanent 701 KB free (512 KB in empty extents), 4085 MB, 2 heaps: "koh-kghu call " 3 KB free held *** 2019-02-19 00:03:40.446 ------------------------------------------------------ Summary of subheaps at depth 1 4078 MB total: 4078 MB commented, 94 KB permanent 113 KB free (0 KB in empty extents), 4011 MB, 257475 chunks: "pmuccst: adt/record " *** 2019-02-19 00:03:42.798
To increase a number of the map entries from 4 GB (65530 x 64 / 1024 to get the amount in GB) into the 16 GB, you need to execute the following command as a root user:
sysctl -w vm.max_map_count=262144
If you want to permanently store a new map entry count that will persist after you reboot your machine, you need to edit/insert proper value from the text editor:
vi /etc/sysctl.conf vm.max_map_count=262144
To apply a new setting you need to execute:
Finally you can check that new value is set you can execute:
more /proc/sys/vm/max_map_count 262144
Besides change in OS setting (max_map_count parameter), there are additional two database parameters that I need to change:
For Oracle versions 220.127.116.11 and lower:
- _realfree_heap_pagesize_hint = 262144
- _realfree_heap_pagesize = 262144
In my 11g database _use_realfree_heap parameter was already set on TRUE.
For that reason I only had to change the remaining parameter by executing the following command:
ALTER SYSTEM SET "_realfree_heap_pagesize_hint"=262144 scope=spfile;
After you run code for from the beginning again, you should not see that error again till you reach 50 million test.
Here are the results:
|Volume….||Elapsed Time (sec)|
It is important to monitor your system while performing the test, as you can clearly see how quickly the memory was consumed.
Although test results are fast (comparing with some alternatives), even with my tweaks this solution cannot reach 50 million records, which is in terms of Big Data processing requirement far from enough.
For that reason, BULK COLLECT cannot be used to process some complex logic on huge data set.
Oracle offers In Memory and Advanced Analytic license options where you can take advantage of being close to the data, but keep in mind that even such solution cannot scale enough to more than one physical machine.
However, for old fashion ETL stuff, when set parameters to appropriate value, you will definitely expect performance benefits by performing more processing in memory.