In the 2nd part of the Oracle RAC in the AWS Cloud series I’ll present test results of Oracle 2 nodes RAC databases running on AWS public cloud based on Flashgrid’s Skycluster deployment.
You can take a look at my previous article that are related to how to deploy & setup environment for Oracle RAC Cloud testing by clicking on the link:
Main objective of this test is to check cluster stability when running Oracle RAC in the AWS Public Cloud and what kind of performances you can expect.
After performing several smoke tests, I started with 10 users running standard OE Swingbench’s test on the first node.
10 users SOE stress test
Below you can see that I’m using only one node (2nd node is almost idle).
The following two screens are taken from SQL Developer while running the test.
You can observe that 1st node is at about 70% of CPU utilization, and that the largest wait event is on I/O.
The following screen is taken from the ASH while running the test.
Below is an expert from the AWR report:
Full AWR report you can find on the following link:
Swingbench report you can download from the link below:
During the 5 minute period of stress testing, almost 4K transactions has been completed, with around 12 transactions per second.
10 users 2 nodes test
This is a logical extension of the previous test.
On the following picture you can see that I’m using 2 nodes and that the CPU consumption is almost the same as in previous test (around 70% – 75%, just about 5% more when compared with single node test).
The next two slides are showing SQL Developer’s view on the first node:
Same view from the 2nd node:
We can observe that load is almost equally splitted between two nodes.
Two next slides are showing ASH for node 1 & 2.
As there are many results that are beyond the scope of this post, you can download AWR report from Node 1 on the following link:
& AWR from Node 2 by using the following link:
Finally you can download the SOE report from both nodes from here:
& from here:
Primary goal of the previous tests is to show Oracle RAC stability in a Public Cloud environment.
Stress test for 10 users per node passed without any issue, and with almost linear scalability.
From real production experience I know that Oracle RAC, when you put it under the heavy load, can fall apart when OS starts to swap.
Reason for that is node eviction or more precisely several thresholds that dictate Cluster behaviour.
You can change default values of RAC Cluster (e.g. node eviction threshold from the 3 second interval to a larger value) or you can change swappiness from default values, process priorities etc., but that would introduce a new risks and other consequences and you will end up with not-standard setup that can produce not common issues.
By removing swap completely, and relying exclusively on real physical memory (Flashgrid infrastructure-as-a-code setup), as far as you still have memory on your disposal, Cluster remain in a stable state during very intensive 10 users per node test, and I didn’t record any sign of system slowing down.
I have to disclose that database was not in archive log mode, and that I didn’t set up Flashback mainly because the primary goal was to test RAC stability, not performance (although here you can find many performance reports ready for download).
For real performance test I would need to allocate much more EBS disks and to enable archiving and flashback.
In case you need to set up production system that you want to deploy in some Cloud variant (public, private, hybrid) where significant load is expected, my advice is – don’t try to save on storage.
AWS has published almost a book that covers all pricing calculations and limitations of EBS (Elastic Block Storage) which is just one of many component for running Oracle RAC on the Cloud.
Thus be sure that you read the document carefully and ensure that you fully understand complex Cloud pricing model, which has nothing to do with this test, but with Cloud in general.
During the stress test, performance of EBS volumes where about 12% lower than expected (what AWS should deliver according to its documentation), but I had no time to investigate that in detail.
Even with this relatively weak m5.2xlarge EC2 machines (4 cores) configuration, I could achieve even better performance by selecting different storage option, as CPU was still not exhausted.
For that reason, in the next (and final) post related to this series of Oracle RAC on AWS deployed by using Flashgrid SkyCluster), I’ll push Oracle RAC on AWS up to its limits.