There are many benchmark testing tools that are focusing on various fields of a cluster like computing capabilities, energy efficiency, data & storage, memory etc.

In this article I’m going to use Somberero – High Performance Computing tool suitable to run a test against your home PC or cluster of supercomputers.

Sombrero is particularly useful for testing a mix of computation power, RAM speed and cluster interconnect network.

You can use it in your local cluster, Cloud cluster, Oracle RAC (Real Application Cluster) as it’s very flexible and easy to use.

Although you can find more about how to install Sombrero at the following link:

https://github.com/sa2c/sombrero

 

there is one mistake in installation instructions which is why I’m providing here correct way that works on Ubuntu (RedHat clones have slightly different commands which you can find on the Web).

 

Let’s start with installation procedure:

sudo apt install git make gcc bc gnuplot openssh-server openmpi-bin openmpi-common

#this step is missing from the official repo
sudo apt install mpich

cd

git clone https://github.com/sa2c/sombrero.git

cd sombrero

make 

Once the installation completes, everything is ready to start with tests.

 

For the first run I suggest to use just one CPU core and set a size of the test to small.

username@hostname:~/sombrero>./sombrero.sh -n 1 small

As a result you’ll get number of operations per second like the following:

[RESULT] Case 1 2356.71 Gflops in 172.899673 seconds

[RESULT] Case 1 13.63 Gflops/seconds

 

You can play with your PC by performing test with more cores like in the following example:

username@hostname:~/sombrero>./sombrero.sh -n 2 small

 

One important note!

You need to know number of cores of your server/cluster.

Below is one of many ways to find real number of cores:

username@hostname:~>cat /proc/cpuinfo | grep processor | wc -l 
8

Result I’ve got needs to be divided by 2 to get the number of cores due to the hyper threading which assumes 2 threads per core on x64 processors (Intel, on RISC there can be 4 or even more threads per core).

In this case, correct number of cores is four.

 

You can also produce formatted output by using awk utility:

./sombrero.sh -n 2 -s small | awk 'BEGIN {printf("hostname");} /^\[RESULT\] Case.*Gflops.seconds/ {printf("\t" $4);} END {printf("\n");}' >> test_results.dat

The next step is to produce some graph to check the results visually:

username@hostname:~/Desktop>gnuplot

	G N U P L O T
	Version 5.0 patchlevel 6    last modified 2017-03-18

	Copyright (C) 1986-1993, 1998, 2004, 2007-2017
	Thomas Williams, Colin Kelley and many others

	gnuplot home:     http://www.gnuplot.info
	faq, bugs, etc:   type "help FAQ"
	immediate help:   type "help"  (plot window: hit 'h')

Terminal type set to 'qt'
gnuplot> set style data histogram
gnuplot> set style histogram cluster gap 1
gnuplot> set style fill solid border rgb "black"
gnuplot> set auto x
gnuplot> set yrange [0:*]
gnuplot> set datafile separator "\t"
gnuplot> set ylabel "GFLOPs/s"

gnuplot> plot for [i=1:6] 'results.dat' using (column(i+1)):xtic(1) title 'TEST '.i

And here is result of my test in graphical format:

Test results

You can continue to play with various gcc options.

If you want to tweak your tests, you can change default configuration of MkFlags.

First you need to execute the following commands:

username@hostname:~>cd ~/sombrero/Make/

username@hostname:~/sombrero/Make>vi MkFlags

Default content of the MkFlags is:

#Compiler
CC = mpicc
CFLAGS = -std=c99 -g -O3
LDFLAGS =

You can change the file to something like this:

#Compiler
CC = mpicc
CFLAGS = -std=c99 -march=native -O3
LDFLAGS =

By removing -g option (debugger symbols) and by allowing gcc to use modern processor features

(-march=native), you can get about 12% better performance:

 

Default tests 10.06 10.03 10.99 10.89 10.53 10.12 12.15 11.44 13.74 13.64

 

Optimized tests 11.32 10.88 11.39 10.80 10.19 10.13 11.57 11.52 13.95 13.71

 

You can continue to play with various gcc parameter (you can even change compiler to clang), but that would be far beyond scope of this post.

 

The last interesting part is to create the same test on a cluster of machines.

For this demo first you need to create 2 VM’s (Virtual Machines) that are sharing the same network.

In the article: “Easy way to create Virtualbox VM’s internal network”, you can find one easy way of how to do it.

 

For more details, please check the following link:

https://www.performatune.com/en/easy-way-to-create-virtualbox-vms-internal-network/

 

Next step is to ensure login from one node/machine in the cluster to the other one without password.

First you need to create private/public keys (on every node in a cluster)

ssh-keygen

and then from each node to copy generated keys to all other nodes in a cluster like in the following example:

ssh-copy-id <username>@node1_IP_address

For example for two node cluster where host names are:

  • node1 172.25.1.1
  • node2 172.25.1.2

from the node1 you can execute:

ssh-keygen 

ssh-copy-id <username>@node2

Then from node2 almost the same (the only difference is @node1 instead of @node2):

ssh-keygen 

ssh-copy-id <username>@node1

Instead of hostname you can also use IP address.

 

You also need to install all packages (MPI and others) like I did on the first node (beginning of article) and then repeat installation of Sombrero tool.

 

After that you need to create hostfile.txt with the same content on each node in the cluster.

 

Here is what you need to do:

username@hostname:~/sombrero>cd ~/sombrero/

username@hostname:~/sombrero> touch hostfile.txt 

 

Content of my hostfile.txt is the following:

vi hostfile.txt 

172.25.1.1
172.25.1.2

As you can observe, I’m using IP address instead of hostname in this example.

 

Last question is how to control how many CPU’s will be used on each node in the cluster?

The answer is very simple: if you want to use 1 CPU on node1 (IP address 172.25.1.1) and two CPU’s from node2 (172.25.1.2), content of my hostfile.txt will look like this:

172.25.1.1
172.25.1.2
172.25.1.2

By repeating IP address of node2, I’m leveraging 2 CPU cores on that node.

Lastly I need to execute the following command to start the cluster test:

jp@jp-Ubuntu1804LTS:~/sombrero>./sombrero.sh -H hostfile.txt -s small
[GEOMETRY] Global size is 32x24x24x24
[GEOMETRY] Proc grid is 1x1x1x1
[GEOMETRY] Global size is 32x24x24x24
[GEOMETRY] Proc grid is 1x1x1x1
[GEOMETRY] Local size is 32x24x24x24
[GEOMETRY] Local size is 32x24x24x24
[GEOMETRY] Global size is 32x24x24x24
[GEOMETRY] Global size is 32x24x24x24
[GEOMETRY] Proc grid is 1x1x1x1
[GEOMETRY] Proc grid is 1x1x1x1
[GEOMETRY] Local size is 32x24x24x24
[GEOMETRY] Local size is 32x24x24x24
[GEOMETRY] Gauge field: size 442368 nbuffer 0 
[GEOMETRY] Spinor field (EO): size 442368 nbuffer 0 
[GEOMETRY] Gauge field: size 442368 nbuffer 0 
[GEOMETRY] Spinor field (EO): size 442368 nbuffer 0 
[GEOMETRY] Gauge field: size 442368 nbuffer 0 
[GEOMETRY] Spinor field (EO): size 442368 nbuffer 0 
[GEOMETRY] Gauge field: size 442368 nbuffer 0 
[GEOMETRY] Spinor field (EO): size 442368 nbuffer 0 
[MAIN] Performing 50 conjugate gradient iterations
[MAIN] Case 1: 147.29e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 1: inf operations per byte
[MAIN] Performing 50 conjugate gradient iterations
[MAIN] Case 1: 147.29e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 1: inf operations per byte
[MAIN] Performing 50 conjugate gradient iterations
[MAIN] Case 1: 147.29e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 1: inf operations per byte
[MAIN] Performing 50 conjugate gradient iterations
[MAIN] Case 1: 147.29e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 1: inf operations per byte
[RESULT] Case 1 147.29 Gflops in 21.236862 seconds
[RESULT] Case 1 6.94 Gflops/seconds
[RESULT] Case 1 147.29 Gflops in 21.353416 seconds
[RESULT] Case 1 6.90 Gflops/seconds
[RESULT] Case 1 147.29 Gflops in 21.709023 seconds
[RESULT] Case 1 6.78 Gflops/seconds
[RESULT] Case 1 147.29 Gflops in 21.991657 seconds
[RESULT] Case 1 6.70 Gflops/seconds
[MAIN] Case 2: 224.23e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 2: inf operations per byte
[MAIN] Case 2: 224.23e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 2: inf operations per byte
[MAIN] Case 2: 224.23e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 2: inf operations per byte
[MAIN] Case 2: 224.23e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 2: inf operations per byte
[RESULT] Case 2 224.23 Gflops in 29.793526 seconds
[RESULT] Case 2 7.53 Gflops/seconds
[RESULT] Case 2 224.23 Gflops in 30.681017 seconds
[RESULT] Case 2 7.31 Gflops/seconds
[RESULT] Case 2 224.23 Gflops in 31.857018 seconds
[RESULT] Case 2 7.04 Gflops/seconds
[RESULT] Case 2 224.23 Gflops in 32.133274 seconds
[RESULT] Case 2 6.98 Gflops/seconds
[MAIN] Case 3: 303.73e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 3: inf operations per byte
[MAIN] Case 3: 303.73e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 3: inf operations per byte
[MAIN] Case 3: 303.73e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 3: inf operations per byte
[MAIN] Case 3: 303.73e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 3: inf operations per byte
[RESULT] Case 3 303.73 Gflops in 37.804207 seconds
[RESULT] Case 3 8.03 Gflops/seconds
[RESULT] Case 3 303.73 Gflops in 38.542305 seconds
[RESULT] Case 3 7.88 Gflops/seconds
[RESULT] Case 3 303.73 Gflops in 42.568699 seconds
[RESULT] Case 3 7.13 Gflops/seconds
[RESULT] Case 3 303.73 Gflops in 42.795631 seconds
[RESULT] Case 3 7.10 Gflops/seconds
[MAIN] Case 4: 515.52e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 4: inf operations per byte
[MAIN] Case 4: 515.52e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 4: inf operations per byte
[MAIN] Case 4: 515.52e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 4: inf operations per byte
[MAIN] Case 4: 515.52e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 4: inf operations per byte
[RESULT] Case 4 515.52 Gflops in 54.308731 seconds
[RESULT] Case 4 9.49 Gflops/seconds
[RESULT] Case 4 515.52 Gflops in 54.376144 seconds
[RESULT] Case 4 9.48 Gflops/seconds
[RESULT] Case 4 515.52 Gflops in 58.882633 seconds
[RESULT] Case 4 8.76 Gflops/seconds
[RESULT] Case 4 515.52 Gflops in 60.108974 seconds
[RESULT] Case 4 8.58 Gflops/seconds
[MAIN] Case 5: 1105.36e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 5: inf operations per byte
[MAIN] Case 5: 1105.36e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 5: inf operations per byte
[MAIN] Case 5: 1105.36e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 5: inf operations per byte
[MAIN] Case 5: 1105.36e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 5: inf operations per byte
[RESULT] Case 5 1105.36 Gflops in 109.629005 seconds
[RESULT] Case 5 10.08 Gflops/seconds
[RESULT] Case 5 1105.36 Gflops in 110.747162 seconds
[RESULT] Case 5 9.98 Gflops/seconds
[RESULT] Case 5 1105.36 Gflops in 122.147095 seconds
[RESULT] Case 5 9.05 Gflops/seconds
[RESULT] Case 5 1105.36 Gflops in 126.025604 seconds
[RESULT] Case 5 8.77 Gflops/seconds
[MAIN] Case 6: 2067.93e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 6: inf operations per byte
[MAIN] Case 6: 2067.93e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 6: inf operations per byte
[MAIN] Case 6: 2067.93e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 6: inf operations per byte
[MAIN] Case 6: 2067.93e9 floating point operations and 0.00e6 bytes communicated
[MAIN] Case 6: inf operations per byte

While running the cluster test, here is the load on the node 1:

node 1 while running a cluster test

while on the next picture you can check the load on the node 2:

node 2 while running a cluster test

 

Summary:

By using various benchmark test you can not only check current server or cluster capabilities and limitations, but you can also perform stress tests which are very useful when testing stability.

You can, for exampe, over burn one node in a cluster and check if node will be evicted.

Sombrero, along with many similar tools, I’ve found very helpful in my daily work, which is the reason behind why I highly recommend it.

 


Comments

There are no comments yet. Why not start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.