Colibri cluster performance

From CCM
Jump to: navigation, search

Contents

February 2015 - Rocks+ 6.5

Bandwidth

[jmandel@colibri ~]$ mpirun --host compute-0-0,compute-0-1 /usr/mpi/gcc/openmpi-1.7.4/tests/osu-micro-benchmarks-4.0.1/osu_bw
--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  compute-0-0

This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
# OSU MPI Bandwidth Test v4.0.1
# Size      Bandwidth (MB/s)
1                       1.24
2                       6.29
4                      12.44
8                      24.13
16                     47.23
32                     95.48
64                    182.13
128                   334.92
256                   698.72
512                   595.77
1024                 2403.20
2048                 3004.79
4096                 3042.87
8192                 3355.85
16384                3596.55
32768                3739.60
65536                3821.52
131072               3851.26
262144               3856.92
524288               3868.64
1048576              3872.24
2097152              3854.34
4194304              3856.61
[jmandel@colibri ~]$ mpirun --host compute-0-0,compute-0-1 /usr/mpi/gcc/openmpi-1.7.4/tests/osu-micro-benchmarks-4.0.1/osu_bibw
--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  compute-0-0

This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
# OSU MPI Bi-Directional Bandwidth Test v4.0.1
# Size    Bi-Bandwidth (MB/s)
1                       2.01
2                       7.98
4                      15.90
8                      31.16
16                     61.65
32                    119.66
64                    219.33
128                   411.86
256                   918.10
512                  1723.26
1024                 2443.14
2048                 4492.57
4096                 5021.86
8192                 5238.95
16384                5977.23
32768                6848.94
65536                7175.58
131072               7356.02
262144               7464.34
524288               7508.10
1048576              7516.34
2097152              7534.14
4194304              7530.98
[jmandel@colibri ~]$ mpirun -np 12  -machinefile machines /usr/mpi/gcc/openmpi-1.7.4/tests/osu-micro-benchmarks-4.0.1/osu_mbw_mr 
--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  compute-0-0

This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
Warning: Conflicting CPU frequencies detected, using: 2600.000000.
Warning: Conflicting CPU frequencies detected, using: 2600.000000.
Warning: Conflicting CPU frequencies detected, using: 2600.000000.
Warning: Conflicting CPU frequencies detected, using: 2600.000000.
Warning: Conflicting CPU frequencies detected, using: 2600.000000.
Warning: Conflicting CPU frequencies detected, using: 2600.000000.
Warning: Conflicting CPU frequencies detected, using: 2600.000000.
# OSU MPI Multiple Bandwidth / Message Rate Test v4.0.1
# [ pairs: 6 ] [ window size: 64 ]
# Size                  MB/s        Messages/s
0                      19.48       19479768.78
1                      18.81       18805282.46
2                      37.62       18809719.87
4                      75.15       18788283.53
8                     147.29       18411447.05
16                    286.89       17930735.38
32                    570.93       17841666.22
64                   1115.89       17435777.11
128                  2081.86       16264520.75
256                  4306.27       16821356.29
512                  8143.78       15905821.11
1024                14546.93       14205982.70
2048                18181.13        8877506.62
4096                18351.66        4480386.60
8192                20493.43        2501639.34
16384               21823.82        1332020.53
32768               22535.26         687721.59
65536               22919.93         349730.41

Latency

[jmandel@colibri ~]$ mpirun -np 2 -machinefile machines  /usr/mpi/gcc/openmpi-1.7.4/tests/osu-micro-benchmarks-4.0.1/osu_latency
--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  compute-0-0

This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
# OSU MPI Latency Test v4.0.1
# Size          Latency (us)
0                       1.52
1                       1.20
2                       1.20
4                       1.20
8                       1.21
16                      1.23
32                      1.28
64                      1.30
128                     1.45
256                     1.79
512                     2.03
1024                    2.60
2048                    3.46
4096                    4.69
8192                    6.53
16384                   8.67
32768                  12.96
65536                  21.46
131072                 38.43
262144                 72.24
524288                141.38
1048576               278.71
2097152               548.96
4194304              1091.15
[jmandel@colibri ~]$ mpirun -np 2 -machinefile machines  /usr/mpi/gcc/openmpi-1.7.4/tests/osu-micro-benchmarks-4.0.1/osu_latency
--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  compute-0-0

This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
# OSU MPI Latency Test v4.0.1
# Size          Latency (us)
0                       1.52
1                       1.20
2                       1.20
4                       1.20
8                       1.21
16                      1.23
32                      1.28
64                      1.30
128                     1.45
256                     1.79
512                     2.03
1024                    2.60
2048                    3.46
4096                    4.69
8192                    6.53
16384                   8.67
32768                  12.96
65536                  21.46
131072                 38.43
262144                 72.24
524288                141.38
1048576               278.71
2097152               548.96
4194304              1091.15

RAID

Note: front end (colibri) has 32GB memory


Note: compute nodes have 64GB memory


September 2013 - Original HP cluster software

Infiniband node names

Finding Infiniband node names from /etc/hosts and putting them in ~/hostfile

[jmandel@node0 ~]$ cat  /home/jmandel/hostfile 
icnode1
icnode2
icnode3
icnode4
icnode5
icnode6
icnode7
icnode8
icnode9
icnode10
icnode11
icnode12
icnode13
icnode14
icnode15
icnode16
icnode17
icnode18
icnode19
icnode20
icnode21
icnode22
icnode23

Basic testing of OpenMPI

Navigate to the test binaries that came with OpenMPI and try to run over Infiniband.

[jmandel@node0 ~]$ which mpif90
/usr/mpi/gcc/openmpi-1.4.3/bin/mpif90
[jmandel@node0 osu_benchmarks-3.1.1]$ pwd
/usr/mpi/gcc/openmpi-1.4.3/tests/osu_benchmarks-3.1.1
[jmandel@node0 osu_benchmarks-3.1.1]$ ls
osu_alltoall  osu_bcast  osu_bibw  osu_bw  osu_latency  osu_mbw_mr  osu_multi_lat
[jmandel@node0 osu_benchmarks-3.1.1]$ mpirun -np 2 -hostfile ~/hostfile ./osu_bw
# OSU MPI Bandwidth Test v3.1.1
# Size        Bandwidth (MB/s)
1                         2.89
2                         5.78
4                        11.66
8                        23.32
16                       42.63
32                       91.88
64                      181.01
128                     383.14
256                     629.41
512                    1360.78
1024                   2178.80
2048                   2970.05
4096                   3330.94
8192                   3505.54
16384                  3608.95
32768                  3736.58
65536                  3782.25
131072                 3851.60
262144                 3862.69
524288                 3871.47
1048576                3853.88
2097152                3873.32
4194304                3881.26

That is node-to-node bandwidth 3.8 GB/s = 31Gb/s. We are running over Infiniband at a good fraction of the nominal Infiniband speed 40Gb/s.

Compiling OpenMPI tests

wget http://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Benchmarks/July12/osu-micro-benchmarks-3.8-July12.tar
tar xvf osu-micro-benchmarks-3.8-July12.tar
cd osu-micro-benchmarks-3.8-July12
./configure --prefix=/home/jmandel/bin
make
make install
cd /home/jmandel/bin/libexec/osu-micro-benchmarks/mpi/one-sided

The performance with compiled test is more like what you can see in an application you build yourself.

Testing bandwidth

# OSU MPI One Sided MPI_Get Bandwidth Test v3.8
# Size      Bandwidth (MB/s)
1                       0.50
2                       1.10
4                       2.22
8                       4.20
16                      8.52
32                     17.22
64                     33.21
128                    66.17
256                   126.40
512                   243.27
1024                  457.08
2048                  719.16
4096                 1292.48
8192                 1841.02
16384                2440.11
32768                3071.98
65536                3479.19
131072               3760.49
262144               3745.98
524288               3627.40
1048576              3644.69
2097152              3650.69
4194304              3650.75

A bit lower performance on the bandwidth test.

[jmandel@node0 one-sided]$ mpirun -np 2 -hostfile /home/jmandel/hostfile osu_put_bw 
# OSU One Sided MPI_Put Bandwidth Test v3.8
# Size      Bandwidth (MB/s)
1                       1.00
2                       2.14
4                       4.27
8                       8.45
16                     16.75
32                     32.53
64                     64.44
128                   132.72
256                   260.64
512                   503.96
1024                  944.86
2048                 1550.79
4096                 2586.23
8192                 3269.05
16384                1987.19
32768                3372.39
65536                3658.43
131072               3793.82
262144               3828.42
524288               3676.07
1048576              3671.66
2097152              3685.25
4194304              3677.46

Well, the put test was not much better. Perhaps the vendor test were better optimized than stock version from source.

Testing latency

[jmandel@node0 one-sided]$ mpirun -np 2 -hostfile /home/jmandel/hostfile osu_get_latency 
# OSU One Sided MPI_Get latency Test v3.8
# Size          Latency (us)
0                       2.22
1                       5.65
2                       5.67
4                       5.67
8                       5.71
16                      5.73
32                      5.80
64                      5.99
128                     6.67
256                     6.89
512                     7.23
1024                    7.79
2048                    8.99
4096                   10.20
8192                   12.37
16384                  15.95
32768                  19.07
65536                  27.45
131072                 44.58
262144                 78.84
524288                146.95
1048576               284.38
2097152               557.04
4194304              1104.83
[jmandel@node0 one-sided]$ mpirun -np 2 -hostfile /home/jmandel/hostfile osu_put_latency 
# OSU One Sided MPI_Put latency Test v3.8
# Size          Latency (us)
0                       2.19
1                       3.48
2                       3.47
4                       3.50
8                       3.53
16                      3.57
32                      3.57
64                      3.50
128                     3.62
256                     3.82
512                     4.03
1024                    4.58
2048                    5.66
4096                    6.67
8192                    8.66
16384                  13.99
32768                  18.13
65536                  26.50
131072                 43.57
262144                 80.62
524288                148.26
1048576               281.41
2097152               554.38
4194304              1105.51

Latency is in microseconds (us) not miliseconds (ms).

Tests on all nodes

[jmandel@node0 collective]$ pwd
/home/jmandel/bin/libexec/osu-micro-benchmarks/mpi/collective
[jmandel@node0 collective]$ mpirun -np 24 -hostfile /home/jmandel/hostfile osu_alltoallv
# OSU MPI All-to-Allv Personalized Exchange Latency Test v3.8
# Size       Avg Latency(us)
1                      30.96
2                      38.38
4                      40.51
8                      24.38
16                     22.99
32                     33.43
64                     50.75
128                    16.13
256                    18.01
512                   247.67
^Cmpirun: killing job...
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox