Compute Canada

Sample Code for Double Master-Slave Models Combining MPI and OpenMP

The nodes of computer clusters such as our Victoria-Falls cluster are often multi-core and each core may support multiple threads. This suggests a two-layer approach when programming new codes or adapting existing ones. To optimally exploit the distributed-memory structure of the cluster while at the same time keeping each shared-memory node busy, a combined MPI-OpenMP programming approach may be taken.

Here we present sample-code that implements two-layered master-slaves models that use MPI to communicate between a master node and multiple slave nodes, while additionally allocating work dynamically within the nodes using OpenMP. The current version of the code is written in C. In this example, the workload consists of the prime factorization of integers.

In all of these models, the master MPI process M bundles basic jobs into job groups and sends these to the slave processes S for execution. When any slave process runs out of work it sends a message to M to obtain more work. Each of the slave processes further spawns a number of OpenMP threads to help with working through the job groups. This is done in a dynamic manner, i.e. slave-threads work on one job from the current job group at a time and acquire more when they are done. When they run out of jobs in the group, one of them asks the master for another group through MPI. Six different model variations were implemented:

  1. In the simplest version of the model (Plan 1 in the code), the slaves ask for more work from outside the local parallel OpenMP region when all the work is completed. The work itself is of course done inside of the parallel region. The master runs in serial mode, i.e. it does not involve any OpenMP directives.
  2. This (Plan 2 in the code) is a similar approach, but here the master is also multithreaded. Only the main thread of M communicates via MPI with the slaves S. The other master-threads do additional pre-allocated work.
  3. Here (Plan 5 in the code), both master and slaves are multithreaded and work from inside the OMP region. The main threads of the slaves S communicate with the main thread of M anytime they run out of work, not waiting for the others. The OpenMP threads on the master M do pre-allocated work.
  4. This (Plan 6 in the code) is the same same as the previous one, but the threads on the master may ask for work dynamically through OpenMP from the main thread.
  5. This model (Plan 11 in the code) is similar to the previous two, but any of the S threads can ask for new work from the main thread on M when they run out. All others continue to execute jobs from the previous or the new job group. Work for the other threads on the master is pre-allocated.
  6. This (Plan 12 in the code) is the same as the previous one, but the work for the master-threads is dynamically allocated.

Note that the MPI communication in these models has to be protected by a critical region to prevent a race condition on shared communication resources and messages. Therefore any communication event is between one thread on the master and one thread on the slave, albeit a potentially different one each time. MPI communication between different threads on the same process does not take place. The sample code can be downloaded by clicking here. It includes source code, an input file, and a script for running it on our systems. Alterations to the script may be necessary. The code works through all six model variations and reports success or failure. We make this available to our users (and anyone else) under the condition that using parts of the code will be acknowledged. We hope that this code supplies you with a framework to adapt for your own programs, particularly if you plan to deploy them on a cluster with multi-core/multi-threaded nodes.