Compute Canada

How do I run MPI programs on HPCVL computers?

Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.

To run MPI programs, a special Runtime Environment is required. This differs from platform to platform. On our Sun machines, which run Solaris, it is called ClusterToolsor CT. This includes commands for the control of multi-process jobs. The most important ones are:

  1. mpinfo (only CT6) gives general information about the cluster or machine. The options -N and -P are commonly used to obtain more details about the nodesand partitions of a cluster, respectively. (Nodes are physical machines in a cluster, partitions are logical groups of CPUs).
  2. mprun (CT6) and mpirun (CT7 and higher) are used to start a multi-process run of a program. They are required to run MPI programs. The most commonly used command line option is -np to specify the number of processes to be started on the default partition. Note that CT7 understands the -n option as well, but CT6 does not. For instance, the following line will start the programtest_mpi.exe with 9 processes:
    mprun -np 9 test_mpi.exe [ClusterTools 6] 
    mpirun -n 9 test_mpi.exe [ClusterTools 8.1]
  3. mpps (only CT6) is the multi-process equivalent to the Unix command ps, and used to produce a list of multi-process jobs. Without options, only the jobs of the current user that were started from the current shell, and are running in the default partition are shown. (A partition is a logical group of processors, defined by the system). Option available include -e to show all jobs, -f to give a longer description including all processes, starting times, etc., -A to include all partitions.
  4. mpkill (only CT6), followed by the JID (job-ID, determined through an mppscall) terminates a multi-process job and sends a "kill signal" to all its processes. The option -9 will force the termination (just as it does in the Unix killcommand).

The mprun and mpirun commands offer additional options that are sometimes useful or required. Most tend to interfere with the scheduling of jobs in a multi-user environment such as ours and should be used with caution. Please consult the man pages for details.

Finally, the -x sge option (only CT6) is used to indicate that the node list for the processes is supplied by the Gridengine scheduling software. This option is used when MPI programs are executed via Gridengine, and appears normally inside of Gridengine submission scripts. If -x sge is specified, the number of processes does not have to be specified anymore, since it will be determined by the Gridengine. A command line looks like this:

mprun -x sge program.exe

Note that the usage of Gridengine is mandatory for production jobs on our system. This option is therefore used frequently. For a details about Gridengine and jobs submission on HPCVL machines and clusters, go here.

Note that the usage of the -x sge option is not supported if you use ClusterTools 8, as the mpirun command detects usage through Grid Engine automatically. In this case, the -n option is not used either, bringing the command line down to a simple

mpirun program.exe