Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.
This is a short introduction into how to carry over code from a serial programming environment to the multi-processor systems used at HPCVL. It is meant to give the user a basic idea of what to do to get the code running on several processors. The document is organized in an "FAQ" manner, i.e. a list of "obvious" questions is presented as a guideline. Please feel free to contact us if you want to see more questions included.
No. At the very least, you will have to recompile it with "parallel options" and to set a few environment variables. For most code, that will not be enough either. Fortunately, in many cases, it is not difficult to get the compiler to produce code that will show some performance gain from multi-threading.
Here are 4 steps that should be considered to "parallelize" your code:
The compilers running on HPCVL clusters are discussed in our Compiler FAQ. They have options that cause it to attempt to parallelize loops that have no dependencies by multi-threading them. The compiler flags to get this done are
This will only work if the loops to be parallelized do not have any dependencies. Since the compiler is very conservative, even simple function calls from inside a loop cause it to reject auto-parallelization. This is because function calls could hide access to global variables (COMMON blocks or modules in Fortran) that establish dependencies. The result is that auto-parallelization often is not an option.
The compiler will be very conservative about multithreading loops automatically. If there is the slightest possibility of data dependencies, it will refuse to do it if -xautopar is used. Function calls within loops, if statements that depend on variables which change in the loop, and many other features will be considered "dangerous" and inhibit parallelization. The reason is that such features have a potential to make the result dependent on the order in which the loop iterations are carried out, and therefore go against a parallel execution.
However, often you know more than the compiler. You might be certain that a function call does not alter the value of variables that are shared with other loop iterations. If this is the case, there is ways to tell the compiler to parallelize anyhow. This is done viacompiler directives that look like comments, but if compiled with the proper flags, will guide the compiler in parallelizing the code. The most common one a OpenMP compiler directives. Here is an example in Fortran:
!$OMP PARALLEL DO PRIVATE(a)
do i = 1, n
a(1) = b(i)
do j = 2, n
a(j) = a(j-1) + b(j) * c(j)
end do
x(i) = f(a)
end do
and in C:
#pragma omp parallel for private(a,j)
for (i=1; i<n+1; i++){
a[1] = b[i];
for (j=2; j<n+1; j++){
a[j] =a[j-1] + b[j] * c[j];
}
x[i] = f(a)
}
The initial "!" in the first line of this Fortran segment causes that line to be interpreted as a comment, unless this is compiled with the compiler flag -xopenmp. In this case, the first line tells the compiler to parallelize the loop directly following it. The private declaration causes a separate copy of the array to be used for each parallel thread (i.e. the array "a" is used as a private variable).
Some commonly used compiler flags for this approach are:
Because OpenMP platform-independent compiler directives are the standard, the use of older directives, while supported, is strongly discouraged.
A separate OpenMP FAQ is available that contains more information about this programming technique.
Sometimes it is necessary to re-write the code in a parallel fashion, so that it can be executed on several separate processors, or indeed machines, separately. For this, it is necessary to establish some communication between the processes, and this is usually done by some form of message passing. A platform independent standard for this is a set of almost 300 routines, available in Fortran, C and C++, that comprise the MPI (Message Passing Interface) standard. Using these routines requires a little rethinking of the code structure, but is in reasonably simple and effective in many cases.
MPI is best used if your code has a good potential to employ many processors independently with none sitting idle. It is also advantageous to have only relatively little communication being necessary between processes. Examples are numerical integration (where independent evaluations of the integrant can be done separately), Monte-Carlo methods, finite-difference and finite-element methods (if the problem can be divided up into blocks of equal size with minimal communication). MPI requires some serious re-coding in some cases, but with a relatively small number of routines, great scaling can be achieved.
A very simple example of how to parallelize code with MPI is given in the monte.f Fortran program.
Only a few MPI commands are necessary to parallelize this Monte-Carlo calculation of pi. The first
call MPI_INIT(ierr)
sets up the MPI system and has to be called in any MPI program. The next two
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, np, ierr)
are used to determine the "rank", i.e. number of the presently running process, and the total number of processes running (size). The identifier MPI_COMM_WORLD is used to label a group of processes assigned to this task, called a "communicator". With
call MPI_REDUCE(pi,pisum,1,MPI_DOUBLE_PRECISION,&
MPI_SUM,0,MPI_COMM_WORLD,ierr)
the partial sums (pi) from the different processes are summed up (reduced) into the total (pisum). This is done simultaneously with the gathering of the results from the processes, and is called "reduction". Finally,
call MPI_FINALIZE(ierr)
closes the MPI system.
To get an idea of how to use MPIand what the various routines do, check out the MPI workshop at the Maui HPC Centre site. For a list of routines in the MPIstandard, and a reference manual of their usage, go to the Sun Documentation Website and search for theSun MPI Programming and Reference Guide .
We offer a separate MPI FAQ with more information about this system.
Although the MPI standard comprises hundreds of routines, you can write very stable and scalable code with only a dozen or so routines. In fact, often the simpler you keep it the better it will work.
To use MPI on our clusters, you will have to do the following things:
#include <mpi.h>This is important for the definition of variables and constants that are used by theMPI system.
-I/opt/SUNWhpc/include -L/opt/SUNWhpc/lib -R/opt/SUNWhpc/lib -lmpiThese tell the compiler, linker and runtime environment where to look for include files, static libraries and runtime dynamic libraries. The command -lmpi loads theMPI library.
tmf90, tmcc, or tmCCmacros for Fortran, C, and C++, respectively, instead of the standard compilers/linkers. These will automatically call the right flags. It also implies usage of the -lmpi library flag.
mpirun [options]where options specify the parameters of the run.
The mpirun command is part of the ClusterTools programming environment, and is necessary to run MPI programs and allocate the separate processes across the multi-processor system. The setup for ClusterTools is part of the default on our cluster. The/opt/SUNWhpc/bin directory must be in your PATH (which it is for the default environment).
mpirun lets you specify the number of processors, e.g.
mpirun -np 4 test_par
runs the MPI program test_par on 4 processors. There is a myriad of other options for this command, many of which are concerned with details of process allocation that are automatically handled by the system on HPCVL clusters, and do therefore not have to concern the user.
For help on ClusterTools, consult Sun's Documentation Site and search for HPC Cluster Tools User's Guide.
All of these things are documented at http://docs.sun.com , but the mass of information on that site makes it a bit difficult to know where to look. Try using the search engine.
If you have questions that you can't resolve by checking documentation, you can Contact us. We have several user support people who can help you with code migration to the parallel environment of the HPCVL facilities. If you want to start a larger project that involves making code executable on parallel machines, they might be able to help you. Keep in mind that we support many people at any given time, so we cannot do the coding for you. But we can do our best to help you get your code ready for multi-processor machines.
Of course, some programs are inherently non-parallel, and trying to make them scalable might be too much effort to be worth it. In that case, the best one can do is try to improve the serial performance by adopting the code to modern computer architecture. The performance enhancement that can be achieved is sometimes quite amazing. It seems, however, that most programs have a good potential to be executed in parallel, and a little effort in that direction often goes a long way.