Compute Canada

How do I force multi-thread parallelization? How to use compiler directives?

Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.

The compiler will be very conservative about multithreading loops automatically. If there is the slightest possibility of data dependencies, it will refuse to do it if -xautopar is used. Function calls within loops, if statements that depend on variables which change in the loop, and many other features will be considered "dangerous" and inhibit parallelization. The reason is that such features have a potential to make the result dependent on the order in which the loop iterations are carried out, and therefore go against a parallel execution.

However, often you know more than the compiler. You might be certain that a function call does not alter the value of variables that are shared with other loop iterations. If this is the case, there is ways to tell the compiler to parallelize anyhow. This is done viacompiler directives that look like comments, but if compiled with the proper flags, will guide the compiler in parallelizing the code. The most common one a OpenMP compiler directives. Here is an example in Fortran:

!$OMP PARALLEL DO PRIVATE(a)
do i = 1, n
a(1) = b(i)
do j = 2, n 
a(j) = a(j-1) + b(j) * c(j) 
end do 
x(i) = f(a)
end do

and in C:

#pragma omp parallel for private(a,j) 
for (i=1; i<n+1; i++){ 
a[1] = b[i];
for (j=2; j<n+1; j++){
a[j] =a[j-1] + b[j] * c[j];
}
x[i] = f(a)

The initial "!" in the first line of this Fortran segment causes that line to be interpreted as a comment, unless this is compiled with the compiler flag -xopenmp. In this case, the first line tells the compiler to parallelize the loop directly following it. The private declaration causes a separate copy of the array to be used for each parallel thread (i.e. the array "a" is used as a private variable).

Some commonly used compiler flags for this approach are:

  • -xopenmp includes all necessary flags for usage of OpenMP compiler directives. It includes several other flags (see man pages). This is the most commonly used multi-threading flag if you are doing explicit (as opposed to automatic) parallelization. Others are only occasionally used.
  • -vpara verbose output about dependencies in the explicitely parallelized loops.
  • -xloopinfo messages are issued about the parallelization of loops.
  • -xstackvar allocate private variables on the stack. This option is implied by -xopenmp
  • -xopenmp=noopt turns of the automatic increase in optimization level (to -xO3) implied in -xopenmp.

Because OpenMP platform-independent compiler directives are the standard, the use of older directives, while supported, is strongly discouraged.

A separate OpenMP FAQ is available that contains more information about this programming technique.