How can I use multiple threads to get parallel performance out of my serial code?

Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.

The compilers running on HPCVL clusters are discussed in our Compiler FAQ. They have options that cause it to attempt to parallelize loops that have no dependencies by multi-threading them. The compiler flags to get this done are

  • -xautopar identifies loops that are obviously non-dependent and creates multithreaded code for them
  • -xreduction reduces variables inside a loop into a single value, for example by summing over them
  • -xloopinfo shows which loops were parallelized, and which not (and why)
  • -stackvar Necessary. Allocates local variables on the stack.

This will only work if the loops to be parallelized do not have any dependencies. Since the compiler is very conservative, even simple function calls from inside a loop cause it to reject auto-parallelization. This is because function calls could hide access to global variables (COMMON blocks or modules in Fortran) that establish dependencies. The result is that auto-parallelization often is not an option.