How do I "parallelize" my code?

Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.

Here are 4 steps that should be considered to "parallelize" your code:

  1. Optimize the serial version as much as you can. Try to make it as "simple" as possible, avoiding nested loops and loops with dependencies, i.e. where the operations inside one iteration depend on the results from a previous one. Dependencies may be hidden in function calls or by reference to global variables or COMMON's. Often, a program spends most of the execution time in a few loops. Those are candidates for parallel performance. Try to find them (e.g. by running analyzer software, such as is available inside the "sunstudio" development tools, or by explicitely inserting timing routines like etime() into the code). Focus on the simplification of those loops.
  2. Use auto-parallelization flags of the compiler (see section 3)
  3. Force multi-threading via OpenMP compiler directives (see section 4)
  4. If the above approach does not work, or you need to deploy the resulting parallel on a cluster, use MPI routines to run separate processes that communicate with each other (see section 5). This usually requires a "from-scratch" approach.