How do I optimize for specific machines at HPCVL ?

Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.

The Studio compilers provide a variety of optimization options. Some of then are applicable to all computers with a Solaris/Sparc platform. Others are specific to a chip and/or architecture.

The most commonly used general optimization flag is the -xOn option. nis a number from 1 to 5 with increasing severity of alterations made to the code, but also increasing gain. Up to -xO3 is generally rather safe to use. Parallelization options require an optimization level of 3, and enforce that level if it is not explicitly specified.

For a specific optimization, the most popular option is -fast. This is a "macro" containing an array of optimization flags that is quite safe to use and often improves performance substantially. The optimization flags used are:

-xvector=lib (Fortran)
-dalign (Fortran)
-xmemalign=8s (C/C++)
-ftrap=common (Fortran)
-ftrap=0one (C/C++)
-fround=nearest (Fortran)
-xbuiltin=%all (C/C++)
-fsingle (C)
-xalias_level=basic (C)

Some of these options apply at the compile stage, others are passed to the linker. Many of them are specific to Fortran or C/C++. The macro option -fast should be specified at both the compile and link stage if these are done separately. Details about the effect of the sub-options can be found in the man pages.

Because of the -xarch, -xcache, and -xchip options implied in -fast, the latter is specific (via the native setting) to the platform on which the compilation takes place. Usually, you will compile your code on the login node, and the resulting executables will therefore be optimized for the UltraSparc-IV+ architecture of that node.

If you wish to optimize for another type of node, you can override the -xarch, -xcache, and -xchip settings explicitely. Keep in mind that overriding happens from left to right, so if you specify -fast and add an -xarch statement to the right, this will replace the implied -xarch=native setting. For the three major Solaris/Sparc platforms we are using at HPCVL, the settings are:

  • Sunfire (US-IV+) platform, "SF 25K cluster", production.q:
    -xarch=sparcvis2 -xcache=64/32/4:2048/64/4:32768/64/4 -xchip=ultra4plus
    Note that this platform is shared by the login node and the Sunfire cluster, which means that if you compile on the login node, the explicit specification of these flags is not necessary. If needed, the environment variable SFFLAGS is set to the above options.
  • M9000 (Sparc64-VII+) platform, "M9K cluster", m9k.q:
    -xarch=sparcima -xcache=64/64/2:6144/256/12 -xchip=sparc64vii
    Because our current default cluster is of this architecture, we recommend to use these settings to override the ones from -fast if your code is used mostly on the M9000 servers. If needed, the environment variable M9KFLAGS is set to the above options.
  • Niagara-2 (UltraT2+) platform, "Victoria Falls cluster", vf.q:
    -xarch=sparcvis2 -xcache=8/16/4:4096/64/16 -xchip=ultraT2plus
    Because the "Victoria Falls" cluster is used for specific purposes, we recommend to to use these settings only if the code should be specifically optimized for this cluster. The -xarch option does not have to be specified as it is the same as on the login node. If needed, the environment variable VFFLAGS is set to the above options.

For each of these architectures, we have provided environment variables on our systems, so that specific optimization becomes easier. For instance, to optimize for the Victoria Falls cluster specifically, the VFFLAGS variable can be used:

cc -fast $(VFFLAGS) test.c

In our experience, code that is compiled on the US-IV+ login node with -fast and no additional options performs well on all three of our platforms.