How do I use the HWT for timing my code?

Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.

The HWT also offers a simple way to introduce timing into your code to optimize it for execution speed, and - in the case of parallel programs - to determine its scaling properties. This is done using calls to two routines that are part of the HWT library. One of them indicates the beginning of a timing region in the code, the other its end. The second routine also serves to label a region to distinguish it from others.

If a region of code that is contained in a timing region is executed multiple times, timings are added up, i.e. they are cumulative. The HWT also keeps track of how many times the region was executed. Here is how a timing experiment can be performed:

  1. The user inserts calls to timing routines into the code, bracketing the regions that are to be timed. These calls are usually placed within pre-processor constructs to restrict them to specific timing versions.
  2. All versions that are to be included in the timing experiment are executed. This might include a serial version, and multiple runs of a parallel version, differing by the number of processors employed. This will produce several intermediate files with timing information.
  3. Executing the script call.cputimer.hwt will retrieve the timing information and compare it. A report in table format will be printed, including CPU times and speedups (usually with respect to a serial or one-processor run). For multiple-processor runs this information is printed separately for each processor.

Details about the usage of the timing routines may be found in the HWT manual.