Compute Canada

How can I check out performance of my serial, multi-threaded, or MPI code

Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.

The SUN's are equipped with a powerful interface for program development called Sun Studio. If you have the proper shell setup, you can call it by simply typing sunstudio. The program is quite complex, so I can here only outline how to use it for profiling serial and multi-threaded code. An online guide is available at

 file:///opt/SUNWspro/prod/lib/locale/C/html/index.html

on our systems. Other documentation can be found at the Sun Docs Site.

In order to analyze your program with the Sun Studio Tool, you need to compile it with the -g option. After calling

 sunstudio

a GUI will appear. Then click on Analyze on the tool bar, choose File and Collect Experiment, then specify the program on the popup menu. After pressing Run, data from a program run will be collected. After completion, these data will be stored in a file calledtest.1.er and a (hidden) directory called .test.1.er. Now you are ready to have a look at them. Close the sampling collector window and go back to the main sunstudio tool bar. Click on Analyze -> File -> Open Experiment and load test.1.er. You will get an Analyzer window that lets you see the total exclusive and inclusive time spent in various subroutine, the % time used by these, and many more. Try the Metrics and the Callers-Calleeswindows to get more information.

If you do not like GUI's, there is a

 collect

command that lets you produce test.1.er from the command line. Check out the man pages with man collect. And if you prefer a printed report for analyzing the experiment, there is a utility that does that, called

 er_print

also documented in the man pages: man er_print. These come in handy if you do not have a desktop environment available.

This tool lets you analyze where most of the execution time in your program is spent. It can also handle multiple processes which it collects into separate experiments.