Can you give me an example of MPI?

Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.

The working principle of MPI is perhaps best illustrated on the grounds of a programming example. The following program, written in Fortran 90 computes the sum of all square-roots of integers from 0 up to a specific limit m:


 module mpi 
include 'mpif.h'
end module mpi 
 module cpuids 
integer::myid,totps, ierr
end module cpuids 
 program example02 
use mpi
use cpuids
call mpiinit
call demo02
call mpi_finalize(ierr)
 subroutine mpiinit 
use mpi
use cpuids
call mpi_init( ierr )
call mpi_comm_rank(mpi_comm_world,myid,ierr)
call mpi_comm_size(mpi_comm_world,totps,ierr)
 subroutine demo02 
use mpi
use cpuids
integer:: m, i
real*8 :: s, mys
if(myid.eq.0) then
write(*,*)'how many terms?'
read(*,*) m
end if
call mpi_bcast(m,1,mpi_integer,0,mpi_comm_world,ierr)
do i=myid,m,totps
end do
write(*,*)'rank:', myid,'mys=',mys, ' m:',m
call mpi_reduce(mys,s,1,mpi_real8,mpi_sum,0,mpi_comm_world,ierr)
if(myid.eq.0) then
write(*,*)'total sum: ', s
end if


Some of the common tasks that need to be performed in every MPI program are done in the subroutine mpiinit in this program. Namely, we need to call the routine mpi_initto prepare the usage of MPI. This has to be done before any other MPI routine is called. The two routine calls to mpi_comm_size and call mpi_comm_rank determine how many processes are running and what is the unique ID number of the present, i.e. the calling process. Both pieces of information are essential. The results are stored in the variables totps and myid, respectively. Note that these variables appear in a module cpuids so that they may be accessed from all routines that "use" that module.

The main work in the example is done in the subroutine demo02. Note that this routine does use the module cpuids. The first operation is to determine the maximum integerm in the sum by requesting input from the user. The if-clause "if(myid.eq.0) then" serves to restrict this I/O operation to only one process, the so-called "root process", usually chosen to be the one with rank (i.e. unique ID number) zero.

After this initial operation, communication has become necessary, since only one process has the right value of m. This is done by a call to the MPI collective operation routine mpi_bcast. This call has the effect of "broadcasting" the integer m. This call needs to be made by all processes, and after they have done so, all of them know m.

The sum over the square root is then executed on each process in a slightly different manner. Each term is added to a local variable mys. A stride of totps (the number of processes) in the do-loop ensures that each process adds different terms to its local sum, by skipping all others. For instance, if there are 10 processes, process 0 will add the square-roots of 0,10,20,30,..., while process 7 will add the square-roots of 7,17,27,37,...

After the sums have been completed, further communication is necessary, since each process only has computed a partial, local sum. We need to collect these local sums into one total, and we do so by calling mpi_reduce. The effect of this call is to "reduce" a value local to each process to a variable that is local to only one process, usually theroot process. We can do this in various ways, but in our case we choose to sum the values up by specifying mpi_sum in the function call. Afterwards, the total sum resides in the variable s, which is printed out by the root process.

The last operation done in our example is finalizing MPI usage by a call tompi_finalize, which is necessary for proper program completion.

In this simple example, we have distributed the tasks of computing many square roots among processes, each of which only did a part of the work. We used communication to exchange information about the tasks that needed to be performed, and to collect results. This mode of programming is called "task parallel". Often it is necessary to distribute large amounts of data among processes as well, leading to "data parallel" programs. Of course, the distinction is not always clear.