[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
6.7.3 An improved replacement for MPI_Alltoall
We close this section by noting that FFTW’s MPI transpose routines can
be thought of as a generalization for the MPI_Alltoall
(albeit only for floating-point types), and in some circumstances can
function as an improved replacement.
is defined by the MPI standard as:
int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, MPI_Comm comm);
In particular, for double*
arrays in
and out
consider the call:
MPI_Alltoall(in, howmany, MPI_DOUBLE, out, howmany MPI_DOUBLE, comm);
This is completely equivalent to:
MPI_Comm_size(comm, &P); plan = fftw_mpi_plan_many_transpose(P, P, howmany, 1, 1, in, out, comm, FFTW_ESTIMATE); fftw_execute(plan); fftw_destroy_plan(plan);
That is, computing a P × P transpose on P
with a block size of 1, is just a standard all-to-all communication.
However, using the FFTW routine instead of MPI_Alltoall
have certain advantages. First of all, FFTW’s routine can operate
in-place (in == out
) whereas MPI_Alltoall
can only
operate out-of-place.
Second, even for out-of-place plans, FFTW’s routine may be faster,
especially if you need to perform the all-to-all communication many
times and can afford to use FFTW_MEASURE
. It should certainly be no slower, not including
the time to create the plan, since one of the possible algorithms that
FFTW uses for an out-of-place transpose is simply to call
. However, FFTW also considers several other
possible algorithms that, depending on your MPI implementation and
your hardware, may be faster.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on March 17, 2014 using texi2html 5.0.