Opened 2 years ago

Last modified 4 weeks ago

#1769 assigned bug

Datatypes created by MPI_Type_create_f90_real not usable in MPI operations

Reported by: klimach Owned by: gropp
Priority: major Milestone: mpich-3.2
Component: mpich Keywords: Fortran MPI_Type_create_f90_real
Cc:

Description (last modified by balaji)

I installed MPICH2 3.0rc1 using GCC 4.7.2 and found a problem with MPI_Type_create_f90_real.
It is very similar to the problem I reported in OpenMPI under: https://svn.open-mpi.org/trac/ompi/ticket/3432

Running the following program:

program test_mpi
use mpi
implicit none

integer, parameter :: rk_prec = 15
integer, parameter :: rk = selected_real_kind(rk_prec)

integer :: rk_mpi
integer :: iError
real(kind=rk) :: a_real
real(kind=rk) :: res

call MPI_Init(iError)
call mpi_type_create_f90_real(rk_prec, MPI_UNDEFINED, rk_mpi, iError)
write(*,*) 'MPI_REAL8:', MPI_REAL8
write(*,*) 'MPI_DOUBLE_PRECISION:', MPI_DOUBLE_PRECISION
write(*,*) 'type_create_f90_real:', rk_mpi
a_real = 1.0
call MPI_Reduce(a_real, res, 1, rk_mpi, MPI_MIN, 0, MPI_COMM_WORLD, iError)
call MPI_Finalize(iError)

end program test_mpi

Results in the following output and error:

MPI_REAL8:  1275070505
MPI_DOUBLE_PRECISION:  1275070495
type_create_f90_real: -1946157049
Fatal error in PMPI_Reduce: Invalid MPI_Op, error stack:
PMPI_Reduce(1217)........: MPI_Reduce(sbuf=0x7fff43637108, rbuf=0x7fff436370f8, count=1, dtype=USER<f90_real>, MPI_MIN, root=0, MPI_COMM_WORLD) failed
MPIR_MINF_check_dtype(71): MPI_Op MPI_MIN operation not defined for this datatype

Change History (8)

comment:1 Changed 2 years ago by balaji

  • Milestone changed from mpich-3.0 to mpich-3.0.2

comment:2 Changed 2 years ago by balaji

  • Owner set to gropp
  • Status changed from new to assigned

comment:3 Changed 19 months ago by balaji

  • Description modified (diff)
  • Milestone changed from mpich-3.1 to mpich-3.1.1

comment:4 Changed 16 months ago by gropp

Awaiting instructions on how to submit for review.

comment:5 Changed 13 months ago by gropp

More problems. It turns out that some of the available floating point types in Fortran have no corresponding C type - for example, on my Mac, long double is 16 bytes but only implements the 10 byte, 80-bit extended double, while you can define a 16 byte Fortran real that implements a 16 byte real type (using all 16 bytes for the mantissa and exponent). Fixing this will require providing alternative ways to provide the operations that do not require using C code.

comment:6 Changed 13 months ago by gropp

Here are some options for handling the REAL*16 and corresponding datatypes created with MPI_Type_create_f90_real/complex:

1) Attach to the datatype the functions needed for the predefined operations. This can be done with an internal attribute, much as the topology data is attached to communicators. This is a very general solution, and would also apply to new datatypes that might be added in MPI-4.

2) Explicitly add support for just this case to the implementation of the operators, perhaps with something like the following:

case MPI_REAL16:
#if defined(USE_FC_FOR_REAL16)
     MPIR_FC_REAL16_OPxxx( in, inout, n )
#else
    ... inline code using matching C type
#endif

We also need a way to allow Fortran to access the 10-byte in 16-byte real type (it is possible to generate such a type in gfortran, for example).

This is also an opportunity to rethink the decomposition of routine code - it isn't clear whether the current code is easily optimized by the compiler.

comment:7 Changed 12 months ago by gropp

For Fortran types returned by MPI_Type_create_f90_xxx, the easiest way to handle them without penalizing all calls to the op routines is to add a block of code in the default clause to check for these datatypes, extract the underlying C type, and re-invoke the routine (assumes a low cost for that call; alternately, a simple jump to the top).

However, because of the problems in handling the long types, it might be better to just always jump to the appropriate Fortran routine in this case.

Finally, another option is to change the way that the operations are handled, and make the operations a property of the datatype.

comment:8 Changed 4 weeks ago by balaji

  • Milestone changed from mpich-3.1.4 to mpich-3.2

Milestone mpich-3.1.4 deleted

Note: See TracTickets for help on using tickets.