Opened 3 years ago

Last modified 8 days ago

#1769 assigned bug

Datatypes created by MPI_Type_create_f90_real not usable in MPI operations

Reported by: klimach Owned by: gropp
Priority: major Milestone: mpich-3.3
Component: mpich Keywords: Fortran MPI_Type_create_f90_real
Cc: jhammond

Description (last modified by balaji)

I installed MPICH2 3.0rc1 using GCC 4.7.2 and found a problem with MPI_Type_create_f90_real.
It is very similar to the problem I reported in OpenMPI under:

Running the following program:

program test_mpi
use mpi
implicit none

integer, parameter :: rk_prec = 15
integer, parameter :: rk = selected_real_kind(rk_prec)

integer :: rk_mpi
integer :: iError
real(kind=rk) :: a_real
real(kind=rk) :: res

call MPI_Init(iError)
call mpi_type_create_f90_real(rk_prec, MPI_UNDEFINED, rk_mpi, iError)
write(*,*) 'MPI_REAL8:', MPI_REAL8
write(*,*) 'type_create_f90_real:', rk_mpi
a_real = 1.0
call MPI_Reduce(a_real, res, 1, rk_mpi, MPI_MIN, 0, MPI_COMM_WORLD, iError)
call MPI_Finalize(iError)

end program test_mpi

Results in the following output and error:

MPI_REAL8:  1275070505
type_create_f90_real: -1946157049
Fatal error in PMPI_Reduce: Invalid MPI_Op, error stack:
PMPI_Reduce(1217)........: MPI_Reduce(sbuf=0x7fff43637108, rbuf=0x7fff436370f8, count=1, dtype=USER<f90_real>, MPI_MIN, root=0, MPI_COMM_WORLD) failed
MPIR_MINF_check_dtype(71): MPI_Op MPI_MIN operation not defined for this datatype

Change History (10)

comment:1 Changed 3 years ago by balaji

  • Milestone changed from mpich-3.0 to mpich-3.0.2

comment:2 Changed 3 years ago by balaji

  • Owner set to gropp
  • Status changed from new to assigned

comment:3 Changed 2 years ago by balaji

  • Description modified (diff)
  • Milestone changed from mpich-3.1 to mpich-3.1.1

comment:4 Changed 2 years ago by gropp

Awaiting instructions on how to submit for review.

comment:5 Changed 2 years ago by gropp

More problems. It turns out that some of the available floating point types in Fortran have no corresponding C type - for example, on my Mac, long double is 16 bytes but only implements the 10 byte, 80-bit extended double, while you can define a 16 byte Fortran real that implements a 16 byte real type (using all 16 bytes for the mantissa and exponent). Fixing this will require providing alternative ways to provide the operations that do not require using C code.

comment:6 Changed 2 years ago by gropp

Here are some options for handling the REAL*16 and corresponding datatypes created with MPI_Type_create_f90_real/complex:

1) Attach to the datatype the functions needed for the predefined operations. This can be done with an internal attribute, much as the topology data is attached to communicators. This is a very general solution, and would also apply to new datatypes that might be added in MPI-4.

2) Explicitly add support for just this case to the implementation of the operators, perhaps with something like the following:

case MPI_REAL16:
#if defined(USE_FC_FOR_REAL16)
     MPIR_FC_REAL16_OPxxx( in, inout, n )
    ... inline code using matching C type

We also need a way to allow Fortran to access the 10-byte in 16-byte real type (it is possible to generate such a type in gfortran, for example).

This is also an opportunity to rethink the decomposition of routine code - it isn't clear whether the current code is easily optimized by the compiler.

comment:7 Changed 2 years ago by gropp

For Fortran types returned by MPI_Type_create_f90_xxx, the easiest way to handle them without penalizing all calls to the op routines is to add a block of code in the default clause to check for these datatypes, extract the underlying C type, and re-invoke the routine (assumes a low cost for that call; alternately, a simple jump to the top).

However, because of the problems in handling the long types, it might be better to just always jump to the appropriate Fortran routine in this case.

Finally, another option is to change the way that the operations are handled, and make the operations a property of the datatype.

comment:8 Changed 12 months ago by balaji

  • Milestone changed from mpich-3.1.4 to mpich-3.2

Milestone mpich-3.1.4 deleted

comment:9 Changed 2 weeks ago by jhammond

  • Cc jhammond added

At least for Intel, one should be able to use float128 in C to do the Fortran float128 ( I can verify this explicitly with folks here if necessary.

In any case, I would like to see this support and can write either the C or Fortran 2008 ISO_C_BINDING code to make it happen. I have already done a bit of playing around with MPICH quadruple precision support, but need some guidance from the MPICH team on how to proceed.

comment:10 Changed 8 days ago by balaji

  • Milestone changed from mpich-3.2.1 to mpich-3.3

Milestone mpich-3.2.1 deleted

Note: See TracTickets for help on using tickets.