Opened 6 years ago

Last modified 5 years ago

#1564 new bug

The case when sendbuf is MPI_IN_PLACE is not handled correctly in MPI_Alltoall

Reported by: mikhail.brinskiy@… Owned by:
Priority: major Milestone: future
Component: mpich Keywords:
Cc: michael.chuvelev@…, alexander.supalov@…

Description (last modified by balaji)

The call of MPI_Alltoall with sendbuf and sendtype parameters equal to MPI_IN_PLACE and MPI_DATATYPE_NULL correspondingly, produces the following error:

MPI_Alltoall(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, valA, 1, MPI_INT, MPI_COMM_WORLD); 
cause error 
Fatal error in MPI_Alltoall: Invalid datatype, error stack: 
MPI_Alltoall(868): MPI_Alltoall(sbuf=MPI_IN_PLACE, scount=0, MPI_DATATYPE_NULL, rbuf=0x7fff0681f790, rcount=1, MPI_INT, MPI_COMM_WORLD) failed 

The problem is that correctness checks are made for sendcount and sendtype parameters even when sendbuf value is MPI_IN_PLACE. But according to the standard (158/2), sendtype and sendcount parameters should be ignored in this case. The solution would be quite simple:
1)Do not check sendbuf and sendtype parameters for correctness if sendbuf equals to MPI_IN_PLACE.
2)The check that produces an error if sendbuf is MPI_IN_PLACE and sendcount greater than 0 should be removed from MPI_Alltoall.

Also the following macroses make some rather strange cheks. They produce an error only when buffer is MPI_IN_PLACE and the value of count is greater than 0. This makes possible to specify all the buffers passed to the MPI collective operation as MPI_IN_PLACE if the corresponding count is 0:

#define MPIR_ERRTEST_SENDBUF_INPLACE(sendbuf, count, err)
#define MPIR_ERRTEST_RECVBUF_INPLACE(recvbuf, count, err)
#define MPIR_ERRTEST_BUF_INPLACE(buf,count,err)

These macroses are used in many collective operations (for instance MPI_Alltoallv, MPI_Altoallw).

Mikhail Brinskiy, Intel MPI team

Change History (11)

comment:1 Changed 6 years ago by balaji

  • Description modified (diff)
  • Milestone set to mpich2-1.5
  • Owner set to balaji
  • Priority changed from minor to major
  • Status changed from new to accepted

Reformatting trac entry.

comment:2 Changed 6 years ago by balaji

The first part (MPI_IN_PLACE for the sendbuf) is fixed in [dd7e55f8de33cfd8b6eb61b75714e9ad088235df]. I'll need to spend some more time on the second part to make sure I don't remove some necessary check.

comment:3 Changed 6 years ago by thakur

Regarding the second part, I am not sure if there is a problem with the macros. They are just lenient when the count is 0 since nothing needs to be done.

comment:4 Changed 6 years ago by balaji

  • Resolution set to fixed
  • Status changed from accepted to closed

Just chatted with Rajeev about this. The checks are correct. They are checking if MPI_IN_PLACE is passed to sendbuf for an inter-communicator or to the recvbuf in any communicator. These are error cases. The MPI standard only allows this for intra-communicators for the sendbuf.

comment:5 Changed 6 years ago by mikhail.brinskiy@…

Yes I agree that MPI_IN_PLACE is allowed for intra-communicators only, but MPI standard allows MPI_IN_PLACE for the sendbuf as well as for recvbuf (for instance MPI_Scatter, 150/13).

The problem with MPIR_ERRTEST_*_INPLACE checks is that they allow using of MPI_IN_PLACE when count is 0. This makes possible to violate MPI standard by specifying any buffer passed to MPI collective function as MPI_IN_PLACE (with count set to 0). In my opinion, MPI function should return the corresponding error saying that recvbuf can't be MPI_IN_PLACE in the case below:
MPI_Gather(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, 0, MPI_COMM_WORLD);

Therefore I suggest the macros should be modified as follows:
if (sendbuf==MPI_IN_PLACE) {\
...

comment:6 Changed 6 years ago by thakur

If the count is 0, the operation is a no-op in any case. So it is kind of optional to complain about the sendbuf. For example, if the sendbuf was an invalid address, should you complain?

If the count is 1, there is good reason to complain.

comment:7 Changed 6 years ago by mikhail.brinskiy@…

MPI_IN_PLACE "interface" assumes that the corresponding count is ignored. Thus it is typical configuration when count is set to 0 with MPI_IN_PLACE. Then what is the correct error to be returned by MPI_Alltoall, if user swaps send and recv parameter sets by mistake (like shown below)?
MPI_Alltoall(&val, n, MPI_INT, MPI_IN_PLACE, 0, MPI_INT, MPI_COMM_WORLD);

In my opinion this is the case when MPIR_ERRTEST_*_INPLACE macros should terminate an operation with the "in_place" error. The current MPICH implementation does not produce any error and returns success code when MPI_Alltoall is called with the parameters specified above.

comment:8 Changed 6 years ago by thakur

OK, I see what you are saying.

comment:9 Changed 6 years ago by balaji

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:10 Changed 5 years ago by balaji

  • Milestone changed from mpich2-1.5 to future

comment:11 Changed 5 years ago by balaji

  • Cc changed from michael.chuvelev@intel.com; alexander.supalov@intel.com to michael.chuvelev@intel.com, alexander.supalov@intel.com
  • Description modified (diff)
  • Owner balaji deleted
  • Status changed from reopened to new
Note: See TracTickets for help on using tickets.