Opened 5 years ago

Last modified 5 years ago

#1804 new bug

MPI_Status alignment issues w/ MPI_Count

Reported by: goodell Owned by: gropp
Priority: major Milestone: future
Component: mpich Keywords:
Cc:

Description

Until just recently, it turns out that --enable-strict was broken with clang/clang++. I have a pending fix (not yet committed as of 2013-03-25) for that issue, but fixing it has revealed a number of other issues that we should have caught much earlier.

This ticket discusses one of these issues, namely alignment issues of MPI_Status/INTEGER status(MPI_STATUS_SIZE) between C and Fortran. The newly-reenabled warnings clearly show the problem:

  CC       src/binding/f77/lib_libfmpich_la-ssendf.lo
src/binding/f77/recvf.c:194:95: warning: cast from 'MPI_Fint *' (aka 'int *') to 'MPI_Status *' (aka 'struct MPI_Status *') increases required alignment from 4 to 8 [-Wcast-align]
    *ierr = MPI_Recv( v1, (int)*v2, (MPI_Datatype)(*v3), (int)*v4, (int)*v5, (MPI_Comm)(*v6), (MPI_Status *)v7 );
                                                                                              ^~~~~~~~~~~~~~~~

This is because the definition of MPI_Status is now:

typedef struct MPI_Status {
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
    MPI_Count count;
    int cancelled;
    int abi_slush_fund[2];
    @EXTRA_STATUS_DECL@
} MPI_Status;

While a Fortran program will declare a status object as (please forgive some possibly incorrect Fortran syntax):

INTEGER status(MPI_STATUS_SIZE)

If the MPI_Count type is a 64-bit and int types are 32-bit, then it is reasonable for the C compiler to require 8-byte alignment for the MPI_Status structure. Unfortunately, Fortan compilers are not obligated to align an INTEGER array to the same alignment. In practice, they will usually only align to the width of an INTEGER, which we assume to be 4 in most cases.

A good fix is not obvious here. It seems that any MPI_Status structure that is implemented as anything except an array/struct of MPI_Fint (or equivalent) runs the risk of experiencing a similar issue. The count field is the only portion of the structure which currently requires a larger size, and it is opaque to the user (not directly accessed). So we could rewrite all of our count logic to use two regular integer fields instead, though this will be extremely ugly.

Change History (18)

comment:1 Changed 5 years ago by goodell

The OMPI guys solved this in the Fortran bindings:

https://svn.open-mpi.org/trac/ompi/ticket/3218
https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/fortran/mpif-h/status-conversion.h

They determine the compiler's alignment for size_t at configure time (in our case it would be MPI_Count, probably). Then they use a runtime check on the pointer value to decide whether the status pointer that was given is properly aligned already. If it is, use that pointer when calling the C routines. If not, make a new temporary array which is properly aligned and copy the real status array in/out at the beginning/end of the binding function. This costs you a copy sometimes on Fortran, but it substantially simplifies life on the C side of things.

comment:2 Changed 5 years ago by balaji

  • Owner changed from goodell to gropp

comment:3 Changed 5 years ago by gropp

I'm testing a solution, using buildiface to insert the correct code everywhere needed.

comment:4 Changed 5 years ago by gropp

Should be fixed; ran the MPICH2 and Intel tests.

comment:5 Changed 5 years ago by gropp

There's another regrettable choice - if MPI_Status were defined as

struct { int MPI_SOURCE; int MPI_TAG; int MPI_ERROR; int cancelled; MPI_Count count; ... }

it would be better, since if MPI_Count has an 8 byte alignment requirement and ints are 4 bytes, this can force padding in the structure, making this structure unnecessarily large.

comment:6 Changed 5 years ago by goodell

You could change that ordering at the next expected ABI change. Surely there will be one at some point in the future.

comment:7 Changed 5 years ago by gropp

Further fixes pushed dealing with the (strange) requirement that the MPI_ERROR field is preserved on input, even though the parameters are defined as OUT only.

comment:8 Changed 5 years ago by goodell

Relevant commits appear to be: [81ad170b] and [3d96864e]

comment:9 Changed 5 years ago by balaji

Since this breaks ABI, I'm planning to revert [8374bd7a] from the 3.0.x release branch. It's OK to break ABI in master, since that now tracks mpich-3.1.x.

comment:10 Changed 5 years ago by balaji

Reverted in [45305a3a].

Is there anything else that needs to be done on this ticket or can it be resolved?

comment:11 Changed 5 years ago by gropp

In what way did this break the ABI? The only real change was to the Type::MPI_Status type, which was just broken, so any application that used it would not have worked. And there was no change to the order of elements in MPI_Status; in fact, the current configure fixes the Type::MPI_Status by discovering the needed padding.

The rest of the changes just ensured that the correctly aligned MPI_Status objects were presented to the C routines in the Fortran wrappers and make no change in the ABI. And it does correct an *error* in the current code which should not be released.

For my edification, where can I find commit numbers? It was clear from svn but not from git.

Last edited 5 years ago by gropp (previous) (diff)

comment:12 Changed 5 years ago by balaji

Whoops, I misread the comments to think that the order was changed. I'll add [81ad170b] and [3d96864e] back to release/3.0.x.

With respect to the ordering change, is that still required?

comment:13 Changed 5 years ago by gropp

The ordering change should be made because having unnecessary padding in MPI_Status is very sloppy and a minor efficiency issue, but it won't eliminate the need for the hideous code that I had to add to the Fortran interface. It will eliminate the need for that extra code in configure to compute the padding (though there still needs to be a check that there is no padding as the type::MPI_Status code will rely on that).

comment:14 Changed 5 years ago by gropp

Once the nightlies pass, this can be resolved (it passed tests on my Mac, including all of the Intel tests, but the Linux compilers seem to be pickier).

comment:15 Changed 5 years ago by balaji

Ok, great. Please ping me when you do make this change and break ABI compatibility, so we can give a heads-up to our vendor partners. I'll resolve this ticket once the tests pass.

comment:16 Changed 5 years ago by gropp

It would be better to ping me when there is another reason to make an ABI change - then we can do them together. This one is easy to make, and it isn't a correctness issue.

comment:17 Changed 5 years ago by balaji

Go ahead and make the change and commit it into master.

master now tracks mpich-3.1.x, so we can break ABI if needed, though we'll likely only do that if there are other such changes too (we can move commits around later as needed). If you are not comfortable with that, you can push the change to a new branch. That way, we can explicitly merge it in when we want to break ABI.

I'll not merge the change into the 3.0.x branch, so the next release (mpich-3.0.5) won't break ABI compatibility.

comment:18 Changed 5 years ago by balaji

  • Milestone changed from mpich-3.1 to future
Note: See TracTickets for help on using tickets.