Opened 9 years ago

Closed 9 years ago

#783 closed bug (worksforme)

allred test fails

Reported by: gropp Owned by: gropp
Priority: major Milestone: future
Component: mpich Keywords:
Cc:

Description

I now see this:

william-gropps-computer-2:coll gropp$ ../../../bin/mpiexec -n 4 ./allred
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773).......: MPI_Allreduce(sbuf=0x400cf0, rbuf=0x400a20, count=10, dtype=0x4c000137, MPI_MAX, MPI_COMM_WORLD) failed
MPIR_MAXF_check_dtype(72): MPI_Op MPI_MAX operation not defined for this datatype 
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773).......: MPI_Allreduce(sbuf=0x400cb0, rbuf=0x400ca0, count=10, dtype=0x4c000137, MPI_MAX, MPI_COMM_WORLD) failed
MPIR_MAXF_check_dtype(72): MPI_Op MPI_MAX operation not defined for this datatype 
[0]0:Return code = 1
[0]1:Return code = 0, signaled with Interrupt
[0]2:Return code = 1
[0]3:Return code = 0, signaled with Interrupt

The failing datatype (strangely, not identified by name), is MPI_INT8_T. This is running on my Mac.

Attachments (1)

forker (2.0 KB) - added by gropp 9 years ago.
Script used to build "production" version

Download all attachments as: .zip

Change History (16)

comment:1 Changed 9 years ago by gropp

(BTW, the original allred was generated with an m4 macro and a master program; this had the advantage that the code was visible in the C file (see /home/MPI/tsuite/coll/*.m4 ). I found that easier to debug than the current CPP-based version.)

comment:2 Changed 9 years ago by gropp

One more tidbit - this test fails when I build a "production" version of MPICH2, but does not fail when I build a debugging version.

comment:3 Changed 9 years ago by goodell

  • Owner set to goodell
  • Status changed from new to accepted

I probably broke this in [f0d08f8772e7b6972bb53747c7573f451c316df1] when I added in MPI Forum ticket #18 support. I'll take a look later today.

-Dave

comment:4 Changed 9 years ago by goodell

I'm unable to reproduce this so far. Can you post the configure options that you are using for your "production" build?

-Dave

comment:5 Changed 9 years ago by goodell

Also, sometimes compiling with -g3 helps for debugging macro heavy code like this. This causes macro information to get added to the debugging info and gdb can (usually) step into macros this way.

Changed 9 years ago by gropp

Script used to build "production" version

comment:6 Changed 9 years ago by gropp

I've just used the attached script (run as ./forker 2>&1 | tee c.log), followed by make, make install, cd test/mpi/coll, make clean, make allred, ../../../bin/mpiexec -n 4 ./allred
Those steps reproduced the error with the current development MPICH2 on my MacBook? pro.

comment:7 Changed 9 years ago by goodell

Something must be different in our environments. I cannot reproduce this, even with the script (changing only the source/install paths) and instructions you posted together with mpich2 trunk@[1a438a299cf0405596e9ad6bf32fe7d110db9931].

Maybe you are linked against an old dynamic library that doesn't support the operation? You can check with otool -L /path/to/allred and look for the path to libmpich.dylib. Also perhaps blow away your install directory and re-run "make install" in case the dll path is correct but the install step isn't overwriting the old library for some reason.

comment:8 Changed 9 years ago by goodell

As a further sanity check I just did a VPATH build the same way (my previous test was in-path) and still got " No Errors" back from the test.

comment:9 Changed 9 years ago by gropp

I'm rebuilding now. However, it used to be that an install was careful about the presence of old dynamic libraries; I'm not sure the current make target preserves that. Something else to check on...

comment:10 Changed 9 years ago by goodell

I don't know if otool will show it in all cases, so it's also worth checking your environment for any DYLD_* environment variables. My environment contains none:

% env | grep DYLD || echo "no DYLD vars are set"
no DYLD vars are set

comment:11 Changed 9 years ago by gropp

  • Resolution set to worksforme
  • Status changed from accepted to closed

Will someone send the trac authors an introductory book on databases - trac threw away my comments again because they overlapped in time but not content with Dave's.

Short version. Blew away old install directory. Problem vanished. Our users will probably see same thing. Need to look at install. resolve.

comment:12 Changed 9 years ago by thakur

  • Resolution worksforme deleted
  • Status changed from closed to reopened

The make install step looks for .so files in the install directory and complains if it finds them (asks user to delete them). If files on Mac have a .dylib extension, that needs to be checked for as well. Reopening ticket.

comment:13 Changed 9 years ago by gropp

My Makefile is looking for .dylib - There is an @SHLIB_EXT@ in the Makefile.in , so it appears to get the correct library extension. The mystery deepens.

comment:14 Changed 9 years ago by goodell

  • Milestone set to future
  • Owner changed from goodell to gropp
  • Status changed from reopened to assigned

comment:15 Changed 9 years ago by thakur

  • Resolution set to worksforme
  • Status changed from assigned to closed

Not sure if this is still a problem. resolving for now

Note: See TracTickets for help on using tickets.