Opened 9 years ago

Closed 8 years ago

#516 closed bug (worksforme)

Nemesis tests hang with PGI compiler

Reported by: "Rajeev Thakur" <thakur@…> Owned by: buntinas
Priority: major Milestone: future
Component: mpich Keywords:
Cc:

Description (last modified by thakur)

Many Nemesis tests timeout with the PGI compiler (7.1.6). This was on
elephant, a quad-core machine. Also happened two days ago on triumph.

/home/MPI/testing/mpich2/mpich2/configure
--prefix=/sandbox/thakur/cb/mpi2-inst --enable-romio --enable-cxx
--disable-totalview --with-device=ch3:nemesis --with-pm=mpd
Environment = F90 = pgf90; FC = pgf77; CXX = pgCC; CC = pgcc;



Looking in ./testlist
Processing directory attr
Looking in ./attr/testlist
Unexpected output in attrt: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program attrt exited without No Errors
Unexpected output in attric: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program attric exited without No Errors
Unexpected output in attrend: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program attrend exited without No Errors
Processing directory coll
Looking in ./coll/testlist
Unexpected output in allred: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allred exited without No Errors
Unexpected output in allredmany: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allredmany exited without No Errors
Unexpected output in allred2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allred2 exited without No Errors
Unexpected output in allred3: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allred3 exited without No Errors
Unexpected output in allred4: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allred4 exited without No Errors
Unexpected output in reduce: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program reduce exited without No Errors
Unexpected output in reduce: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program reduce exited without No Errors
Unexpected output in red3: mpiexec_elephant.mcs.anl.gov (handle_sig_occurred
1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program red3 exited without No Errors
Unexpected output in red4: mpiexec_elephant.mcs.anl.gov (handle_sig_occurred
1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program red4 exited without No Errors
Unexpected output in alltoall1: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program alltoall1 exited without No Errors
Unexpected output in alltoallv: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program alltoallv exited without No Errors
Unexpected output in alltoallv0: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program alltoallv0 exited without No Errors
Unexpected output in alltoallw1: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program alltoallw1 exited without No Errors
Unexpected output in alltoallw2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program alltoallw2 exited without No Errors
Unexpected output in allgathe[2]: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allgathe[2] exited without No Errors
Unexpected output in allgathe[3]: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allgathe[3] exited without No Errors
Unexpected output in allgatherv2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allgatherv2 exited without No Errors
Unexpected output in allgatherv3: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program allgatherv3 exited without No Errors
Unexpected output in allgatherv4: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=600
Program allgatherv4 exited without No Errors
Unexpected output in bcasttest: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program bcasttest exited without No Errors
Unexpected output in bcasttest: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program bcasttest exited without No Errors
Unexpected output in bcast2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program bcast2 exited without No Errors
Unexpected output in bcast2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=420
Program bcast2 exited without No Errors
Unexpected output in bcast3: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=420
Program bcast3 exited without No Errors
Unexpected output in coll2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll2 exited without No Errors
Unexpected output in coll3: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll3 exited without No Errors
Unexpected output in coll4: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll4 exited without No Errors
Unexpected output in coll5: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll5 exited without No Errors
Unexpected output in coll6: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll6 exited without No Errors
Unexpected output in coll7: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll7 exited without No Errors
Unexpected output in coll8: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll8 exited without No Errors
Unexpected output in coll9: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll9 exited without No Errors
Unexpected output in coll10: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll10 exited without No Errors
Unexpected output in coll11: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll11 exited without No Errors
Unexpected output in coll12: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll12 exited without No Errors
Unexpected output in coll13: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program coll13 exited without No Errors
Unexpected output in longuser: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program longuser exited without No Errors
Unexpected output in redscat: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program redscat exited without No Errors
Unexpected output in redscat: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program redscat exited without No Errors
Unexpected output in redscat2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program redscat2 exited without No Errors
Unexpected output in redscat2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program redscat2 exited without No Errors
Unexpected output in redscat2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program redscat2 exited without No Errors
Unexpected output in scantst: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program scantst exited without No Errors
Unexpected output in exscan: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program exscan exited without No Errors
Unexpected output in exscan2: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program exscan2 exited without No Errors
Unexpected output in gather: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program gather exited without No Errors
Unexpected output in gathe[2]: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program gathe[2] exited without No Errors
Unexpected output in scattern: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program scattern exited without No Errors
Unexpected output in scatte[2]: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program scatte[2] exited without No Errors
Unexpected output in scatte[3]: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program scatte[3] exited without No Errors
Unexpected output in scatterv: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program scatterv exited without No Errors
Unexpected output in icbcast: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icbcast exited without No Errors
Unexpected output in icbcast: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icbcast exited without No Errors
Unexpected output in icallreduce: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icallreduce exited without No Errors
Unexpected output in icreduce: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icreduce exited without No Errors
Unexpected output in icscatter: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icscatter exited without No Errors
Unexpected output in icgather: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icgather exited without No Errors
Unexpected output in icallgather: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icallgather exited without No Errors
Unexpected output in icbarrier: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icbarrier exited without No Errors
Unexpected output in icallgatherv: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icallgatherv exited without No Errors
Unexpected output in icgatherv: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icgatherv exited without No Errors
Unexpected output in icscatterv: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icscatterv exited without No Errors
Unexpected output in icalltoall: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icalltoall exited without No Errors
Unexpected output in icalltoallv: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icalltoallv exited without No Errors
Unexpected output in icalltoallw: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icalltoallw exited without No Errors
Unexpected output in opland: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opland exited without No Errors
Unexpected output in oplor: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program oplor exited without No Errors
Unexpected output in oplxor: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program oplxor exited without No Errors
Unexpected output in oplxor: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program oplxor exited without No Errors
Unexpected output in opband: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opband exited without No Errors
Unexpected output in opbor: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opbor exited without No Errors
Unexpected output in opbxor: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opbxor exited without No Errors
Unexpected output in opbxor: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opbxor exited without No Errors
Unexpected output in opprod: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opprod exited without No Errors
Unexpected output in opprod: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opprod exited without No Errors
Unexpected output in opsum: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opsum exited without No Errors
Unexpected output in opmin: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opmin exited without No Errors
Unexpected output in opminloc: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opminloc exited without No Errors
Unexpected output in opmax: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opmax exited without No Errors
Unexpected output in opmaxloc: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program opmaxloc exited without No Errors
Processing directory comm
Looking in ./comm/testlist
Unexpected output in dup: mpiexec_elephant.mcs.anl.gov (handle_sig_occurred
1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program dup exited without No Errors
Unexpected output in dupic: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program dupic exited without No Errors
Unexpected output in commname: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program commname exited without No Errors
Unexpected output in ic1: mpiexec_elephant.mcs.anl.gov (handle_sig_occurred
1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program ic1 exited without No Errors
Unexpected output in icgroup: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icgroup exited without No Errors
Unexpected output in icm: mpiexec_elephant.mcs.anl.gov (handle_sig_occurred
1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icm exited without No Errors
Unexpected output in icsplit: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program icsplit exited without No Errors
Unexpected output in iccreate: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=180
Program iccreate exited without No Errors
Unexpected output in ctxalloc: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=300
Program ctxalloc exited without No Errors
Unexpected output in ctxsplit: mpiexec_elephant.mcs.anl.gov
(handle_sig_occurred 1145): job ending due to env var MPIEXEC_TIMEOUT=300
Program ctxsplit exited witelephant:/sandbox/thakur/cb/mpich2%
elephant:/sandbox/thakur/cb/mpich2%
elephant:/sandbox/thakur/cb/mpich2%



Change History (9)

comment:1 Changed 9 years ago by Rajeev Thakur

  • id set to 516

This message has 0 attachment(s)

comment:2 Changed 9 years ago by thakur

I just rebuilt by hand on octagon and the tests hang there as well. Configured just with --with-pm=gforker. All compilers set to PG compilers.

I am disabling the nightly test with PG compilers until this is fixed, because it takes all day just to go through the first configuration, mpd-ch3:nemesis.

comment:3 Changed 9 years ago by thakur

  • Milestone set to mpich2-1.1rc1
  • Owner set to goodell

comment:4 Changed 9 years ago by goodell

  • Milestone changed from mpich2-1.1rc1 to mpich2-1.1

This is not going to happen by 1.1rc1. Maybe we can do this before the final release.

comment:5 Changed 9 years ago by goodell

  • Owner changed from goodell to buntinas

comment:6 Changed 9 years ago by buntinas

  • Milestone changed from mpich2-1.1 to mpich2-1.1.1

This will need to wait until we find a workaround for the PGI compiler bugs.

comment:7 Changed 9 years ago by buntinas

  • Milestone changed from mpich2-1.1.1 to future

comment:8 Changed 8 years ago by goodell

  • Description modified (diff)

This is fixed with more modern PGI compilers, right?

comment:9 Changed 8 years ago by thakur

  • Description modified (diff)
  • Resolution set to worksforme
  • Status changed from new to closed

I believe so.

Note: See TracTickets for help on using tickets.