Opened 8 years ago

Last modified 5 years ago

#926 new bug

enable-fast

Reported by: balaji Owned by:
Priority: long-term Milestone: future
Component: mpich Keywords:
Cc:

Description

Bcast seems to be hanging with enable-fast in the new nightly tests: http://www.mcs.anl.gov/research/projects/mpich2/nightly/new/latest

Change History (12)

comment:1 Changed 8 years ago by thakur

  • Owner set to goodell
  • Status changed from new to assigned

I can reproduce this by hand. Built with --enable-fast on thrash. bcast2 runs up to 8 processes. On 10 procs, which triggers the long msg algorithm, it hangs.

comment:2 Changed 8 years ago by thakur

No, it works. It just takes a while.

thrash:/sandbox/thakur/tmp/test/mpi/coll% date; mpiexec -n 10 bcast2; date
Fri Nov  6 16:20:28 CST 2009
 No Errors
Fri Nov  6 16:24:10 CST 2009

comment:3 Changed 8 years ago by thakur

I ran all the tests in the coll directory. They all completed.

comment:4 Changed 8 years ago by goodell

So this is just the usual nemesis over-subscription issue being aggravated by tighter loops from --enable-fast, right? Is there anything that actually needs to be done here in the short term?

comment:5 Changed 8 years ago by thakur

  • Resolution set to wontfix
  • Status changed from assigned to closed

Probably not. Resolving it for now.

comment:6 Changed 8 years ago by balaji

Is the problem with the sched_yield() call alone, or other CPU yielding calls as well (e.g., usleep(0) or select())? If this is specific to sched_yield, shouldn't we just give a higher priority to pick one of the other routines before trying sched_yield?

comment:7 Changed 8 years ago by thakur

  • Resolution wontfix deleted
  • Status changed from closed to reopened

With the default build, the coll tests go through real fast. That --enable-fast makes the oversubscription case real slow is something to look into, but maybe not for this release. Reopening.

comment:8 Changed 8 years ago by balaji

  • Milestone changed from mpich2-1.2.1 to mpich2-1.3

comment:9 Changed 8 years ago by buntinas

Yes, we should look into select() or sleep() as alternatives.

Hmm. I wonder if --enable-fast disables yield...

comment:10 Changed 7 years ago by thakur

  • Milestone changed from mpich2-1.3 to mpich2-1.3.1
  • Owner changed from goodell to buntinas
  • Status changed from reopened to assigned

comment:11 Changed 7 years ago by buntinas

  • Milestone changed from mpich2-1.3.2 to future
  • Priority changed from major to long-term

comment:12 Changed 5 years ago by balaji

  • Owner buntinas deleted
  • Status changed from assigned to new
Note: See TracTickets for help on using tickets.