Opened 7 years ago

Last modified 2 years ago

#1102 new bug

Progress might never be called even if process calls send repeatedly.

Reported by: buntinas Owned by: wbland
Priority: minor Milestone: mpich-3.3
Component: mpich Keywords:
Cc:

Description (last modified by balaji)

This was pointed out by Cray. They had a process that a process that repeatedly called send would never checkpoint because it never called the progress engine to receive the checkpoint marker message. Note that as long as the message is sent immediately, we don't have to wait on the request, so never call progress.

The same thing can happen with any "unexpected" control message, like a cancel send message.

A similar thing can happen with CTS or DATA messages. Even though the app is calling send, the CTS or DATA message is not processed until wait is called. This reduces opportunities for communication overlap.

This is probably not a common situation, but we should look for a lightweight solution. E.g., something like call progress if it hasn't been called in the last 1000 send calls.

Change History (7)

comment:1 Changed 7 years ago by buntinas

  • Milestone changed from mpich2-1.3.2 to mpich2-1.4

comment:2 Changed 7 years ago by balaji

  • Milestone changed from mpich2-1.4 to mpich2-1.5

comment:3 Changed 5 years ago by buntinas

  • Milestone changed from mpich2-1.5 to mpich-3.0

comment:4 Changed 5 years ago by balaji

  • Milestone changed from mpich-3.0 to mpich-3.0.1

comment:5 Changed 5 years ago by balaji

  • Description modified (diff)
  • Owner changed from buntinas to wbland

Giving this to Wesley because of the FT connections in this work, though the ticket is not directly related to FT.

comment:6 Changed 5 years ago by balaji

  • Milestone changed from mpich-3.1 to mpich-3.2

comment:7 Changed 2 years ago by balaji

  • Milestone changed from mpich-3.2.1 to mpich-3.3

Milestone mpich-3.2.1 deleted

Note: See TracTickets for help on using tickets.