Opened 4 years ago

Last modified 3 years ago

#2155 new bug

Revoke not working

Reported by: wbland Owned by: wbland
Priority: major Milestone: future
Component: ulfm Keywords:
Cc:

Description

The revoke test is not passing. It seems to be timing out at the moment.

Change History (7)

comment:1 Changed 4 years ago by huiweilu

  • Owner changed from wbland to huiweilu

comment:2 Changed 4 years ago by huiweilu

With MPIR_CVAR_CH3_NOLOCAL=0, revoke_nofail run into deadlock (stuck in the barrier).

For NOLOCAL=0, the communicator context that barrier uses is context_id+offset=3 (As comparison, when NOLOCAL=1, irecv uses context_id+offset=1). Current code does not take care of the offset=3 case, that's why MPIDI_CH3U_Clean_recvq does not find the posted message of the barrier to cancel.

The difference between NOLOCAL=1 or 0 is in src/mpi/coll/barrier.c:146. When NOLOCAL=0, MPIR_Barrier_intra will choose the barrier_smp_intra path, while when NOLOCAL=1, it will ignore this path.

In barrier_smp_intra, it will call MPIR_Barrier_impl(comm_ptr->node_comm, errflag). Note the "comm_ptr->node_comm" here. This "node_comm" is causing the trouble.

node_comm is created at src/mpi/comm/commutil.c:497, it use MPID_CONTEXT_INTRANODE_OFFSET as offset. So if the original context_id=496, then node_comm->context_id=498.

In barrier, when the node_comm is actually used in src/mpid/ch3/src/mpid_irecv.c:43, it uses "comm->recvcontext_id + context_offset". Here the comm passed from node_comm, context_offset is 1. That's why when barrier used send_recv, there is pkt with context=499.

comment:3 Changed 4 years ago by wbland

We’re working on debugging the MPIX_Revoke code inside ULFM and why it seems to cause MPI_FInalize to hang. After digging around for a while, we found that the problem is related to the virtual connections. Inside MPI_Finalize, some (or possibly all) of the processes get stuck in MPID_STATE_MPIDI_CH3U_VC_WAITFORCLOSE. This led me to dig around some in the revoke implementation to see what I was doing that could be affecting the status of virtual connections and I saw this line (https://trac.mpich.org/projects/mpich/browser/src/mpid/ch3/src/mpid_comm_revoke.c#L53):

MPIDI_Comm_get_vc_set_active(comm_ptr, i, &vc);

This line is supposed to activate any virtual connections between processes where a connection isn’t already active, because a revoke is currently implemented as a dumb, linear broadcast so a connection to all processes is required. So I activate the VC, then shove a message in with MPIDI_CH3_iStartMsgv. I stole this code from elsewhere, probably ch3u_rma_sync.c. The reason I can’t use a regular MPID_Send here is that the message needs to be handled in the packet handler, rather than by a matching receive (MPIX_Revoke calls don’t have matching calls).

So this leads me back to the problem. It appears that any connections that weren’t already set up before the revoke function end up being created “incorrectly?” because while the revoke message does eventually get through, the virtual connection doesn’t seem to always get cleaned up correctly. This is especially true if the revoke happens very close to the end of the application (where there wouldn’t be another call to enforce communication between the connecting processes). Looking at the logs, the call to the tcp_connect function may not even happen until inside finalize, when things are trying to get cleaned up.

To confirm this theory, I tested what would happen if I required everyone to communicate before I got into the revoking part of the code and added a quick barrier at the beginning of my test. This made the problem go away because after the barrier, everyone had communicated and set up their VCs correctly (for a small number of processes).

comment:4 Changed 4 years ago by wbland

I talked to Ken some more yesterday about what happens to the VCs during init. We concluded that one ugly solution would be to fully set up all of the VCs during initialization if we ask for fault tolerance (we could look for the flag -disable-auto-cleanup for example). That would solve the problems of not having the VCs set up when we get to revoke and would hopefully let them clean up correctly during finalize.

The only other solution I can think of to help here would be to have some sort of blocking VC setup function so the creation won't run into the finalization (which is what seems to be happening).

comment:5 Changed 4 years ago by wbland

We tried doing the VC initialization during setup without good results. Even though we do a big communication wireup at the end of MPI_Init, some of the processes still hang during VC cleanup in finalize. It happens less often than it did before, but there still seems to be some problems with the VCs here. This is going to take some major work to fix at this point, possibly involving taking a look at how the VCs are designed. I'm going to push it for a few days until we can have a discussion about this locally.

comment:6 Changed 4 years ago by wbland

  • Owner changed from huiweilu to wbland

comment:7 Changed 3 years ago by balaji

  • Milestone changed from mpich-3.2 to future
Note: See TracTickets for help on using tickets.