Opened 5 years ago

Closed 4 years ago

#1849 closed bug (fixed)

bug in the nbc schedule progress code.

Reported by: blocksom Owned by: balaji
Priority: major Milestone: future
Component: mpich Keywords: ibm-integ
Cc:

Description

This commit was (rightfully) not accepted into the mpich/master branch, however we still need to discuss this error and figure out what is going on. Dave thought this might be some data structure layout mismatch or something.

From dc82657b3f47f9af3a6a5bfac2d7dc540697efd8 Mon Sep 17 00:00:00 2001
From: Michael Blocksome <blocksom@us.ibm.com>
Date: Fri, 2 Nov 2012 13:49:59 -0500
Subject: [PATCH] Comment out bug in the nbc schedule progress code.

I'm not sure why, but this line of code causes the MPID_Request 'kind'
field to be set to zero, which is an invalid value and trip an assert in
MPI_Wait().

Need to discuss this with ANL.

Signed-off-by: Charles Archer <archerc@us.ibm.com>
---
 src/mpid/common/sched/mpid_sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/mpid/common/sched/mpid_sched.c b/src/mpid/common/sched/mpid_sched.c
index a07e8e1..58b0392 100644
--- a/src/mpid/common/sched/mpid_sched.c
+++ b/src/mpid/common/sched/mpid_sched.c
@@ -867,7 +867,7 @@ static int MPIDU_Sched_progress_state(struct MPIDU_Sched_state *state, int *made
 
             /* TODO refactor into a sched_complete routine? */
             MPID_REQUEST_SET_COMPLETED(s->req);
-            MPID_Request_release(s->req);
+            /* MPID_Request_release(s->req); */
             s->req = NULL;
             MPIU_Free(s->entries);
             MPIU_Free(s);
-- 
1.7.1

Change History (21)

comment:1 Changed 5 years ago by blocksom

I pushed this commit to branch mpich-ibm/ticket-1849.

comment:2 Changed 5 years ago by balaji

  • Milestone set to mpich-3.1
  • Owner set to balaji

comment:3 Changed 5 years ago by balaji

Can you reproduce this with a test program on ch3?

comment:4 Changed 5 years ago by blocksom

I think this might require the fix in ticket #1808 to expose the problem. I'll recreate on bgq first - I think it was one of the mpich testsuite tests, but not sure which one.

comment:5 Changed 5 years ago by blocksom

I rebased branch mpich-ibm/ticket-1849 to mpich/master.

comment:6 Changed 5 years ago by balaji

  • Milestone changed from mpich-3.1 to future

I'm going to push this to "future", till we know what this is trying to do. We are waiting to hear back on what test is failing without this.

comment:7 Changed 5 years ago by blocksom

I rebased this to master, pushed as branch mpich-ibm/ticket-1849-alt, and deleted branch mpich-ibm/ticket-1849.

comment:8 Changed 5 years ago by balaji

Mike: were you able to figure out which test was failing without this patch? If not, should this patch be discarded?

comment:9 Changed 5 years ago by blocksom

No .. I just haven't worked on this. Sorry for the delay.

I'll pull the master branch and run the new mpi3 tests on bgq today, then apply this patch and see if any failures go away.

comment:10 Changed 5 years ago by blocksom

One of the tests that fail on bgq is test/mpi/coll/nonblocking which then passes if I apply the patch in this ticket.

/bgusr/blocksom/mpich/build/test/mpi/coll$ runjob -n 4 --block R01-M0-N06 --timeout 120 --envs PAMID_MPIR_NBC=1 : ./nonblocking
Abort(1) on node 3 (rank 3 in comm 1140850688): Fatal error in PMPI_Wait: Internal MPI error!, error stack:
PMPI_Wait(180)............: MPI_Wait(request=0x1dbfffba60, status=0x1) failed
MPIR_Wait_impl(77)........: 
MPIR_Request_complete(234): INTERNAL ERROR: unexpected value in case statement (value=0)
Abort(1) on node 0 (rank 0 in comm 1140850688): Fatal error in PMPI_Wait: Internal MPI error!, error stack:
PMPI_Wait(180)............: MPI_Wait(request=0x1dbfffba60, status=0x1) failed
MPIR_Wait_impl(77)........: 
MPIR_Request_complete(234): INTERNAL ERROR: unexpected value in case statement (value=0)
Abort(1) on node 1 (rank 1 in comm 1140850688): Fatal error in PMPI_Wait: Internal MPI error!, error stack:
PMPI_Wait(180)............: MPI_Wait(request=0x1dbfffba60, status=0x1) failed
MPIR_Wait_impl(77)........: 
MPIR_Request_complete(234): INTERNAL ERROR: unexpected value in case statement (value=0)
Abort(1) on node 2 (rank 2 in comm 1140850688): Fatal error in PMPI_Wait: Internal MPI error!, error stack:
PMPI_Wait(180)............: MPI_Wait(request=0x1dbfffba60, status=0x1) failed
MPIR_Wait_impl(77)........: 
MPIR_Request_complete(234): INTERNAL ERROR: unexpected value in case statement (value=0)
2013-08-12 10:36:26.509 (WARN ) [0xfff9b6a8c90] R01-M0-N06:1239830:ibm.runjob.client.Job: normal termination with status 1 from rank 3

However, I found that if I do not specify the environment variable PAMID_MPIR_NBC=1 then the test will pass...

/bgusr/blocksom/mpich/build/test/mpi/coll$ runjob -n 4 --block R01-M0-N06 --timeout 120 : ./nonblocking
 No errors

Without the environment variable specified the NBC code in pamid will default to a blocking implementation, so I don't think this really tells us anything.

(Eventually, the pamid NBC code will use the native pamid non-blocking collectives - it just isn't fully implemented yet)

So, it seems to me to be something in the MPIR non-blocking code path - but I'm still digging.

comment:11 Changed 5 years ago by sssharka

This change is needed for all non-blocking collectives. MPI_Wait will release the request anyway once it is completed and MPI_Wait will be called after all non-blocking calls i.e. no memory leaks. There is no need for MPIDU_Sched_progress_state to relase the request. This causes double free.

comment:12 Changed 5 years ago by balaji

MPI_Wait just reduces the reference count on the request. Until the reference count reaches 0, the request doesn't actually get freed. So that's not a correct assessment of the problem.

comment:13 Changed 5 years ago by sssharka

MPIR_Wait_impl calls MPIR_Request_complete which releases the request. Here is the code snippet from MPIR_Request_complete:

case MPID_COLL_REQUEST:
{

MPIR_Request_extract_status(request_ptr, status);
MPID_Request_release(request_ptr);
*request = MPI_REQUEST_NULL;
break;

}

So it is doing exactly what MPIDU_Sched_progress_state does when a request is complete.

Thanks
Sameh

comment:14 Changed 5 years ago by balaji

Sameh,

MPIDU_Sched_progress_state only decrements the internal reference count. MPIR_Wait_impl does two things: (1) it decrements the reference count and (2) marks the user request as NULL. So these are two different things.

Did you try this change with CH3? I think this will break CH3.

So either CH3 is incorrectly decrementing the reference count somewhere it shouldn't, or pamid is not incrementing the reference count somewhere it should. I'll look into CH3. Can you do the same for pamid?

comment:15 Changed 5 years ago by sssharka

Hi Pavan,

Here is what I found so far.

MPI_Test or MPI_Wait both call MPIR_Request_complete which as I attached earlier will release the request and sets it to null:

case MPID_COLL_REQUEST:

{

MPIR_Request_extract_status(request_ptr, status);
MPID_Request_release(request_ptr);
*request = MPI_REQUEST_NULL;
break;

}

MPIDU_Sched_progress_state does the following:

if (s->idx == s->num_entries) {

dprintf(stderr, "completing and dequeuing s=%p r=%p\n", s, s->req);

/* dequeue this schedule from the state, it's complete */
MPL_DL_DELETE(state->head, s);

/* TODO refactor into a sched_complete routine? */
MPID_REQUEST_SET_COMPLETED(s->req);
MPID_Request_release(s->req);
s->req = NULL;

So both MPI_Test or MPI_Wait and MPIDU_Sched_progress_state will release the request and set it to NULL.

Regarding PAMID, I made sure PAMID doesn't change anything. Here is what I did to make sure I take PAMID completely out of the picture:

comm->coll_fns->Ibarrier_sched = MPIR_Ibarrier_intra;
comm->coll_fns->Ibcast_sched = MPIR_Ibcast_intra;
comm->coll_fns->Igather_sched = MPIR_Igather_intra;
.
.
.

and

comm->coll_fns->Ibarrier_req = NULL;
comm->coll_fns->Ibcast_req = NULL;
.
.

This way the code path is purely mpich code. NO PAMID.

Here is the testcase I used ibarrier.c

#include <mpi.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{

MPI_Request barrier;
int rank,i,done;

MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Ibarrier(MPI_COMM_WORLD,&barrier);
for (i=0,done=0; !done; i++) {

usleep(1000);
/*printf("[%d] MPI_Test: %d\n",rank,i);*/
MPI_Test(&barrier,&done,MPI_STATUS_IGNORE);

}

if (rank == 0)

printf(" No Errors\n");

MPI_Finalize();
return 0;

Maybe you could try to disable ch3 and test? It could be that ch3 handles requests in a different way? I didn't look at ch3 code so I can't really compare.

Please let me know what you find/think.

Thanks
Sameh

comment:16 Changed 5 years ago by balaji

The code works fine with ch3, so it's not an obvious thing.

I'll look into ch3 this week to see if I can find something wrong, but without a failing test case, it's really hard to say anything.

comment:17 Changed 5 years ago by sssharka

Hi Pavan,

Thanks for looking into this. Please try to disable ch3 and use the testcase above.

Thanks
Sameh

comment:18 Changed 4 years ago by sssharka

Hi Pavan,

Did you try disabling ch3 and testing with the ibarrier.c testcase? Any updates on this issue?

Thanks
Sameh

comment:19 Changed 4 years ago by balaji

Sameh,

I don't doubt that this is failing with pamid. But the patch is changing code at the MPI layer, which is working fine with ch3. So I'm not convinced that the bug is in that code and not in pamid. Since it works correctly with ch3, my guess is that you are doing something wrong in pamid.

FWIW, your conclusion on the progress engine above is incorrect. Setting s->req to NULL does not mean that the user-request is being freed. The user request cannot be set to REQUEST_NULL till the user completes that request. The internal scheduler cannot do that for the user.

comment:20 Changed 4 years ago by blocksom

Sameh found the bug .. in a macro that wasn't called anywhere in pamid until the nbc schedule code was added.

I've signed off on the commit and pushed it to branch mpich-ibm/ticket-1849.

This ticket can be closed once this commit is pushed to master.

comment:21 Changed 4 years ago by balaji

  • Resolution set to fixed
  • Status changed from new to closed

Pushed to mpich/master in [e41c212d].

Note: See TracTickets for help on using tickets.