Opened 7 years ago

Last modified 22 months ago

#1420 new bug

Processes segfaulting when a fault occurs

Reported by: wangraying@… Owned by: wbland
Priority: major Milestone: mpich-3.3
Component: mpich Keywords:
Cc:

Description (last modified by balaji)

In the following program, when process 0 aborts, the remaining processes seem to segfault.

#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
#include "mpi.h"

int main(int argc, char **argv)
{
    int rank;

    MPI_Init(NULL, NULL);
    MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
    MPI_Barrier(MPI_COMM_WORLD);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    /* sleep(2); */
    if (rank == 0)
        exit(0);

    while (1);

    MPI_Finalize();

    return 0;
}

It looks like the SIGUSR1 handle is not being setup in this case. If the "sleep(2)" line is uncommented, the SIGUSR1 handle is setup, but the segfault still seems to occur.

Change History (14)

comment:1 Changed 7 years ago by balaji

Here's the command I used for launching the job:

./bin/mpiexec -disable-auto-cleanup -hosts localhost,127.0.1.1 -n 3 ./a.out

comment:2 Changed 7 years ago by buntinas

  • Resolution set to worksforme
  • Status changed from new to closed

This works for me on my laptop and bblogin.

comment:3 Changed 7 years ago by balaji

  • Resolution worksforme deleted
  • Status changed from closed to reopened

Can you try killing one additional process (apart from the one that already exits in the program)? I can definitely reproduce this with two failed processes. With one failed process, it's a little more harder.

comment:4 Changed 7 years ago by balaji

  • Reporter changed from balaji to wangraying@…

Adding Rui to the CC list.

comment:5 Changed 7 years ago by balaji

Err.. Accidentally added to the reporter list. Oh, well.

comment:6 Changed 7 years ago by buntinas

Send me your configure line.

comment:7 Changed 7 years ago by balaji

  • Milestone changed from mpich2-1.3.2 to mpich2-1.3.3

comment:8 Changed 7 years ago by balaji

  • Milestone changed from mpich2-1.3.3 to mpich2-1.4

Milestone mpich2-1.3.3 deleted

comment:9 Changed 6 years ago by balaji

  • Milestone changed from mpich2-1.4 to future

comment:10 Changed 6 years ago by balaji

  • Milestone changed from future to mpich2-1.4

comment:11 Changed 5 years ago by balaji

  • Milestone changed from mpich2-1.5 to mpich-3.0

FT is not a priority for the 1.5 release. Moving this to 3.0.

comment:12 Changed 5 years ago by balaji

  • Milestone changed from mpich-3.0 to mpich-3.0.1

comment:13 Changed 4 years ago by balaji

  • Description modified (diff)
  • Milestone changed from mpich-3.1 to mpich-3.2
  • Owner changed from buntinas to wbland
  • Status changed from reopened to new

comment:14 Changed 22 months ago by balaji

  • Milestone changed from mpich-3.2.1 to mpich-3.3

Milestone mpich-3.2.1 deleted

Note: See TracTickets for help on using tickets.