Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#2054 closed bug (fixed)

MPICH failing electric-fence check

Reported by: matthieu.dorier@… Owned by: Wesley Bland <wbland@…>
Priority: major Milestone: future
Component: mpich Keywords:
Cc:

Description

Hi,

I wanted to debug a memory corruption in an MPI program using the electric-fence tool, and noticed that electric-fence detects an error already in MPI_Init (thus the program stops and I cannot debug the actual memory corruption that happens later). The following program is a minimal one to exemplify the error:

#include <mpi.h>
int main(int argc, char** argv) {
  MPI_Init(&argc,&argv);
  MPI_Finalize();
  return 0;
}

The error output by electric-fence:

ElectricFence Aborting: Allocating 0 bytes, probably a bug.

And the backtrace output by gdb:

Program received signal SIGILL, Illegal instruction.
0x0012d422 in __kernel_vsyscall ()
(gdb) backtrace
#0 0x0012d422 in __kernel_vsyscall ()
#1 0x0040c976 in kill () at ../sysdeps/unix/syscall-template.S:82
#2 0x0012fc54 in EF_Abort () from /usr/lib/libefence.so.0
#3 0x0012f71b in memalign () from /usr/lib/libefence.so.0
#4 0x0012f88b in malloc () from /usr/lib/libefence.so.0
#5 0x001e3b6b in MPID_nem_init () from /home/mdorier/deploy/lib/libmpich.so.10
#6 0x001d2f4c in MPIDI_CH3_Init () from /home/mdorier/deploy/lib/libmpich.so.10
#7 0x001c8c57 in MPID_Init () from /home/mdorier/deploy/lib/libmpich.so.10
#8 0x0029d435 in MPIR_Init_thread () from /home/mdorier/deploy/lib/libmpich.so.10
#9 0x0029cd33 in PMPI_Init () from /home/mdorier/deploy/lib/libmpich.so.10
#10 0x0804859f in main (argc=1, argv=0xbffff994) at m.c:4

The version of mpich is 3.0.4, gcc 4.6.4, on Ubuntu 10.4, linux kernel 2.6.32.

I suspect a call to malloc with 0 as parameter, whose output is properly checked by Mpich, but makes electric-fence think there is an error.

Change History (3)

comment:1 Changed 4 years ago by wbland

  • Milestone changed from mpich-3.1.1 to future

I tried to replicate this problem with valgrind, but didn't see any issues. I also poked through the code and didn't see anything wrong in MPID_nem_init. Can we call this done given that Mattieu's issue is fixed?

comment:2 Changed 4 years ago by Wesley Bland <wbland@…>

  • Owner set to Wesley Bland <wbland@…>
  • Resolution set to fixed
  • Status changed from new to closed

In e65d15dcd45aa30c20b697e7762e79016ff02ec7:

Fixes failing electric-fence check

electric-fence tool detects malloc(0) in mpid_nem_init.c

Fixes #2054

Signed-off-by: Wesley Bland <wbland@…>

comment:3 Changed 4 years ago by Pavan Balaji <balaji@…>

In 6f6e2dccc70fb7835275524e866f48357de81dab:

Fixes freeing uninitialized memory

Last fix corrected malloc(0) problem but forgot the free part.

Fixes #2054

Signed-off-by: Pavan Balaji <balaji@…>

Note: See TracTickets for help on using tickets.