Opened 6 years ago

Last modified 5 years ago

#1632 new bug

fortran examples hang on mac OS 10.7 / intel 12.1

Reported by: bourdin@… Owned by:
Priority: major Milestone: future
Component: mpich Keywords:
Cc:

Description (last modified by balaji)

Using mpich2-1.4.p1 compiled with intel 12.1 under mac OS 10.7.4 (and possibly other releases of 10.7). All C examples run fine, but all fortran examples seem to hang in intel_cpu_indicator_init

(gdb) where

0 0x000000010acc99c8 in intel_cpu_indicator_init ()

1 0x0000000000000000 in ?? ()

This is a problem since intel only supports version 12 of its compilers under lion (11.1 can be made to work with a few hacks)

I configures mpich2 using the following command line:

$ ./configure --prefix=/opt/HPC/mpich2-1.4.1p1-intel12.1 --enable-fast=O3 --enable-g=dbg --enable-romio --enable-shared --enable-sharedlibs=osx-gcc --with-device=ch3:sock --with-pm=gforker --without-mpe CC=icc CXX=icpc FC=ifort F77=ifort

the same command line will work fine with version 11.1 of the intel compilers.

Has anybody seen this issue?

Blaise

Change History (10)

comment:1 Changed 5 years ago by balaji

  • Milestone set to mpich2-1.5

Is you seeing the error with gfortran (instead of the Intel compilers)?

comment:2 Changed 5 years ago by Bourdin@…

This only happens with version 12.x of the intel compilers under Mac OS. Everything is fine with gcc/gfortran or with icc/ifort from intel 11.1.

I have not tried mixing gnu and intel compilers.

Everything works fine under Linux with intel 11.1 and 12.1.

This is especially critical since intel 11.1 is not supported under Mac OS 10.7 and requires a few hacks to be made to work.

Thanks for your help,

Blaise

comment:3 Changed 5 years ago by balaji

  • Owner set to buntinas
  • Status changed from new to assigned

comment:4 Changed 5 years ago by buntinas

Unfortunately, we don't currently have access to Intel compilers on OSX, so we can't test this here.

Try running it with the environment variable MPICH_NO_LOCAL=1 set. E.g.,

MPICH_NO_LOCAL=1 mpiexec -n 2 foo

Let us know if this fixes the hang problem.

Also, test the OpenPA library. From the mpich2 directory do this:

cd src/openpa
make check

and let us know if any test fail.

Thanks!

comment:5 follow-up: Changed 5 years ago by bourdin@…

Hi,

  • setting the environment variable MPICH_NO_LOCAL=1 makes no differences. Actually, f77 and f90 programs hang even when launched directly (i.e. without going through mpiexec)
  • all test in openpa pass.

I can create an account in a development machine here at LSU, if you send me an SSH public key.

Regards,
Blaise

comment:6 in reply to: ↑ 5 Changed 5 years ago by chan

Did you try adding -g and -traceback in compiling the fortran examples and see if it provides more info where it hangs ?

comment:7 Changed 5 years ago by buntinas

I found that I could get a 30-day demo version so I was able to reproduce the bug with intel 12.1.5 compilers.

Short version: If you don't build shared libraries (omit --enable-shared and --enable-sharedlibs=) then it works.

Long version: I built a non-mpi hello world program and found when I linked with the mpich libraries configured with --enable-shared, then the program hung before whatever the fortan version of main() is. The bug exists in 1.4.1p1 as well as trunk ([44c3b88c3b439f740dc14640e7286d1b2af101fb]) which uses libtool.

At the end of the build, I got warnings like

ld: warning: direct access in __ZN3MPI8Datatype13Create_keyvalEPFiRKS0_iPvS3_S3_RbEPFiRS0_iS3_S3_ES3_
to global weak symbol __ZN3MPI8Datatype14NULL_DELETE_FNERS0_iPvS2_ means the weak symbol cannot
be overridden at runtime. This was likely caused by different translation units being compiled with 
different visibility settings.

which made me think that it might be weak symbol related, but adding --disable-weak-symbols did not fix the bug (nor, however, did it make the warnings go away, which may mean that --disable-weak-symbols doesn't.)

I don't yet know whether this is a bug with the intel compiler or in how we generate shared-libraries.

comment:8 Changed 5 years ago by bourdin@…

I can disable generation of dynamic libraries, it's not a huge deal.

Thanks for your help

comment:9 Changed 5 years ago by buntinas

  • Milestone changed from mpich2-1.5 to future

Since there's a work around (using static libraries) I'm going to leave this for future work.

comment:10 Changed 5 years ago by balaji

  • Description modified (diff)
  • Owner buntinas deleted
  • Status changed from assigned to new
Note: See TracTickets for help on using tickets.