Opened 8 years ago

Last modified 5 years ago

#967 new bug

issues Pack/Unpack_external with MPI_LONG and MPI_DOUBLE

Reported by: goodell Owned by:
Priority: minor Milestone: future
Component: mpich Keywords:
Cc: dalcinl@…

Description

Originally reported on mpich2-dev@ by Lisandro Dalcín

Sorry if this is a known issue. Also sorry for pasting Python code,
but no time right now to write proper testcase in C.

From long time back, I'm having issues in mpi4py testsuite when using
Pack/Unpack_external. The following script should be enough to exhibit
all the issues I have on Linux 32 and 64.

This is the Python code I'm testing. Basically, for various MPI
datatypes, I do a pack/unpack of an array with 5 items and print them
at the end, expecting input/output arrays to be the same:

from mpi4py import MPI
import numpy

for typecode, datatype in [('i', MPI.INT),
                          ('l', MPI.LONG),
                          ('q', MPI.LONG_LONG),
                          ('f', MPI.FLOAT),
                          ('d', MPI.DOUBLE),
                          ]:
   # temp array for packing
   nbytes = datatype.Pack_external_size('external32', 5)
   tmpbuf = numpy.empty(nbytes, dtype='B') # unsigned char

   # pack input array
   iarray = numpy.arange(1,6, dtype=typecode)
   datatype.Pack_external('external32', iarray, tmpbuf, 0)

   # unpack output array
   oarray = numpy.zeros(5, dtype=typecode)
   datatype.Unpack_external('external32', tmpbuf, 0, oarray)

   print datatype.Get_name(), datatype.Get_size()
   print 'input: ', iarray
   print 'output:', oarray
   print 'buffer:', tmpbuf, len(tmpbuf)


Now I'll list my issues when I run the code above on Linux 32 and 64 bits:

1) Linux 32:

All but the MPI_DOUBLE iteration works. For the MPI_DOUBLE  case, the
pack/unpack seems to do the job, but memory is getting corrupted. A
run under valgrind shows this a few times (both invalid reads and
invalid writes):

==10399== Invalid read of size 4
==10399==    at 0x48736AA: external32_float_convert (mpid_ext32_segment.h:167)
==10399==    by 0x487435D: MPID_Segment_contig_pack_external32_to_buf
(mpid_ext32_segment.c:208)
==10399==    by 0x48E13ED: MPID_Segment_manipulate (segment.c:528)
==10399==    by 0x487326E: MPID_Segment_pack_external32
(mpid_ext32_segment.c:302)
==10399==    by 0x48B3088: PMPI_Pack_external (pack_external.c:140)
==10399==    by 0x469B16F:
__pyx_pf_6mpi4py_3MPI_8Datatype_Pack_external (mpi4py.MPI.c:34203)
==10399==    by 0x5533077: PyCFunction_Call (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x558F072: PyEval_EvalFrameEx (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x5590E49: PyEval_EvalCodeEx (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x5590FB3: PyEval_EvalCode (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x55AC25B: ??? (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x55AC322: PyRun_FileExFlags (in /usr/lib/libpython2.6.so.1.0)
==10399==  Address 0x42e4b48 is 0 bytes after a block of size 40 alloc'd
==10399==    at 0x4005BDC: malloc (vg_replace_malloc.c:195)
==10399==    by 0x5222979: ??? (in
/usr/lib/python2.6/site-packages/numpy/core/multiarray.so)
==10399==    by 0x52365C0: ??? (in
/usr/lib/python2.6/site-packages/numpy/core/multiarray.so)
==10399==    by 0x5236CAE: ??? (in
/usr/lib/python2.6/site-packages/numpy/core/multiarray.so)
==10399==    by 0x5533077: PyCFunction_Call (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x54F280C: PyObject_Call (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x558ED4F: PyEval_EvalFrameEx (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x5590E49: PyEval_EvalCodeEx (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x5590FB3: PyEval_EvalCode (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x55AC25B: ??? (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x55AC322: PyRun_FileExFlags (in /usr/lib/libpython2.6.so.1.0)
==10399==    by 0x55AD8C0: PyRun_SimpleFileExFlags (in /usr/lib/libpython2.6.so.


2) Linux 64:

a) I have the same issue as before for MPI_DOUBLE.

b) Additionally, MPI_LONG does not seems to do the job. I get this output:

MPI_LONG 8
input:  [1 2 3 4 5]
output: [0 0 0 0 0]
buffer: [ 16  70 115   0   0   0   0   0 192 238 240  79   6  43   0
0 255 255 255 255] 20


-- 
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

Change History (3)

comment:1 Changed 8 years ago by goodell

  • Cc dalcinl@… added

comment:2 Changed 8 years ago by goodell

AFAIK, external32 is not fully implemented and it is not surprising that it does not work. We do not have any near-term plans to implement it.

We will update this ticket as the status of external32 support in MPICH2 changes.

comment:3 Changed 5 years ago by robl

Intel contributed external32 support for ROMIO, but it depends on bug-free pack/unpack_external32. I committed a version of it in [fec9f28d0b67a8e6912b1a309b2739a2551b83d2].

Note: See TracTickets for help on using tickets.