Opened 7 years ago

Closed 5 years ago

#1477 closed bug (wontfix)

insufficient buffer space socket errors on Windows

Reported by: jayesh Owned by: jayesh
Priority: major Milestone: future
Component: mpich Keywords:
Cc: roy_kuraisa@…

Description (last modified by balaji)

The user gets these "insufficient buffer space" socket errors when bcast'ing large amount of data with 1.3.2p1 .

run (with my own debug info) and error message:
-----------------------------------------------
D:\roy>smpd -version
1.3.2p1

D:\roy>mpiexec -hosts 2 usctap3800 1 usctap3826 1 \\usdata011\MPRI-App\BAT\RoyTe
st\CorrelateMPI\CorrelateMPI.exe correlate_rep_traits.xml gwat_all_attieeric.h5
out.h5 debug
>>> Root process on computer: USCTAP3800
>>> Root process on computer: USCTAP3800
>>> No. of computers: 2
>>> Summary of InData
        Cfg file: correlate_rep_traits.xml
        Input hdf5 file: gwat_all_attieeric.h5
        Name of x data: repData
        Name of x ids: repIDs
        Name of y data: traitData
        Name of y ids: traitIDs
        Filter: pvalue
        Filter threshold: 1.0001
        Dataset name for correlation: correlations
        Dataset name for pvalue: pvalues
        Metric name: pearson
        Compression: 0
>>> Rank 0: reading input file: gwat_all_attieeric.h5
>>> File: out.h5 exists.
###
### Perf: Time to input data: 0 mins   2 secs
###
>>> in x data - Rank: 0 rows/cols/total: 39558/506/20016348
>>> in y data - Rank: 0 rows/cols/total: 347/506/175582
>>> Rank 0: Broadcasting input data to worker nodes
>>> rank: 1 metric: pearson Length: 7
>>> in x data - Rank: 1 rows/cols/total: 39558/506/20016348
>>> in y data - Rank: 1 rows/cols/total: 347/506/175582
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1430).................................: MPI_Bcast(buf=00000000018C004
0, count=20016348, MPI_FLOAT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1273)............................:
MPIR_Bcast_intra(1107)...........................:
MPIR_Bcast_binomial(143).........................:
MPIC_Recv(110)...................................:
MPIC_Wait(540)...................................:
MPIDI_CH3I_Progress(353).........................:
MPID_nem_mpich2_blocking_recv(905)...............:
MPID_nem_newtcp_module_poll(37)..................:
MPID_nem_newtcp_module_connpoll(2669)............:
MPID_nem_newtcp_module_recv_success_handler(2364):
MPID_nem_newtcp_module_post_readv_ex(330)........:
MPIU_SOCKW_Readv_ex(392).........................: read from socket failed, An o
peration on a socket could not be performed because the system lacked sufficient
 buffer space or because a queue was full.
 (errno 10055)

job aborted:
rank: node: exit code[: error message]
0: usctap3800: 123
1: usctap3826: 1: process 1 exited without calling finalize

Change History (2)

comment:1 Changed 7 years ago by balaji

  • Milestone changed from mpich2-1.5 to future

comment:2 Changed 5 years ago by balaji

  • Description modified (diff)
  • Resolution set to wontfix
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.