Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#1080 closed bug (invalid)

large_messages fails on bb machines

Reported by: buntinas Owned by: buntinas
Priority: blocker Milestone: mpich2-1.3
Component: mpich Keywords:
Cc:

Description (last modified by buntinas)

large_messages fails pretty consistently on the bb machines when run over multiple nodes.

Attachments (2)

tcp_client.c (2.4 KB) - added by buntinas 4 years ago.
tcp_server.c (2.7 KB) - added by buntinas 4 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 Changed 4 years ago by buntinas

  • Description modified (diff)

This looks like it's an issue with the Linux TCP stack.

strace shows that writev and readv are getting the correct size to write for the message, but the call fails. The 40 byte header is sent correctly, but the next call to writev to write the rest of the message fails. (The fact that writev with 2 iov elements (40byte header, big data) writes only 40 bytes on the first call may be significant.)

Reducing the size to 231-1 bytes works, but anything larger does not.

I'm going to verify this with a sockets test program.

comment:2 Changed 4 years ago by buntinas

  • Resolution set to invalid
  • Status changed from new to closed

Confirmed with a sockets program.

comment:3 Changed 4 years ago by balaji

Can you disable this test in the test suite then? Also, attach your sockets test program to this ticket.

comment:4 Changed 4 years ago by thakur

The test works in the old nightly tests on a single machine though.

comment:5 Changed 4 years ago by buntinas

This works on a single machine, so I don't think we should disable the tests. I'm attaching the test pgms.

Changed 4 years ago by buntinas

Changed 4 years ago by buntinas

Note: See TracTickets for help on using tickets.