Opened 5 years ago

#1637 new bug

`>` should be `>=` in r7359

Reported by: goodell Owned by:
Priority: minor Milestone: future
Component: mpich Keywords:
Cc:

Description

in [2e57c15465a9722a9e03ec18d0bfcf8ac0e758f0] there is some code that looks buggy:

            if (lpid > comm_world_size ||

I think this comparison hits an edge case with dynamic processes and that the comparison should be >= instead. I spotted the bug a while back and intended to fix it, but I don't want to fix it without a test case, which I haven't had the time to fix. This ticket is a reminder to both fix the bug and provide a regression test for it.

Off the top of my head, I think that the test would look something like:

  • launch the original MPI_COMM_WORLD job
  • split that world into two groups (A and B)
  • each of those groups spawns a child group (C and D respectively)
  • B now does a connect/accept to C and A does a connect/accept to D. This will result in an lpid ordering view for processes in group A of [(A&B),C,D], while the view for processes in group B will be [(A&B),D,C].
  • Now do an MPI_Group_translate_ranks operation at a process in each of group A and group B that involves a process from the last rank in (A&B) and the first processes in C and D. Both operations should return consistent results.

Change History (0)

Note: See TracTickets for help on using tickets.