Opened 7 years ago

Last modified 4 years ago

#1061 new feature

Avoid allocating the VCR for MPI_COMM_WORLD (Make the VCR API independent of the nature of the VCR list)

Reported by: jratt0@… Owned by:
Priority: minor Milestone: future
Component: mpich Keywords:
Cc: jratt@…, goodell@…, buntinas@…

Description (last modified by balaji)

This patch was experimentally tested to enable memory optimizations. On BGP, for MPI_COMM_WORLD, MPID_VCR_Get_lpid(comm_world->vcr[i], &lpid) will always return i in lpid. This means that, except to make the array dereference work, MPI_COMM_WORLD doesn't actually need a VCR list.

The basic gist of the patch is to replace VC calls that take ..., vcr[i], ... with ..., vcr, i, .... That way, a device can store special values in vcr to let it know that it isn't a normal pointer. The change isn't too hard; I was able to make most of the changes with one-line perl.

This doesn't show it, but a further patch was added that uses a VCR of ((void*)1) for MPI_COMM_WORLD. Then the DCMFd implementation checks for that value; if vcr == (void*)1, return i; else return vcr[i].lpid. We expect this to save a lot of memory as process counts increase.

Attachments (1)

0001-Make-the-VCR-API-independent-of-the-nature-of-the-VC.diff (14.4 KB) - added by jratt0@… 7 years ago.
Patch in discussion

Download all attachments as: .zip

Change History (12)

Changed 7 years ago by jratt0@…

Patch in discussion

comment:1 Changed 7 years ago by jratt0@…

The diff should apply with patch -p2 --dry-run < 0001-Make-the-VCR-API-independent-of-the-nature-of-the-VC.diff from inside the mpich2 directory. However, the DCMFd code might have to be removed.

comment:2 Changed 7 years ago by goodell

  • Owner set to goodell
  • Status changed from new to accepted

I'll take a look at this later in the week. The idea sounds good, but I need to read the code and make sure that there isn't a better change also.

Thanks for sending this along.

comment:3 Changed 7 years ago by goodell

  • Milestone changed from mpich2-1.3.2 to mpich2-1.4

comment:4 Changed 7 years ago by balaji

  • Milestone changed from mpich2-1.4 to mpich2-1.5

comment:5 Changed 6 years ago by jratt0@…

Is there a latest status on this? Anything I can do to help?

comment:6 Changed 6 years ago by goodell

Hi Joe,

The patch you provided was unsuitable for CH3 for a few reasons, the specifics of which I can't recall right now. I think it mainly had to do with a lot of bad code in CH3 that assumed it knew how the VCs/VCRs/VCRTs were implemented and couldn't easily be improved to deal with the new interface.

I recently wrote a whole bunch of code to overhaul this area and yield the same effect. As a bonus it greatly simplifies much of the upper level code that deals with group and VCR/VCRT construction. It's sitting on a branch ("dev/mem-efficiency") right now waiting for someone to make a pass to fix up the few small performance issues that are present. One of Bill's students is supposed to get to it soon, otherwise I'll probably do it. It will also need a little bit of work to incorporate the changes into dcmfd's successor.

So, in terms of help, I suppose you could take a look at the net diff on that branch relative to its branch point and let me know what you think of it. It's obviously a little rough still, but it should clean up easily enough.


comment:7 Changed 5 years ago by balaji

  • Milestone changed from mpich2-1.5 to mpich2-1.5.1

comment:8 Changed 5 years ago by balaji

  • Milestone changed from mpich2-1.5.1 to mpich-3.0

Milestone mpich2-1.5.1 deleted

comment:9 Changed 5 years ago by balaji

  • Milestone changed from mpich-3.0 to mpich-3.0.1

comment:10 Changed 4 years ago by balaji

  • Cc changed from,,, to,,
  • Description modified (diff)
  • Milestone changed from mpich-3.1 to future
  • Status changed from accepted to new

comment:11 Changed 4 years ago by balaji

  • Owner goodell deleted
Note: See TracTickets for help on using tickets.