Opened 8 years ago

Closed 6 years ago

#984 closed bug (wontfix)

dllchan build

Reported by: balaji Owned by: gropp
Priority: major Milestone: future
Component: mpich Keywords:
Cc:

Description

The dllchan build seems to be calling the configures to each "sub-channel" at make time. This means that the environment propagation used by configure is unused here. The file src/mpid/ch3/channels/dllchan/Makefile.sm explicitly passes the variables it needs to the sub-channels.

Before [8f7007b8de69fdbd857dd47705f5a0c534c17790], the CPPFLAGS were ignored from this set of variables that are passed. However, in order to allow for MPL and OPA includes to be set through CPPFLAGS, this was modified to pass CPPFLAGS as well. However, doing this causes the build to break.

On some investigation, it looks like some of the header files such as mpidi_ch3_impl.h are defined in all channels including dllchan. So, when MPICH2 is configured with ch3:dllchan:sock, the correct header needs to be used. That is, the files in the sock sub-channel need to include the mpidi_ch3_impl.h in sock/include and not the one in dllchan/include.

Earlier, since CPPFLAGS were not passed to the sub-configures at all, each channel was using its local header. However, when CPPFLAGS is passed, dllchan sets the flags to -I<some_path>/dllchan/include, after which each sub-channel appends its local path to the flags: -I<some_path>/dllchan/include -I<some_path>/sock/include. This causes dllchan's headers to be used, when the headers from the sub-channel should be used.

Attachments (1)

dllchan.patch (2.9 KB) - added by balaji 8 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 8 years ago by balaji

A relatively easy fix for this might be to always add the include flags to the front of the CPPFLAGS instead of appending it at the end. But this "fix" seems ugly.

Changed 8 years ago by balaji

comment:2 Changed 8 years ago by balaji

Attached a patch with the hack fix mentioned above.

comment:3 Changed 8 years ago by gropp

I've fixed this for the sock, ssm, and shm channels. But since the ssm and shm channels were deleted, I no longer have a good test case (I need at least two channels for a good test).

comment:4 Changed 8 years ago by thakur

sock and nemesis would be an option.

comment:5 Changed 8 years ago by thakur

On second thought, it may not be worth doing a dllchan across ch3 channels since we want users to develop new netmods under Nemesis rather than new ch3 channels. So dynamic loading of Nemesis netmods is probably more important.

comment:6 Changed 8 years ago by balaji

So far, all of the requests I have heard from users are for runtime selectable netmods, rather than dynamically loadable netmods (which achieves runtime selectability and more). Further, runtime selectable netmods also ensure correct abstraction of layers.

So, we need to be clear on which one we want -- runtime selectable or dynamic loadable netmods.

comment:7 follow-up: Changed 8 years ago by gropp

The primary case for dynamically loadable netmods (or channels) is for the case of an executable that must run on a foreign system and needs to load the netmod that supports the network that is present on the system. This is a case in which Intel is interested and that Open MPI supports (at least as I understand it).

The other reason for dynamically loadable netmods/channels is more indirect - to make this work, you need to have very clean boundaries between the various components (which we do not have yet). Thus, these serve as a way to test this aspect of the design and implementation.

Finally, the statement "we want users to develop new netmods under Nemesis" is, I hope, shorthand for "we want most users looking to add a new transport layer to strongly consider starting with a new netmod under Nemesis". MPICH2 must support the ADI layers that have allowed serious vendors to optimize their implementation; Nemesis, for all of its good features, is not the universal best interface. MPICH2 should, first and foremost, remain a research vehicle into MPI implementation issues that also serves as the basis for a quality MPI implementation.

comment:8 in reply to: ↑ 7 Changed 8 years ago by goodell

Replying to gropp:

The primary case for dynamically loadable netmods (or channels) is for the case of an executable that must run on a foreign system and needs to load the netmod that supports the network that is present on the system. This is a case in which Intel is interested and that Open MPI supports (at least as I understand it).

This case is still handled by runtime-selectable netmods. The executable could just dynamically link against the local MPI library, which would have one or more selectable netmods that are appropriate for the system. There's no strict reason that the executable needs to ship with its own MPI library but with a limited set of netmods.

Finally, the statement "we want users to develop new netmods under Nemesis" is, I hope, shorthand for "we want most users looking to add a new transport layer to strongly consider starting with a new netmod under Nemesis". MPICH2 must support the ADI layers that have allowed serious vendors to optimize their implementation; Nemesis, for all of its good features, is not the universal best interface. MPICH2 should, first and foremost, remain a research vehicle into MPI implementation issues that also serves as the basis for a quality MPI implementation.

I think Rajeev's point was in contrast to adding new CH3 channels. The channel interface has several deficiencies, many of which are corrected by the netmod interface. Also, channels cannot easily take advantage of nemesis' sophisticated shared memory support. Nobody wants to get rid of the ADI interface or discourage high-end vendors from implementing full devices when it is appropriate.

comment:9 Changed 6 years ago by balaji

There has been no discussion on this ticket for almost 2 years now. From the discussion so far, it sounds like the runtime selectability of netmods seems to meet all our needs. Also, more importantly, dllchan has been broken for 2 years now.

If we do want do have dynamically loadable modules (either netmods or channels), we need to start with a better wiki design document and discuss it within the group before writing up code. The current design has several deficiencies such as the build model as reported in the original ticket text.

If there are no more comments on this ticket, I'd like to mark this as "wontfix" and delete dllchan from trunk.

comment:10 Changed 6 years ago by balaji

  • Resolution set to wontfix
  • Status changed from new to closed

dllchan has been removed in [b3ca28687bddd8083a0e0b0ff84da6ebe55545b0]. This ticket is no longer relevant.

Note: See TracTickets for help on using tickets.