Opened 8 years ago

Last modified 4 years ago

#940 new feature

add mpdlistjobs-like functionality to hydra

Reported by: goodell Owned by:
Priority: minor Milestone: future
Component: mpich Keywords:
Cc: ashley@…

Description (last modified by balaji)

There's a chance that hydra already implements this, but I suspect it doesn't.

PADB figures out how to attach to processes by using mpdlistjobs and parsing the output of that command. Implementing this for hydra actually falls into two cases, persistent and on-demand. Something directly analogous to mpdlistjobs makes sense for the persistent case, but another solution will probably be needed for on-demand. Ashley Pittman (CCed) suggested it could be possible to use some of the debugger hooks for this. Signals or writing out a job description at mpiexec-time might be an alternative.

Change History (18)

comment:1 Changed 8 years ago by balaji

  • Owner set to balaji
  • Status changed from new to accepted

What is the exact functionality required here? Just a way to query the PIDs of all processes running on the different nodes?

comment:2 Changed 8 years ago by balaji

Adding Ashley's reply to the comment list (please directly append to the ticket comments in the future):

==================================================================
I need to be able to find three things:

1) A list of identifiers for currently running jobs.

2) A list of hostnames for given job.

3) A way of converting from pids to ranks given a job and a hostname

for preference #3 would be done on the local host and for that host only
however getting this info on the frontend works as well. For a forth
option being able to run a "shadow" job on the same list of nodes would
be nice, I can do this with mpd by writing a hosts file and calling
mpdrun but if this isn't possible I can just use pdsh instead, at least
at small nodes counts.

I'm around all week if you want to go over this in person.

Ashley,
==================================================================

comment:3 Changed 8 years ago by balaji

With respect to the first point, do you need to be able to query for all jobs running on the system? Even with mpd, unless the entire system shares a single mpd ring, you can't do that. If you just want unique identifiers for each job, can the local PID of the "mpiexec" process coupled with the hostname serve as a job identifier for you (e.g., foobar:8764)?

comment:4 Changed 8 years ago by Ashley Pittman <ashley@…>

Thank you for forwarding that mail, I was just logging on here to do it myself when I saw it along with your reply.

For the list of jobs I don't have a strong preference, slurm and rms have system wide global identifiers and you can run padb from anywhere on the system, mpd has ring wide global identifiers and you can run padb from anywhere in the ring, orte has more what you describe in that the identifier is unique to the node where mpirun is executing only and padb can only be run from that node.

It's not a problem using the pid of mpiexec as the identifier as long as step 2 can translate this pid into the hostlist, for example mpdlistjobs shows a per-job id but not the pid of mpiexec so I have to use the id. I'd also need a way of distinguishing which mpiexec processes were really hydra and which were some other implementation, I can probably do this by calling basenamerealink /proc/$pid/exe however.

In summary I can work with anything as long as I can tie steps 1, 2 and 3 together with the same identifier, the visible difference to the end user will be where they can run padb from.

comment:5 Changed 8 years ago by balaji

You should still be able to run padb from any node. The way I'm envisioning this to work is to have a helper executable that takes this identifier (which contains the hostname + PID of the "mpiexec" process) as input, sends a query to this mpiexec process and gets all the information it needs from it.

Btw, the "helper executable" might be another instance of mpiexec itself with a different set of parameters, since it'd be convenient for another piece of work I'm doing.

comment:6 Changed 8 years ago by Ashley Pittman <ashley@…>

I can see how that would work for specifying the job and allowing your work to connect to it, how would I do #1 in this case, that is to find the list of jobs in the first place?

comment:7 Changed 8 years ago by balaji

The intention was not to automatically provide the list of all the running jobs.

Hydra can work with different bootstrap servers/resource managers (e.g., slurm, pbs, ssh). When Hydra is using slurm or pbs, you should be able to query the resource manager to get information about different jobs, independently of Hydra. But when using ssh, there's no resource manager -- the user is explicitly managing resources. So, isn't it the user's job to provide information on what all jobs are running on the system? I'll be happy to add a -print-jobid option to mpiexec which dumps this information to stdout to make it convenient for the user.

comment:8 Changed 8 years ago by buntinas

Or you could add a per-node jobfile list in /tmp or something. Or just /tmp/hydra-${USER}-${PID} files. That way if the user starts padb on the same node she started mpiexec, padb can give a list of active jobs, w/o the user having to type the jobid manually.

comment:9 Changed 8 years ago by balaji

Even in that case the user will still need to provide the job ID, since the same user can have multiple jobs. We can possibly optimize special cases where padb is started on the same node and the user has a single job, but I'm not sure if it's worthwhile to do that.

comment:10 Changed 8 years ago by buntinas

I was thinking that padb can give the user a list to select from of current jobs that were started on that node.

comment:11 Changed 8 years ago by balaji

Is that adding much more than just a grep in the "ps ax" output?

comment:12 Changed 8 years ago by buntinas

nope, just cleaner and easier to parse.

comment:13 follow-up: Changed 8 years ago by balaji

We also need to care about cleaning up /tmp of processes that have badly terminated. And possibly locking to avoid race conditions (e.g., if /tmp/hydra-foo is created, but no data has been written to it yet). Is telling the users to give the hostname and PID of the mpiexec process they care about, really that big a deal?

comment:14 in reply to: ↑ 13 Changed 8 years ago by buntinas

Replying to balaji:

We also need to care about cleaning up /tmp of processes that have badly terminated. And possibly locking to avoid race conditions (e.g., if /tmp/hydra-foo is created, but no data has been written to it yet).

These files would be empty. They're created when a hydra mpiexec is started, and possibly cleaned up when it terminates.

It's possible that you have some stale files (mpiexec terminated but didn't delete its file), but that just means that padb would have to verify that the process exists and is an mpiexec.

I see these as hints that would give higher fidelity data to padb that what it could get from just ps|grep (you could have non-hydra mpiexecs).

Is telling the users to give the hostname and PID of the mpiexec process they care about, really that big a deal?

I don't know, you'll have to ask Ashley about that. But it just seems like an easy feature to add.

comment:15 Changed 8 years ago by balaji

I can add it if it's needed, but I really want to distinguish between what can be done and what's needed.

Ashley: thoughts?

comment:16 Changed 8 years ago by Ashley Pittman <ashley@…>

I'm not sure how hydra would interact with slurm, if it was then padb can just interact with slurm and use squeue as it does now. If users are using hydra with ssh though then I need to be able to detect this and present a list of jobs to the user.

typically I find that padb is run with the -a flag (--all) which targets all jobs on the system for the current user (There is also --any), I could just say that this isn't supported on hydra and users need to specify a jobid but this is going to be a pain for users to do, particularly as there doesn't appear to be any other way of finding out the jobid.

Ashley,

comment:17 Changed 4 years ago by balaji

  • Description modified (diff)
  • Status changed from accepted to new

comment:18 Changed 4 years ago by balaji

  • Owner balaji deleted
Note: See TracTickets for help on using tickets.