Opened 9 years ago

Closed 8 years ago

#634 closed bug (worksforme)

mpiexec command issue

Reported by: loc duong ding <mambom1902@…> Owned by:
Priority: major Milestone:
Component: mpich Keywords:

Description (last modified by balaji)

Dear developers,

I have problem when using mpich2. I use MPICH2 to run PWSCF code. In normal
case, it runs well. But when I run with a large calculation ( need more memory),
the command can not run without any error announcement.

But when I use simple computer to run this file, If run well even though it is
very slow. How can I solve this problem?
 The command I use:

mpiexec -machinefile /home/loc/machinefile -n 8 pw.x -npool 2
<input_hydrogen_for_C8OOH >test
Loc Duong Dinh
Ms-Ph.D Student
Sungkyunkwan Advanced Institute of Nanotechnology,
Sungkyunkwan University,
Suwon, 440-746, Korea

Attachments (1)

part0001.html (1.0 KB) - added by Jayesh Krishna 9 years ago.
Added by email2trac

Download all attachments as: .zip

Change History (7)

comment:1 Changed 9 years ago by loc duong ding

  • id set to 634

This message has 0 attachment(s)

Changed 9 years ago by Jayesh Krishna

Added by email2trac

comment:2 Changed 9 years ago by Jayesh Krishna

  Are you using the latest stable release, 1.1, of MPICH2? Please try the
latest release
nloads) and get back to us if you still have problems.

  If you still have problems please provide us the following,

# The output of the configure command
# The complete error message
# Any relevant details that might help us narrow down the problem - When
did you start getting the errors (After an MPICH2 upgrade, change in
problem size etc)?


comment:3 Changed 9 years ago by gropp

Is this the "stdin file is too large" problem with mpd? Would switching to hydra be a possible solution?

comment:4 Changed 9 years ago by thakur

  • Resolution set to wontfix
  • Status changed from new to closed

resolving until we hear further.

comment:5 Changed 9 years ago by mambom1902@…

  • Resolution wontfix deleted
  • Status changed from closed to reopened

I am using MPICH2. I have the problem when I use to run some program requiring
large memory and long time calculation. It suddenly stops running in the
procedure. The error message is:

"rank 2 in job 57  master_42076   caused collective abort of all ranks
exit status of rank 2: killed by signal 9"

If I retart the job, it continues running and again suddenly stops after certain
time running.

Have you any suggestion to solve this problem?

I appreciate your help.

Sincerely, -----------------------------------------------
Loc Duong Dinh
Ms-Ph.D Student
Sungkyunkwan Advanced Institute of Nanotechnology,
Sungkyunkwan University,
Suwon, 440-746, Korea

comment:6 Changed 8 years ago by balaji

  • Description modified (diff)
  • Resolution set to worksforme
  • Status changed from reopened to closed

It is not clear what the problem is here, but if the application is memory intensive, my guess is that it might be running out of memory on the node.

Note: See TracTickets for help on using tickets.