Dear developers,

I have problem when using mpich2. I use MPICH2 to run PWSCF code. In normal
case, it runs well. But when I run with a large calculation ( need more memory),
the command can not run without any error announcement.

But when I use simple computer to run this file, If run well even though it is
very slow. How can I solve this problem?
 The command I use:

mpiexec -machinefile /home/loc/machinefile -n 8 pw.x -npool 2
<input_hydrogen_for_C8OOH >test
Loc Duong Dinh
Ms-Ph.D Student
Sungkyunkwan Advanced Institute of Nanotechnology,
Sungkyunkwan University,
Suwon, 440-746, Korea

comment:2 Changed 5 years ago by Jayesh Krishna

  Are you using the latest stable release, 1.1, of MPICH2? Please try the
latest release
nloads) and get back to us if you still have problems.

  If you still have problems please provide us the following,

# The output of the configure command
# The complete error message
# Any relevant details that might help us narrow down the problem - When
did you start getting the errors (After an MPICH2 upgrade, change in
problem size etc)?


comment:3 Changed 5 years ago by gropp

Is this the "stdin file is too large" problem with mpd? Would switching to hydra be a possible solution?

resolving until we hear further.

I am using MPICH2. I have the problem when I use to run some program requiring
large memory and long time calculation. It suddenly stops running in the
procedure. The error message is:

"rank 2 in job 57  master_42076   caused collective abort of all ranks
exit status of rank 2: killed by signal 9"

If I retart the job, it continues running and again suddenly stops after certain
time running.

Have you any suggestion to solve this problem?

I appreciate your help.

Sincerely, -----------------------------------------------
Loc Duong Dinh
Ms-Ph.D Student
Sungkyunkwan Advanced Institute of Nanotechnology,
Sungkyunkwan University,
Suwon, 440-746, Korea

It is not clear what the problem is here, but if the application is memory intensive, my guess is that it might be running out of memory on the node.

