Subject: Integration of PVM into SGE 5.3
To: None <current-users@netbsd.org, tech-cluster@netbsd.org>
From: Co Thai Ngo <cngo@nmsu.edu>
List: current-users
Date: 06/16/2005 15:35:20
Hi,
I'm trying to integrate PVM into SGE 5.3. But for some reasons, pvm is not
started. Here is what I did:
- replaced SGE_ROOT/pvm with the supplied from SGE howto website
(http://gridengine.sunsource.net/howto/pvm-integration/pvm-integration.html)
- Since NetBSD is not supported by PVM, I added the folloing line in to
SGE_ROOT/pvm/src/aimk
********
case nbsd-i386:
case glinux:
case linux:
set CC = gcc
set CFLAGS = "-O -Wall -Werror -Wstrict-prototypes -DLINUX $DEBUG_FLAG
$CFLAGS"
set LFLAGS = "$DEBUG_FLAG $LFLAGS"
...
*******
- Defined PE:
********
arbutus# qconf -mp pvm
pe_name pvm
queue_list all
slots 32
user_lists NONE
xuser_lists NONE
start_proc_args /usr/pkg/sge/pvm/startpvm.sh $pe_hostfile
$host /usr/pkg/pvm3
stop_proc_args /usr/pkg/sge/pvm/stoppvm.sh $pe_hostfile $host
allocation_rule 1
control_slaves FALSE
job_is_first_task TRUE
then I got the errors when I run the hello program:
*********
acacia: {36} more tester_loose.sh.pe481
[pvmd pid25581] 06/16 14:21:08 readhostfile() iflist failed
startpvm: Couldn't get all of the 2 requested hosts
rm: /tmp/481.1.yucca.q/hostfile: No such file or directory
libpvm [pid27666] /tmp/pvmd.1024: No such file or directory
libpvm [pid27666] /tmp/pvmd.1024: No such file or directory
libpvm [pid27666]: pvm_halt(): Can't contact local daemon
********
******
acacia: {37} more tester_loose.sh.po481
/usr/pkg/sge/default/spool/yucca/active_jobs/481.1/pe_hostfile
yucca.nmsu.edu /usr/pkg/pvm3
/var/tmp/tmp.0.00025581aa
startpvm.sh: startup failed - invoking cleanup script
/usr/pkg/sge/default/spool/yucca/active_jobs/481.1/pe_hostfile yucca.nmsu.edu
/usr/pkg/sge/default/spool/yucca/active_jobs/481.1/pe_hostfile yucca.nmsu.edu
/usr/pkg/sge/default/spool/oenothera/active_jobs/481.1/pe_hostfile
oenothera.nmsu.edu /usr/pkg/pvm3
/var/tmp/tmp.0.00026663aa
********/
Does anyone know why pvm doesn't start. I've checked with SGE people and they
think the generated line "/var/tmp/tmp.0.00025581aa" shouldn't be there and
maybe PVM is compiled in a special way on NetBSD... But they seem not know how
to fix it. I highly appreciated if anyone could help me to fix that problem.
Thank you very much,
--
Co Thai Ngo
Dept. of Biology
New Mexico State University