[Beowulf] refunding reserved amount in gold

2011-10-12 Thread akshar bhosale
Hi, We are using PBS (torque 2.4.8) and gold version 2.1.7.1. One of the jobs went for execution and reserved the equivalent amount. The same job came out of execution and went in queue from execution. This happened 30 times for the same job. Every time job has reserved amount. Now finally

[Beowulf] error in job; jobs failing

2011-04-02 Thread akshar bhosale
Hi, we are getting dapl 4003 event error. We have rhel 5.2 x64 and intel mpi library 4.3;dapl-1.2.7-1.ofed1.3.1; What can be the reason? we have torque and pbs setup for job runs. ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computin

Re: [Beowulf] Fwd: maui: how to set different walltime for different users

2011-02-23 Thread akshar bhosale
thanks..i could not find resource quota set in maui...can anybody help me out. -akshar On Thu, Feb 24, 2011 at 12:44 AM, Reuti wrote: > Am 23.02.2011 um 20:07 schrieb akshar bhosale: > > > -- Forwarded message -- > > From: akshar bhosale > > Date: Fri,

[Beowulf] Fwd: maui: how to set different walltime for different users

2011-02-23 Thread akshar bhosale
-- Forwarded message -- From: akshar bhosale Date: Fri, Feb 18, 2011 at 12:01 AM Subject: maui: how to set different walltime for different users To: Beowulf Mailing List hi, we have a cluster on 16 nodes where we run torque+maui. We have set max walltime of 4 days for all

[Beowulf] maui: how to set different walltime for different users

2011-02-17 Thread akshar bhosale
hi, we have a cluster on 16 nodes where we run torque+maui. We have set max walltime of 4 days for all jobs. we want to set different max walltimes for different users. e.g. user abc wants 5 days as max walltime, user xyzwants 55 days as max walltime for a single job. we dont want to create new

[Beowulf] error starting job : stray job; master mom log says : can not compose message to sister

2011-01-07 Thread akshar bhosale
hi, we have 100 nodes cluster. we have strange problem on cluster with torque 2.4.8 a job submitted for 256 cores interactively gives following error in pbs server : PBS_Server;LOG_ERROR::sync_node_jobs, stray job 2004.nodesvr.clust1.in found on node07.clust1.in PBS_Server;LOG_ERROR::sync_node_job

[Beowulf] shutting down pbs server and maui for half an hour will affect running jobs?

2010-07-09 Thread akshar bhosale
hi, we have maintenance of pbs server so it is going down for half an hour ..will it affect running jobs?where is the timeout defined?can it be increased? on pbs mom side or pbs server side we need to change?any other parameter we need to check ?will it hold the already running jobs for half an hou

[Beowulf] guide for pbs/torque and mpi

2010-07-01 Thread akshar bhosale
hi, we want to have a good reference guide for torque(pbs),maui and mpi akshar ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/li

[Beowulf] in pbs submit script, setenv command is not working

2010-04-08 Thread akshar bhosale
Hi, we have cluser of 8 nodes and it is rhel 5.2 (64 bit). We have torque and here is my submit script which is #!/bin/csh -f #PBS -l nodes=2:ppn=2 #PBS -r n #PBS -A ourproj #PBS -V #PBS -o output_pvd3.6.txt #PBS -e error_pvd3.6.txt echo PBS JOB id is $PBS_JOBID echo PBS_NODEFILE is $PBS_NODEFILE e

[Beowulf] error while using mpirun

2010-03-12 Thread akshar bhosale
i have installed mpich 1.2 6 on my desktop (core 2 duo) my test file is : #include #include int main(int argc,char *argv[]) { int rank=0; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("my second program rank is %d \n",rank); MPI_Fin

Re: [Beowulf] error while make mpijava on amd_64

2010-03-05 Thread akshar bhosale
Hi Mark, Many thanks 2 u.. Regards, Rigved On Thu, Mar 4, 2010 at 1:35 AM, Mark Hahn wrote: > we r not getting latest free download version for mpijava for linux for >> > > this is version 1.2.5 circa jan 2003, right? right away this should set > off some alarms, since any maintained packag

Re: [Beowulf] using watchdog timers to reboot a hung system automagically: Good idea or bad?

2009-10-23 Thread akshar bhosale
hi rahul, same thing happens at our side.node gets reboot due to asr and it doesnt crash.can u suggest any remedy? On Fri, Oct 23, 2009 at 6:26 AM, Rahul Nabar wrote: > I wanted to get some opinions about if watchdog timers are a good idea > or not. I came across watchdogs again when reading t