Newbie help for using multiprocessing and subprocess packages for creating child processes

2009-06-16 Thread Rob Newman

Hi All,

I am new to Python, and have a very specific task to accomplish. I  
have a command line shell script that takes two arguments:


create_graphs.sh -v --sta=STANAME

where STANAME is a string 4 characters long.

create_graphs creates a series of graphs using Matlab (among other 3rd  
party packages).


Right now I can run this happily by hand, but I have to manually  
execute the command for each STANAME. What I want is to have a Python  
script that I pass a list of STANAMEs to, and it acts like a daemon  
and spawns as many child processes as there are processors on my  
server (64), until it goes through all the STANAMES (about 200).


I posted a message on Stack Overflow (ref: http://stackoverflow.com/questions/884650/python-spawn-parallel-child-processes-on-a-multi-processor-system-use-multipro) 
 and was recommended to use the multiprocessing and subprocess  
packages. In the Stack Overflow answers, it was suggested that I use  
the process pool class in multiprocessing. However, the server I have  
to use is a Sun Sparc (T5220, Sun OS 5.10) and there is a known issue  
with sem_open() (ref: http://bugs.python.org/issue3770), so it appears  
I cannot use the process pool class.


So, below is my script (controller.py) that I have attempted to use as  
a test, that just calls the 'ls' command on a file I know exists  
rather than firing off my shell script (which takes ~ 10 mins to run  
per STANAME):


#!/path/to/python

import sys
import os
import json
import multiprocessing
import subprocess

def work(verbose,staname):
  print 'function:',staname
  print 'parent process:', os.getppid()
  print 'process id:', os.getpid()
  print "ls /path/to/file/"+staname+"_info.pf"
  # cmd will eventually get replaced with the shell script with the  
verbose and staname options

  cmd = [ "ls /path/to/file/"+staname+"_info.pf" ]
  return subprocess.call(cmd, shell=False)

if __name__ == '__main__':

  report_sta_list = ['B10A','B11A','BNLO']

  # Print out the complete station list for testing
  print report_sta_list

  # Get the number of processors available
  num_processes = multiprocessing.cpu_count()

  print 'Number of processes: %s' % (num_processes)

  print 'Now trying to assign all the processors'

  threads = []

  len_stas = len(report_sta_list)

  print "+++ Number of stations to process: %s" % (len_stas)

  # run until all the threads are done, and there is no data left
  while len(threads) < len(report_sta_list):

# if we aren't using all the processors AND there is still data  
left to

# compute, then spawn another thread

print "+++ Starting to set off all child processes"

if( len(threads) < num_processes ):

  this_sta = report_sta_list.pop()

  print "+++ Station is %s" % (this_sta)

  p = multiprocessing.Process(target=work,args=['v',this_sta])

  p.start()

  print p, p.is_alive()

  threads.append(p)

else:

  for thread in threads:

if not thread.is_alive():

  threads.remove(thread)

However, I seem to be running into a whole series of errors:

myhost{rt}62% controller.py
['B10A', 'B11A', 'BNLO']
Number of processes: 64
Now trying to assign all the processors
+++ Number of stations to process: 3
+++ Starting to set off all child processes
+++ Station is BNLO
 True
+++ Starting to set off all child processes
+++ Station is B11A
function: BNLO
parent process: 22341
process id: 22354
ls /path/to/file/BNLO_info.pf
 True
function: B11A
parent process: 22341
process id: 22355
ls /path/to/file/B11A_info.pf
Process Process-1:
Traceback (most recent call last):
  File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in  
_bootstrap

self.run()
  File "/opt/csw/lib/python/multiprocessing/process.py", line 88, in  
run

self._target(*self._args, **self._kwargs)
  File "controller.py", line 104, in work
return subprocess.call(cmd, shell=False)
  File "/opt/csw/lib/python/subprocess.py", line 444, in call
return Popen(*popenargs, **kwargs).wait()
  File "/opt/csw/lib/python/subprocess.py", line 595, in __init__
errread, errwrite)
  File "/opt/csw/lib/python/subprocess.py", line 1092, in  
_execute_child

raise child_exception
OSError: [Errno 2] No such file or directory
Process Process-2:
Traceback (most recent call last):
  File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in  
_bootstrap

self.run()
  File "/opt/csw/lib/python/multiprocessing/process.py", line 88, in  
run

self._target(*self._args, **self._kwargs)
  File "controller.py", line 104, in work
return subprocess.call(cmd, shell=False)
  File "/opt/csw/lib/python/subprocess.py", line 444, in call
return Popen(*popenargs, **kwargs).wait()
  File "/opt/csw/lib/python/subprocess.py", line 595, in __init__
errread, errwrite)
  File "/opt/csw/lib/python/subprocess.py", line 1092, in  
_execute_child

raise child_exception
OSError: [Errno 2] No such file or directory

The files are there:

mhost{me}1

Re: Newbie help for using multiprocessing and subprocess packages for creating child processes

2009-06-16 Thread Rob Newman

Thanks Matt - that worked.

Kind regards,
- Rob

On Jun 16, 2009, at 12:47 PM, Matt wrote:


Try replacing:
   cmd = [ "ls /path/to/file/"+staname+"_info.pf" ]
with:
   cmd = [ “ls”, “/path/to/file/"+staname+"_info.pf" ]

Basically, the first is the conceptual equivalent of executing the
following in BASH:
‘ls /path/to/file/FOO_info.pf’
The second is this:
‘ls’ ‘/path/to/file/FOO_info.pf’

The first searches for a command in your PATH named ‘ls /path...’. The
second searches for a command names ‘ls’ and gives it the argument
‘/path...’

Also, I think this is cleaner (but it’s up to personal preference):
   cmd = [ "ls", "/path/to/file/%s_info.pf" % staname]


~Matthew Strax-Haber
Northeastern University, CCIS & CBA
Co-op, NASA Langley Research Center
Student Government Association, Special Interest Senator
Resident Student Association, SGA Rep & General Councilor
Chess Club, Treasurer
E-mail: strax-haber.m=AT=neu.edu

On Tue, Jun 16, 2009 at 3:13 PM, Rob Newman wrote:

Hi All,

I am new to Python, and have a very specific task to accomplish. I  
have a

command line shell script that takes two arguments:

create_graphs.sh -v --sta=STANAME

where STANAME is a string 4 characters long.

create_graphs creates a series of graphs using Matlab (among other  
3rd party

packages).

Right now I can run this happily by hand, but I have to manually  
execute the
command for each STANAME. What I want is to have a Python script  
that I pass
a list of STANAMEs to, and it acts like a daemon and spawns as many  
child
processes as there are processors on my server (64), until it goes  
through

all the STANAMES (about 200).

I posted a message on Stack Overflow (ref:
http://stackoverflow.com/questions/884650/python-spawn-parallel-child-processes-on-a-multi-processor-system-use-multipro) 
 and
was recommended to use the multiprocessing and subprocess packages.  
In the
Stack Overflow answers, it was suggested that I use the process  
pool class
in multiprocessing. However, the server I have to use is a Sun  
Sparc (T5220,

Sun OS 5.10) and there is a known issue with sem_open() (ref:
http://bugs.python.org/issue3770), so it appears I cannot use the  
process

pool class.

So, below is my script (controller.py) that I have attempted to use  
as a
test, that just calls the 'ls' command on a file I know exists  
rather than
firing off my shell script (which takes ~ 10 mins to run per  
STANAME):


#!/path/to/python

import sys
import os
import json
import multiprocessing
import subprocess

def work(verbose,staname):
 print 'function:',staname
 print 'parent process:', os.getppid()
 print 'process id:', os.getpid()
 print "ls /path/to/file/"+staname+"_info.pf"
 # cmd will eventually get replaced with the shell script with the  
verbose

and staname options
 cmd = [ "ls /path/to/file/"+staname+"_info.pf" ]
 return subprocess.call(cmd, shell=False)

if __name__ == '__main__':

 report_sta_list = ['B10A','B11A','BNLO']

 # Print out the complete station list for testing
 print report_sta_list

 # Get the number of processors available
 num_processes = multiprocessing.cpu_count()

 print 'Number of processes: %s' % (num_processes)

 print 'Now trying to assign all the processors'

 threads = []

 len_stas = len(report_sta_list)

 print "+++ Number of stations to process: %s" % (len_stas)

 # run until all the threads are done, and there is no data left
 while len(threads) < len(report_sta_list):

   # if we aren't using all the processors AND there is still data  
left to

   # compute, then spawn another thread

   print "+++ Starting to set off all child processes"

   if( len(threads) < num_processes ):

 this_sta = report_sta_list.pop()

 print "+++ Station is %s" % (this_sta)

 p = multiprocessing.Process(target=work,args=['v',this_sta])

 p.start()

 print p, p.is_alive()

 threads.append(p)

   else:

 for thread in threads:

   if not thread.is_alive():

 threads.remove(thread)

However, I seem to be running into a whole series of errors:

myhost{rt}62% controller.py
['B10A', 'B11A', 'BNLO']
Number of processes: 64
Now trying to assign all the processors
+++ Number of stations to process: 3
+++ Starting to set off all child processes
+++ Station is BNLO
 True
+++ Starting to set off all child processes
+++ Station is B11A
function: BNLO
parent process: 22341
process id: 22354
ls /path/to/file/BNLO_info.pf
 True
function: B11A
parent process: 22341
process id: 22355
ls /path/to/file/B11A_info.pf
Process Process-1:
Traceback (most recent call last):
 File "/opt/csw/lib/python/multiprocessing/process.py", line 231, in
_bootstrap
   self.run()
 File "/opt/csw/lib/pyt