Re: [Tutor] subprocess.Popen / proc.communicate issue

Cameron Simpson Thu, 30 Mar 2017 16:00:42 -0700

I wrote a long description of how .communicate can deadlock.


Then I read the doco more carefully and saw this:

 Warning: Use communicate() rather than .stdin.write, .stdout.read
 or .stderr.read to avoid deadlocks due to any of the other OS
 pipe buffers filling up and blocking the child process.

This suggests that .communicate uses Threads to send and to gather data
independently, and that therefore the deadlock situation may not arise.

See what lsof and strace tell you; all my other advice stands regardless, and
the deadlock description may or may not be relevant. Still worth reading and
understanding it when looking at this kind of problem.

Cheers,
Cameron Simpson <c...@zip.com.au>

On 31Mar2017 09:43, Cameron Simpson <c...@zip.com.au> wrote:

On 30Mar2017 13:51, bruce <badoug...@gmail.com> wrote:
Trying to understand the "correct" way to run a sys command ("curl")
and to get the potential stderr. Checking Stackoverflow (SO), implies
that I should be able to use a raw/text cmd, with "shell=true".
I strongly recommend avoiding shell=True if you can. It has manyproblems. All stackoverflow advice needs to be considered withcaution. However, that is not the source of your deadlock.
If I leave the stderr out, and just use
   s=proc.communicate()
the test works...

Any pointers on what I might inspect to figure out why this hangs on
the proc.communicate process/line??
When it is hung, run "lsof" on the processes from another terminali.e. lsof the python process and also lsof the curl process. That willmake clear the connections between them, particularly which filedescriptors ("fd"s) are associated with what.
The run "strace" on the processes. That shoud show you what systemcalls are in progress in each process.
My expectation is that you will see Python reading from one filedescriptor and curl writing to a different one, and neitherprogressing.
Personally I avoid .communicate and do more work myself, largerly toknow precisely what is going on with my subprocesses.
The difficulty with .communicate is that Python must read both stderrand stdout separately, but it will be doing that sequentially: readone, then read the other. That is just great if the command is "short"and writes a small enough amount of data to each. The command runs,writes, and exits. Python reads one and sees EOF after the data,because the command has exited. Then Python reads the other andcollects the data and sees EOF because the command has exited.
However, if the output of the command is large on whatever streamPython reads _second_, the command will stall writing to that stream.This is because Python is not reading the data, and therefore thebuffers fill (stdio in curl plus the buffer in the pipe). So thecommand ("curl") stalls waiting for data to be consumed from thebuffers. And because it has stalled, the command does not exit, andtherefore Python does not see EOF on the _first_ stream. So it sitswaiting for more data, never reading from the second stream.
[...snip...]
cmd='[r" curl -sS '
#cmd=cmd+'-A  "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
Gecko/20100101 Firefox/38.0"'
cmd=cmd+"-A  '"+user_agent+"'"
##cmd=cmd+'   --cookie-jar '+cname+' --cookie '+cname+'    '
cmd=cmd+'   --cookie-jar '+ff+' --cookie '+ff+'    '
#cmd=cmd+'-e "'+referer+'"   -d "'+tt+'"  '
#cmd=cmd+'-e "'+referer+'"    '
cmd=cmd+"-L '"+url1+"'"+'"]'
#cmd=cmd+'-L "'+xx+'" '
Might I recommand something like this:

cmd_args = [ 'curl', '-sS' ]
cmd_args.extend( [ '-A', user_agent ] )
cmd_args.extend( [ '--cookie-jar', ff, '--cookie', ff ] )
cmd_args.extend( [ '-L', url ]
and using shell=False. This totally avoids any need to "quote" stringsin the command, because the shell is not parsing the string - you'reinvoking "curl" directly instead of asking the shell to read a stringand invoke "curl" for you.
Constructing shell commands is tedious and fiddly; avoid it when youdon't need to.
try_=1
It is preferable to say:

try_ = true
while(try_):
You don't need and brackets here:

while try_:

More readable, because less punctuation.
  proc=subprocess.Popen(cmd,
shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
proc = subprocess.Popen(cmd_args,
         stdout=subprocess.PIPE,
         stderr=subprocess.PIPE)
  s,err=proc.communicate()
  s=s.strip()
  err=err.strip()
  if(err==0):
    try_=''
It is preferable to say:

try_ = False
Also, you should be looking at proc.returncode, _not_ err. Manyprograms write informative messages to stderr, and a nonempty stderrdoes not imply failure.
instead, all programs set their exit status to 0 for success and tovarious nonzero values for failure. So check:
if proc.returncode == 0:
  try_ = False

Or you could bypass try_ altogether and go:

while True:
  ... subprocess ...
  if proc.returncode == 0:
    break

That may not fit your larger scheme.

Cheers,
Cameron Simpson <c...@zip.com.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] subprocess.Popen / proc.communicate issue

Reply via email to