I wrote a long description of how .communicate can deadlock.
Then I read the doco more carefully and saw this:
Warning: Use communicate() rather than .stdin.write, .stdout.read
or .stderr.read to avoid deadlocks due to any of the other OS
pipe buffers filling up and blocking the child process.
This suggests that .communicate uses Threads to send and to gather data
independently, and that therefore the deadlock situation may not arise.
See what lsof and strace tell you; all my other advice stands regardless, and
the deadlock description may or may not be relevant. Still worth reading and
understanding it when looking at this kind of problem.
Cheers,
Cameron Simpson <c...@zip.com.au>
On 31Mar2017 09:43, Cameron Simpson <c...@zip.com.au> wrote:
On 30Mar2017 13:51, bruce <badoug...@gmail.com> wrote:
Trying to understand the "correct" way to run a sys command ("curl")
and to get the potential stderr. Checking Stackoverflow (SO), implies
that I should be able to use a raw/text cmd, with "shell=true".
I strongly recommend avoiding shell=True if you can. It has many
problems. All stackoverflow advice needs to be considered with
caution. However, that is not the source of your deadlock.
If I leave the stderr out, and just use
s=proc.communicate()
the test works...
Any pointers on what I might inspect to figure out why this hangs on
the proc.communicate process/line??
When it is hung, run "lsof" on the processes from another terminal
i.e. lsof the python process and also lsof the curl process. That will
make clear the connections between them, particularly which file
descriptors ("fd"s) are associated with what.
The run "strace" on the processes. That shoud show you what system
calls are in progress in each process.
My expectation is that you will see Python reading from one file
descriptor and curl writing to a different one, and neither
progressing.
Personally I avoid .communicate and do more work myself, largerly to
know precisely what is going on with my subprocesses.
The difficulty with .communicate is that Python must read both stderr
and stdout separately, but it will be doing that sequentially: read
one, then read the other. That is just great if the command is "short"
and writes a small enough amount of data to each. The command runs,
writes, and exits. Python reads one and sees EOF after the data,
because the command has exited. Then Python reads the other and
collects the data and sees EOF because the command has exited.
However, if the output of the command is large on whatever stream
Python reads _second_, the command will stall writing to that stream.
This is because Python is not reading the data, and therefore the
buffers fill (stdio in curl plus the buffer in the pipe). So the
command ("curl") stalls waiting for data to be consumed from the
buffers. And because it has stalled, the command does not exit, and
therefore Python does not see EOF on the _first_ stream. So it sits
waiting for more data, never reading from the second stream.
[...snip...]
cmd='[r" curl -sS '
#cmd=cmd+'-A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
Gecko/20100101 Firefox/38.0"'
cmd=cmd+"-A '"+user_agent+"'"
##cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
cmd=cmd+' --cookie-jar '+ff+' --cookie '+ff+' '
#cmd=cmd+'-e "'+referer+'" -d "'+tt+'" '
#cmd=cmd+'-e "'+referer+'" '
cmd=cmd+"-L '"+url1+"'"+'"]'
#cmd=cmd+'-L "'+xx+'" '
Might I recommand something like this:
cmd_args = [ 'curl', '-sS' ]
cmd_args.extend( [ '-A', user_agent ] )
cmd_args.extend( [ '--cookie-jar', ff, '--cookie', ff ] )
cmd_args.extend( [ '-L', url ]
and using shell=False. This totally avoids any need to "quote" strings
in the command, because the shell is not parsing the string - you're
invoking "curl" directly instead of asking the shell to read a string
and invoke "curl" for you.
Constructing shell commands is tedious and fiddly; avoid it when you
don't need to.
try_=1
It is preferable to say:
try_ = true
while(try_):
You don't need and brackets here:
while try_:
More readable, because less punctuation.
proc=subprocess.Popen(cmd,
shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
proc = subprocess.Popen(cmd_args,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
s,err=proc.communicate()
s=s.strip()
err=err.strip()
if(err==0):
try_=''
It is preferable to say:
try_ = False
Also, you should be looking at proc.returncode, _not_ err. Many
programs write informative messages to stderr, and a nonempty stderr
does not imply failure.
instead, all programs set their exit status to 0 for success and to
various nonzero values for failure. So check:
if proc.returncode == 0:
try_ = False
Or you could bypass try_ altogether and go:
while True:
... subprocess ...
if proc.returncode == 0:
break
That may not fit your larger scheme.
Cheers,
Cameron Simpson <c...@zip.com.au>
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor