I wrote a long description of how .communicate can deadlock.

Then I read the doco more carefully and saw this:

 Warning: Use communicate() rather than .stdin.write, .stdout.read
 or .stderr.read to avoid deadlocks due to any of the other OS
 pipe buffers filling up and blocking the child process.

This suggests that .communicate uses Threads to send and to gather data
independently, and that therefore the deadlock situation may not arise.

See what lsof and strace tell you; all my other advice stands regardless, and
the deadlock description may or may not be relevant. Still worth reading and
understanding it when looking at this kind of problem.

Cheers,
Cameron Simpson <c...@zip.com.au>

On 31Mar2017 09:43, Cameron Simpson <c...@zip.com.au> wrote:
On 30Mar2017 13:51, bruce <badoug...@gmail.com> wrote:
Trying to understand the "correct" way to run a sys command ("curl")
and to get the potential stderr. Checking Stackoverflow (SO), implies
that I should be able to use a raw/text cmd, with "shell=true".

I strongly recommend avoiding shell=True if you can. It has many problems. All stackoverflow advice needs to be considered with caution. However, that is not the source of your deadlock.

If I leave the stderr out, and just use
   s=proc.communicate()
the test works...

Any pointers on what I might inspect to figure out why this hangs on
the proc.communicate process/line??

When it is hung, run "lsof" on the processes from another terminal i.e. lsof the python process and also lsof the curl process. That will make clear the connections between them, particularly which file descriptors ("fd"s) are associated with what.

The run "strace" on the processes. That shoud show you what system calls are in progress in each process.

My expectation is that you will see Python reading from one file descriptor and curl writing to a different one, and neither progressing.

Personally I avoid .communicate and do more work myself, largerly to know precisely what is going on with my subprocesses.

The difficulty with .communicate is that Python must read both stderr and stdout separately, but it will be doing that sequentially: read one, then read the other. That is just great if the command is "short" and writes a small enough amount of data to each. The command runs, writes, and exits. Python reads one and sees EOF after the data, because the command has exited. Then Python reads the other and collects the data and sees EOF because the command has exited.

However, if the output of the command is large on whatever stream Python reads _second_, the command will stall writing to that stream. This is because Python is not reading the data, and therefore the buffers fill (stdio in curl plus the buffer in the pipe). So the command ("curl") stalls waiting for data to be consumed from the buffers. And because it has stalled, the command does not exit, and therefore Python does not see EOF on the _first_ stream. So it sits waiting for more data, never reading from the second stream.

[...snip...]
cmd='[r" curl -sS '
#cmd=cmd+'-A  "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
Gecko/20100101 Firefox/38.0"'
cmd=cmd+"-A  '"+user_agent+"'"
##cmd=cmd+'   --cookie-jar '+cname+' --cookie '+cname+'    '
cmd=cmd+'   --cookie-jar '+ff+' --cookie '+ff+'    '
#cmd=cmd+'-e "'+referer+'"   -d "'+tt+'"  '
#cmd=cmd+'-e "'+referer+'"    '
cmd=cmd+"-L '"+url1+"'"+'"]'
#cmd=cmd+'-L "'+xx+'" '

Might I recommand something like this:

cmd_args = [ 'curl', '-sS' ]
cmd_args.extend( [ '-A', user_agent ] )
cmd_args.extend( [ '--cookie-jar', ff, '--cookie', ff ] )
cmd_args.extend( [ '-L', url ]

and using shell=False. This totally avoids any need to "quote" strings in the command, because the shell is not parsing the string - you're invoking "curl" directly instead of asking the shell to read a string and invoke "curl" for you.

Constructing shell commands is tedious and fiddly; avoid it when you don't need to.

try_=1

It is preferable to say:

try_ = true

while(try_):

You don't need and brackets here:

while try_:

More readable, because less punctuation.

  proc=subprocess.Popen(cmd,
shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)

proc = subprocess.Popen(cmd_args,
         stdout=subprocess.PIPE,
         stderr=subprocess.PIPE)

  s,err=proc.communicate()
  s=s.strip()
  err=err.strip()
  if(err==0):
    try_=''

It is preferable to say:

try_ = False

Also, you should be looking at proc.returncode, _not_ err. Many programs write informative messages to stderr, and a nonempty stderr does not imply failure.

instead, all programs set their exit status to 0 for success and to various nonzero values for failure. So check:

if proc.returncode == 0:
  try_ = False

Or you could bypass try_ altogether and go:

while True:
  ... subprocess ...
  if proc.returncode == 0:
    break

That may not fit your larger scheme.

Cheers,
Cameron Simpson <c...@zip.com.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to