Re: Named fifo's causing hanging bash scripts

2015-01-16 Thread Chet Ramey
On 1/12/15 9:55 AM, wer...@linux-8jdz.site wrote:

> Bash Version: 4.3
> Patch Level: 33
> Release Status: release
> 
> Description:
> Named fifo's causing hanging bash scripts like
> 
> while IFS="|" read a b c ; do
>   [shell code]
> done < <(shell code)
> 
> can cause random hangs of the bash.An strace shows that the bash
> stays in wait4()

I can't reproduce this.  I spun up a VM running OpenSUSE 13 and ran the
attached script against a version of bash-4.3.33 that was modified to use
FIFOs instead of /dev/fd.  There were no hangs in any of about 30 runs.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: [bug-bash] Named fifo's causing hanging bash scripts

2015-01-16 Thread Chet Ramey
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 1/13/15 4:29 AM, Dr. Werner Fink wrote:

>>> Bash Version: 4.3
>>> Patch Level: 33
>>> Release Status: release
>>>
>>> Description:
>>> Named fifo's causing hanging bash scripts like
>>>
>>> while IFS="|" read a b c ; do
>>>   [shell code]
>>> done < <(shell code)
>>>
>>> can cause random hangs of the bash.An strace shows that the bash
>>> stays in wait4()
>>
>> And when you attach to one of the hanging bash processes using gdb, what
>> does the stack traceback look like?
> 
> Yes (and sorry for the wrong email address as this was done on a clean 
> virtual sysstem)
> 
> there are two hanging bash processes together with the find command:
> 
> werner   19062  0.8  0.0  11864  2868 ttyS0S+   10:21   0:00 bash -x 
> /tmp/brp-25-symlink
> werner   19063  0.0  0.0  11860  1920 ttyS0S+   10:21   0:00 bash -x 
> /tmp/brp-25-symlink
> werner   19064  0.2  0.0  16684  2516 ttyS0S+   10:21   0:00 find . -type 
> l -printf %p|%h|%l n
> 
> the gdb -p 19062 and gdb -p 19063 show
> 
> (gdb) bt
> #0  0x7f530818a65c in waitpid () from /lib64/libc.so.6
> #1  0x0042b233 in waitchld (block=block@entry=1, wpid=19175) at 
> jobs.c:3235
> #2  0x0042c6da in wait_for (pid=pid@entry=19175) at jobs.c:2496

What do ps and gdb tell you about pid 19175 (and the corresponding pid in
the call to waitchld in the other traceback)?  Running, terminated, reaped,
other?

Chet


- -- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (Darwin)

iEYEARECAAYFAlS5HqsACgkQu1hp8GTqdKuU5QCeKfuBQ4dYeU3fSjJPgtB+31Ep
YPQAoIk8aeYkJWWcghPjYONgvyrE/qy9
=duRA
-END PGP SIGNATURE-



Re: Named fifo's causing hanging bash scripts

2015-01-16 Thread Dr. Werner Fink
On Fri, Jan 16, 2015 at 09:09:25AM -0500, Chet Ramey wrote:
> On 1/12/15 9:55 AM, wer...@linux-8jdz.site wrote:
> 
> > Bash Version: 4.3
> > Patch Level: 33
> > Release Status: release
> > 
> > Description:
> > Named fifo's causing hanging bash scripts like
> > 
> > while IFS="|" read a b c ; do
> >   [shell code]
> > done < <(shell code)
> > 
> > can cause random hangs of the bash.An strace shows that the bash
> > stays in wait4()
> 
> I can't reproduce this.  I spun up a VM running OpenSUSE 13 and ran the
> attached script against a version of bash-4.3.33 that was modified to use
> FIFOs instead of /dev/fd.  There were no hangs in any of about 30 runs.

Hmmm ... what I see is

  werner   10920  0.0  0.0  11860  2876 pts/1S+   15:59   0:00 bash 
/tmp/brp-25-symlink
  werner   10921  0.0  0.0  11856  1844 pts/1S+   15:59   0:00 bash 
/tmp/brp-25-symlink
  werner   10922  0.0  0.0  16684  2476 pts/1S+   15:59   0:00 find . -type 
l -printf %p|%h|%l n

  d136:~ # ll /proc/10920/fd
  total 0
  lr-x-- 1 werner suse 64 Jan 16 15:59 0 -> pipe:[124428]
  lrwx-- 1 werner suse 64 Jan 16 15:59 1 -> /dev/pts/1
  lrwx-- 1 werner suse 64 Jan 16 15:59 10 -> /dev/pts/1
  lrwx-- 1 werner suse 64 Jan 16 15:59 2 -> /dev/pts/1
  lr-x-- 1 werner suse 64 Jan 16 15:59 255 -> /tmp/brp-25-symlink
  d136:~ # ll /proc/10921/fd
  total 0
  lrwx-- 1 werner suse 64 Jan 16 15:59 0 -> /dev/pts/1
  l-wx-- 1 werner suse 64 Jan 16 15:59 1 -> pipe:[124428]
  lrwx-- 1 werner suse 64 Jan 16 15:59 2 -> /dev/pts/1

... but in the build there is

  [  131s] checking for mkfifo... yes

  [  150s] execute_cmd.c: In function 'execute_command_internal':
  [  150s] execute_cmd.c:1034:12: warning: 'ofifo_list' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
  [  150s]free ((void *)ofifo_list);
  [  150s] ^

and currently the bash43 is not usable for the OBS here. Also my personal
chrootx script using <() for fiddling with xauth hangs upto Ctrl-C.

Werner

-- 
  "Having a smoking section in a restaurant is like having
  a peeing section in a swimming pool." -- Edward Burr


signature.asc
Description: Digital signature


Re: Named fifo's causing hanging bash scripts

2015-01-16 Thread Chet Ramey
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 1/16/15 10:25 AM, Dr. Werner Fink wrote:
> On Fri, Jan 16, 2015 at 09:09:25AM -0500, Chet Ramey wrote:
>> On 1/12/15 9:55 AM, wer...@linux-8jdz.site wrote:
>>
>>> Bash Version: 4.3
>>> Patch Level: 33
>>> Release Status: release
>>>
>>> Description:
>>> Named fifo's causing hanging bash scripts like
>>>
>>> while IFS="|" read a b c ; do
>>>   [shell code]
>>> done < <(shell code)
>>>
>>> can cause random hangs of the bash.An strace shows that the bash
>>> stays in wait4()
>>
>> I can't reproduce this.  I spun up a VM running OpenSUSE 13 and ran the
>> attached script against a version of bash-4.3.33 that was modified to use
>> FIFOs instead of /dev/fd.  There were no hangs in any of about 30 runs.
> 
> Hmmm ... what I see is

OK, but if I can't reproduce it, I can't investigate it.

> 
>   werner   10920  0.0  0.0  11860  2876 pts/1S+   15:59   0:00 bash 
> /tmp/brp-25-symlink
>   werner   10921  0.0  0.0  11856  1844 pts/1S+   15:59   0:00 bash 
> /tmp/brp-25-symlink
>   werner   10922  0.0  0.0  16684  2476 pts/1S+   15:59   0:00 find . 
> -type l -printf %p|%h|%l n
> 
>   d136:~ # ll /proc/10920/fd
>   total 0
>   lr-x-- 1 werner suse 64 Jan 16 15:59 0 -> pipe:[124428]
>   lrwx-- 1 werner suse 64 Jan 16 15:59 1 -> /dev/pts/1
>   lrwx-- 1 werner suse 64 Jan 16 15:59 10 -> /dev/pts/1
>   lrwx-- 1 werner suse 64 Jan 16 15:59 2 -> /dev/pts/1
>   lr-x-- 1 werner suse 64 Jan 16 15:59 255 -> /tmp/brp-25-symlink
>   d136:~ # ll /proc/10921/fd
>   total 0
>   lrwx-- 1 werner suse 64 Jan 16 15:59 0 -> /dev/pts/1
>   l-wx-- 1 werner suse 64 Jan 16 15:59 1 -> pipe:[124428]
>   lrwx-- 1 werner suse 64 Jan 16 15:59 2 -> /dev/pts/1
> 
> ... but in the build there is
> 
>   [  131s] checking for mkfifo... yes

Sure, it's there, but if /dev/fd exists bash will prefer it.  Since the
VM I'm testing on has /dev/fd I had to manually edit config.h to disable
it.

> 
>   [  150s] execute_cmd.c: In function 'execute_command_internal':
>   [  150s] execute_cmd.c:1034:12: warning: 'ofifo_list' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>   [  150s]free ((void *)ofifo_list);

This isn't a useful warning.  The `free' is only called if the saved_fifo
flag is set, and that's only set if ofifo_list is initialized.


- -- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (Darwin)

iEYEARECAAYFAlS5LscACgkQu1hp8GTqdKswzACeK333huO5pI5LF8DqiVxa/L2X
ZlYAn0DfbeUiLGB2SEA/O8E/kLer7yNW
=f4G5
-END PGP SIGNATURE-



Re: [bug-bash] Named fifo's causing hanging bash scripts

2015-01-16 Thread Dr. Werner Fink
On Fri, Jan 16, 2015 at 09:22:36AM -0500, Chet Ramey wrote:
> On 1/13/15 4:29 AM, Dr. Werner Fink wrote:
> 
> >>> Bash Version: 4.3
> >>> Patch Level: 33
> >>> Release Status: release
> >>>
> >>> Description:
> >>> Named fifo's causing hanging bash scripts like
> >>>
> >>> while IFS="|" read a b c ; do
> >>>   [shell code]
> >>> done < <(shell code)
> >>>
> >>> can cause random hangs of the bash.An strace shows that the 
> >>> bash
> >>> stays in wait4()
> >>
> >> And when you attach to one of the hanging bash processes using gdb, what
> >> does the stack traceback look like?
> > 
> > Yes (and sorry for the wrong email address as this was done on a clean 
> > virtual sysstem)
> > 
> > there are two hanging bash processes together with the find command:
> > 
> > werner   19062  0.8  0.0  11864  2868 ttyS0S+   10:21   0:00 bash -x 
> > /tmp/brp-25-symlink
> > werner   19063  0.0  0.0  11860  1920 ttyS0S+   10:21   0:00 bash -x 
> > /tmp/brp-25-symlink
> > werner   19064  0.2  0.0  16684  2516 ttyS0S+   10:21   0:00 find . 
> > -type l -printf %p|%h|%l n
> > 
> > the gdb -p 19062 and gdb -p 19063 show
> > 
> > (gdb) bt
> > #0  0x7f530818a65c in waitpid () from /lib64/libc.so.6
> > #1  0x0042b233 in waitchld (block=block@entry=1, wpid=19175) at 
> > jobs.c:3235
> > #2  0x0042c6da in wait_for (pid=pid@entry=19175) at jobs.c:2496
> 
> What do ps and gdb tell you about pid 19175 (and the corresponding pid in
> the call to waitchld in the other traceback)?  Running, terminated, reaped,
> other?

  d136:~ # ps 10942
PID TTY  STAT   TIME COMMAND
  d136:~ #

... the process does not exists anymore. I guess that this could belong to
the sed commands of the script.  The other thread is showing

  d136: # ps 10922
PID TTY  STAT   TIME COMMAND
  13177 pts/1S+ 0:00 find . -type l -printf %p|%h|%l n

and the backtrace shows here

 0x7fccae8d4860 in __write_nocancel () from /lib64/libc.so.6
 #0  0x7fccae8d4860 in __write_nocancel () from /lib64/libc.so.6
 #1  0x7fccae86f6b3 in _IO_new_file_write () from /lib64/libc.so.6
 #2  0x7fccae86ed73 in new_do_write () from /lib64/libc.so.6
 #3  0x7fccae8704e5 in __GI__IO_do_write () from /lib64/libc.so.6
 #4  0x7fccae86fbe1 in __GI__IO_file_xsputn () from /lib64/libc.so.6
 #5  0x7fccae8416e0 in vfprintf () from /lib64/libc.so.6
 #6  0x7fccae8eec05 in __fprintf_chk () from /lib64/libc.so.6
 #7  0x004106d5 in ?? ()
 #8  0x0040a11b in ?? ()
 #9  0x0040afa9 in ?? ()
 #10 0x0040b0a6 in ?? ()
 #11 0x00409bfe in ?? ()
 #12 0x00409bfe in ?? ()
 #13 0x00404199 in ?? ()
 #14 0x00403911 in ?? ()
 #15 0x7fccae81cb05 in __libc_start_main () from /lib64/libc.so.6
 #16 0x004039dd in ?? ()

which IMHO could be related that output of find is not read anymore(?)


> 
> Chet

Werner

-- 
  "Having a smoking section in a restaurant is like having
  a peeing section in a swimming pool." -- Edward Burr


signature.asc
Description: Digital signature


Re: [bug-bash] Named fifo's causing hanging bash scripts

2015-01-16 Thread Chet Ramey
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 1/16/15 10:32 AM, Dr. Werner Fink wrote:
> On Fri, Jan 16, 2015 at 09:22:36AM -0500, Chet Ramey wrote:
>> On 1/13/15 4:29 AM, Dr. Werner Fink wrote:
>>
> Bash Version: 4.3
> Patch Level: 33
> Release Status: release
>
> Description:
> Named fifo's causing hanging bash scripts like
>
> while IFS="|" read a b c ; do
>   [shell code]
> done < <(shell code)
>
> can cause random hangs of the bash.An strace shows that the 
> bash
> stays in wait4()

 And when you attach to one of the hanging bash processes using gdb, what
 does the stack traceback look like?
>>>
>>> Yes (and sorry for the wrong email address as this was done on a clean 
>>> virtual sysstem)
>>>
>>> there are two hanging bash processes together with the find command:
>>>
>>> werner   19062  0.8  0.0  11864  2868 ttyS0S+   10:21   0:00 bash -x 
>>> /tmp/brp-25-symlink
>>> werner   19063  0.0  0.0  11860  1920 ttyS0S+   10:21   0:00 bash -x 
>>> /tmp/brp-25-symlink
>>> werner   19064  0.2  0.0  16684  2516 ttyS0S+   10:21   0:00 find . 
>>> -type l -printf %p|%h|%l n
>>>
>>> the gdb -p 19062 and gdb -p 19063 show
>>>
>>> (gdb) bt
>>> #0  0x7f530818a65c in waitpid () from /lib64/libc.so.6
>>> #1  0x0042b233 in waitchld (block=block@entry=1, wpid=19175) at 
>>> jobs.c:3235
>>> #2  0x0042c6da in wait_for (pid=pid@entry=19175) at jobs.c:2496
>>
>> What do ps and gdb tell you about pid 19175 (and the corresponding pid in
>> the call to waitchld in the other traceback)?  Running, terminated, reaped,
>> other?
> 
>   d136:~ # ps 10942
> PID TTY  STAT   TIME COMMAND
>   d136:~ #
> 
> ... the process does not exists anymore. I guess that this could belong to
> the sed commands of the script.  

This is why I need to be able to reproduce it.  If the process got reaped,
when would it have happened and why would the call to wait_for() have
found a valid CHILD struct for it?  The whole loop runs with SIGCHLD
blocked, so it's not as if the signal handler could have reaped the
child out from under it.  I have questions but no way to find answers.


- -- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (Darwin)

iEYEARECAAYFAlS5MjoACgkQu1hp8GTqdKvN5ACeK9XEiIQ1glUHC4hEF3ZTKJjL
dUkAoI6nnxKypXP3MFns6/TyaOHNmHL5
=x3Ck
-END PGP SIGNATURE-



Re: [bug-bash] Named fifo's causing hanging bash scripts

2015-01-16 Thread Dr. Werner Fink
On Fri, Jan 16, 2015 at 10:46:02AM -0500, Chet Ramey wrote:
> >>
> >> What do ps and gdb tell you about pid 19175 (and the corresponding pid in
> >> the call to waitchld in the other traceback)?  Running, terminated, reaped,
> >> other?
> > 
> >   d136:~ # ps 10942
> > PID TTY  STAT   TIME COMMAND
> >   d136:~ #
> > 
> > ... the process does not exists anymore. I guess that this could belong to
> > the sed commands of the script.  
> 
> This is why I need to be able to reproduce it.  If the process got reaped,
> when would it have happened and why would the call to wait_for() have
> found a valid CHILD struct for it?  The whole loop runs with SIGCHLD
> blocked, so it's not as if the signal handler could have reaped the
> child out from under it.  I have questions but no way to find answers.

OK, thanks for your effort ... I've strip the spec file down step by step and
reached success at commenting out -DMUST_UNBLOCK_CHLD=1 (mea culpa) ...  many
thanks for your help!

Werner

-- 
  "Having a smoking section in a restaurant is like having
  a peeing section in a swimming pool." -- Edward Burr


signature.asc
Description: Digital signature


Re: [bug-bash] Named fifo's causing hanging bash scripts

2015-01-16 Thread Jonathan Hankins
Dr. Fink,

Have you tried getting rid of the stderr redirect on your find command to
make sure find isn't showing any errors?

If you eliminate most of the inside of your while loop, does it still
hang?  For example:

while IFS="|" read link link_dir link_dest; do
echo "$link,$link_dir,$link_dest"
done < <(find . -type l -printf '%p|%h|%l\n' 2>/dev/null)

-Jonathan Hankins


On Fri, Jan 16, 2015 at 9:46 AM, Chet Ramey  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 1/16/15 10:32 AM, Dr. Werner Fink wrote:
> > On Fri, Jan 16, 2015 at 09:22:36AM -0500, Chet Ramey wrote:
> >> On 1/13/15 4:29 AM, Dr. Werner Fink wrote:
> >>
> > Bash Version: 4.3
> > Patch Level: 33
> > Release Status: release
> >
> > Description:
> > Named fifo's causing hanging bash scripts like
> >
> > while IFS="|" read a b c ; do
> >   [shell code]
> > done < <(shell code)
> >
> > can cause random hangs of the bash.An strace shows that
> the bash
> > stays in wait4()
> 
>  And when you attach to one of the hanging bash processes using gdb,
> what
>  does the stack traceback look like?
> >>>
> >>> Yes (and sorry for the wrong email address as this was done on a clean
> virtual sysstem)
> >>>
> >>> there are two hanging bash processes together with the find command:
> >>>
> >>> werner   19062  0.8  0.0  11864  2868 ttyS0S+   10:21   0:00 bash
> -x /tmp/brp-25-symlink
> >>> werner   19063  0.0  0.0  11860  1920 ttyS0S+   10:21   0:00 bash
> -x /tmp/brp-25-symlink
> >>> werner   19064  0.2  0.0  16684  2516 ttyS0S+   10:21   0:00 find
> . -type l -printf %p|%h|%l n
> >>>
> >>> the gdb -p 19062 and gdb -p 19063 show
> >>>
> >>> (gdb) bt
> >>> #0  0x7f530818a65c in waitpid () from /lib64/libc.so.6
> >>> #1  0x0042b233 in waitchld (block=block@entry=1, wpid=19175)
> at jobs.c:3235
> >>> #2  0x0042c6da in wait_for (pid=pid@entry=19175) at
> jobs.c:2496
> >>
> >> What do ps and gdb tell you about pid 19175 (and the corresponding pid
> in
> >> the call to waitchld in the other traceback)?  Running, terminated,
> reaped,
> >> other?
> >
> >   d136:~ # ps 10942
> > PID TTY  STAT   TIME COMMAND
> >   d136:~ #
> >
> > ... the process does not exists anymore. I guess that this could belong
> to
> > the sed commands of the script.
>
> This is why I need to be able to reproduce it.  If the process got reaped,
> when would it have happened and why would the call to wait_for() have
> found a valid CHILD struct for it?  The whole loop runs with SIGCHLD
> blocked, so it's not as if the signal handler could have reaped the
> child out from under it.  I have questions but no way to find answers.
>
>
> - --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, ITS, CWRUc...@case.edu
> http://cnswww.cns.cwru.edu/~chet/
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (Darwin)
>
> iEYEARECAAYFAlS5MjoACgkQu1hp8GTqdKvN5ACeK9XEiIQ1glUHC4hEF3ZTKJjL
> dUkAoI6nnxKypXP3MFns6/TyaOHNmHL5
> =x3Ck
> -END PGP SIGNATURE-
>
>


-- 

Jonathan HankinsHomewood City Schools

The simplest thought, like the concept of the number one,
has an elaborate logical underpinning. - Carl Sagan

jhank...@homewood.k12.al.us