Possible bug: Race condition when calling external commands during trap handling

2012-05-02 Thread Tillmann.Crueger
Hi,

I have a problem with a trap handler in a script, which is doing some logging 
and needs external commands (date and hostname). In some cases there seems to 
be a race condition causing a syntax error every once in a while. I am assuming 
it is a race condition, because the syntax errors only happen very very 
infrequently.

I have produced the following script as a small example:

---

#!/bin/bash

log() {
  local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1"
  echo $text >> /dev/null
}

thread() {
  while true; do
log "Thread is running"
kill -ALRM $$
sleep 1
  done
}

trap "log 'received ALRM'" ALRM 

thread &
trap "kill $?; exit 0" INT TERM


while true; do
  log "Main is running"
  sleep 1
done

---

Very infrequently this script will fail with a syntax error in line 5 (echo 
$text >> /dev/null). The actual error message is:

> /path/to/script.sh: command substitution: line 5: syntax error near 
> unexpected token `)' 
> /path/to/script.sh: command substitution: line 5: `hostname -s) $1' 

Since there is not "hostname -s) $1" in line 5, I am assuming there also is an 
off-by-one error and line 4 is actually meant (local text="$(date +'%Y-%m-%d 
%H:%M:%S') $(hostname -s) $1").

I have encountered this problem both on bash 4.2.24(1)-release 
(x86_64-pc-linux-gnu) on ubuntu 12.04 as well as on bash 4.1.2(1)-release 
(x86_64-redhat-linux-gnu) on RHEL 6.2.

There may be something wrong with the way traps are used in this case, but the 
documentation is very sparse on this topic. I also opened a question on 
StackOverflow.com 
(http://stackoverflow.com/questions/10194837/concurrent-logging-in-bash-scripts)
 but did not receive any usefull answers yet.

Since this is a race condition, it might take a while for the bug to hit. In 
some cases the script was running up to 30 minutes before the bug triggered.

Please let me know if you have any furhter questions or hints on how to resolve 
this issue.

Thank you,
  Till Crueger


Possible bug: Race condition when calling external commands during trap handling

2012-05-03 Thread Tillmann.Crueger
Yes, you are correct, that line is buggy and contains a typo. I added it later 
in a hurry after I could reproduce the error, to ensure a clean shutdown of the 
script. What I meant to type was:

> trap "kill $!; exit 0" INT TERM

However thinking about it, this also does not work as intended.

The problem exists, though, even if that line is deleted (one just has to kill 
all remaining threads manually after the crash or after ^C).

If you need, I can update the script with a INT and TERM handler, which 
actually kills, however since this is not relevant to the problem in question, 
I did not sent a correction after I noticed the typo.


I am also aware of the strict restrictions on operations allowed during signal 
handling in C and C++. I tried to find any documentation on allowed operations 
during trap handlers for bash, but even after a prolonged search in the man and 
info pages as well as online, I could not find an resources on that topic. The 
low number of responses to the same question on SO also seems to show, that 
hardly anyone is aware of such restrictions. If such documentation exists of 
course this is not bug. In that case my personal suggestion would be to somehow 
mention the available documentation in the man pages. This would be especially 
usefull, since it is not very clear, what operations would need a malloc() 
internally (note that in C most kinds of exec() do malloc() and therefore are 
not thread safe, however executing external commands is very common in bash, so 
the restrictions cannot just be derived from the C-side).

I also noted, that the behaviour is different from problems during signal 
handling. The most likely result of a forbidden operation during signal 
handling would be a deadlock (since the operation will try to lock the same 
resource twice in the same thread). However in this case somehow the parser 
seems to mess up it's internal state, resulting in the parser error I am seeing.

I hope this makes the problem more clear. 

Thank you for your feedback,
  Till

-Ursprüngliche Nachricht-
Von: Bob Proulx [mailto:b...@proulx.com]
Gesendet: Donnerstag, 3. Mai 2012 09:08
An: Crueger, Tillmann
Cc: bug-bash@gnu.org
Betreff: Re: Possible bug: Race condition when calling external commands during 
trap handling

tillmann.crue...@telekom.de wrote:
> I have produced the following script as a small example:

A good small example!  I do not understand the problem but I do have a question 
about one of the lines in it and a comment about another.

> trap "kill $?; exit 0" INT TERM

What did you intend with "kill $?; exit 0"?  Did you mean "kill $$"
instead?

>   local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1"

Note that GNU date can use "+%F %T" as a shortcut for "%Y-%m-%d %H:%M:%S".
It is useful to save typing.

And lastly I will comment that you are doing quite a bit inside of an interrupt 
routine.  Typically in a C program it is not safe to perform any operation that 
may call malloc() within an interupt service routine since malloc isn't 
reentrant.  Bash is a C program and I assume the same restriction would apply.

Bob