Re: BASH recursion segfault, FUNCNEST doesn't help

2022-06-02 Thread Gergely
Hi Martin,


>> There's a slim chance this might be exploitable.
> I would really be interested in an example.

I could not produce a scenario in 15 minutes that would indicate that
this corrupts other sections, as there is a considerable gap between the
stack and everything else. This is OS-dependent though and bash has no
control over what happens should this occur.

It's not inconceivable that other OS-es or even (old) Linux in certain
configurations will place the stack close to something else that is
valuable though. However, I'm not forcing the idea that this is a
vulnerability. It technically might be, but I do understand the
hesitation of fixing something that is hard to and pretty much pointless
to exploit.

> There are many ways to exhaust memory (and other) recources, recursion is one 
> them. In your case a variable like SRCNEST (and all the code with its 
> performance impacts needed behind it) might help, but what exactly is the 
> advantage of a "maximum source nesting level exceeded" error over a 
> segmentation fault?
>
> Next we will need MAXARRUSAGE, MAXBRACEEXPAN, ...

Well, the issue is not the fact that this is a resource exhaustion, but
rather the fact that it's entirely OS-dependent and the programmer has
zero control over it. What happens should the situation occur, is not up
to bash or the programmer. The behaviour is not portable and not
recoverable. A programmer might expect a situation like this, but there
is no knob to turn to prevent an abrupt termination, unlike FUNCNEST.

Speaking for myself, I'd find an error a much MUCH more palatable
condition than a segfault in this case. In the case of an error I at
least have a chance to do cleanup or emit a message, as opposed to just
terminating out of the blue. I don't think most bash programs are
written with the expectation that they might seize to run any moment
without any warning.

On top of it, this already works just fine for FUNCNEST and even though
the default behavior is still a segfault, now a careful programmer has a
fighting chance.


Regarding performance:

I don't think many extremely high performance applications are written
in bash, but that might just be my ignorance. In any case I did some
rudimentary testing:

I ran the same recursive function until segfault 10x in a loop and
measured the total execution time. Once without FUNCTEST set and once
with FUNCTEST set to a very large number.

Here are the results:

- with FUNCTEST: 1m2.233s

- without: 1m1.691s

The difference is less than 1% with an insanely unrealistic workload.

With this in mind I suggest that SCRNEST might not be
performance-prohibitive if it can only be ran when sourcing is involved
or if it can be rolled into FUNCNEST. I am not sure how painful that
would be to implement, but I for one would happily sacrifice a bit of
speed for a better coding experience.


With that in mind I am in no position to even consider whether what I'm
suggesting makes any sense and since I am unable to do the work myself I
humbly thank you for yours and wish you a very nice day!


Gergely





Re: bash leaks the old var when using =~ in a function with local BASH_REMATCH

2022-06-02 Thread Chet Ramey

On 5/28/22 4:08 PM, Emanuele Torre wrote:


Bash Version: 5.1
Patch Level: 16
Release Status: release

Description:
If `[[ $str =~ $re ]]' is executed from a function in which
`BASH_REMATCH' is local, bash will "leak" the old *global*
`BASH_REMATCH' variable.

 This happens because in `sh_regmatch()', bash calls these two
functions:

unbind_variable_noref ("BASH_REMATCH");
rematch = make_new_array_variable ("BASH_REMATCH");

 `unbind_variable_noref()' will unbind and `free()' the first
 variable it can find named "BASH_REMATCH" (giving priority to
local variables).

 While "BASH_REMATCH" will add a new variable named
"BASH_REMATCH" to the global variables.


Thanks for the report. This is an issue new in bash-5.1; before that, the
BASH_REMATCH variable was readonly and you couldn't create a local
variable with that name in the way you do here. You could create the
variable yourself and add attributes before you performed any regexp
matches, but your settings would be discarded and the variable set to be
readonly the first time you did.

I changed it in bash-5.1 so the value could be saved and restored during
traps (specifically the DEBUG trap).


Fix:
The obvious fix is to use, instead of `unbind_variable_noref()',
a similar function that uses `global_variables' instead of
`shell_variables'.

That will remove the "variable leak", but it is still not great:
declaring a local `BASH_REMATCH' makes it impossible to access
the matches of `[[ $str =~ $re ]]' because bash will set the
global `BASH_REMATCH' instead of the local one, and
 `"${BASH_REMATCH[@]}"' will expand to local `BASH_REMATCH'.


I doubt there's very much code out there that tries to make BASH_REMATCH
local, since it throws an error before bash-5.1. So let's talk through a
change that's as backwards-compatible as possible.

Clearly there's a mismatch between allowing the local variable to be
created (and unbinding it) and setting it in the global scope. As you say,
the easiest thing to do is to act exclusively on the global scope and
warn the user -- at least in the documentation -- that local copies of the
variable aren't going to be used. That would be the closest to preserving
the pre-bash-5.1 behavior.



 I think allowing `BASH_REMATCH' to be local-ised should be
 considered: it would be nice. (Also, it's a little confusing
that `MAPFILE', `REPLY', `COPROC', etc. can be localised, but
`BASH_REMATCH' cannot.)


Those are all default variables that can be overridden by names the user
chooses; that's one difference.



bash will currently (once the unbind part is fixed) try to
remove the global `BASH_REMATCH' and replace it with a brand new
array variable that contains the matches.


The idea is that there is one BASH_REMATCH instance, and the shell always
sets it to reflect the match status of the most recent regexp match. It was
easy to make that guarantee when it was readonly.



It could instead replace the local `BASH_REMATCH' variable with
a new local array variable (if a local `BASH_REMATCH' variable
of any type was present.)

 I am not sure if bash has any specific reason to use this
technique instead of just using `find_or_make_array_variable()'
like other features in bash that use arrays do.


Bash uses this technique all over the place for variables that are set as a
side effect of performing some other action. COMP_WORD, COMP_LINE,
COMP_POINT, COMP_CWORD, POSIXLY_CORRECT, IGNOREEOF, etc. Anything that's
supposed to be immune from something weird the user does. When the variable
was readonly, this was also easy to guarantee.



I think bash could just make `[[ $str =~ $re ]]' use
 `find_or_make_array_variable()' like other bash features that
 use arrays do; If the variable that already exists has
 incompatible attributes (i.e. -A and -r) it could just print an
 error message (while still returning 0/1 depending on the result
 of the match, BASH_REMATCH not being settable should not
influence the exit status of `[[ $str =~ $re ]]'), or simply not
set BASH_REMATCH silently.
 This would also allow to use attributes like -l, -u with
BASH_REMATCH (`declare -l BASH_REMATCH') which may be useful.

That is less backwards compatible than other options, but you could make an
argument that it's more consistent with the new non-readonly behavior.

One thing about BASH_REMATCH is that it's always set to whatever regexec
returns in the `matches' array, whether or not the match itself succeeds.
The return value from regexec only affects $?. It's always behaved this
way.

It's not clear that allowing other attributes to affect what gets put into
BASH_REMATCH is desirable: it's supposed to be exactly wh