Re: bash core dumps doing glob pattern on long string

2022-10-11 Thread Martin D Kealey
Broadly I meant translating into a "preferably" Deterministic
(stackless/non-backtracking) Finite State Automaton.

However I note that it's not always possible to make a Deterministic FSA
when you have repeatable groups which themselves don't have fixed lengths
(like a+(a|abbba|aabb*aba)b); either the extglob compiler would need to
start over and compile to a Non Deterministic (stacking) FSA, or just give
up and go back to the recursive approach.

Personally I would favour the addition of «shopt -s dfa_extglob» that would
block the fall-back, causing extglobs that would need a stack to be treated
as magic match-never tokens.

I say "extglob", but this would also speed up silly ordinary globs like
[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]

-Martin

On Tue, 11 Oct 2022 at 15:57, Phi Debian  wrote:

>
>
> On Tue, Oct 11, 2022 at 7:23 AM Martin D Kealey 
> wrote:
>
>>
>> 4) compile globs into state machines, the same way that regexes get
>> compiled, so that they can be matched without needing any recursion.
>>
>> -Martin
>>
>
> Do you mean the NFA colud trade recursion for malloc()'d backtracking
> point record? this would trade stacksize vs bss size. Or may be you mean
> converting the NFA into DFA, that I could not handle myself :)
>
> That's why at first I thought limiting the input string length was may be
> an option, but Chet demonstrated it was not.
>
> Then returning an error instead of core dumps on recursion too deep on the
> NFA walk seems to be the thing to do, and then I thought a stack
> availabitly check for a given recursive level could be done, this is pretty
> easy to do and almost costless regarding perf, specially because bash (and
> other shells) have access to alloca(), well we could admit that os/arch
> sans alloca() could still core dump.
>
> A stack check with alloca() ressemble something along this lines
>
> int stack_too_small(size_t s)
> { return alloca(s)!=NULL;
> }
>
> This non inlinable function do allocate the stack provision, return true
> if not possible, false if possible and return the provision to the stack,
> the caller is then assured there is enough space in the stack for its
> recursion frame.
>
> and the shell code would look like
> ... recursive_func(...)
> {...
>   if(stack_too_small(recursion_frame_size)
>   { handle nicely; // longjmp, return
> }
>
> What Chet call the right size to define is possibly something like the
> recursion frame size, that is the stack needed by the current recursion
> level instance (its locals) + all the frame size sub function would need
> until the next recurence, i.e if the call path look like
>
> a()-->b()-->c()-->a()...
> The recursion frame size  is sum(framesize(a)+framesize(b)+framesize(c))
>
> Some empiric constant can also be used, for instance on linux a max
> stacksize of 0x80 is pretty common deciding we big constant like
> 32K could be simple and good enough.
>
>


Re: bash core dumps doing glob pattern on long string

2022-10-11 Thread Phi Debian
On Tue, Oct 11, 2022 at 9:02 AM Martin D Kealey 
wrote:

> Broadly I meant translating into a "preferably" Deterministic
> (stackless/non-backtracking) Finite State Automaton.
>
> However I note that it's not always possible to make a Deterministic FSA
> when you have repeatable groups which themselves don't have fixed lengths
> (like a+(a|abbba|aabb*aba)b); either the extglob compiler would need to
> start over and compile to a Non Deterministic (stacking) FSA, or just give
> up and go back to the recursive approach.
>
> Personally I would favour the addition of «shopt -s dfa_extglob» that
> would block the fall-back, causing extglobs that would need a stack to be
> treated as magic match-never tokens.
>
> I say "extglob", but this would also speed up silly ordinary globs like
> [a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]
>
> -Martin
>
>
Ok I see, that's true that when you see a glob pattern like +(a) one would
think an optimiser could make it backtrack less.

«shopt -s dfa_extglob» could be an option, Chet may say it is hard to
explain when to use it... but we can go this path too, that is less of a
hack regarding stack provision, but more of a hack regarding the extglob
compiler :) Any implementation will please me :)


Re: bash core dumps doing glob pattern on long string

2022-10-11 Thread Chet Ramey

On 10/11/22 1:26 AM, Phi Debian wrote:

On the contrary I see nor reference saying the 'pattern matching for 
pathname expansion can also be used by extension to any strings, but now I 
understand it is.


The conditional command is explicit about it:

"When  the  == and != operators are used, the string to the right
 of the operator is considered a pattern and matched according to
 the rules described below under Pattern Matching, as if the ext-
 glob shell option were enabled."

The `case' command; the pattern substitution, pattern removal, and
case modification word expansions; and variables like EXECIGNORE and
GLOBIGNORE also apply pattern matching to arbitrary strings.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: bash core dumps doing glob pattern on long string

2022-10-11 Thread Koichi Murase
2022年10月11日(火) 16:22 Martin D Kealey :
> However I note that it's not always possible to make a Deterministic FSA
> when you have repeatable groups which themselves don't have fixed lengths
> (like a+(a|abbba|aabb*aba)b);

This is not true. Any non-deterministic finite automaton (NFA) can be
translated into a deterministic finite automaton (DFA). The "regular
expressions" that must be processed by stacking/recursion for
backtracking are typically the ones that involve backreferences, \1,
\2, etc., to capturing groups in the same pattern. However, such
"regular expressions" no longer define regular languages and cannot be
represented by NFAs (strictly speaking, unless all the referenced
capturing groups accept finite languages) because we need infinite
states to record the captured strings. As far as I know, glob/extglob
does not have constructs that cannot be represented by formal regular
languages so should always be able to be represented by NFAs and thus DFAs.



Re: bash core dumps doing glob pattern on long string

2022-10-11 Thread Greg Wooledge
On Tue, Oct 11, 2022 at 12:38:44PM -0700, Koichi Murase wrote:
> As far as I know, glob/extglob
> does not have constructs that cannot be represented by formal regular
> languages so should always be able to be represented by NFAs and thus DFAs.

It's been too many decades since I studied this stuff in college, but
earlier while reading the thread I was pondering converting extglobs
into (ERE) regular expressions.

Extglobs add 5 features:

  ?(pattern-list)
 Matches zero or one occurrence of the given patterns
  *(pattern-list)
 Matches zero or more occurrences of the given patterns
  +(pattern-list)
 Matches one or more occurrences of the given patterns
  @(pattern-list)
 Matches one of the given patterns
  !(pattern-list)
 Matches anything except one of the given patterns

The first 4 can be converted directly into ERE without any issues at all.
The last one cannot.  At least not easily.

The part I can't remember is whether !(foo) can be expressed in a regular
language.  My gut says "no", but there might be some trick that I've
forgotten.  A regular language has concatenation (abc), union (abc|def),
and close (a*bc*) but not negation.



Re: bash core dumps doing glob pattern on long string

2022-10-11 Thread Koichi Murase
2022年10月11日(火) 13:12 Greg Wooledge :
> On Tue, Oct 11, 2022 at 12:38:44PM -0700, Koichi Murase wrote:
> > As far as I know, glob/extglob
> > does not have constructs that cannot be represented by formal regular
> > languages so should always be able to be represented by NFAs and thus DFAs.
>
> It's been too many decades since I studied this stuff in college, but
> earlier while reading the thread I was pondering converting extglobs
> into (ERE) regular expressions.
>
> Extglobs add 5 features:
>
> [...]
>
>   !(pattern-list)
>  Matches anything except one of the given patterns
>
> The first 4 can be converted directly into ERE without any issues at all.
> The last one cannot.  At least not easily.
>
> The part I can't remember is whether !(foo) can be expressed in a regular
> language.  My gut says "no", but there might be some trick that I've
> forgotten.  A regular language has concatenation (abc), union (abc|def),
> and close (a*bc*) but not negation.

I also studied them more than ten years ago, but if I remember
correctly, the negation of a regular language is also regular. First,
the two representations of regular languages,
concatenation+union+Kleene closure and DFA/NFA, are equivalent,
meaning they can be converted either way. Then, the DFA of
!() can be obtained by replacing the set of accept states in
the DFA of  with its complement set. I guess it would
generally become complicated if represented in a combination of
concat+union+closure. Other complex patterns that contain !()
as a subexpression can be constructed by just replacing "!()"
with that complicated "concat+union+closure".

--
Koichi



Re: bash core dumps doing glob pattern on long string

2022-10-11 Thread Dennis Williamson
On Tue, Oct 11, 2022, 3:12 PM Greg Wooledge  wrote:

> On Tue, Oct 11, 2022 at 12:38:44PM -0700, Koichi Murase wrote:
> > As far as I know, glob/extglob
> > does not have constructs that cannot be represented by formal regular
> > languages so should always be able to be represented by NFAs and thus
> DFAs.
>
> It's been too many decades since I studied this stuff in college, but
> earlier while reading the thread I was pondering converting extglobs
> into (ERE) regular expressions.
>
> Extglobs add 5 features:
>
>   ?(pattern-list)
>  Matches zero or one occurrence of the given patterns
>   *(pattern-list)
>  Matches zero or more occurrences of the given patterns
>   +(pattern-list)
>  Matches one or more occurrences of the given patterns
>   @(pattern-list)
>  Matches one of the given patterns
>   !(pattern-list)
>  Matches anything except one of the given patterns
>
> The first 4 can be converted directly into ERE without any issues at all.
> The last one cannot.  At least not easily.
>
> The part I can't remember is whether !(foo) can be expressed in a regular
> language.  My gut says "no", but there might be some trick that I've
> forgotten.  A regular language has concatenation (abc), union (abc|def),
> and close (a*bc*) but not negation.
>


Can't you just match and negate?

>


pop_var_context msg when eval code with errexit set

2022-10-11 Thread Xavier Delaruelle
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2
uname output: Linux mosaic 5.18.17-200.fc36.x86_64 #1 SMP
PREEMPT_DYNAMIC Thu Aug 11 14:36:06 UTC 2022 x86_64 x86_64 x86_64
GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.2
Patch Level: 2
Release Status: release

Description:
Starting version 5.2, when evaluating bash code (with eval builtin
command) with 'errexit' option set, a pop_var_context message appears
if the evaluating code leads to an error:

test_script: line 5: pop_var_context: head of shell_variables not
a function context

See the reproducer test code in Repat-By section.

I work on a project that provides a bash function in user
environment (https://modules.sourceforge.net/). This function produces
bash shell code that is evaluated in current shell session to update
it. When
users run their script in 'errexit' mode, I would expect not to obtain
this pop_var_context message, like in previous bash versions.

I first detected this issue on FreeBSD 13-1, but I have also
reproduced it on a Linux environment.

Repeat-By:
Script to reproduce the issue:

```test_script
myfunc() {
   eval "test 0 = 1;"
}
cmd="myfunc"
eval "$cmd"
```

Then run this test script with 'errexit' option set:

$ ./bash -e test_script
test_script: line 5: pop_var_context: head of shell_variables not
a function context

Regards,
Xavier