Re: bash core dumps doing glob pattern on long string
Broadly I meant translating into a "preferably" Deterministic (stackless/non-backtracking) Finite State Automaton. However I note that it's not always possible to make a Deterministic FSA when you have repeatable groups which themselves don't have fixed lengths (like a+(a|abbba|aabb*aba)b); either the extglob compiler would need to start over and compile to a Non Deterministic (stacking) FSA, or just give up and go back to the recursive approach. Personally I would favour the addition of «shopt -s dfa_extglob» that would block the fall-back, causing extglobs that would need a stack to be treated as magic match-never tokens. I say "extglob", but this would also speed up silly ordinary globs like [a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z] -Martin On Tue, 11 Oct 2022 at 15:57, Phi Debian wrote: > > > On Tue, Oct 11, 2022 at 7:23 AM Martin D Kealey > wrote: > >> >> 4) compile globs into state machines, the same way that regexes get >> compiled, so that they can be matched without needing any recursion. >> >> -Martin >> > > Do you mean the NFA colud trade recursion for malloc()'d backtracking > point record? this would trade stacksize vs bss size. Or may be you mean > converting the NFA into DFA, that I could not handle myself :) > > That's why at first I thought limiting the input string length was may be > an option, but Chet demonstrated it was not. > > Then returning an error instead of core dumps on recursion too deep on the > NFA walk seems to be the thing to do, and then I thought a stack > availabitly check for a given recursive level could be done, this is pretty > easy to do and almost costless regarding perf, specially because bash (and > other shells) have access to alloca(), well we could admit that os/arch > sans alloca() could still core dump. > > A stack check with alloca() ressemble something along this lines > > int stack_too_small(size_t s) > { return alloca(s)!=NULL; > } > > This non inlinable function do allocate the stack provision, return true > if not possible, false if possible and return the provision to the stack, > the caller is then assured there is enough space in the stack for its > recursion frame. > > and the shell code would look like > ... recursive_func(...) > {... > if(stack_too_small(recursion_frame_size) > { handle nicely; // longjmp, return > } > > What Chet call the right size to define is possibly something like the > recursion frame size, that is the stack needed by the current recursion > level instance (its locals) + all the frame size sub function would need > until the next recurence, i.e if the call path look like > > a()-->b()-->c()-->a()... > The recursion frame size is sum(framesize(a)+framesize(b)+framesize(c)) > > Some empiric constant can also be used, for instance on linux a max > stacksize of 0x80 is pretty common deciding we big constant like > 32K could be simple and good enough. > >
Re: bash core dumps doing glob pattern on long string
On Tue, Oct 11, 2022 at 9:02 AM Martin D Kealey wrote: > Broadly I meant translating into a "preferably" Deterministic > (stackless/non-backtracking) Finite State Automaton. > > However I note that it's not always possible to make a Deterministic FSA > when you have repeatable groups which themselves don't have fixed lengths > (like a+(a|abbba|aabb*aba)b); either the extglob compiler would need to > start over and compile to a Non Deterministic (stacking) FSA, or just give > up and go back to the recursive approach. > > Personally I would favour the addition of «shopt -s dfa_extglob» that > would block the fall-back, causing extglobs that would need a stack to be > treated as magic match-never tokens. > > I say "extglob", but this would also speed up silly ordinary globs like > [a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z] > > -Martin > > Ok I see, that's true that when you see a glob pattern like +(a) one would think an optimiser could make it backtrack less. «shopt -s dfa_extglob» could be an option, Chet may say it is hard to explain when to use it... but we can go this path too, that is less of a hack regarding stack provision, but more of a hack regarding the extglob compiler :) Any implementation will please me :)
Re: bash core dumps doing glob pattern on long string
On 10/11/22 1:26 AM, Phi Debian wrote: On the contrary I see nor reference saying the 'pattern matching for pathname expansion can also be used by extension to any strings, but now I understand it is. The conditional command is explicit about it: "When the == and != operators are used, the string to the right of the operator is considered a pattern and matched according to the rules described below under Pattern Matching, as if the ext- glob shell option were enabled." The `case' command; the pattern substitution, pattern removal, and case modification word expansions; and variables like EXECIGNORE and GLOBIGNORE also apply pattern matching to arbitrary strings. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: bash core dumps doing glob pattern on long string
2022年10月11日(火) 16:22 Martin D Kealey : > However I note that it's not always possible to make a Deterministic FSA > when you have repeatable groups which themselves don't have fixed lengths > (like a+(a|abbba|aabb*aba)b); This is not true. Any non-deterministic finite automaton (NFA) can be translated into a deterministic finite automaton (DFA). The "regular expressions" that must be processed by stacking/recursion for backtracking are typically the ones that involve backreferences, \1, \2, etc., to capturing groups in the same pattern. However, such "regular expressions" no longer define regular languages and cannot be represented by NFAs (strictly speaking, unless all the referenced capturing groups accept finite languages) because we need infinite states to record the captured strings. As far as I know, glob/extglob does not have constructs that cannot be represented by formal regular languages so should always be able to be represented by NFAs and thus DFAs.
Re: bash core dumps doing glob pattern on long string
On Tue, Oct 11, 2022 at 12:38:44PM -0700, Koichi Murase wrote: > As far as I know, glob/extglob > does not have constructs that cannot be represented by formal regular > languages so should always be able to be represented by NFAs and thus DFAs. It's been too many decades since I studied this stuff in college, but earlier while reading the thread I was pondering converting extglobs into (ERE) regular expressions. Extglobs add 5 features: ?(pattern-list) Matches zero or one occurrence of the given patterns *(pattern-list) Matches zero or more occurrences of the given patterns +(pattern-list) Matches one or more occurrences of the given patterns @(pattern-list) Matches one of the given patterns !(pattern-list) Matches anything except one of the given patterns The first 4 can be converted directly into ERE without any issues at all. The last one cannot. At least not easily. The part I can't remember is whether !(foo) can be expressed in a regular language. My gut says "no", but there might be some trick that I've forgotten. A regular language has concatenation (abc), union (abc|def), and close (a*bc*) but not negation.
Re: bash core dumps doing glob pattern on long string
2022年10月11日(火) 13:12 Greg Wooledge : > On Tue, Oct 11, 2022 at 12:38:44PM -0700, Koichi Murase wrote: > > As far as I know, glob/extglob > > does not have constructs that cannot be represented by formal regular > > languages so should always be able to be represented by NFAs and thus DFAs. > > It's been too many decades since I studied this stuff in college, but > earlier while reading the thread I was pondering converting extglobs > into (ERE) regular expressions. > > Extglobs add 5 features: > > [...] > > !(pattern-list) > Matches anything except one of the given patterns > > The first 4 can be converted directly into ERE without any issues at all. > The last one cannot. At least not easily. > > The part I can't remember is whether !(foo) can be expressed in a regular > language. My gut says "no", but there might be some trick that I've > forgotten. A regular language has concatenation (abc), union (abc|def), > and close (a*bc*) but not negation. I also studied them more than ten years ago, but if I remember correctly, the negation of a regular language is also regular. First, the two representations of regular languages, concatenation+union+Kleene closure and DFA/NFA, are equivalent, meaning they can be converted either way. Then, the DFA of !() can be obtained by replacing the set of accept states in the DFA of with its complement set. I guess it would generally become complicated if represented in a combination of concat+union+closure. Other complex patterns that contain !() as a subexpression can be constructed by just replacing "!()" with that complicated "concat+union+closure". -- Koichi
Re: bash core dumps doing glob pattern on long string
On Tue, Oct 11, 2022, 3:12 PM Greg Wooledge wrote: > On Tue, Oct 11, 2022 at 12:38:44PM -0700, Koichi Murase wrote: > > As far as I know, glob/extglob > > does not have constructs that cannot be represented by formal regular > > languages so should always be able to be represented by NFAs and thus > DFAs. > > It's been too many decades since I studied this stuff in college, but > earlier while reading the thread I was pondering converting extglobs > into (ERE) regular expressions. > > Extglobs add 5 features: > > ?(pattern-list) > Matches zero or one occurrence of the given patterns > *(pattern-list) > Matches zero or more occurrences of the given patterns > +(pattern-list) > Matches one or more occurrences of the given patterns > @(pattern-list) > Matches one of the given patterns > !(pattern-list) > Matches anything except one of the given patterns > > The first 4 can be converted directly into ERE without any issues at all. > The last one cannot. At least not easily. > > The part I can't remember is whether !(foo) can be expressed in a regular > language. My gut says "no", but there might be some trick that I've > forgotten. A regular language has concatenation (abc), union (abc|def), > and close (a*bc*) but not negation. > Can't you just match and negate? >
pop_var_context msg when eval code with errexit set
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 uname output: Linux mosaic 5.18.17-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Aug 11 14:36:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.2 Patch Level: 2 Release Status: release Description: Starting version 5.2, when evaluating bash code (with eval builtin command) with 'errexit' option set, a pop_var_context message appears if the evaluating code leads to an error: test_script: line 5: pop_var_context: head of shell_variables not a function context See the reproducer test code in Repat-By section. I work on a project that provides a bash function in user environment (https://modules.sourceforge.net/). This function produces bash shell code that is evaluated in current shell session to update it. When users run their script in 'errexit' mode, I would expect not to obtain this pop_var_context message, like in previous bash versions. I first detected this issue on FreeBSD 13-1, but I have also reproduced it on a Linux environment. Repeat-By: Script to reproduce the issue: ```test_script myfunc() { eval "test 0 = 1;" } cmd="myfunc" eval "$cmd" ``` Then run this test script with 'errexit' option set: $ ./bash -e test_script test_script: line 5: pop_var_context: head of shell_variables not a function context Regards, Xavier