Re: bind -X shows inactive bindings (bindings removed using bind -r)
> I think you misunderstood my response. Wait for the next push to the > devel git branch. > > Chet I checked the latest push to the devel branch. Thank you very much for fixing the behavior! Best regards, Koichi
[PATCH] Fix a problem that shadow `bind -x' does not work
Hi, I have still several patches related to `bind'. My previous patches are processed now so let me post them. Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security uname output: Linux hp2019 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6 14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.0 Patch Level: 11 Release Status: maint Description: When the key sequence of a binding is a prefix of other bindings (let me call it a shadow binding in this report), the shadow binding is triggered when the user input does not match any of the other bindings or there is no input within timeout specified by the readline variable `keyseq-timeout'. When such a shadow binding is the one by `bind -x', Bash fails to find the appropriate unix command and produces error message or triggers a wrong command. This is reproduced in Bash 4.4, 5.0 and current devel branch. Bash 4.3 works properly in simple cases, but it still fails in complex test cases. In Bash from 3.0 to 4.2, it causes segmentation fault. The problem is internally caused by the unupdated content of the array `rl_executing_keyseq' on unmatched user inputs or timeout. This array is used as a key sequence to find the unix command in `cmd_xmap'. Repeat-By: Test case 1: With the following settings, after typing `C-t' the command `echo world' is expected to be executed after the 500ms delay specified by the default value of the readline variable `keyseq-timeout'. However we get an error message "bash_execute_unix_command: ..." instead: $ LANG=C bash --norc $ bind -v | grep keyseq-timeout set keyseq-timeout 500 $ bind '"\C-t\C-t":"hello"' $ bind -x '"\C-t":echo world' $ <-- bash: bash_execute_unix_command: cannot find keymap for command Test case 2: When the keyseq `\C-t\C-t' binds to `bind -x', `\C-t' + delay invokes the command for `\C-t\C-t' instead fo the one for `\C-t'. $ LANG=C bash --norc $ bind -x '"\C-t\C-t":echo hello' $ bind -x '"\C-t":echo world' $ <-- hello #<-- expected result is "world" Test case 3: Similar results can also be obtained by inputting some unexpected key before the timeout comes. $ LANG=C bash --norc $ bind '"\C-t\C-t":"hello"' $ bind -x '"\C-t":echo world' $ <-- a bash: bash_execute_unix_command: cannot find keymap for command $ a Test case 4: This is just a test case for code coverage. Internally the following three errors are produced by different control paths so we need to fix all of these control paths. $ LANG=C bash --norc $ bind '"\C-t\C-t\C-t\C-t":"hello"' $ bind -x '"\C-t":echo world' $ <-- a bash: bash_execute_unix_command: cannot find keymap for command $ bash: bash_execute_unix_command: cannot find keymap for command $ bash: bash_execute_unix_command: cannot find keymap for command $ a Fix: I attach a patch. In the patch the following lines are inserted to needed places. I think in principle just `rl_key_sequence_length--' should work, but I have written as in the patch for safety. if (rl_key_sequence_length > 0) rl_executing_keyseq[--rl_key_sequence_length] = '\0'; Best regards, Koichi 0001-_rl_dispatch_subseq-update-rl_executing_keyseq-on-un.patch Description: Binary data
Re: Cygwin bash build- command substitution fails
On 12/17/19 4:00 PM, Dave Taylor wrote: Bash Version: 4.4 Patch Level: 12 Release Status: release Description: I'm trying to do a Cygwin build of the bash git repo at bminor/bash on github. The configure (no options) and make (no options) finish successfully, but the build fails when doing $()-style command substitutions, claiming that the trailing paren is unexpected: % echo $(echo hiya) bash: command substitution: line 9: syntax error near unexpected token `)' bash: command substitution: line 9: `echo hiya)' This is generally the result of using something other than bison to generate the parser. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: [PATCH] Fix a problem that shadow `bind -x' does not work
> Test case 2: > > $ LANG=C bash --norc > $ bind -x '"\C-t\C-t":echo hello' > $ bind -x '"\C-t":echo world' > $ <-- > hello #<-- expected result is "world" I'm sorry. I found that "Test case 2" has not yet fixed in the previous patch. This is the patch for the additional fix for "Test case 2". Best regards, Koichi 0002-bash_execute_unix_command-check-shadow-binding-in-cm.patch Description: Binary data
[PATCH] Fix a problem that shadow `bind -x' is not removed from `bind -X'
I found a case that some removed bindings still remain in `bind -X' after the fix. Here is the report. Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security uname output: Linux hp2019 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6 14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.0 Patch Level: 11 Release Status: maint Description: The command string for shadow `bind -x' key binding is not removed from corresponding cmd_xmap and therefore remains in the list of `bind -X'. Repeat-By: With the following command, one can create shadow binding for `\C-t' and remove the binding. The binding is in fact removed and inactive after the unbind, but remains in the output of `bind -X'. $ bind '"\C-t\C-t\C-t\C-t":"hello"' $ bind -x '"\C-t":echo world' $ bind -r '\C-t' $ bind -X "\C-t": "echo world" Fix: I attach a patch. In the patch, if the original binding corresponding to removed keyseq is `ISKMAP', its shadow entry `map[ANYOTHERKEY].function' is also checked if it is `bash_execute_unix_command'. Thank you, Koichi 0001-unbind_keyseq-check-shadow-bindings-for-bash_execute.patch Description: Binary data
[PATCH] Fix a problem `rl_bind_key' cannot create shadow binding for `C-@'
This is another report. Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security uname output: Linux hp2019 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6 14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.0 Patch Level: 11 Release Status: maint Description: One of the public interface of readline, the function `rl_bind_key (key, function)' does not work with key = 0 (C-@) when there are already bindings of keyseqs starting from "\C-@". This is because when `rl_bind_key' calls `rl_generic_bind', it fails to construct an appropriate untranslated keyseq for "\C-@". Repeat-By: The function `rl_bind_key' is not widely used by current Bash codes, but to see the problem caused by this bug, one can use an older form of bind 'C-SPC:...' to register a shadow binding. $ LANG=C ./bash-3a7c642e --norc $ bind '"\C-@\C-@":"hello"' $ bind 'C-SPC:backward-char' $ echo* #<-- (* is the cursor position) In the above example, the expected result is `ech*o' with `*' being the cursor position after the timeout, but the cursor does not move. But with the following newer form, we can get the expected result: $ LANG=C ./bash-3a7c642e --norc $ bind '"\C-@\C-@":"hello"' $ bind '"\C-@":backward-char' Fix: I attach a patch `0001-patch'. In the patch, the key '\0' is treated specially similarly to the key '\\'. By the way I think there is a memory leak in the same function. Could you check the second attached patch `0002-patch'? I think if the original binding is a macro the memory block should be released before the pointer is overwritten. Actually I'm not quite sure, but at least in a similar function `rl_generic_bind', the macro string is released. I think there is a memory leak also in the `rl_generic_bind'. The shadow macro which is stored in `map[ANYOTHERKEY].function' is not released before the overwrite. See the third patch `0003-patch'. Thank you, Koichi 0001-rl_bind_key-support-C.patch Description: Binary data 0002-rl_bind_key-free-macro-strings.patch Description: Binary data 0003-rl_generic_bind-fix-memleak.patch Description: Binary data
Unicode range and enumeration support.
On 2019/12/16 08:39, Greg Wooledge wrote: On Sat, Dec 14, 2019 at 02:48:16AM -0800, L A Walsh wrote: On 2019/12/13 10:42, Greg Wooledge wrote: There's a larger issue to be addressed first. The man page says, [...] sary. When characters are supplied, the expression expands to each character lexicographically between x and y, inclusive, using the de‐ fault C locale. If it says letters that lends stronger support to including unicode ranges of letters and numbers since the shell handles unicode and brace expansions with unicode filenames works just fine. That ranges don't seems a bit of a wart. No, it won't include Unicode, because it very clearly says "C locale" right up there. At one point in time, Bash only supported the C locale for display and input. That isn't the case in the current Bash. Just because it wasn't so in the past, doesn't mean things can't or won't change in the future. If that was true we wouldn't have computers. The problem is, it is *not possible* to extract the set of characters out of an arbitrary locale. The locale interfaces simply are not built to allow it. You can do it in the C locale, simply because the C locale is a known, fixed quantity that you can hard-code. You can't do it in any other locale. You can do it in Perl, JavaScript, Python, Ruby C, C++ among others, where range matching support has support for identifying characters of a specific type out of arbitrary locales. For example (from https://www.regular-expressions.info/unicode.html): \p{L} or \p{Letter}: any kind of letter from any language. \p{Ll} or \p{Lowercase_Letter}: a lowercase letter that has an uppercase variant. \p{Lu} or \p{Uppercase_Letter}: an uppercase letter that has a lowercase variant. ... \p{Math_Symbol}: any mathematical symbol. \p{N} or \p{Number}: any kind of numeric character in any script. \p{Nd} or \p{Decimal_Digit_Number}: a digit zero through nine in any script except ideographic scripts. Those can be cross-sectioned with script-name properties from any script in Unicode (Common, Arabic, Braille, Cherokee, Devangari...Thai, Tibetan, Ya). The list of support is very extensive. Tables are published in machine readable form that are used to build support to allow range matching and enumeration for a huge number of characters. I.e. you can do it in pretty much any locale supported by Unicode, not just the C language. I can't begin to list all the references for this, but just googling on: "programming language support for ranges of numbers or alphabets in unicode" will show a huge number of references. Such features could be put in [a] loadable module[s], or made "includable" at build time to manage memory if desired/needed. OTOH, I already said if one didn't want to do ranges, one could follow the easier path (I think) and allow any arbitrary unicode range to be enumerated while ensuring quoting of ASCII-ranged meta characters.
Re: Unicode range and enumeration support.
On Wed, Dec 18, 2019 at 11:15:46AM -0800, L A Walsh wrote: > On 2019/12/16 08:39, Greg Wooledge wrote: > > The problem is, it is *not possible* to extract the set of characters > > out of an arbitrary locale. The locale interfaces simply are not built > > to allow it. > > > > You can do it in the C locale, simply because the C locale is a known, > > fixed quantity that you can hard-code. You can't do it in any other locale. >You can do it in Perl, JavaScript, Python, Ruby C, C++ among others, > [...] > \p{L} or \p{Letter}: any kind of letter from any language. > \p{Ll} or \p{Lowercase_Letter}: a lowercase letter > that has an uppercase variant. You misunderstood me, or perhaps I wasn't clear enough. I agree that if you are GIVEN a character as input, you can determine whether that character is a letter, or a lowercase letter (etc.) in the current locale. What you CANNOT do[1] is GENERATE all of the lowercase letters (etc.) in the current locale. To put it another way: you can write code that determines whether an input character $c matches a glob or regex like [Z-a]. (Maybe.) But, you CANNOT write code to generate all of the characters from Z to a. Since this thread is about brace expansion, which must generate characters, the feature you're looking for is simply impossible, to the best of my knowledge. (I'd be delighted for you to prove me wrong. Show me how to generate all of the :alpha: characters in the en_US.utf8 locale in perl, or python, or any other language.) [1] The only way I know to get that information would be to take as input *every conceivable character*, and, one by one, check whether each of those characters matches the :alpha: class. Such a brute force solution is not in the spirit of the mission. As such, I'll save you the time and do that part myself. wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c "\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s "$c"; fi; done; echo ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈ Obviously I did not use *every conceivable character* as input -- just a couple hundred, a completely arbitrary cut-off point, because this is just a proof of concept. Trawling the entire Unicode code point space is left as an adventure for braver souls than mine. As is comparing the different locales on a system, or the same locale between different operating systems. Sorting these characters is also possible, once they have been generated. This is (I think!) what allows things like [Z-a] to work at all: you can check whether $c is >= 'Z' and <= 'a', without knowing what all of the characters in between are. But you can't ask "what comes after Z". wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c "\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s\\n "$c"; fi; done | sort | tr -d \\n; echo aAªÁÀÂÅÄÃÆbBcCÇdDeEÈfFgGhHiIjJkKlLmMnNoOºpPqQrRsStTuUvVwWxXyYzZµ Again, this is only PART of the set, and is not intended to be a complete enumeration of the :alpha: characters in my system's locale.
Re: Unicode range and enumeration support.
On 12/18/19 2:46 PM, Greg Wooledge wrote: > Sorting these characters is also possible, once they have been generated. > This is (I think!) what allows things like [Z-a] to work at all: you > can check whether $c is >= 'Z' and <= 'a', without knowing what all of > the characters in between are. But you can't ask "what comes after Z". > > wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c > "\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s\\n "$c"; fi; done | sort > | tr -d \\n; echo > aAªÁÀÂÅÄÃÆbBcCÇdDeEÈfFgGhHiIjJkKlLmMnNoOºpPqQrRsStTuUvVwWxXyYzZµ > > Again, this is only PART of the set, and is not intended to be a > complete enumeration of the :alpha: characters in my system's locale. There's no need to sort ASCII characters, though, since the collation order of [A-z] in the C locale is defined by their numeric codepoint order. That is a guarantee that doesn't follow through in other locales. So all bash needs to do to print {Z..a} is to take Z == ASCII decimal 90 and a == ASCII decimal 97, then enumerate the numbers 90-97 and translate them into ascii. No locale awareness is needed, no heuristics, no invocation of the locale subsystem, you don't even need to hardcode the ASCII range in source code. And that's why bash can support enumerating a range of ASCII characters in LC_COLLATE=C order, when it cannot (easily) do so using other locales. -- Eli Schwartz Arch Linux Bug Wrangler and Trusted User signature.asc Description: OpenPGP digital signature
Re: Unicode range and enumeration support.
On Wed, Dec 18, 2019 at 03:08:20PM -0500, Eli Schwartz wrote: > So all bash needs to do to print {Z..a} is to take Z == ASCII decimal 90 > and a == ASCII decimal 97, then enumerate the numbers 90-97 and > translate them into ascii. No locale awareness is needed, no heuristics, > no invocation of the locale subsystem, you don't even need to hardcode > the ASCII range in source code. Until you want to use bash on an EBCDIC system. ;-) > And that's why bash can support enumerating a range of ASCII characters > in LC_COLLATE=C order, when it cannot (easily) do so using other locales. Yup.
Re: Unicode range and enumeration support.
On 12/18/19 3:13 PM, Greg Wooledge wrote: > On Wed, Dec 18, 2019 at 03:08:20PM -0500, Eli Schwartz wrote: >> So all bash needs to do to print {Z..a} is to take Z == ASCII decimal 90 >> and a == ASCII decimal 97, then enumerate the numbers 90-97 and >> translate them into ascii. No locale awareness is needed, no heuristics, >> no invocation of the locale subsystem, you don't even need to hardcode >> the ASCII range in source code. > > Until you want to use bash on an EBCDIC system. ;-) Oof, that was mean. :p (Also, why does this still exist.) (But I guess we all realize that this just means bash needs to rely on the existing support for translating the ASCII locale, and still doesn't need to enumerate a lookup code of characters for this especial purpose.) >> And that's why bash can support enumerating a range of ASCII characters >> in LC_COLLATE=C order, when it cannot (easily) do so using other locales. > > Yup. > -- Eli Schwartz Arch Linux Bug Wrangler and Trusted User signature.asc Description: OpenPGP digital signature
read -t 0 fails to detect input.
It seems that read -t 0 should detect if there is input from a pipe (and others). >From man bash: >> If timeout is 0, read returns immediately, without trying to read any data. >> The exit status is 0 if input is available on the specified file descriptor, non-zero otherwise. So, it seems that this should print 1: $ true | read -t 0 var; echo $? 1 And this should print 0 (input available), but it doesn't (most of the time). $ echo value | read -t 0 var ; echo $? 1 A little delay seems to get it working: $ echo value | { read -t 0 var; } ; echo $? 0 Related: Comment to what is wrong with read -t 0: https://unix.stackexchange.com/questions/33049/how-to-check-if-a-pipe-is-empty-and-run-a-command-on-the-data-if-it-isnt/498065?noredirect=1#comment916652_497121 Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -fdebug-prefix-map=/build/bash-2bxm7h/bash-5.0=. -fstack-protector-strong -Wformat -We$ uname output: Linux iodeb 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.0 Patch Level: 3 Release Status: release
Re: read -t 0 fails to detect input.
Date:Wed, 18 Dec 2019 19:40:45 -0400 From:Bize Ma Message-ID: | A little delay seems to get it working: | | $ echo value | { read -t 0 var; } ; echo $? | 0 It might, but that is adding no significant delay, and the results are unpredictable. jinx$ echo value | { read -t 0 var; } ; echo $? 0 jinx$ echo value | { read -t 0 var; } ; echo $? 0 jinx$ echo value | { read -t 0 var; } ; echo $? 1 jinx$ echo value | { read -t 0 var; } ; echo $? 1 jinx$ echo value | { read -t 0 var; } ; echo $? 0 jinx$ echo value | { read -t 0 var; } ; echo $? 1 jinx$ echo value | { read -t 0 var; } ; echo $? 1 jinx$ echo value | { read -t 0 var; } ; echo $? 0 jinx$ echo value | { read -t 0 var; } ; echo $? 1 jinx$ echo value | { read -t 0 var; } ; echo $? 0 jinx$ echo value | { read -t 0 var; } ; echo $? 1 jinx$ echo value | { read -t 0 var; } ; echo $? 0 It is all just a race condition - there's nothing specifying which side of the pipe starts running first. kre