Re: bind -X shows inactive bindings (bindings removed using bind -r)

2019-12-18 Thread Koichi Murase
> I think you misunderstood my response. Wait for the next push to the
> devel git branch.
>
> Chet

I checked the latest push to the devel branch.
Thank you very much for fixing the behavior!

Best regards,
Koichi



[PATCH] Fix a problem that shadow `bind -x' does not work

2019-12-18 Thread Koichi Murase
Hi, I have still several patches related to `bind'. My previous
patches are processed now so let me post them.

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security
uname output: Linux hp2019 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6
14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.0
Patch Level: 11
Release Status: maint

Description:

  When the key sequence of a binding is a prefix of other bindings
  (let me call it a shadow binding in this report), the shadow binding
  is triggered when the user input does not match any of the other
  bindings or there is no input within timeout specified by the
  readline variable `keyseq-timeout'. When such a shadow binding is
  the one by `bind -x', Bash fails to find the appropriate unix
  command and produces error message or triggers a wrong command.

  This is reproduced in Bash 4.4, 5.0 and current devel branch. Bash
  4.3 works properly in simple cases, but it still fails in complex
  test cases. In Bash from 3.0 to 4.2, it causes segmentation
  fault.

  The problem is internally caused by the unupdated content of the array
  `rl_executing_keyseq' on unmatched user inputs or timeout. This array
  is used as a key sequence to find the unix command in `cmd_xmap'.

Repeat-By:

  Test case 1:

  With the following settings, after typing `C-t' the command `echo
  world' is expected to be executed after the 500ms delay specified by
  the default value of the readline variable `keyseq-timeout'. However
  we get an error message "bash_execute_unix_command: ..." instead:

  $ LANG=C bash --norc
  $ bind -v | grep keyseq-timeout
  set keyseq-timeout 500
  $ bind '"\C-t\C-t":"hello"'
  $ bind -x '"\C-t":echo world'
  $ <-- 
  bash: bash_execute_unix_command: cannot find keymap for command


  Test case 2:

  When the keyseq `\C-t\C-t' binds to `bind -x', `\C-t' + delay
  invokes the command for `\C-t\C-t' instead fo the one for `\C-t'.

  $ LANG=C bash --norc
  $ bind -x '"\C-t\C-t":echo hello'
  $ bind -x '"\C-t":echo world'
  $ <-- 
  hello #<-- expected result is "world"


  Test case 3:

  Similar results can also be obtained by inputting some unexpected
  key before the timeout comes.

  $ LANG=C bash --norc
  $ bind '"\C-t\C-t":"hello"'
  $ bind -x '"\C-t":echo world'
  $ <-- a
  bash: bash_execute_unix_command: cannot find keymap for command
  $ a

  Test case 4:

  This is just a test case for code coverage. Internally the following
  three errors are produced by different control paths so we need to
  fix all of these control paths.

  $ LANG=C bash --norc
  $ bind '"\C-t\C-t\C-t\C-t":"hello"'
  $ bind -x '"\C-t":echo world'
  $ <-- a
  bash: bash_execute_unix_command: cannot find keymap for command
  $
  bash: bash_execute_unix_command: cannot find keymap for command
  $
  bash: bash_execute_unix_command: cannot find keymap for command
  $ a


Fix:

  I attach a patch. In the patch the following lines are inserted to
  needed places. I think in principle just `rl_key_sequence_length--'
  should work, but I have written as in the patch for safety.

if (rl_key_sequence_length > 0)
  rl_executing_keyseq[--rl_key_sequence_length] = '\0';

Best regards,
Koichi


0001-_rl_dispatch_subseq-update-rl_executing_keyseq-on-un.patch
Description: Binary data


Re: Cygwin bash build- command substitution fails

2019-12-18 Thread Chet Ramey

On 12/17/19 4:00 PM, Dave Taylor wrote:


Bash Version: 4.4
Patch Level: 12
Release Status: release

Description:
I'm trying to do a Cygwin build of the bash git repo at bminor/bash on
github.

The configure (no options) and make (no options) finish successfully, but
the build fails when doing $()-style command substitutions, claiming that
the trailing paren is unexpected:

% echo $(echo hiya)
bash: command substitution: line 9: syntax error near unexpected token `)'
bash: command substitution: line 9: `echo hiya)'


This is generally the result of using something other than bison to
generate the parser.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [PATCH] Fix a problem that shadow `bind -x' does not work

2019-12-18 Thread Koichi Murase
>   Test case 2:
>
>   $ LANG=C bash --norc
>   $ bind -x '"\C-t\C-t":echo hello'
>   $ bind -x '"\C-t":echo world'
>   $ <-- 
>   hello #<-- expected result is "world"

I'm sorry. I found that "Test case 2" has not yet fixed in the
previous patch. This is the patch for the additional fix for "Test
case 2".

Best regards,
Koichi


0002-bash_execute_unix_command-check-shadow-binding-in-cm.patch
Description: Binary data


[PATCH] Fix a problem that shadow `bind -x' is not removed from `bind -X'

2019-12-18 Thread Koichi Murase
I found a case that some removed bindings still remain in `bind -X'
after the fix. Here is the report.

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security
uname output: Linux hp2019 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6
14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.0
Patch Level: 11
Release Status: maint

Description:

  The command string for shadow `bind -x' key binding is not removed
  from corresponding cmd_xmap and therefore remains in the list of
  `bind -X'.

Repeat-By:

  With the following command, one can create shadow binding for `\C-t'
  and remove the binding. The binding is in fact removed and inactive
  after the unbind, but remains in the output of `bind -X'.

  $ bind '"\C-t\C-t\C-t\C-t":"hello"'
  $ bind -x '"\C-t":echo world'
  $ bind -r '\C-t'
  $ bind -X
  "\C-t": "echo world"

Fix:

  I attach a patch. In the patch, if the original binding
  corresponding to removed keyseq is `ISKMAP', its shadow entry
  `map[ANYOTHERKEY].function' is also checked if it is
  `bash_execute_unix_command'.

Thank you,
Koichi


0001-unbind_keyseq-check-shadow-bindings-for-bash_execute.patch
Description: Binary data


[PATCH] Fix a problem `rl_bind_key' cannot create shadow binding for `C-@'

2019-12-18 Thread Koichi Murase
This is another report.

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security
uname output: Linux hp2019 5.2.13-200.fc30.x86_64 #1 SMP Fri Sep 6
14:30:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.0
Patch Level: 11
Release Status: maint

Description:

  One of the public interface of readline, the function
  `rl_bind_key (key, function)' does not work with key = 0 (C-@) when
  there are already bindings of keyseqs starting from "\C-@".
  This is because when `rl_bind_key' calls `rl_generic_bind',
  it fails to construct an appropriate untranslated keyseq for "\C-@".

Repeat-By:

  The function `rl_bind_key' is not widely used by current Bash codes,
  but to see the problem caused by this bug, one can use an older form
  of bind 'C-SPC:...' to register a shadow binding.

  $ LANG=C ./bash-3a7c642e --norc
  $ bind '"\C-@\C-@":"hello"'
  $ bind 'C-SPC:backward-char'
  $ echo* #<--  (* is the cursor position)

  In the above example, the expected result is `ech*o' with `*' being
  the cursor position after the timeout, but the cursor does not move.
  But with the following newer form, we can get the expected result:

  $ LANG=C ./bash-3a7c642e --norc
  $ bind '"\C-@\C-@":"hello"'
  $ bind '"\C-@":backward-char'

Fix:

  I attach a patch `0001-patch'. In the patch, the key '\0' is
  treated specially similarly to the key '\\'.

  By the way I think there is a memory leak in the same
  function. Could you check the second attached patch
  `0002-patch'? I think if the original binding is a macro the
  memory block should be released before the pointer is overwritten.
  Actually I'm not quite sure, but at least in a similar function
  `rl_generic_bind', the macro string is released.

  I think there is a memory leak also in the `rl_generic_bind'.  The
  shadow macro which is stored in `map[ANYOTHERKEY].function' is not
  released before the overwrite. See the third patch `0003-patch'.

Thank you,
Koichi


0001-rl_bind_key-support-C.patch
Description: Binary data


0002-rl_bind_key-free-macro-strings.patch
Description: Binary data


0003-rl_generic_bind-fix-memleak.patch
Description: Binary data


Unicode range and enumeration support.

2019-12-18 Thread L A Walsh

On 2019/12/16 08:39, Greg Wooledge wrote:

On Sat, Dec 14, 2019 at 02:48:16AM -0800, L A Walsh wrote:
  

On 2019/12/13 10:42, Greg Wooledge wrote:


There's a larger issue to be addressed first.  The man page says,
[...]
sary.  When characters are supplied, the  expression  expands  to  each
character  lexicographically  between x and y, inclusive, using the de‐
fault C locale.
  


  


   If it says letters that lends stronger support to including
unicode ranges of letters and numbers since the shell handles unicode and
brace expansions with unicode filenames works just fine.  That ranges don't
seems a bit of a wart.



No, it won't include Unicode, because it very clearly says "C locale"
right up there.
  


   At one point in time, Bash only supported the C locale for display 
and input.

That isn't the case in the current Bash.  Just because it wasn't so in the
past, doesn't mean things can't or won't change in the future.  If that 
was true

we wouldn't have computers.

The problem is, it is *not possible* to extract the set of characters
out of an arbitrary locale.  The locale interfaces simply are not built
to allow it.

You can do it in the C locale, simply because the C locale is a known,
fixed quantity that you can hard-code.  You can't do it in any other locale.
  


   You can do it in Perl, JavaScript, Python, Ruby C, C++ among others,
where range matching support has support for identifying characters of
a specific type out of arbitrary locales.  For example (from
https://www.regular-expressions.info/unicode.html):


\p{L} or \p{Letter}: any kind of letter from any language.
\p{Ll} or \p{Lowercase_Letter}: a lowercase letter
that has an uppercase variant.
\p{Lu} or \p{Uppercase_Letter}: an uppercase letter
that has a lowercase variant.
 ...
   \p{Math_Symbol}: any mathematical symbol.
\p{N} or \p{Number}: any kind of numeric character in any script.

   \p{Nd} or \p{Decimal_Digit_Number}: a digit zero through nine in any
   script except ideographic scripts.


   Those can be cross-sectioned with script-name properties from any
script in Unicode (Common, Arabic, Braille, Cherokee, Devangari...Thai,
Tibetan, Ya).  The list of support is very extensive.  Tables are
published in machine readable form that are used to build support to allow
range matching and enumeration for a huge number of characters.

   I.e. you can do it in pretty much any locale supported by Unicode, not
just the C language.  I can't begin to list all the references for this,
but just googling on:

"programming language support for ranges of numbers or alphabets in
unicode"

will show a huge number of references.

Such features could be put in [a] loadable module[s], or made "includable"
at build time to manage memory if desired/needed.

   OTOH, I already said if one didn't want to do ranges, one could follow
the easier path (I think) and allow any arbitrary unicode range to be
enumerated while ensuring quoting of ASCII-ranged meta characters.











Re: Unicode range and enumeration support.

2019-12-18 Thread Greg Wooledge
On Wed, Dec 18, 2019 at 11:15:46AM -0800, L A Walsh wrote:
> On 2019/12/16 08:39, Greg Wooledge wrote:
> > The problem is, it is *not possible* to extract the set of characters
> > out of an arbitrary locale.  The locale interfaces simply are not built
> > to allow it.
> > 
> > You can do it in the C locale, simply because the C locale is a known,
> > fixed quantity that you can hard-code.  You can't do it in any other locale.

>You can do it in Perl, JavaScript, Python, Ruby C, C++ among others,
> [...]
> \p{L} or \p{Letter}: any kind of letter from any language.
> \p{Ll} or \p{Lowercase_Letter}: a lowercase letter
> that has an uppercase variant.

You misunderstood me, or perhaps I wasn't clear enough.

I agree that if you are GIVEN a character as input, you can determine
whether that character is a letter, or a lowercase letter (etc.) in
the current locale.

What you CANNOT do[1] is GENERATE all of the lowercase letters (etc.) in
the current locale.

To put it another way: you can write code that determines whether
an input character $c matches a glob or regex like [Z-a].  (Maybe.)

But, you CANNOT write code to generate all of the characters from Z to a.

Since this thread is about brace expansion, which must generate
characters, the feature you're looking for is simply impossible, to
the best of my knowledge.  (I'd be delighted for you to prove me
wrong.  Show me how to generate all of the :alpha: characters in the
en_US.utf8 locale in perl, or python, or any other language.)

[1] The only way I know to get that information would be to take as input
*every conceivable character*, and, one by one, check whether each
of those characters matches the :alpha: class.  Such a brute force
solution is not in the spirit of the mission.  As such, I'll save you
the time and do that part myself.

wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c 
"\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s "$c"; fi; done; echo
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈ

Obviously I did not use *every conceivable character* as input -- just
a couple hundred, a completely arbitrary cut-off point, because this is
just a proof of concept.  Trawling the entire Unicode code point space
is left as an adventure for braver souls than mine.  As is comparing
the different locales on a system, or the same locale between different
operating systems.

Sorting these characters is also possible, once they have been generated.
This is (I think!) what allows things like [Z-a] to work at all: you
can check whether $c is >= 'Z' and <= 'a', without knowing what all of
the characters in between are.  But you can't ask "what comes after Z".

wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c 
"\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s\\n "$c"; fi; done | sort | 
tr -d \\n; echo
aAªÁÀÂÅÄÃÆbBcCÇdDeEÈfFgGhHiIjJkKlLmMnNoOºpPqQrRsStTuUvVwWxXyYzZµ

Again, this is only PART of the set, and is not intended to be a
complete enumeration of the :alpha: characters in my system's locale.



Re: Unicode range and enumeration support.

2019-12-18 Thread Eli Schwartz
On 12/18/19 2:46 PM, Greg Wooledge wrote:
> Sorting these characters is also possible, once they have been generated.
> This is (I think!) what allows things like [Z-a] to work at all: you
> can check whether $c is >= 'Z' and <= 'a', without knowing what all of
> the characters in between are.  But you can't ask "what comes after Z".
> 
> wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c 
> "\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s\\n "$c"; fi; done | sort 
> | tr -d \\n; echo
> aAªÁÀÂÅÄÃÆbBcCÇdDeEÈfFgGhHiIjJkKlLmMnNoOºpPqQrRsStTuUvVwWxXyYzZµ
> 
> Again, this is only PART of the set, and is not intended to be a
> complete enumeration of the :alpha: characters in my system's locale.

There's no need to sort ASCII characters, though, since the collation
order of [A-z] in the C locale is defined by their numeric codepoint
order. That is a guarantee that doesn't follow through in other locales.

So all bash needs to do to print {Z..a} is to take Z == ASCII decimal 90
and a == ASCII decimal 97, then enumerate the numbers 90-97 and
translate them into ascii. No locale awareness is needed, no heuristics,
no invocation of the locale subsystem, you don't even need to hardcode
the ASCII range in source code.

And that's why bash can support enumerating a range of ASCII characters
in LC_COLLATE=C order, when it cannot (easily) do so using other locales.

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User



signature.asc
Description: OpenPGP digital signature


Re: Unicode range and enumeration support.

2019-12-18 Thread Greg Wooledge
On Wed, Dec 18, 2019 at 03:08:20PM -0500, Eli Schwartz wrote:
> So all bash needs to do to print {Z..a} is to take Z == ASCII decimal 90
> and a == ASCII decimal 97, then enumerate the numbers 90-97 and
> translate them into ascii. No locale awareness is needed, no heuristics,
> no invocation of the locale subsystem, you don't even need to hardcode
> the ASCII range in source code.

Until you want to use bash on an EBCDIC system. ;-)

> And that's why bash can support enumerating a range of ASCII characters
> in LC_COLLATE=C order, when it cannot (easily) do so using other locales.

Yup.



Re: Unicode range and enumeration support.

2019-12-18 Thread Eli Schwartz
On 12/18/19 3:13 PM, Greg Wooledge wrote:
> On Wed, Dec 18, 2019 at 03:08:20PM -0500, Eli Schwartz wrote:
>> So all bash needs to do to print {Z..a} is to take Z == ASCII decimal 90
>> and a == ASCII decimal 97, then enumerate the numbers 90-97 and
>> translate them into ascii. No locale awareness is needed, no heuristics,
>> no invocation of the locale subsystem, you don't even need to hardcode
>> the ASCII range in source code.
> 
> Until you want to use bash on an EBCDIC system. ;-)

Oof, that was mean. :p (Also, why does this still exist.)

(But I guess we all realize that this just means bash needs to rely on
the existing support for translating the ASCII locale, and still doesn't
need to enumerate a lookup code of characters for this especial purpose.)

>> And that's why bash can support enumerating a range of ASCII characters
>> in LC_COLLATE=C order, when it cannot (easily) do so using other locales.
> 
> Yup.
> 


-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User



signature.asc
Description: OpenPGP digital signature


read -t 0 fails to detect input.

2019-12-18 Thread Bize Ma
It seems that read -t 0 should detect if there is input from a pipe (and
others).

>From man bash:

>> If  timeout is 0, read returns immediately, without trying to read any
data.
>> The exit status is 0 if input is available on the specified file
 descriptor, non-zero otherwise.

So, it seems that this should print 1:

$ true | read -t 0 var; echo $?
1

And this should print 0 (input available), but it doesn't (most of the
time).

$ echo value | read -t 0 var ; echo $?
1

A little delay seems to get it working:

$ echo value | { read -t 0 var; } ; echo $?
0

Related: Comment to what is wrong with read -t 0:
https://unix.stackexchange.com/questions/33049/how-to-check-if-a-pipe-is-empty-and-run-a-command-on-the-data-if-it-isnt/498065?noredirect=1#comment916652_497121

Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -fdebug-prefix-map=/build/bash-2bxm7h/bash-5.0=.
-fstack-protector-strong -Wformat -We$
uname output: Linux iodeb 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2
(2019-11-11) x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.0
Patch Level: 3
Release Status: release


Re: read -t 0 fails to detect input.

2019-12-18 Thread Robert Elz
Date:Wed, 18 Dec 2019 19:40:45 -0400
From:Bize Ma 
Message-ID:  


  | A little delay seems to get it working:
  |
  | $ echo value | { read -t 0 var; } ; echo $?
  | 0

It might, but that is adding no significant delay, and the
results are unpredictable.

jinx$ echo value | { read -t 0 var; } ; echo $?
0
jinx$ echo value | { read -t 0 var; } ; echo $?
0
jinx$ echo value | { read -t 0 var; } ; echo $?
1
jinx$ echo value | { read -t 0 var; } ; echo $?
1
jinx$ echo value | { read -t 0 var; } ; echo $?
0
jinx$ echo value | { read -t 0 var; } ; echo $?
1
jinx$ echo value | { read -t 0 var; } ; echo $?
1
jinx$ echo value | { read -t 0 var; } ; echo $?
0
jinx$ echo value | { read -t 0 var; } ; echo $?
1
jinx$ echo value | { read -t 0 var; } ; echo $?
0
jinx$ echo value | { read -t 0 var; } ; echo $?
1
jinx$ echo value | { read -t 0 var; } ; echo $?
0

It is all just a race condition - there's nothing specifying which
side of the pipe starts running first.

kre