Re: Trailing newlines disappear

2017-06-13 Thread Peter & Kelly Passchier
On 14/06/2560 03:52, Chet Ramey wrote: > You mean command substitution cutting off the input it reads at the > occurrence of a NUL? What I am really after is a shell option for command substitution not discarding trailing newlines. Peter

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread L A Walsh
Chet Ramey wrote: On 6/13/17 8:35 PM, L A Walsh wrote: Chet Ramey wrote: That's not relevant to the issue of whether or not a particular character is classified as alphabetic in one locale and not another. That isn't relevant either. Unicode declares the categories that chara

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/13/17 9:27 PM, George wrote: > On Tue, 2017-06-13 at 20:14 -0400, Chet Ramey wrote: >> On 6/13/17 5:19 PM, tetsu...@scope-eye.net >> wrote: >>> In that case, the answer is simple: The shell swiftly rejects the >>> script, and provides a clear reason why it cann

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread George
On Tue, 2017-06-13 at 20:14 -0400, Chet Ramey wrote: > On 6/13/17 5:19 PM, tetsu...@scope-eye.net wrote: > > > > > > In that case, the answer is simple: > > > > The shell swiftly rejects the script, and provides a clear reason why > > it cannot be run. ("bash: Script requires the en_US.utf8 loca

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/13/17 8:35 PM, L A Walsh wrote: > > > Chet Ramey wrote: >> That's not relevant to the issue of whether or not a particular character >> is classified as alphabetic in one locale and not another. > That isn't relevant either. Unicode declares the categories > that characters are in -- GLOBAL

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread L A Walsh
Chet Ramey wrote: That's not relevant to the issue of whether or not a particular character is classified as alphabetic in one locale and not another. That isn't relevant either. Unicode declares the categories that characters are in -- GLOBALLY. It doesn't vary by locale. Even if a

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/13/17 7:58 PM, L A Walsh wrote: > > > Chet Ramey wrote: >> On 6/2/17 6:23 PM, L A Walsh wrote: >> >> >>> As for unsupported systems, there is a reason they are no longer >>> supported. The world is already using UTF-8. It's only a few >>> luddites clinging to ascii as a last refuge. ;-)

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/13/17 5:55 PM, tetsu...@scope-eye.net wrote: > > ...Though Apple now sticks to Bash 3.2 to avoid GPL v3 right? Makes 'em > kind of an odd use-case, and maybe a bit irrelevant with respect to the > future direction of Bash. Maybe, but that wasn't the question. Mac OS X arguably has the large

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/13/17 5:19 PM, tetsu...@scope-eye.net wrote: > > In that case, the answer is simple: > > The shell swiftly rejects the script, and provides a clear reason why > it cannot be run. ("bash: Script requires the en_US.utf8 locale which > is not installed on this system. Sorry, dude.") The shell

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/13/17 5:04 PM, tetsu...@scope-eye.net wrote: > > Well, my own reports have been to the mailing list (and just over the last > year) - to be fair, while I haven't necessarily gotten the answers I wanted > to hear, I did get answers. :) > > The "Support" and "Patches" pages on Savannah seem a

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/13/17 5:00 PM, Greg Wooledge wrote: > On Tue, Jun 13, 2017 at 04:44:08PM -0400, tetsu...@scope-eye.net wrote: >> For that to work, basically the character encoding used to interpret >> the script should be (potentially) distinct from the one used to >> interact with the rest of the system. >>

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread L A Walsh
tetsu...@scope-eye.net wrote: This is also why I think this should be an optional "encoding marker" --- Why? If it was the current encoding, it wouldn't have high-bits set. If it had any high bits set, it's fairly simple to either presume or validate the script as UTF-8, as it is self-sy

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/13/17 4:44 PM, tetsu...@scope-eye.net wrote: > > > Please excuse the top-posting, this mail client isn't very good... > > To some extent, tying the shell script language to the locale is > unavoidable. However, one of the points I was trying to make is that, in > principle, at least, this s

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread L A Walsh
Chet Ramey wrote: On 6/2/17 6:23 PM, L A Walsh wrote: As for unsupported systems, there is a reason they are no longer supported. The world is already using UTF-8. It's only a few luddites clinging to ascii as a last refuge. ;-) What display/OS do you have that you can't run UTF-8 on?

Re: Patch for unicode in varnames...

2017-06-13 Thread tetsujin
...Though Apple now sticks to Bash 3.2 to avoid GPL v3 right? Makes 'em kind of an odd use-case, and maybe a bit irrelevant with respect to the future direction of Bash. - Original Message - From: chet.ra...@case.edu To:"L A Walsh" , "Greg Wooledge" Cc:"bug-bash" , Sent:Tue, 13 Jun 2017

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread tetsujin
In that case, the answer is simple: The shell swiftly rejects the script, and provides a clear reason why it cannot be run. ("bash: Script requires the en_US.utf8 locale which is not installed on this system. Sorry, dude.") This, in my opinion, is certainly preferable over the current situation,

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread tetsujin
Well, my own reports have been to the mailing list (and just over the last year) - to be fair, while I haven't necessarily gotten the answers I wanted to hear, I did get answers. :) The "Support" and "Patches" pages on Savannah seem a bit neglected, however. Items from 2011-2015 with no status an

Re: AddressSanitizer: heap-buffer-overflow _rl_find_prev_mbchar_internal / expand_prompt

2017-06-13 Thread Eduardo Bustamante
On Tue, Jun 13, 2017 at 3:30 PM, Chet Ramey wrote: [...] > I can't reproduce it with asan or without on Mac OS X. I'll look around > for a Linux system with asan to run it on. I had to use these exact same environment variables, otherwise the out of bounds read wouldn't happen. I'm not sure if it

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Greg Wooledge
On Tue, Jun 13, 2017 at 04:44:08PM -0400, tetsu...@scope-eye.net wrote: > For that to work, basically the character encoding used to interpret > the script should be (potentially) distinct from the one used to > interact with the rest of the system. > > ...But that gets complicated: the shell woul

Re: Trailing newlines disappear

2017-06-13 Thread Chet Ramey
On 6/12/17 8:55 PM, Peter & Kelly Passchier wrote: > On 13/06/2560 02:54, Chet Ramey wrote: >> If you want to effectively change it to a newline, specify NUL as the >> line delimiter using the -d option (new in bash-4.4). > > Thanks, that sounds like a clean solution! > I only reverted to mapfile

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread tetsujin
Please excuse the top-posting, this mail client isn't very good... To some extent, tying the shell script language to the locale is unavoidable. However, one of the points I was trying to make is that, in principle, at least, this shouldn't be the case. If a script is written in a particular cha

Re: AddressSanitizer: heap-buffer-overflow _rl_find_prev_mbchar_internal / expand_prompt

2017-06-13 Thread Chet Ramey
On 6/13/17 11:14 AM, Eduardo Bustamante wrote: > It seems like this is another case of strlen reading too much. I can't reproduce it with asan or without on Mac OS X. I'll look around for a Linux system with asan to run it on. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/13/17 2:30 PM, L A Walsh wrote: > > > Chet Ramey wrote: >> On 6/1/17 8:42 PM, L A Walsh wrote: >> >>> It would be a useful upgrade besides being a "good world citizen" ;-). >>> >> >> The only way this makes sense is to extend the allowable set of >> characters from the portable charac

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/6/17 4:08 PM, L A Walsh wrote: >Thanks for the history lesson (for those of us who were too lazy to > goog it ;-)). > Still, on what OS has it shown the most growth or popularity? The entire set of Linux machines is up there, but Mac OS X (and its descendents like iOS and tvOS) has many

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/6/17 10:40 AM, PePa wrote: > On 06/06/2560 21:20, Greg Wooledge wrote: >> Scripts that can only *run* in a UTF-8 encoding-locale are a bad idea. > > Even currently, when functions in a bash script are beyond ASCII, they > can still be run anywhere. I would imagine it would be the same when >

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/6/17 4:08 AM, Peter & Kelly Passchier wrote: > > > On 06/06/2560 14:37, George wrote: >> Broadly speaking I think the approach taken in Eduardo's patch >> (interpreting the byte sequence according to the rules of its character >> encoding) is better than the approach taken in current version

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/6/17 3:37 AM, George wrote: > that approach only works if you know the > correct character encoding to use when processing the script. The information > has to be provided in the script somehow. It can be provided in the usual way: by specifying the appropriate locale via assignments to the

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/5/17 8:40 PM, Peter & Kelly Passchier wrote: > On 06/06/2560 05:39, George wrote: >> So if you had "Pokémon" as an identifier in a Latin-1-encoded script (byte >> value 0xE9 between the "k" and "m") and then tried running that script in a >> UTF-8 locale, that byte sequence (0xE9 0x6D) would

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/5/17 6:39 PM, George wrote: > If Bash did go the route of using the locale to set the character encoding of > a script, I think it would be best to have a mechanism a script can use > to define the character encoding for the whole script file up front, rather > than setting LC_CTYPE to proc

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/5/17 4:52 AM, George wrote: > the character type for the character conversion is derived from the user's > locale > (which means there's not a reliable mechanism in place to run a script in a > locale whose character encoding doesn't match that of the script.) There isn't today. The burden

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/4/17 2:47 PM, L A Walsh wrote: > >Clarification, please, but it looks like with your > patch below, Unicode in variable names might be fairly close > to being achieved? Seeing how it was done for functions, > gave you insight into how variables could be done, yes? No. It's not `don

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/4/17 3:45 AM, dualbus wrote: > I know I said I wasn't going to reply, but this changed my mind :-) > > I hadn't realized that bash already supports Unicode in function names! When not in Posix mode, bash doesn't really have any restrictions on characters that can be used in function names.

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/3/17 2:00 PM, George wrote: > As it is, Bash bug reports and > feature requests are neglected for years on end, If you have a bug report that's been neglected for that long, let me know. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brev

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/2/17 6:23 PM, L A Walsh wrote: > As for unsupported systems, there is a reason they are no longer > supported. The world is already using UTF-8. It's only a few > luddites clinging to ascii as a last refuge. ;-) > > What display/OS do you have that you can't run UTF-8 on? This is a red he

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/2/17 12:54 PM, tetsu...@scope-eye.net wrote: > I agree that allowing Unicode in parameter names is problematic: > - there are characters that should be equivalent in principle, but > aren't (For instance, the Greek letter pi (π) and the mathematical > symbol pi (𝛑) - in some fonts they may r

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread L A Walsh
Chet Ramey wrote: On 6/1/17 8:42 PM, L A Walsh wrote: It would be a useful upgrade besides being a "good world citizen" ;-). The only way this makes sense is to extend the allowable set of characters from the portable character set to the current locale. I'm guessing that if POS

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/2/17 1:16 AM, L A Walsh wrote: > > > dualbus wrote: >> >> - People then have to test the new implementation, to ensure that there >> are no regressions, and no new bugs introduced. I'm happy to volunteer >> once there's a working implementation. >> >> - There are some questions that must

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/2/17 12:52 AM, dualbus wrote: > - There are some questions that must be answered first: > > * How do you how to decode multibyte character sequences into Unicode? > Should UTF-8 be assumed? It has to be the current locale. > * Will the parsing of a script depend upon the user loca

Re: RFE: Please allow unicode ID chars in identifiers

2017-06-13 Thread Chet Ramey
On 6/1/17 8:42 PM, L A Walsh wrote: > It would be a useful upgrade besides being a "good world citizen" ;-). The only way this makes sense is to extend the allowable set of characters from the portable character set to the current locale. > I'm guessing that if POSIX was set, then they'd be limit

AddressSanitizer: heap-buffer-overflow _rl_find_prev_mbchar_internal / expand_prompt

2017-06-13 Thread Eduardo Bustamante
It seems like this is another case of strlen reading too much. dualbus@debian:~/src/gnu/bash-build$ base64 < /home/dualbus/bash-fuzzing/read-readline/output/10/crashes/id:11,sig:06,src:001239+003201,op:splice,rep:2 GwMWF/zuFQAXCxcXFwAD6FNTALwAABAAgCkZGRkZ/zpQFxkZGRkZGRcXIH/6AAD6jlxchDP8GQAB

Re: consistency probs var & function re-use

2017-06-13 Thread Greg Wooledge
On Sun, Jun 11, 2017 at 03:51:26PM -0700, L A Walsh wrote: > I was pointing out that the reason 'declare' used a newline, was that he > had not quoted the input to 'x', which is expanded to a newline > before it is assigned to 'x'. You are completely wrong here. My variable 'x' had a newline in i