Re: RFE: Please allow unicode ID chars in identifiers

2017-06-03 Thread L A Walsh


"Every technical field -- and most nontechnical fields -- has developed
 conventional shorthand notation to make convenient the presentation and
 discussion involving frequently used concepts.  For example, because 
of long

 acquaintance,

 x+y*z

 is clearer to us than

 multiply y by z and add the result to x

 It is hard to overestimate the importance of concise notation for common
 operations."
-

The symbols of specific fields and common to speakers in other languages 
express their meaning more concisely than spelling out words that may or 
may not have the meaning associated with the symbol.  Indeed, it is said 
that a picture is worth a thousand words.  In this case the exact symbol 
is worth more than associated words that are spelled out.


It's thought that having to translate eidetic concepts to abstract 
letters interferes with comprehension and communication.


If those aren't valid -- and essential reasons to support those extended 
symbols, I don't know what would be.


The passage above I stumbled upon by accident while reviewing concept
in the author's book.  The author: Bjarne Stroustrup in the C++11, fourth
edition of "The C++ Programming Language". 

You may not believe me, but Stroustrup is saying the same thing.  The 
symbols native to a discipline provide for faster and more accurate 
communication --

as in getting across concepts.

Also to respond to tetsujin: shorter normalization forms are preferred over
longer forms. There are formulaic ways of determining the proper form that
are well suited to a computer that can apply the rules very quickly.

Some conventions regarding character set usage have already been "solved"
and encoded in binary properties of the characters.  For example, the
start and continue "ID" properties are best associated with names used
for variables.  And while the symbol for pi (𝛑) may look similar, it would
most likely be used where numbers (and numeric constants) are used, while
the greek letter would be used in identifiers.

I'm glad some people are willing to discuss things rather than run around
asserting that the sky will *cost* something ... Whether or not it cost
something shouldn't prevent people from forming ideas that that they might
find desirable in the future.  It certainly doesn't mean such features are
expected next month, or even "anytime" by a specific person (if they have
no interest in the work) I'd support them not doing it as long as they
allow someone more interested to do the work.  Being open to doing it
yourself isn't required to be open to seeing something grow in specific
directions...

-l










Re: RFE: Please allow unicode ID chars in identifiers

2017-06-03 Thread George
On Sat, 2017-06-03 at 01:20 -0700, L A Walsh wrote:
> Some conventions regarding character set usage have already been "solved"
> and encoded in binary properties of the characters.  For example, the
> start and continue "ID" properties are best associated with names used
> for variables. 
Referring to "Unicode Identifier and Pattern Syntax", I guess:
http://unicode.org/reports/tr31
>  And while the symbol for pi (𝛑) may look similar, it would
> most likely be used where numbers (and numeric constants) are used, while
> the greek letter would be used in identifiers.
> 
> 

On some fonts, the two may look not only similar, but identical. I brought it 
up mainly to acknowledge that there are some issues in Unicode handling
that make this issue potentially more complicated than "don't disallow it". In 
my opinion, to do it right the syntax should recognize Unicode
whitespace at least, which in turn requires an explicit decision that the 
source text is Unicode (and not generically-processed byte sequences outside
the ASCII range)
(The question of whether to support Unicode mathematical operators or Unicode 
quotation marks introduces a lot of interesting possibilities - but
that's a whole other ball of wax)
Problems like Unicode normalization, and questions of what constitutes 
equivalency in the scope of the programming language can complicate things.
There's a series of trade-offs between keeping the implementation relatively 
simple vs. supporting equivalency where the user may reasonably expect
it. There's potential for something like this to get complicated, depending on 
how far the design goes in supporting equivalency for semi-redundant
Unicode characters. I don't think a programming language necessarily needs to 
go too far down that rabbit hole, really, but I wanted to acknowledge
the issue.
> I'm glad some people are willing to discuss things rather than run around
> asserting that the sky will *cost* something ... Whether or not it cost
> something shouldn't prevent people from forming ideas that that they might
> find desirable in the future.  It certainly doesn't mean such features are
> expected next month, or even "anytime" by a specific person (if they have
> no interest in the work) I'd support them not doing it as long as they
> allow someone more interested to do the work.  Being open to doing it
> yourself isn't required to be open to seeing something grow in specific
> directions...
> 
> 

To be fair, that reaction is understandable. When someone like me shows up and 
starts trying to influence the direction of a project, even when I
contribute code to produce the changes I want to see in the project, will I be 
the one maintaining that code in 5 years? To be quite honest in my case
the answer is almost certainly "no". When I advocate for a feature to be added 
to Bash, I am probably not going to be paying the bulk of the price for
it. Willingness to contribute code is a very low bar.
But speaking vaguely about "the future" and open-ended expectations with 
respect to time is probably counterproductive. As it is, Bash bug reports and
feature requests are neglected for years on end, new features make their way to 
Bash years after they're pioneered in ksh. A vague "in the future"
could easily find us back here, ten years from now, still wondering when Bash 
will support Unicode in parameter names. Best we can do is make a good
case for the features we want to see integrated, and do as much as we can to at 
least reduce the _initial_ cost of the feature.


Re: RFE: Please allow unicode ID chars in identifiers

2017-06-03 Thread PePa
On 04/06/2560 01:00, George wrote:
> There's a series of trade-offs between keeping the implementation relatively 
> simple vs. supporting equivalency where the user may reasonably expect
> it.
I will personally never use non-ascii in a bash script, even though I
use unicode extensively, also in the shell.

But the fact that unicode functions are already supported does seem to
pave the way for allowing variable names in unicode. For consistency, it
should really be the same handling as function names -- I am expecting
no equivalency support in the current function name implementation, and
I am also guessing that many types of non-ascii space are also allowed
in function names already. Which does makes sense: if people want to
shoot themselves in the foot by using similar/same looking but actually
different glyphs and spaces, it would be too tiresome to try to prevent
that.

Peter



Re: Builtin read with -n0 or -N0 (nchars == 0) behaves as a read with no -n/-N argument

2017-06-03 Thread Pranav Deshpande
Hello,
Sorry for the late reply.

My solution is to change *line 294* of builtins/read.def.

Change
if (code == 0 || *intval < 0* || intval != (int)intval)

to

if (code == 0 || i*ntval <= 0* || intval != (int)intval)

Command:

./bash -c 'read -n0 <<< "abc";declare -p REPLY'

Output:
./bash: line 0: read: 0: invalid number
./bash: line 0: declare: REPLY: not found

i.e. behaviour #1.

Is this solution ok?

Regards,
Pranav


Re: RFE: Please allow unicode ID chars in identifiers

2017-06-03 Thread L A Walsh



Greg Wooledge wrote:

Here is a demonstration of the cost of what you are proposing.  In my
mail user agent, your variable shows up as L??v.

Source code with your UTF-8 identifiers would no longer even be 
READABLE  


What display/OS do you have that you can't run UTF-8 on?


   Still curious - - -> me.  :-)




Re: RFE: Please allow unicode ID chars in identifiers

2017-06-03 Thread John McKown
On Sat, Jun 3, 2017 at 4:48 PM, L A Walsh  wrote:

>
> Greg Wooledge wrote:
>>
>>> Here is a demonstration of the cost of what you are proposing.  In my
>>> mail user agent, your variable shows up as L??v.
>>>
>>> Source code with your UTF-8 identifiers would no longer even be
>>> READABLE
>>>
>>
>> What display/OS do you have that you can't run UTF-8 on?
>>
>
>Still curious - - -> me.  :-)
>
>
​OK, I did a port of BASH to an IBM "mainframe" system (IBM z) which uses
EBCDIC as it's character set, rather than ASCII or UNICODE. Granted, this
system is a very minor player to most people. OTOH, most "Fortune 500"
companies will have one or more "hidden away" doing some "legacy" work and,
now, perhaps even some mobile (smartphone, tablet) back end work. There is
another mid-range IBM system which also uses EBCDIC, but I don't know if it
has a BASH port or not.​



-- 
Windows. A funny name for a operating system that doesn't let you see
anything.

Maranatha! <><
John McKown


Re: RFE: Please allow unicode ID chars in identifiers

2017-06-03 Thread L A Walsh



John McKown wrote:
On Sat, Jun 3, 2017 at 4:48 PM, L A Walsh > wrote:



Greg Wooledge wrote:

Here is a demonstration of the cost of what you are
proposing.  In my
mail user agent, your variable shows up as L??v.

Source code with your UTF-8 identifiers would no longer
even be READABLE 



What display/OS do you have that you can't run UTF-8 on?


   Still curious - - -> me.  :-)


​OK, I did a port of BASH to an IBM "mainframe" system (IBM z) which 
uses EBCDIC as it's character set, rather than ASCII or UNICODE. 
Granted, this system is a very minor player to most people. OTOH, most 
"Fortune 500" companies will have one or more "hidden away" doing some 
"legacy" work and, now, perhaps even some mobile (smartphone, tablet) 
back end work. There is another mid-range IBM system which also uses 
EBCDIC, but I don't know if it has a BASH port or not.​

I know that perl which supports an ebcdic flavored machines, has full
UTF-8 support now -- they were one of the first -- about 10 years ago,
but no one uses ebcdic outside of ibm machines -- and those are not
consumer or home level machines.

I'm surprised you got bash ported to work with that charset though,
and certainly I can't imagine you using an ebcdic machine as a home
machineare you?



Re: RFE: Please allow unicode ID chars in identifiers

2017-06-03 Thread Peter & Kelly Passchier
On 04/06/2560 04:48, L A Walsh wrote:
>> Greg Wooledge wrote:
>>> Here is a demonstration of the cost of what you are proposing.  In my
>>> mail user agent, your variable shows up as L??v.
>>>
>>> Source code with your UTF-8 identifiers would no longer even be
>>> READABLE  
>>
>> What display/OS do you have that you can't run UTF-8 on?

So it's his mail client: reading unicode source in their old mail client
is going to be problematic for some people...

Peter




Re: Builtin read with -n0 or -N0 (nchars == 0) behaves as a read with no -n/-N argument

2017-06-03 Thread dualbus
On Sun, Jun 04, 2017 at 01:45:42AM +0530, Pranav Deshpande wrote:
[...]
> My solution is to change *line 294* of builtins/read.def.
> 
> Change
> if (code == 0 || *intval < 0* || intval != (int)intval)
> 
> to
> 
> if (code == 0 || i*ntval <= 0* || intval != (int)intval)
[...]

> Is this solution ok?

Yes. That works.


Chet went with the other option though:

  dualbus@debian:~/src/gnu/bash-build$ ./bash -c 'read -n0; echo $?; declare -p 
REPLY'
  0
  declare -- REPLY=""

You can see the change by navigating the `devel' branch of the git repository
in Savannah (commit 1110e30870a8782425067a060d89cc411b014418):

  
http://git.savannah.gnu.org/cgit/bash.git/commit/?h=devel&id=1110e30870a8782425067a060d89cc411b014418

Although there's a problem with the solution:

  dualbus@debian:~$ for sh in bash ~/src/gnu/bash-build/bash ksh93 mksh; do $sh 
-c ': | read -n 0; echo $?'; done
  1
  0
  1
  1

Since the read(2) system call doesn't take place, `read -n 0' doesn't detect
the broken pipe. IMO, it should.

-- 
Eduardo Bustamante
https://dualbus.me/



Re: Builtin read with -n0 or -N0 (nchars == 0) behaves as a read with no -n/-N argument

2017-06-03 Thread Pranav Deshpande
Is that more advantageous?

On Sun, Jun 4, 2017 at 10:46 AM, dualbus  wrote:

> On Sun, Jun 04, 2017 at 01:45:42AM +0530, Pranav Deshpande wrote:
> [...]
> > My solution is to change *line 294* of builtins/read.def.
> >
> > Change
> > if (code == 0 || *intval < 0* || intval != (int)intval)
> >
> > to
> >
> > if (code == 0 || i*ntval <= 0* || intval != (int)intval)
> [...]
>
> > Is this solution ok?
>
> Yes. That works.
>
>
> Chet went with the other option though:
>
>   dualbus@debian:~/src/gnu/bash-build$ ./bash -c 'read -n0; echo $?;
> declare -p REPLY'
>   0
>   declare -- REPLY=""
>
> You can see the change by navigating the `devel' branch of the git
> repository
> in Savannah (commit 1110e30870a8782425067a060d89cc411b014418):
>
>   http://git.savannah.gnu.org/cgit/bash.git/commit/?h=devel&id=
> 1110e30870a8782425067a060d89cc411b014418
>
> Although there's a problem with the solution:
>
>   dualbus@debian:~$ for sh in bash ~/src/gnu/bash-build/bash ksh93 mksh;
> do $sh -c ': | read -n 0; echo $?'; done
>   1
>   0
>   1
>   1
>
> Since the read(2) system call doesn't take place, `read -n 0' doesn't
> detect
> the broken pipe. IMO, it should.
>
> --
> Eduardo Bustamante
> https://dualbus.me/
>


Re: Builtin read with -n0 or -N0 (nchars == 0) behaves as a read with no -n/-N argument

2017-06-03 Thread Eduardo Bustamante
On Sun, Jun 4, 2017 at 12:16 AM, dualbus  wrote:
[...]
> Although there's a problem with the solution:
>
>   dualbus@debian:~$ for sh in bash ~/src/gnu/bash-build/bash ksh93 mksh; do 
> $sh -c ': | read -n 0; echo $?'; done
>   1
>   0
>   1
>   1
>
> Since the read(2) system call doesn't take place, `read -n 0' doesn't detect
> the broken pipe. IMO, it should.

Err, I'm clearly wrong. SIGPIPE is sent to the writer, not to the
reader. My bad.

That doesn't mean the problem is not there though. I think this is a
better test:

  dualbus@debian:~$ for sh in bash ksh93 mksh
~/src/gnu/bash-build/bash; do echo $sh $($sh -c 'exec < /; read -n 0;
echo $?' 2>&1); done
  bash bash: line 0: read: read error: 0: Is a directory 1
  ksh93 1
  mksh mksh: read: Is a directory 2
  /home/dualbus/src/gnu/bash-build/bash 0

Since bash "fakes" the read, it's not able to detect errors.

I currently have this:

dualbus@debian:~/src/gnu/bash$ git diff
diff --git a/builtins/read.def b/builtins/read.def
index 520a2b34..4e4c1b8a 100644
--- a/builtins/read.def
+++ b/builtins/read.def
@@ -362,10 +362,6 @@ read_builtin (list)
   input_string = (char *)xmalloc (size = 112); /* XXX was 128 */
   input_string[0] = '\0';

-  /* More input and options validation */
-  if (nflag == 1 && nchars == 0)
-goto assign_vars;  /* bail early if asked to read 0 chars */
-
   /* $TMOUT, if set, is the default timeout for read. */
   if (have_timeout == 0 && (e = get_string_value ("TMOUT")))
 {
@@ -381,6 +377,10 @@ read_builtin (list)

   begin_unwind_frame ("read_builtin");

+  /* We were asked to read 0 chars. Do error detection and bail out */
+  if (nflag == 1 && nchars == 0 && (retval=read(fd, NULL, 0)) < 0)
+goto handle_error;
+
 #if defined (BUFFERED_INPUT)
   if (interactive == 0 && default_buffered_input >= 0 && fd_is_bash_input (fd))
 sync_buffered_stream (default_buffered_input);
@@ -714,6 +714,7 @@ add_char:
 free (rlbuf);
 #endif

+handle_error:
   if (retval < 0)
 {
   t_errno = errno;


But it completely ignores the signal handling code down below.