Re: Unexpected delay in using arguments.

2018-08-15 Thread Bob Proulx
Bize Ma wrote:
> > but just the same it is almost always possible
> > to refactor a program into a different algorithm that avoids the need
> > to enumerate so many arguments in memory.
> 
> As you say: "almost".

I still believe that to be true.  Since you haven't shared what the
actual task is there is no way for us to propose any counter example
improvements.  So "almost" is as far as I can go.  Instead should I
say 99.44% of the time?  Since I have never come across a counter
example yet?

> Take a look at the Stéphane Chazelas example to convince yourself.

Nothing Stéphane said changed my statement at all.

It does look like bash can be more efficient with argument handling.
Since, for example, dash does it.

Bob



Re: Assignment of $* to a var removes spaces on unset IFS.

2018-08-15 Thread Greg Wooledge
This reply was sent to me without Cc-ing the list.  I have added the Cc.

On Tue, Aug 14, 2018 at 11:39:20PM -0400, Bize Ma wrote:
> On Tue, 14 Aug 2018 12:34:31, Greg Wooledge said:
> 
> > I will also repeat, once more, my advice that one should NEVER write
> > a script containing an unquoted $* or $@ expansion.
> 
> That is plainly INCORRECT, Greg.

You are incorrect.

> > It breaks in all kinds of ways in more than one shell.
> 
> That several shells do different things is a bug on those shells, not bash.

Agreed.  And the fact that IN REAL LIFE, THOSE SHELLS EXIST AND HAVE
THOSE BUGS, which are triggered by incorrect code, is a reason to write
code correctly so as not to trigger those bugs.

Or maybe you're one of those people who doesn't care about reality.

> > Just don't do it, and these problems go away.
> 
> If I want the split+glob to take effect I can do:
> 
>echo $*
> 
> 
> There is nothing wrong with that (don't claim that it change in different
> shells, that is a different issue than using split+glob in bash, go back to
> the point above about other shells if you wish).

There is absolutely definitely positively 100% certainly something wrong
with that.

Let's break your script, shall we?

Here's your script, except I'm going to represent it as a function.  Doing
it as a script would have the same effect.

glob() {
# "Return" (write to stdout, one per line) the expansions of all
# arguments as globs against the current working directory.
printf %s\\n $*
}

Will you at least agree that this is your intent, and a fair
representation of your proposed solution?  I'll take that as a "yes".

So now, you can pass SOME globs to it (properly quoted), and it will
appear to work for those globs:

wooledg:~$ glob '*.pdf'
0400_0001.pdf
11412687.pdf
11412859.pdf
[...]
Epic Web Service - PatientLookup.pdf
[...]

At this point, the naive script writer will say "Yay, it worked!"

The experienced script writer knows that it did not, in fact, "work".
It only "worked" for this one degenerate kindergarten-level example.

Let's test it again.

wooledg:~$ glob '*Web Service*'
*Web
Service*

But... but... but... the PREVIOUS glob worked!  Why didn't this one
work?

Because the script (function) does not ACTUALLY work.  It is broken.
You can't solve this problem using an unquoted $* expansion.  Not in
reality.

Some of us care about reality.

Other languages have no problem with this.  Tcl, for example, has a [glob]
command that takes a list of glob-patterns and returns a list of files
that match them.

wooledg:~$ tclsh
% puts [join [glob {*Web Service*}] \n]
Epic Web Service - PatientLookup.pdf
% puts [join [glob *.pdf] \n]
Infoblox_DNS_mgmt.pdf
prtest.pdf
PMCNoteCoverPage.pdf
[...]

Of course, it doesn't sort them, because I didn't call [lsort].  Adding
that is trivial.

Now, you tried to implement Tcl's [glob] in bash, but you did it naively
and the naive version does not work.  You took a shortcut.  That shortcut
falls off a cliff.

I'll leave the non-broken implementation as an exercise for the reader.

Now, back to the original points:

1) Certain shells have bugs in them.  These shells are in widespread use
   on real systems in real life.

2) One of those shells is bash, which makes it relevant to bug-bash.

3) Some of those bugs involve the unquoted expansions of $* and $@.

4) Even in the ABSENCE of such bugs, the use of an unquoted $* or $@
   expansion does not actually solve the problems you claim it solves.

5) Therefore, for TWO reasons, you should not use unquoted $* or $@
   in your shell scripts.

   Reason 1: it doesn't solve the problem.
   Reason 2: it sometimes breaks even worse due to shell bugs.



Re: Assignment of $* to a var removes spaces on unset IFS.

2018-08-15 Thread Ilkka Virta

On 15.8. 15:44, Greg Wooledge wrote:

glob() {
 # "Return" (write to stdout, one per line) the expansions of all
 # arguments as globs against the current working directory.
 printf %s\\n $*
}

But... but... but... the PREVIOUS glob worked!  Why didn't this one
work?


I'm sure you know what word splitting is.


I'll leave the non-broken implementation as an exercise for the reader.


$ glob() { local IFS=; printf '%s\n' $*; }
$ touch "foo bar.txt" "foo bar.pdf"
$ glob "foo bar*"
foo bar.pdf
foo bar.txt

(Well, you'd probably want 'nullglob' too, and there's the minor issue
that printf '%s\n'  prints at least one line even if there are no
arguments but I'll ignore that for now.)


Of course, in most cases, and unquoted expansion is not what one wants,
but if there's need to glob in the shell, then an unquoted expansion is
what has to be used. How IFS affects word splitting isn't just about
$* , the issue is the same even if you only have one glob in a regular
variable.

--
Ilkka Virta / itvi...@iki.fi



Re: Rational Range Interpretation for bash-5.0?

2018-08-15 Thread Ilkka Virta

On 6.8. 23:07, Chet Ramey wrote:

Hi. I am considering making bash glob expansion implement rational range
interpretation starting with bash-5.0 -- basically making globasciiranges
the default. It looks like glibc is going to do this for version 2.28 (at
least for a-z, A-Z, and 0-9), and other GNU utilities have done it for some
time. What do folks think?


I tried to think about a counterpoint, some case for where the current
(non-globasciiranges) behaviour would be useful, but I can't come up
with any. At least the part where [a-z] matches A, but not Z makes it a
bit useless.

If you're considering special-casing just those three, I'd suggest 
adding a-f and A-F too, for patterns matching hex digits.


So yeah, +1 from me.

--
Ilkka Virta / itvi...@iki.fi



Re: Unexpected delay in using arguments.

2018-08-15 Thread Chet Ramey
On 8/15/18 3:36 AM, Bob Proulx wrote:

> It does look like bash can be more efficient with argument handling.
> Since, for example, dash does it.

Yes, it just needs new primitives to do it. The existing code for managing
the saved positional parameters has been in bash since the pre-1.0 days and
is pretty much unchanged since then. I'll take a look.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Assignment of $* to a var removes spaces on unset IFS.

2018-08-15 Thread Chet Ramey
On 8/13/18 10:52 PM, Bize Ma wrote:

>  > This is a bug in bash and it should be fixed, not excused.
> 
> To which I agree. After a year, nothing else have been said about it.
> 
> It seems about time to get it solved. Or say that it won't be.

It's fixed in the current development versions, and that fix will be in
the next release. It looks like it's been fixed since April, 2017 as part
of a code cleanup to implement Posix interp 888:

http://lists.nongnu.org/archive/html/bug-bash/2017-04/msg1.html

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Assignment of $* to a var removes spaces on unset IFS.

2018-08-15 Thread Chet Ramey
On 8/14/18 8:34 AM, Greg Wooledge wrote:
> On Mon, Aug 13, 2018 at 09:36:23PM -0400, Bize Ma wrote:
>> Note that 4.4.19 is newer than what is available in
>> https://www.gnu.org/software/bash/
> 
> The current patch level of bash can be obtained by downloading the most
> recent tarball from  and then applying the
> subsequent patches from ,
> or whichever directory is appropriate for the current version.

It's always easier to download

http://git.savannah.gnu.org/cgit/bash.git/snapshot/bash-master.tar.gz

That is the head of the master branch, which is always the current
released version plus all released patches.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: "sh -a" sets the POSIXLY_CORRECT *environment* variable

2018-08-15 Thread Chet Ramey
On 8/14/18 11:50 AM, Stephane Chazelas wrote:
> Hi,
> 
> This is from
> https://unix.stackexchange.com/questions/462333/why-does-a-in-bin-sh-a-affect-sed-and-set-a-doesnt
> (original investigation by Mark Plotnick)
> 
> Though not documented, enabling the POSIX mode in bash whether
> with
> 
> - bash -o posix
> - sh
> - env SHELLOPTS=posix bash
> - set -o posix # within bash
> 
> causes it to set the value of the $POSIXLY_CORRECT shell
> variable to "y" (if it was not already set)

Yes. This behavior dates from early 1997. It was put in on request so users
could get a posix environment from the shell, since GNU utilities
understand the POSIXLY_CORRECT variable. I could improve the documentation
there, but a 20-plus-year-old feature isn't going to change.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unexpected delay in using arguments.

2018-08-15 Thread Bize Ma
2018-08-15 3:36 GMT-04:00 Bob Proulx :

> Bize Ma wrote:
>


> I still believe that to be true.


You are entitled to have an opinion, even if incorrect.


> Since you haven't shared what the
> actual task is there is no way for us to propose any counter example
> improvements.  So "almost" is as far as I can go.  Instead should I
> say 99.44% of the time?  Since I have never come across a counter
> example yet?
>

Give it time.


> > Take a look at the Stéphane Chazelas example to convince yourself.
>
> Nothing Stéphane said changed my statement at all.
>

How do you process a «list of files» without the «list of files» ?


> It does look like bash can be more efficient with argument handling.
> Since, for example, dash does it.
>

That is true.


Re: "sh -a" sets the POSIXLY_CORRECT *environment* variable

2018-08-15 Thread Chet Ramey
On 8/15/18 11:05 AM, Chet Ramey wrote:

>> causes it to set the value of the $POSIXLY_CORRECT shell
>> variable to "y" (if it was not already set)
> 
> Yes. This behavior dates from early 1997. It was put in on request so users
> could get a posix environment from the shell, since GNU utilities
> understand the POSIXLY_CORRECT variable. I could improve the documentation
> there, but a 20-plus-year-old feature isn't going to change.

This is probably less clear than it should be. The `standard' GNU way to
indicate that an application is, or should be, in posix mode is to set
POSIXLY_CORRECT. Bash had a couple of different ways to do it (--posix,
-o posix), and the request was that I add the standard way to indicate
posix mode. As a side effect, users could then export the variable to get
a POSIX environment using the GNU utilities. Anyway, it all happened long
ago.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Assignment of $* to a var removes spaces on unset IFS.

2018-08-15 Thread Bize Ma
2018-08-15 8:44 GMT-04:00 Greg Wooledge :

> This reply was sent to me without Cc-ing the list.  I have added the Cc.
>
> On Tue, Aug 14, 2018 at 11:39:20PM -0400, Bize Ma wrote:
> > On Tue, 14 Aug 2018 12:34:31, Greg Wooledge said:
> >
> > > I will also repeat, once more, my advice that one should NEVER write
> > > a script containing an unquoted $* or $@ expansion.
> >
> > That is plainly INCORRECT, Greg.
>
> You are incorrect.


I am always incorrect, I like to be so.
But  we are discussing an issue, not my personal problems.


> > > It breaks in all kinds of ways in more than one shell.
> >
> > That several shells do different things is a bug on those shells, not
> bash.
>
> Agreed.


Excellent, we agree.


> And the fact that IN REAL LIFE, THOSE SHELLS EXIST AND HAVE
> THOSE BUGS, which are triggered by incorrect code


No, sorry, but bugs are triggered by perfectly good code, that's why they
are called bugs.


> , is a reason to write
> code correctly so as not to trigger those bugs.
>

Writing code to *work around* bugs is *not* the correct solution.
It is only a way to *perpetuate* those bugs.
The correct solution is to resolve the bugs.

But this is just a smoke curtain. There are no bugs about $@ and $* in Bash.

Or maybe you're one of those people who doesn't care about reality.
>

Sometimes I don't , most of the time when I sleep, sometimes when I day
dream.


> > > Just don't do it, and these problems go away.
> >
> > If I want the split+glob to take effect I can do:
> >
> >echo $*
> >
> >
> > There is nothing wrong with that (don't claim that it change in different
> > shells, that is a different issue than using split+glob in bash, go back
> to
> > the point above about other shells if you wish).
>
> There is absolutely definitely positively 100% certainly something wrong
> with that.
>

Ah, yes, that's correct, thanks, I should have used printf, not echo.
But you corrected it, thanks again.

Let's break your script, shall we?
>
> Here's your script, except I'm going to represent it as a function.  Doing
> it as a script would have the same effect.
>
> glob() {
> # "Return" (write to stdout, one per line) the expansions of all
> # arguments as globs against the current working directory.
> printf %s\\n $*
> }
>
> Will you at least agree that this is your intent,


No, that is *not* my intent.
You can not implement a glob function when *both: split and glob* are in
effect.

If you want a glob function, stop the split, set IFS to null:

$ cat script
#!/bin/bash
glob() ( IFS=; printf %s\\n $* )
glob '*Web Service*'

$ ./script
Epic Web Service - PatientLookup.pdf


and a fair
> representation of your proposed solution?  I'll take that as a "yes".
>

No it is not a fair representation of anything I said.
Only of a twist that you want to present here.

So now, you can pass SOME globs to it (properly quoted), and it will
> appear to work for those globs:
>

But it will fail to work on the next example as you want to demonstrate,
therefore you have not coded correctly to workaround the shells bugs
(your own words).

Some of us care about reality.
>

Yes, when I am awake.

Other languages have no problem with this.  Tcl,

(...)

But we are in bash, are we not?


Now, you tried to implement Tcl's [glob] in bash, but you did it naively
> and the naive version does not work.  You took a shortcut.  That shortcut
> falls off a cliff.
>

No, *you* did. You implemented a flawed glob function.


> I'll leave the non-broken implementation as an exercise for the reader.
>

You are not able to do it?


> Now, back to the original points:
>
> 1) Certain shells have bugs in them.  These shells are in widespread use
>on real systems in real life.
>
> 2) One of those shells is bash, which makes it relevant to bug-bash.
>
> 3) Some of those bugs involve the unquoted expansions of $* and $@.
>

Not in bash.


> 4) Even in the ABSENCE of such bugs, the use of an unquoted $* or $@
>expansion does not actually solve the problems you claim it solves.
>

I claimed nothing. I presented a valid use. Don't put words in my mouth.


> 5) Therefore, for TWO reasons, you should not use unquoted $* or $@
>in your shell scripts.
>
>Reason 1: it doesn't solve the problem.
>Reason 2: it sometimes breaks even worse due to shell bugs.
>

There is no therefore if the premises are wrong.


Re: "sh -a" sets the POSIXLY_CORRECT *environment* variable

2018-08-15 Thread Stephane Chazelas
2018-08-15 11:05:06 -0400, Chet Ramey:
> On 8/14/18 11:50 AM, Stephane Chazelas wrote:
> > Hi,
> > 
> > This is from
> > https://unix.stackexchange.com/questions/462333/why-does-a-in-bin-sh-a-affect-sed-and-set-a-doesnt
> > (original investigation by Mark Plotnick)
> > 
> > Though not documented, enabling the POSIX mode in bash whether
> > with
> > 
> > - bash -o posix
> > - sh
> > - env SHELLOPTS=posix bash
> > - set -o posix # within bash
> > 
> > causes it to set the value of the $POSIXLY_CORRECT shell
> > variable to "y" (if it was not already set)
> 
> Yes. This behavior dates from early 1997. It was put in on request so users
> could get a posix environment from the shell, since GNU utilities
> understand the POSIXLY_CORRECT variable. I could improve the documentation
> there, but a 20-plus-year-old feature isn't going to change.
[...]

Maybe there was a misunderstanding.

It's fine that bash enters POSIX mode when $POSIXLY_CORRECT is
set. IOW, it's fine that bash enters POSIX mode when the users
request it.

The problem I'm trying to raise is about the reverse behaviour:
that bash takes upon itself to request POSIX mode of all other
utilities when it itself enters POSIX mode, that it sets
$POSIXLY_CORRECT when it enters POSIX mode.

The problem would show up mostly with

#! /bin/sh -a

scripts on systems where sh is bash, and where the script relies
on non-POSIX behaviours of some GNU utilities.

I can't see how that could be seen as a feature, I can't imagine
anyone wanting that. If one wants to get a POSIX environment on
a GNU system, they would do:

export POSIXLY_CORRECT=y

(and yes, it's good that bash does honour it) (and yes, it's not
a very good interface as that means it can break scripts called
within that environment and that rely on non-POSIX behaviour of
some utilities, but that's beside the point being made here).

If one wants a POSIX shell, they can use

#! /bin/sh -

Or:

#! /usr/bin/env bash
set -o posix

if they can't rely on /bin/sh being a POSIX shell.

But that should not affect the behaviour of all other utilities
called within the script.

Without "-a", it's OK as the $POSIXLY_CORRECT variable is not
exported (it's not very useful that bash sets it (especially
considering it's not documented), but at least it's harmless).

-- 
Steohane