[bash-devel] Attempting to cd to the empty directory operand where ./ does not exist aborts

2024-01-29 Thread Kerin Millar
Hi,

This is with commit 138f3cc3591163d18ee4b6390ecd6894d5d16977 running on Linux 
6.7.2 and glibc-2.38.

$ mkdir -p ~/dir && cd ~/dir && rmdir ~/dir
$ cd ""
bash: cd: : No such file or directory

So far, so good. Now let's try to cd from a non-interactive instance.

$ bash -c 'declare -p BASH_VERSION; cd ""'
shell-init: error retrieving current directory: getcwd: cannot access parent 
directories: No such file or directory
declare -- BASH_VERSION="5.3.0(4)-devel"
chdir: error retrieving current directory: getcwd: cannot access parent 
directories: No such file or directory
malloc: ./cd.def:619: assertion botched
malloc: 0x56be197137b0: allocated: last allocated from pathcanon.c:109
free: start and end chunk sizes differ
Aborting...Aborted (core dumped)

And, again, with bash 5.2.

$ /bin/bash -c 'declare -p BASH_VERSION; cd ""'
shell-init: error retrieving current directory: getcwd: cannot access parent 
directories: No such file or directory
declare -- BASH_VERSION="5.2.26(1)-release"
chdir: error retrieving current directory: getcwd: cannot access parent 
directories: No such file or directory

-- 
Kerin Millar



Re: wait -n misses signaled subprocess

2024-01-29 Thread Robert Elz
Date:Sun, 28 Jan 2024 18:21:42 -0500
From:Chet Ramey 
Message-ID:  <3347f790-529b-4bee-91fd-de39bed3f...@case.edu>

  | because `wait -n' doesn't look in the table
  | of saved statuses -- its job is to wait for `new' jobs to terminate, not
  | ones that have already been removed from the table.

That's very interesting, and most unexpected information.

I always wondered why the option was 'n' - I would have made it
be 'a' probably, as a shorthand for "any" - but then I decided
that perhaps 'n' was a better choice, as "a" could also be "all",
the option name would not be providing any real clue at all, so
I assumed you'd been ultra clever and used 'n' as the next char
in "any" and also as it can be read like the first part of "en" "ee"
(which you need to say out loud, or at least in your head, to get the
effect of).

It never even dawned on me that 'n' might mean "new", as in only
processes that hadn't terminated at the time the wait -n was done,
as that's essentially a recipe for script madness, race conditions
galore, as the one reported here.

What wait(1) needed was an alternative to its normal "all" semantic,
just "wait" waits for every background job to terminate, what's needed
is a way to wait for any one of them (whether already terminated, but
not previously waited for or not).   That's what I always assumed
wait -n was doing, and how I implemented it in the NetBSD shell.

Similarly "wait pid1 pid2 pid3" waits for all 3 of those to
terminate, so "wait -n pid1 pid2 pid3" should wait for any one
of them - already terminated or not.   When there's just one pid
in the list, the -n option always seemed useless to me, there
ought be no difference between "wait pid" and "wait -n pid"
(as in wait for all of one, and wait for any of one, mean the
same thing, wait for that one), but obviously should still be
supported for consistency.   To think that it might be interpreted
as "wait for a new process "pid" to terminate, ignoring the one that
just finished a few milliseconds ago" is simply astounding, completely
unbelievable.

And from what I have seen of the other comments, several from
long term & dedicated bash users, it is just as astounding to
them as well.   Please treat this as a bug, and fix it.  Quickly.

kre



Re: wait -n misses signaled subprocess

2024-01-29 Thread Andreas Schwab
On Jan 29 2024, Robert Elz wrote:

> I always wondered why the option was 'n'

n = next?

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



Re: wait -n misses signaled subprocess

2024-01-29 Thread Robert Elz
Date:Mon, 29 Jan 2024 13:54:10 +0100
From:Andreas Schwab 
Message-ID:  

  | n = next?

That would be a reasonable interpretation, I guess, but
unfortunately not one which helps the current question,
as it doesn't answer "next what?"   It could be "the next
of these processes which terminates" (like the "new"
interpretation) or "the next of these processes that has
a status available" (like the "any" interpretation).

While I'm here, I will also mention that the bash man page
section for wait(1) does say "any" in one place, and equivalent
(but better) wording in another ("a single job"), but never
mentions "new" anywhere.

Further in both the -n and no -n cases, the wait utility is
stated to "wait for" (whatever is appropriate for the args given)
hence the operation should be assumed to be the same in both
cases, either an actual pause is required in both (until some
appropriate process changes status) or is not required in either
(if such a process has already terminated and is waiting for
shell level reaping).

Note that processes that have already been reported (via wait,
or jobs, or the prompt level jobs lookalike) have already been
reported, so if any of that had happened wait isn't expected to
be able to fetch status from them again.

kre



About `M-C-e` expand result `'` failed

2024-01-29 Thread A4-Tacks
Subject: About `M-C-e` expand result `'` failed

Configuration Information [Automatically generated, do not change]:
Machine: aarch64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -march=armv8-a -O2 -pipe -fstack-protector-strong -fno-plt 
-fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security   
  -fstack-clash-protection 
-DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin' 
-DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc' 
-DSYS_BASH_LOGOUT='/etc/bash.bash_logout' -DNON_INTERACTIVE_LOGIN_SHELLS
uname output: Linux localhost 4.14.116 #1 SMP PREEMPT Tue Mar 22 15:13:10 CST 
2022 aarch64 GNU/Linux
Machine Type: aarch64-unknown-linux-gnu

Bash Version: 5.2
Patch Level: 21
Release Status: release

Description:
 input M-C-e (shell-expand-line),
 expand the result containing `'` (single quotation mark) failed.

Repeat-By:
 ```bash
 # echo "$(echo $'ab\n\'cd\nef')"
 ab
 'cd
 ef
 # echo "$(echo $'ab\n\'cd\nef')"  # input M-C-e
 # echo "$(echo $'ab\n\'cd\nef')"bash: bad substitution: no closing `)' in 
echo "$(echo $'ab\n\'cd\nef')"
 ```
 
 expected:
 ```bash
 # echo $'ab\n\'cd\nef'
 ```
 or
 ```bash
 # echo ab
 'cd
 ef
 ```





Re: wait -n misses signaled subprocess

2024-01-29 Thread Greg Wooledge
On Mon, Jan 29, 2024 at 08:52:37PM +0700, Robert Elz wrote:
> Date:Mon, 29 Jan 2024 13:54:10 +0100
> From:Andreas Schwab 
> Message-ID:  
> 
>   | n = next?

This was my assumption as well.

> That would be a reasonable interpretation, I guess, but
> unfortunately not one which helps the current question,
> as it doesn't answer "next what?"

For the record, with bash 5.2:


unicorn:~$ cat foo
#!/bin/bash

sleep 1 &
sleep 37 &
sleep 2
time wait -n
unicorn:~$ ./foo
real 0.001  user 0.000  sys 0.001
unicorn:~$ ps
PID TTY  TIME CMD
   1152 pts/300:00:00 bash
 542197 pts/300:00:00 sleep
 542200 pts/300:00:00 ps
unicorn:~$ ps -fp 542197
UID  PIDPPID  C STIME TTY  TIME CMD
greg  542197   1  0 08:59 pts/300:00:00 sleep 37


wait -n *does* appear to acknowledge the already-terminated child process,
despite a second child process still being active.



Re: [bash-devel] Attempting to cd to the empty directory operand where ./ does not exist aborts

2024-01-29 Thread Chet Ramey

On 1/29/24 5:51 AM, Kerin Millar wrote:


$ bash -c 'declare -p BASH_VERSION; cd ""'
shell-init: error retrieving current directory: getcwd: cannot access parent 
directories: No such file or directory
declare -- BASH_VERSION="5.3.0(4)-devel"
chdir: error retrieving current directory: getcwd: cannot access parent 
directories: No such file or directory
malloc: ./cd.def:619: assertion botched
malloc: 0x56be197137b0: allocated: last allocated from pathcanon.c:109
free: start and end chunk sizes differ
Aborting...Aborted (core dumped)


Thanks for the report. `cd' should not try to canonicalize empty pathnames;
it should just see what chdir(2) returns and go from there.


And, again, with bash 5.2.


I suspect your version of bash-5.2 is built without the bash malloc for
some reason, since I can reproduce this with bash-5.2.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [bash-devel] Attempting to cd to the empty directory operand where ./ does not exist aborts

2024-01-29 Thread Kerin Millar
On Mon, 29 Jan 2024 10:30:43 -0500
Chet Ramey  wrote:

> On 1/29/24 5:51 AM, Kerin Millar wrote:
> 
> > $ bash -c 'declare -p BASH_VERSION; cd ""'
> > shell-init: error retrieving current directory: getcwd: cannot access 
> > parent directories: No such file or directory
> > declare -- BASH_VERSION="5.3.0(4)-devel"
> > chdir: error retrieving current directory: getcwd: cannot access parent 
> > directories: No such file or directory
> > malloc: ./cd.def:619: assertion botched
> > malloc: 0x56be197137b0: allocated: last allocated from pathcanon.c:109
> > free: start and end chunk sizes differ
> > Aborting...Aborted (core dumped)
> 
> Thanks for the report. `cd' should not try to canonicalize empty pathnames;
> it should just see what chdir(2) returns and go from there.
> 
> > And, again, with bash 5.2.
> 
> I suspect your version of bash-5.2 is built without the bash malloc for
> some reason, since I can reproduce this with bash-5.2.

You are quite right.

https://gitlab.archlinux.org/archlinux/packaging/packages/bash/-/blob/c5dfc21dfe74524ca5766af83924cc8c3e3f1a0a/PKGBUILD#L60

-- 
Kerin Millar



Re: wait -n misses signaled subprocess

2024-01-29 Thread Chet Ramey

On 1/28/24 7:19 PM, Steven Pelley wrote:

Thank you Chet for your thorough reply.

You make a few comments about differences in output (stderr for not
finding a job, notifications for jobs terminating) and in all cases I
believe you are correct.  Let's assume job control is disabled.


OK, but remember:

"When job control isn't enabled (usually in a non-interactive shell), the
shell doesn't notify users about terminated background jobs, but it still
removes dead jobs from the jobs list before reading the next command. It
cleans the jobs table of notified jobs at other times, too, to move dead
jobs out of the jobs list and keep it a manageable size."

These exit statuses are still available to `wait pid' (but not `wait -n
pid') as POSIX specfies.





I expect the line ending (BUG) to indicate a return code of 143.


It might, if `wait -n' looked for already-notified jobs in the table of
saved exit statuses, but it doesn't. Should it, even if the user has
already been notified of the status of that job?


When job control is disabled I get this output for the same test (just
for consistent reference):


The results are consistent with what I described previously.



There's no user notification of the job terminating because job
control is disabled.  The "wait -n" returning 127 is the first
opportunity the shell might have to notify the user of the job. 


So should the shell require the user to periodically run `wait' in a non-
interactive shell without job control to clean dead jobs out of the jobs
list? I don't think so.


In
this context I think that "even if the user has already been notified
of the status of that job" doesn't apply -- the user hasn't been
notified of the job terminating. 


See above.


Even so, this behavior differs from a similar example but where the
first job ends successfully, or at least without being killed by a
signal.  It still terminates prior to calling "wait -n" (this is from
Jan 24 but I'll post again to keep everything in a linear thread).
echo "TEST: EXIT 0 PRIOR TO wait -n @${SECONDS}"
{ sleep 1; echo "child finishing @${SECONDS}"; exit 1; } &
pid=$!
echo "child proc $pid @${SECONDS}"
sleep 2
wait -n $pid
echo "wait -n $pid return code $? @${SECONDS}"

output (no job control):
TEST: EXIT 0 PRIOR TO wait -n @0
child proc 2779 @0
child finishing @1
wait -n 2779 return code 1 @2

It does look in the table of saved exit statuses, returning 1.


It doesn't. In this case, the code path it follows marks the job as dead
but doesn't mark it as notified (since it exited normally), so it's still
in the jobs list when `wait -n' is called, and available for returning.
That's probably a bug there.



I think the sticking point is the notion of the user being notified of
the status of a job. 


I think it's whether or not `wait -n pid' behaves the same as `wait pid'
and looks in the list of saved exit statuses if the pid isn't found in a
job in the jobs list.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: wait -n misses signaled subprocess

2024-01-29 Thread Chet Ramey

On 1/29/24 7:12 AM, Robert Elz wrote:

 Date:Sun, 28 Jan 2024 18:21:42 -0500
 From:Chet Ramey 
 Message-ID:  <3347f790-529b-4bee-91fd-de39bed3f...@case.edu>

   | because `wait -n' doesn't look in the table
   | of saved statuses -- its job is to wait for `new' jobs to terminate, not
   | ones that have already been removed from the table.

That's very interesting, and most unexpected information.

I always wondered why the option was 'n' - I would have made it
be 'a' probably, as a shorthand for "any" - but then I decided
that perhaps 'n' was a better choice, as "a" could also be "all",
the option name would not be providing any real clue at all, so
I assumed you'd been ultra clever and used 'n' as the next char
in "any" and also as it can be read like the first part of "en" "ee"
(which you need to say out loud, or at least in your head, to get the
effect of).


You should have. You told me about your implementation using `-n' in
10/2017, long before I implemented it (4/2020).


It never even dawned on me that 'n' might mean "new", as in only
processes that hadn't terminated at the time the wait -n was done,
as that's essentially a recipe for script madness, race conditions
galore, as the one reported here.


What does `wait -n' without job arguments mean?


What wait(1) needed was an alternative to its normal "all" semantic,
just "wait" waits for every background job to terminate, what's needed
is a way to wait for any one of them (whether already terminated, but
not previously waited for or not).   That's what I always assumed
wait -n was doing, and how I implemented it in the NetBSD shell.


OK. Since wait without options can already wait for the same pid multiple
times, the -n option has to bring some new functionality here.




Similarly "wait pid1 pid2 pid3" waits for all 3 of those to
terminate, so "wait -n pid1 pid2 pid3" should wait for any one
of them - already terminated or not. 


As long as it's still in the jobs list.



 When there's just one pid
in the list, the -n option always seemed useless to me, there
ought be no difference between "wait pid" and "wait -n pid"
(as in wait for all of one, and wait for any of one, mean the
same thing, wait for that one), but obviously should still be
supported for consistency. 


OK. We can agree there shouldn't be any difference between `wait pid'
and `wait -n pid'.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: wait -n misses signaled subprocess

2024-01-29 Thread Chet Ramey

On 1/29/24 12:07 PM, Chet Ramey wrote:

On 1/29/24 7:12 AM, Robert Elz wrote:

 Date:    Sun, 28 Jan 2024 18:21:42 -0500
 From:    Chet Ramey 
 Message-ID:  <3347f790-529b-4bee-91fd-de39bed3f...@case.edu>

   | because `wait -n' doesn't look in the table
   | of saved statuses -- its job is to wait for `new' jobs to terminate, 
not

   | ones that have already been removed from the table.

That's very interesting, and most unexpected information.

I always wondered why the option was 'n' - I would have made it
be 'a' probably, as a shorthand for "any" - but then I decided
that perhaps 'n' was a better choice, as "a" could also be "all",
the option name would not be providing any real clue at all, so
I assumed you'd been ultra clever and used 'n' as the next char
in "any" and also as it can be read like the first part of "en" "ee"
(which you need to say out loud, or at least in your head, to get the
effect of).


You should have. You told me about your implementation using `-n' in
10/2017, long before I implemented it (4/2020).


Sorry, this is my mistake. That was a different feature. Bash implemented
`wait -n' first.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: wait -n misses signaled subprocess

2024-01-29 Thread Chet Ramey

On 1/29/24 12:33 PM, Chet Ramey wrote:


You should have. You told me about your implementation using `-n' in
10/2017, long before I implemented it (4/2020).


Sorry, this is my mistake. That was a different feature. Bash implemented
`wait -n' first.


For those wondering, the `different feature' was having `wait -n' pay
attention to its pid/job arguments.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: wait -n misses signaled subprocess

2024-01-29 Thread Chet Ramey

On 1/29/24 7:54 AM, Andreas Schwab wrote:

On Jan 29 2024, Robert Elz wrote:


I always wondered why the option was 'n'


n = next?


Yes: the original implementation polled the non-terminated background jobs
and returned when one of them exited.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: wait -n misses signaled subprocess

2024-01-29 Thread Robert Elz
Date:Mon, 29 Jan 2024 12:07:53 -0500
From:Chet Ramey 
Message-ID:  

  | What does `wait -n' without job arguments mean?

Find, or if there are none already, wait*(2) for, a process (job technically)
that has changed state (terminated in POSIX, and one day in the NetBSD
shell, that difference isn't relevant here) and return its status.
If there's already a terminated job (job which has changed status in bash)
then no wait type sys call gets performed (that already happened).

It also returns the status of that process, rather than simple "0" which
a bare "wait" does (and with the appropriate arg, tells you which process
it was).

  | OK. Since wait without options can already wait for the same pid multiple
  | times, the -n option has to bring some new functionality here.

Yes, without args, it waits until all listed arg processes (jobs) are
finished (or changed state) and returns the status of the last.   With -n
it waits for any one of them, just as the bash man page says it will.
The "any one" (vs "all") is the new functionality.

  | As long as it's still in the jobs list.

Yes, of course - the final para of my message covered that case.

  | OK. We can agree there shouldn't be any difference between `wait pid'
  | and `wait -n pid'.

Yes, but just because that's a degenerate case of the more general commands,
which happens in each case to devolve into the same thing.

And from a different message:

chet.ra...@case.edu said:
  | So should the shell require the user to periodically run `wait' in a non-
  | interactive shell without job control to clean dead jobs out of the jobs
  | list? I don't think so. 

I do.   wait or jobs ("jobs >/dev/null" is a nice simple clean up, without
the potential hang waiting for things to terminate that the wait utility
imposes).   A new option to wait(1) (either a simple one, perhaps -t, to
only wait for already terminated jobs, or a timeout, where 0 indicates never
to wait at all (ie: don't do the wait sys call) which would be a more
general, but more costly, mechanism).   But as long as it is just a matter
of cleaning up, and jobs works for that, I don't currently see the need.

Of course, you're also allowed to dump processes from the lists if there
get to be too many of them, but on modern systems, it really should be
possible to retain hundreds, if not thousands, without any real problem.

And of course, you're not required to retain status of any job if there's
no way that the script can request it - but determining that these days is
difficult.  It used to be easy in the Sys V/POSIX model where if $! wasn't
saved, then there was no way for the script to request the status, as it
couldn't (reasonably - parsing job trees from ps output doesn't count) find
out the pid to wait for (and simple "wait" never returns any status).

These days, with the jobs command available, a script could do
pids=$(jobs -l | code to parse the output and print the pids)
and determine what it can wait for that way (the code isn't difficult)
- and it can also wait on %1 %2 ... without having any idea what the pids
might be, so in practice adding the (non-trivial) code to monitor references
to $! isn't worth the bother (IMO).

It's also a bit unusual for non-interactive code to run lots of async jobs
without waiting for results - doing that is a sure way to run into the
"max user processes" limit, and have things start failing.   If there are
less than that, then having the shell retain the info until the script
terminates isn't really a very big cost, should the script not bother to
ever clean up.

  | I think it's whether or not `wait -n pid' behaves the same as `wait pid' and
  | looks in the list of saved exit statuses if the pid isn't found in a job in
  | the jobs list. 

We have it simpler than that, there's just one list, which serves both
purposes.  Makes things easier I believe (in all three of: shell code, shell
doc, and user understanding), even if it does consume a few more bytes for
a little longer than is really needed (jobs needs the command strings, so
they can be printed, wait doesn't, so retaining that is an extra cost ... not
one large enough for anyone to have ever noticed though).

kre