[bash-devel] Attempting to cd to the empty directory operand where ./ does not exist aborts
Hi, This is with commit 138f3cc3591163d18ee4b6390ecd6894d5d16977 running on Linux 6.7.2 and glibc-2.38. $ mkdir -p ~/dir && cd ~/dir && rmdir ~/dir $ cd "" bash: cd: : No such file or directory So far, so good. Now let's try to cd from a non-interactive instance. $ bash -c 'declare -p BASH_VERSION; cd ""' shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory declare -- BASH_VERSION="5.3.0(4)-devel" chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory malloc: ./cd.def:619: assertion botched malloc: 0x56be197137b0: allocated: last allocated from pathcanon.c:109 free: start and end chunk sizes differ Aborting...Aborted (core dumped) And, again, with bash 5.2. $ /bin/bash -c 'declare -p BASH_VERSION; cd ""' shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory declare -- BASH_VERSION="5.2.26(1)-release" chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory -- Kerin Millar
Re: wait -n misses signaled subprocess
Date:Sun, 28 Jan 2024 18:21:42 -0500 From:Chet Ramey Message-ID: <3347f790-529b-4bee-91fd-de39bed3f...@case.edu> | because `wait -n' doesn't look in the table | of saved statuses -- its job is to wait for `new' jobs to terminate, not | ones that have already been removed from the table. That's very interesting, and most unexpected information. I always wondered why the option was 'n' - I would have made it be 'a' probably, as a shorthand for "any" - but then I decided that perhaps 'n' was a better choice, as "a" could also be "all", the option name would not be providing any real clue at all, so I assumed you'd been ultra clever and used 'n' as the next char in "any" and also as it can be read like the first part of "en" "ee" (which you need to say out loud, or at least in your head, to get the effect of). It never even dawned on me that 'n' might mean "new", as in only processes that hadn't terminated at the time the wait -n was done, as that's essentially a recipe for script madness, race conditions galore, as the one reported here. What wait(1) needed was an alternative to its normal "all" semantic, just "wait" waits for every background job to terminate, what's needed is a way to wait for any one of them (whether already terminated, but not previously waited for or not). That's what I always assumed wait -n was doing, and how I implemented it in the NetBSD shell. Similarly "wait pid1 pid2 pid3" waits for all 3 of those to terminate, so "wait -n pid1 pid2 pid3" should wait for any one of them - already terminated or not. When there's just one pid in the list, the -n option always seemed useless to me, there ought be no difference between "wait pid" and "wait -n pid" (as in wait for all of one, and wait for any of one, mean the same thing, wait for that one), but obviously should still be supported for consistency. To think that it might be interpreted as "wait for a new process "pid" to terminate, ignoring the one that just finished a few milliseconds ago" is simply astounding, completely unbelievable. And from what I have seen of the other comments, several from long term & dedicated bash users, it is just as astounding to them as well. Please treat this as a bug, and fix it. Quickly. kre
Re: wait -n misses signaled subprocess
On Jan 29 2024, Robert Elz wrote: > I always wondered why the option was 'n' n = next? -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: wait -n misses signaled subprocess
Date:Mon, 29 Jan 2024 13:54:10 +0100 From:Andreas Schwab Message-ID: | n = next? That would be a reasonable interpretation, I guess, but unfortunately not one which helps the current question, as it doesn't answer "next what?" It could be "the next of these processes which terminates" (like the "new" interpretation) or "the next of these processes that has a status available" (like the "any" interpretation). While I'm here, I will also mention that the bash man page section for wait(1) does say "any" in one place, and equivalent (but better) wording in another ("a single job"), but never mentions "new" anywhere. Further in both the -n and no -n cases, the wait utility is stated to "wait for" (whatever is appropriate for the args given) hence the operation should be assumed to be the same in both cases, either an actual pause is required in both (until some appropriate process changes status) or is not required in either (if such a process has already terminated and is waiting for shell level reaping). Note that processes that have already been reported (via wait, or jobs, or the prompt level jobs lookalike) have already been reported, so if any of that had happened wait isn't expected to be able to fetch status from them again. kre
About `M-C-e` expand result `'` failed
Subject: About `M-C-e` expand result `'` failed Configuration Information [Automatically generated, do not change]: Machine: aarch64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -march=armv8-a -O2 -pipe -fstack-protector-strong -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin' -DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc' -DSYS_BASH_LOGOUT='/etc/bash.bash_logout' -DNON_INTERACTIVE_LOGIN_SHELLS uname output: Linux localhost 4.14.116 #1 SMP PREEMPT Tue Mar 22 15:13:10 CST 2022 aarch64 GNU/Linux Machine Type: aarch64-unknown-linux-gnu Bash Version: 5.2 Patch Level: 21 Release Status: release Description: input M-C-e (shell-expand-line), expand the result containing `'` (single quotation mark) failed. Repeat-By: ```bash # echo "$(echo $'ab\n\'cd\nef')" ab 'cd ef # echo "$(echo $'ab\n\'cd\nef')" # input M-C-e # echo "$(echo $'ab\n\'cd\nef')"bash: bad substitution: no closing `)' in echo "$(echo $'ab\n\'cd\nef')" ``` expected: ```bash # echo $'ab\n\'cd\nef' ``` or ```bash # echo ab 'cd ef ```
Re: wait -n misses signaled subprocess
On Mon, Jan 29, 2024 at 08:52:37PM +0700, Robert Elz wrote: > Date:Mon, 29 Jan 2024 13:54:10 +0100 > From:Andreas Schwab > Message-ID: > > | n = next? This was my assumption as well. > That would be a reasonable interpretation, I guess, but > unfortunately not one which helps the current question, > as it doesn't answer "next what?" For the record, with bash 5.2: unicorn:~$ cat foo #!/bin/bash sleep 1 & sleep 37 & sleep 2 time wait -n unicorn:~$ ./foo real 0.001 user 0.000 sys 0.001 unicorn:~$ ps PID TTY TIME CMD 1152 pts/300:00:00 bash 542197 pts/300:00:00 sleep 542200 pts/300:00:00 ps unicorn:~$ ps -fp 542197 UID PIDPPID C STIME TTY TIME CMD greg 542197 1 0 08:59 pts/300:00:00 sleep 37 wait -n *does* appear to acknowledge the already-terminated child process, despite a second child process still being active.
Re: [bash-devel] Attempting to cd to the empty directory operand where ./ does not exist aborts
On 1/29/24 5:51 AM, Kerin Millar wrote: $ bash -c 'declare -p BASH_VERSION; cd ""' shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory declare -- BASH_VERSION="5.3.0(4)-devel" chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory malloc: ./cd.def:619: assertion botched malloc: 0x56be197137b0: allocated: last allocated from pathcanon.c:109 free: start and end chunk sizes differ Aborting...Aborted (core dumped) Thanks for the report. `cd' should not try to canonicalize empty pathnames; it should just see what chdir(2) returns and go from there. And, again, with bash 5.2. I suspect your version of bash-5.2 is built without the bash malloc for some reason, since I can reproduce this with bash-5.2. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [bash-devel] Attempting to cd to the empty directory operand where ./ does not exist aborts
On Mon, 29 Jan 2024 10:30:43 -0500 Chet Ramey wrote: > On 1/29/24 5:51 AM, Kerin Millar wrote: > > > $ bash -c 'declare -p BASH_VERSION; cd ""' > > shell-init: error retrieving current directory: getcwd: cannot access > > parent directories: No such file or directory > > declare -- BASH_VERSION="5.3.0(4)-devel" > > chdir: error retrieving current directory: getcwd: cannot access parent > > directories: No such file or directory > > malloc: ./cd.def:619: assertion botched > > malloc: 0x56be197137b0: allocated: last allocated from pathcanon.c:109 > > free: start and end chunk sizes differ > > Aborting...Aborted (core dumped) > > Thanks for the report. `cd' should not try to canonicalize empty pathnames; > it should just see what chdir(2) returns and go from there. > > > And, again, with bash 5.2. > > I suspect your version of bash-5.2 is built without the bash malloc for > some reason, since I can reproduce this with bash-5.2. You are quite right. https://gitlab.archlinux.org/archlinux/packaging/packages/bash/-/blob/c5dfc21dfe74524ca5766af83924cc8c3e3f1a0a/PKGBUILD#L60 -- Kerin Millar
Re: wait -n misses signaled subprocess
On 1/28/24 7:19 PM, Steven Pelley wrote: Thank you Chet for your thorough reply. You make a few comments about differences in output (stderr for not finding a job, notifications for jobs terminating) and in all cases I believe you are correct. Let's assume job control is disabled. OK, but remember: "When job control isn't enabled (usually in a non-interactive shell), the shell doesn't notify users about terminated background jobs, but it still removes dead jobs from the jobs list before reading the next command. It cleans the jobs table of notified jobs at other times, too, to move dead jobs out of the jobs list and keep it a manageable size." These exit statuses are still available to `wait pid' (but not `wait -n pid') as POSIX specfies. I expect the line ending (BUG) to indicate a return code of 143. It might, if `wait -n' looked for already-notified jobs in the table of saved exit statuses, but it doesn't. Should it, even if the user has already been notified of the status of that job? When job control is disabled I get this output for the same test (just for consistent reference): The results are consistent with what I described previously. There's no user notification of the job terminating because job control is disabled. The "wait -n" returning 127 is the first opportunity the shell might have to notify the user of the job. So should the shell require the user to periodically run `wait' in a non- interactive shell without job control to clean dead jobs out of the jobs list? I don't think so. In this context I think that "even if the user has already been notified of the status of that job" doesn't apply -- the user hasn't been notified of the job terminating. See above. Even so, this behavior differs from a similar example but where the first job ends successfully, or at least without being killed by a signal. It still terminates prior to calling "wait -n" (this is from Jan 24 but I'll post again to keep everything in a linear thread). echo "TEST: EXIT 0 PRIOR TO wait -n @${SECONDS}" { sleep 1; echo "child finishing @${SECONDS}"; exit 1; } & pid=$! echo "child proc $pid @${SECONDS}" sleep 2 wait -n $pid echo "wait -n $pid return code $? @${SECONDS}" output (no job control): TEST: EXIT 0 PRIOR TO wait -n @0 child proc 2779 @0 child finishing @1 wait -n 2779 return code 1 @2 It does look in the table of saved exit statuses, returning 1. It doesn't. In this case, the code path it follows marks the job as dead but doesn't mark it as notified (since it exited normally), so it's still in the jobs list when `wait -n' is called, and available for returning. That's probably a bug there. I think the sticking point is the notion of the user being notified of the status of a job. I think it's whether or not `wait -n pid' behaves the same as `wait pid' and looks in the list of saved exit statuses if the pid isn't found in a job in the jobs list. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait -n misses signaled subprocess
On 1/29/24 7:12 AM, Robert Elz wrote: Date:Sun, 28 Jan 2024 18:21:42 -0500 From:Chet Ramey Message-ID: <3347f790-529b-4bee-91fd-de39bed3f...@case.edu> | because `wait -n' doesn't look in the table | of saved statuses -- its job is to wait for `new' jobs to terminate, not | ones that have already been removed from the table. That's very interesting, and most unexpected information. I always wondered why the option was 'n' - I would have made it be 'a' probably, as a shorthand for "any" - but then I decided that perhaps 'n' was a better choice, as "a" could also be "all", the option name would not be providing any real clue at all, so I assumed you'd been ultra clever and used 'n' as the next char in "any" and also as it can be read like the first part of "en" "ee" (which you need to say out loud, or at least in your head, to get the effect of). You should have. You told me about your implementation using `-n' in 10/2017, long before I implemented it (4/2020). It never even dawned on me that 'n' might mean "new", as in only processes that hadn't terminated at the time the wait -n was done, as that's essentially a recipe for script madness, race conditions galore, as the one reported here. What does `wait -n' without job arguments mean? What wait(1) needed was an alternative to its normal "all" semantic, just "wait" waits for every background job to terminate, what's needed is a way to wait for any one of them (whether already terminated, but not previously waited for or not). That's what I always assumed wait -n was doing, and how I implemented it in the NetBSD shell. OK. Since wait without options can already wait for the same pid multiple times, the -n option has to bring some new functionality here. Similarly "wait pid1 pid2 pid3" waits for all 3 of those to terminate, so "wait -n pid1 pid2 pid3" should wait for any one of them - already terminated or not. As long as it's still in the jobs list. When there's just one pid in the list, the -n option always seemed useless to me, there ought be no difference between "wait pid" and "wait -n pid" (as in wait for all of one, and wait for any of one, mean the same thing, wait for that one), but obviously should still be supported for consistency. OK. We can agree there shouldn't be any difference between `wait pid' and `wait -n pid'. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait -n misses signaled subprocess
On 1/29/24 12:07 PM, Chet Ramey wrote: On 1/29/24 7:12 AM, Robert Elz wrote: Date: Sun, 28 Jan 2024 18:21:42 -0500 From: Chet Ramey Message-ID: <3347f790-529b-4bee-91fd-de39bed3f...@case.edu> | because `wait -n' doesn't look in the table | of saved statuses -- its job is to wait for `new' jobs to terminate, not | ones that have already been removed from the table. That's very interesting, and most unexpected information. I always wondered why the option was 'n' - I would have made it be 'a' probably, as a shorthand for "any" - but then I decided that perhaps 'n' was a better choice, as "a" could also be "all", the option name would not be providing any real clue at all, so I assumed you'd been ultra clever and used 'n' as the next char in "any" and also as it can be read like the first part of "en" "ee" (which you need to say out loud, or at least in your head, to get the effect of). You should have. You told me about your implementation using `-n' in 10/2017, long before I implemented it (4/2020). Sorry, this is my mistake. That was a different feature. Bash implemented `wait -n' first. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait -n misses signaled subprocess
On 1/29/24 12:33 PM, Chet Ramey wrote: You should have. You told me about your implementation using `-n' in 10/2017, long before I implemented it (4/2020). Sorry, this is my mistake. That was a different feature. Bash implemented `wait -n' first. For those wondering, the `different feature' was having `wait -n' pay attention to its pid/job arguments. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait -n misses signaled subprocess
On 1/29/24 7:54 AM, Andreas Schwab wrote: On Jan 29 2024, Robert Elz wrote: I always wondered why the option was 'n' n = next? Yes: the original implementation polled the non-terminated background jobs and returned when one of them exited. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait -n misses signaled subprocess
Date:Mon, 29 Jan 2024 12:07:53 -0500 From:Chet Ramey Message-ID: | What does `wait -n' without job arguments mean? Find, or if there are none already, wait*(2) for, a process (job technically) that has changed state (terminated in POSIX, and one day in the NetBSD shell, that difference isn't relevant here) and return its status. If there's already a terminated job (job which has changed status in bash) then no wait type sys call gets performed (that already happened). It also returns the status of that process, rather than simple "0" which a bare "wait" does (and with the appropriate arg, tells you which process it was). | OK. Since wait without options can already wait for the same pid multiple | times, the -n option has to bring some new functionality here. Yes, without args, it waits until all listed arg processes (jobs) are finished (or changed state) and returns the status of the last. With -n it waits for any one of them, just as the bash man page says it will. The "any one" (vs "all") is the new functionality. | As long as it's still in the jobs list. Yes, of course - the final para of my message covered that case. | OK. We can agree there shouldn't be any difference between `wait pid' | and `wait -n pid'. Yes, but just because that's a degenerate case of the more general commands, which happens in each case to devolve into the same thing. And from a different message: chet.ra...@case.edu said: | So should the shell require the user to periodically run `wait' in a non- | interactive shell without job control to clean dead jobs out of the jobs | list? I don't think so. I do. wait or jobs ("jobs >/dev/null" is a nice simple clean up, without the potential hang waiting for things to terminate that the wait utility imposes). A new option to wait(1) (either a simple one, perhaps -t, to only wait for already terminated jobs, or a timeout, where 0 indicates never to wait at all (ie: don't do the wait sys call) which would be a more general, but more costly, mechanism). But as long as it is just a matter of cleaning up, and jobs works for that, I don't currently see the need. Of course, you're also allowed to dump processes from the lists if there get to be too many of them, but on modern systems, it really should be possible to retain hundreds, if not thousands, without any real problem. And of course, you're not required to retain status of any job if there's no way that the script can request it - but determining that these days is difficult. It used to be easy in the Sys V/POSIX model where if $! wasn't saved, then there was no way for the script to request the status, as it couldn't (reasonably - parsing job trees from ps output doesn't count) find out the pid to wait for (and simple "wait" never returns any status). These days, with the jobs command available, a script could do pids=$(jobs -l | code to parse the output and print the pids) and determine what it can wait for that way (the code isn't difficult) - and it can also wait on %1 %2 ... without having any idea what the pids might be, so in practice adding the (non-trivial) code to monitor references to $! isn't worth the bother (IMO). It's also a bit unusual for non-interactive code to run lots of async jobs without waiting for results - doing that is a sure way to run into the "max user processes" limit, and have things start failing. If there are less than that, then having the shell retain the info until the script terminates isn't really a very big cost, should the script not bother to ever clean up. | I think it's whether or not `wait -n pid' behaves the same as `wait pid' and | looks in the list of saved exit statuses if the pid isn't found in a job in | the jobs list. We have it simpler than that, there's just one list, which serves both purposes. Makes things easier I believe (in all three of: shell code, shell doc, and user understanding), even if it does consume a few more bytes for a little longer than is really needed (jobs needs the command strings, so they can be printed, wait doesn't, so retaining that is an extra cost ... not one large enough for anyone to have ever noticed though). kre