On 10/17/24 6:44 PM, Greg Wooledge wrote:
This issue came up on the Libera #bash IRC channel today:Between bash 4.4 and 5.0, the definition of "IFS whitespace" has apparently been expanded:
POSIX defines whitespace as a character in the current locale's `space' character class, or a byte for which isspace() returns true. The word splitting section references this definition, but leaves it up to the application whether or not characters besides space/tab/newline are considered IFS whitespace when they appear in $IFS. At the time (previous edition of the standard), POSIX defined whitespace as "In the POSIX locale, white space consists of one or more <blank> ( <space> and <tab> characters), <newline>, <carriage-return>, <form-feed>, and <vertical-tab> characters." The word splitting section wasn't quite as rigorous as the current version's, but it referenced this definition. However, the conformance suite tests for this. Before bash-5.0, Oracle contacted me about the results of their running bash-4.4 through the conformance suite (they were considering shipping the next version of Solaris with bash as the POSIX shell and wanted it to pass the tests). Now, I had not run bash through this test suite myself -- that came later -- so I took them at their word. There were a couple of `read' tests for exactly this, including making sure that leading and trailing whitespace got stripped if the (non- space/tab/newline) characters were in $IFS. So I changed it -- the test suite, something that companies have to pay to take and want to pass, was supposed to reflect the normative text -- and shipped bash-5.0. Oracle was happy, this was a minor change that affected few people, and then Oracle canceled Solaris 12 and decided to stick with Solaris 11 forever.
In bash 4.4 and earlier, IFS whitespace is always space + tab + newline. But in 5.0 and later, it's "whatever the locale's isspace() allows",
Yep, as POSIX specifies.
along with some kind of 0x00 to 0x7f range check (thanks emanuele6).
The comment in locale_setblanks explains this: some systems, like macOS, return true from isspace() for characters between 0x80 and 0xff even though they introduce multibyte characters (every locale besides "C" in macOS uses UTF-8 encoding). Grisha reported it: https://lists.gnu.org/archive/html/bug-bash/2023-05/msg00132.html
Now, that's not necessarily bad, but the man page still says:
Yes, the man page and info file need to evolve in the same manner as the standard: define whitespace and then reference it as needed. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/
OpenPGP_signature.asc
Description: OpenPGP digital signature