On Mon, Jul 17, 2023 at 3:29 PM Chet Ramey <chet.ra...@case.edu> wrote: > > On 7/7/23 5:05 PM, Grisha Levit wrote: > > A few small tweaks for the macOS-specific normalization handling to > > handle the issues below: > > The issue is that the behavior has to be different between cases where > the shell is reading input from the terminal and gets NFC characters > that need to be converted to NFD (which is how HFS+ and APFS store them) > and when the shell is reading input from a file and doesn't need to (and > should not) do anything with NFD characters.
NB: while HFS+ stores NFD names, APFS preserves normalization, so we can get either NFC or NFD text back from readdir. Both are normalization-insensitive: "Being normalization-insensitive ensures that normalization variants of a filename cannot be created in the same directory, and that a filename can be found with any of its normalization variants." [1] Currently, Bash never actually converts to NFD. The fnx_tofs() function is there but it is never used. Instead, Bash converts filenames to NFC with fnx_fromfs() before comparing with either the glob pattern or the completion hint text (which is never converted). Since access is normalization-insensitive, we just need to normalize to _some_ form, so going to NFC is fine, but if we're going to do that we should normalize both the filesystem name and the text being compared. If there's a match, globs expand to the filenames (NFC or NFD) as returned by readdir(), and Readline completes with NFC-normalized versions of the names. I think this makes sense. What doesn't work quite right currently though is that glob patterns with NFD text never match anything, and completion prefixes with NFD text never expand to anything. [1]: https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/FAQ/FAQ.html > Does iconv work when taking NFD input that came from the file system and > trying to convert it to NFD (UTF-8-MAC)? I've honestly never checked. Converting to UTF-8-MAC always normalizes to NFD: $ printf '\303\251\0\145\314\201' | iconv -f UTF-8-MAC -t UTF-8-MAC | od -b -An 145 314 201 000 145 314 201 $ printf '\303\251\0\145\314\201' | iconv -f UTF-8 -t UTF-8-MAC | od -b -An 145 314 201 000 145 314 201 But Bash only converts from UTF-8-MAC to UTF-8, which always normalizes to NFC: $ printf '\303\251\0\145\314\201' | iconv -f UTF-8-MAC -t UTF-8 | od -b -An 303 251 000 303 251