Hi Branden, G. Branden Robinson wrote on Thu, Dec 06, 2018 at 11:17:11PM -0500: > At 2018-12-06T18:44:18+0100, Ingo Schwarze wrote:
>> - | tr '[:cntrl:]' ' '" >> + | tr '[:cntrl:]' '[ *32]'" > This might not be portable _enough_. > > The number of characters in the class :cntrl: is locale-dependent; you > are only guaranteed 32 such codepoints if LC_CTYPE=C (that is, ASCII). I expected that even for exotic locales, the newline character would likely be among the 32 first found, but maybe you are right that isn't guaranteed. > POSIX says that the repeat count in the second argument to tr can be > omitted, and the transliteration target will grow to fit the size of the > source: > > https://pubs.opengroup.org/onlinepubs/009695399/utilities/tr.html Interesting, i missed that. The version you are lnking to is outdated, but the current version says the same: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html > ...on the other hand, Solaris's relationship with POSIX has been > difficult at best, True, but... > so I wouldn't be surprised if omitting the repeat > count is disallowed in its implementation. ... actually, omitting the repeat count works on Solaris 9 to 11. > But I know nothing about the > limitations of historical versions of tr. Using something that is standard-conforming (like "[ *]") is clearly better than using something that is unspecified (like ' '), even if we don't know which historical systems support it. > Another approach would be to force LC_CTYPE=C in the pipeline before > calling tr. > > So either: > > | tr '[:cntrl:]' '[ *]'" > > or: > > | LC_CTYPE=C tr '[:cntrl:]' '[ *32]'" > > perhaps? Is that safe? I figure it might have catastrophic results if the system default locale happens to be UTF-32 or something like that, but i'm not sure. In any case, not all character encodings are supersets of ASCII. So, i see three standard-conforming options that work on all of Linux, OpenBSD, Solaris 11, Solaris 10, and Solaris 9: [1] tr '[:cntrl:]' '[ *32]' Slight risk that on some system in some locale, the newline might not be among the first 32 control characters. [2] tr '[:cntrl:]' '[ *]' Slight risk that some system might not support it. [3] tr '\\\\n' ' ' Slight risk that on some system, we might get "\r\n" or "\r" - not sure that fear makes sense, it might be FUD. Also a slight risk that replacing other control characters is somehow important, even though i don't think it is, but maybe right before release is poor timing for such a change. I tend to like [2] best - the change is minimal, addressing only what must be changed and nothing else, compliant, and works. So if somebody confims that opinion and nobody objects, i think i should put that in: s/32//. Yours, Ingo