On Thu, 19 Sep 2024, Brian Inglis via Cygwin wrote: > On 2024-09-19 07:27, Christian Franke via Cygwin wrote: > > > > > > Yes, but Cygwin does not provide consistent forward/reverse UTF-8 <-> UTF-16 > > mappings. > > Surrogates halves are invalid for UTF-8 encoding; they should be first be > encoded as a valid UTF-16 code point. > The encoder should just fail if it encounters any invalid sequence! > Handling surrogates or other invalid values as anything other than invalid > turns > the encoding into what has been called WTF-8 where W may be for Windows! ;^>
This may be necessary though, in order to round-trip anything which is valid in NTFS. In my opinion, rm -rf not failing in the face of potentially maliciously named files/directories is more important than strictly adhering to a standard that says 'fail if you see these values'. https://cygwin.com/pipermail/cygwin/2024-June/256111.html -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple