> > UTF-8 and UTF-16 Text Encoding Detection Library
That was posted in *2014?? *Suddenly I've forgotten if time's flowing backwards or forwards... What's the rationale for choosing UTF-16 in the first place? It offers nothing that UTF-8 can't already handle... (to my flimsy understanding) On 23 July 2017 at 22:23, Mike Bianchi <mbian...@foveal.com> wrote: > This library purports to be a way to approach the problem ... > > https://www.autoitconsulting.com/site/development/utf-8- > utf-16-text-encoding-detection-library/ > > UTF-8 and UTF-16 Text Encoding Detection Library > by Jonathan Bennett | Aug 23, 2014 | Development | > > This post shows how to detect UTF-8 and UTF-16 text and presents a fully > functional C++ and C# library that can be used to help with the detection. > > I recently had to upgrade the text file handling feature of AutoIt to > better > handle text files where no byte order mark (BOM) was present. The older > version of code I was using worked fine for UTF-8 files (with or without > BOM) > but it wasn't able to detect UTF-16 files without a BOM. I tried to the the > IsTextUnicode Win32 API function but this seemed extremely unreliable and > wouldn't detect UTF-16 Big-Endian text in my tests. > > Note, especially for UTF-16 detection, there is always an element of > ambiguity. > This post by Raymond shows that however you try and detect encoding there > will > always be some sequence of bytes that will make your guesses look stupid. > > Here are the detection methods I'm currently using for the various types of > text file. The order of the checks I perform are: > > BOM > UTF-8 > UTF-16 (newline) > UTF-16 (null distribution) > : > : > > -- > Mike Bianchi > >