Re: [Groff] mom : unicode in .INCLUDE'd files

John Gardner Sun, 23 Jul 2017 05:29:46 -0700

>
> UTF-8 and UTF-16 Text Encoding Detection Library


That was posted in *2014?? *Suddenly I've forgotten if time's flowing
backwards or forwards...

What's the rationale for choosing UTF-16 in the first place? It offers
nothing that UTF-8 can't already handle... (to my flimsy understanding)

On 23 July 2017 at 22:23, Mike Bianchi <mbian...@foveal.com> wrote:

> This library purports to be a way to approach the problem ...
>
>   https://www.autoitconsulting.com/site/development/utf-8-
> utf-16-text-encoding-detection-library/
>
>         UTF-8 and UTF-16 Text Encoding Detection Library
>         by Jonathan Bennett | Aug 23, 2014 | Development |
>
> This post shows how to detect UTF-8 and UTF-16 text and presents a fully
> functional C++ and C# library that can be used to help with the detection.
>
> I recently had to upgrade the text file handling feature of AutoIt to
> better
> handle text files where no byte order mark (BOM) was present.  The older
> version of code I was using worked fine for UTF-8 files (with or without
> BOM)
> but it wasn't able to detect UTF-16 files without a BOM. I tried to the the
> IsTextUnicode Win32 API function but this seemed extremely unreliable and
> wouldn't detect UTF-16 Big-Endian text in my tests.
>
> Note, especially for UTF-16 detection, there is always an element of
> ambiguity.
> This post by Raymond shows that however you try and detect encoding there
> will
> always be some sequence of bytes that will make your guesses look stupid.
>
> Here are the detection methods I'm currently using for the various types of
> text file.  The order of the checks I perform are:
>
>     BOM
>     UTF-8
>     UTF-16 (newline)
>     UTF-16 (null distribution)
>         :
>         :
>
> --
>  Mike Bianchi
>
>

Re: [Groff] mom : unicode in .INCLUDE'd files

Reply via email to