On 11/26/2012 05:28, Jon TURNEY wrote:

The initial output to the gdb window
stops before the 'ü' in "Dorothea Lütkehaus" (internally this is ISO-8859-1
encoded and presumably forms an invalid UTF-8 sequence)

Only the 7-bit ASCII subset of ISO 8859-x is legal UTF-8.

This is listed in the ddd PROBLEMS file with solutions "don't use a UTF-8
locale" or "link with lesstif" :-)

Isn't the core problem that DDD has been semi-maintainerless[*] for years, and that these are the same as the years of UTF-8's ascendancy?

I ask because it's the basis of my guess is that if you tried to provide a patch upstream, it would be ignored. That, or they'd invite you to become the new maintainer. :)

So, my advice is to either maintain a private patch for Cygwin DDD or take over maintainership of DDD.


[*] https://www.gnu.org/software/ddd/news.html

A minimal fix might be to just replace the
ISO-8859-1 encoded strings with their ASCII equivalents (e.g. ü -> u)

German umlaut-ed characters are generally Anglicized as "ue", "oe", etc.

a better solution might
be to actually fix ddd, but I have no idea what would be involved.

The easiest way to fix this is with iconv(1):

    $ iconv -f iso-8859-1 -t utf-8 < foo.c > foo-fixed.c
    $ wc -c foo.c foo-fixed.c
    N foo.c
    N+M foo-fixed.c
    $ mv foo-fixed.c foo.c

N is the original file size and M will be at least 1 byte per replaced character, but potentially up to 3 per replaced character.

I suspect you'll run into a problem if you're using cygport. It will try to make a patch file for you when it sees that you've run iconv on the -src directory contents, since the corresponding -origsrc files are now different. But, since cygport is running in a UTF-8 locale, it's going to either truncate the 8859 input files or mangle them. You might have to do some shell script gymnastics to avoid this.

All the more reason to arm-twist upstream, if you can. Or become upstream. :)

Reply via email to