------- Comment #3 from burnus at gcc dot gnu dot org 2008-04-15 19:46 ------- > > Front end and library are ready to handle this when implemented. > Front-end is ready? Yes, it is: ENCODING= is supported and the rest is neither in the library nor in the front-end implemented. Though I would not call this "ready".
> Is ENCODING="UTF-8" related to UCS-4 support? I think it is at the end. You can easily use UTF-8 encoding already now, but '(a2)' might print one (non-ascii) or two (ascii) characters. To have something well-defined, only one-byte-wide characters can be used currently. For anything beyond, UCS4 is needed in the front end. Actually, I do not understand how to write things like character(kind=myUCS4,len=20) :: foo = myUCS4_'Some UCS4 string' (The problem is switching the encoding within the same file; good luck in finding an editor which supports this.) If one does not need non-ascii character literals (i.e. reading from / writing to files), there is no problem. Possible solutions? a) Have a UCS-4 input file; then both default_'foo' and ucs4_'foo' work. b) Expect that for myUCS4_'foo' literals the characters in the quotes are actually UTF-8. I'm personally in favour of (b). I'm not quite sure whether this is really compatible with the Fortran standard, but I like the way of inputting the string. Otherwise, I think Fortran misses a good way of inputting non-ascii characters in an ASCII file. C99 offers '\uXXXX' but unless I missed something in Fortran the equivalent would be: I think (c) is what most programmers want, but I actually do not see how this should work syntax wise; or should an ascii literal automatically handled as UTF-8? Then it would work: when assigning to a ucs8 string, the UTF-8 gets properly converted a non-ascii character has then the length one (len(char() while if one assigns to a ASCII string, non-ascii characters of cause need more bytes and thus "len('ยง') == 2". (b) is also an interesting problem. And (a) of cause works, but it is quite cumbersome to use - Fortran misses the \uXXXX way of C for specifying an unicode character; one can probably work with myUCS4string = char(int(z/A0FF/),kind=myUCS4) but this is awful. (Actually, I think the standard does not even guarantee that it does this as "char" is processor dependent.) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35863