Tim Landscheidt wrote:
> I don't know what the result
> of `quotearg ("äöü")' should look like and what it should
> depend on.

It depends on the output destination of the quoted string.
If it is for output on stderr, like in bison/src/parse-gram.y:193

  %printer { fputs (quotearg_style (c_quoting_style, $$), stderr); }

then you can most likely emit "äöü" with the same multibyte characters.

If it is for inclusion in a Java program, in comments, you also don't
need to do particular processing of multibyte characters.

If it is for use as a literal string in a Java program, then the
interpretation of source code depends on the -encoding parameter passed
as argument to the Java compiler (see [1]). If you emit "äöü" directly
into the source code, the developer needs to add a -encoding option;
this is normally not welcome. To avoid this, the notation \unnnn
can be used in strings for UTF-16 codepoints, excluding LF and CR
(\u000A and \u000D are invalid inside strings). So, the algorithm is:
  - Determine the encoding of the string's origin (if it's from a
    file name or a tty, you can assume locale_charset() is the right
    guess; if it's from a file, use a command-line argument to specify
    its encoding).
  - Convert the multibyte string to UTF-16 (either through module
    'striconv' or through a hand-written code in the same style as
    lib/unicodeio.c [just in the reverse direction]).
  - Replace LF with \n, CR with \r, and all other UTF-16 code points
    outside the range U+0020..U+007E with \unnnn.

Bruno

[1] http://download.oracle.com/javase/1,5.0/docs/tooldocs/solaris/javac.html
-- 
In memoriam Louis Philippe d'Orléans 
<http://en.wikipedia.org/wiki/Louis_Philippe_II,_Duke_of_Orléans>

Reply via email to