I have made bug reporting to Jira:
https://issues.apache.org/jira/browse/XERCESC-1846
I found this problem with version xerces 3.0 and icu 4.0, but I quess it will 
be problematic on xerces 2.8 and icu 3.8 version too - the symptoms are there.
Thank to the xerces team for fixing this.
Jan Suchy

------------ Původní zpráva ------------
Od: David Bertoni <[email protected]>
Předmět: Re: xerces/ICU unicode alias for weak encoding when
serializing/converting to CP
Datum: 16.12.2008 22:38:25
----------------------------------------
Jan Suchý wrote:
> Hi Jesse,
> thank you for your answer and ideas.
> I have found one kind of solution to patch the transcoder wrap class:
> src\xercesc\util\Transcoders\ICU\ICUTransService.cpp
>
> adding there to constructor of ICUTranscoder::ICUTranscoder these lines:
>
>    UErrorCode uerr = U_ZERO_ERROR;
>    ucnv_setSubstChars(toAdopt, "?", 1, &uerr);
> ...
>
> Than, the "?" character is used as replacement char, when using icu.
> This is ICU specific solutions and is not clear, because there is necessary to
rebuild xerces lib. I would like to see some possible switch around XMLFormatter
class, but there is unknown UConverter form ICU which will be used next, because
there is nothing to know which transcoder will be called later.
Please create a Jira issue because this is a bug.  We should not let the
ICU use a replacement character that we know will result in a document
that's not well-formed.

Dave



Reply via email to