More specifically, if I replace “ “ with u/200B (zero width space) in the string that contains surrogate characters, FOP parsing fails even if I just use xsl:value-of. I’m not going to pursue that at this time. Maybe when FOP handles non-BMP characters it should be revisited.
Marc From: Marc Kaufman [mailto:[email protected]] Sent: Thursday, July 14, 2016 12:34 PM To: [email protected] Subject: RE: isolated high surrogate I’ve isolated the problem to a template definition that is trying to replace apace characters with non-breaking spaces. Evidently it clobbers some surrogate pairs. FWIW: here’s the offending line(s): <xsl:template name="zero_width_space_1"> <xsl:param name="data"/> <xsl:param name="counter" select="0"/> <xsl:choose> <xsl:when test="$counter < string-length($data)+1"> <xsl:value-of select='concat(substring($data,$counter,1),"​")'/> <xsl:call-template name="zero_width_space_2"> <xsl:with-param name="data" select="$data"/> <xsl:with-param name="counter" select="$counter+1"/> </xsl:call-template> </xsl:when> <xsl:otherwise> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="zero_width_space_2"> <xsl:param name="data"/> <xsl:param name="counter"/> <xsl:value-of select='concat(substring($data,$counter,1),"​")'/> <xsl:call-template name="zero_width_space_1"> <xsl:with-param name="data" select="$data"/> <xsl:with-param name="counter" select="$counter+1"/> </xsl:call-template> </xsl:template> So, not an FOP problem. Marc From: Marc Kaufman [mailto:[email protected]] Sent: Thursday, July 14, 2016 12:22 PM To: [email protected]<mailto:[email protected]> Subject: RE: isolated high surrogate I tried that. Doesn’t work. I understand that non-BMP is not supported, and I’m prepared to live with two .notdef characters in the result, but I’m not sure why I’m getting the fatal error from the parser. From: Glenn Adams [mailto:[email protected]] Sent: Thursday, July 14, 2016 12:01 PM To: FOP Users <[email protected]<mailto:[email protected]>> Subject: Re: isolated high surrogate Non-BMP characters are not presently supported by FOP, see [1]. When they are supported, you would best encode them in a file using a single (not two) numeric character entities, e.g., 𐀁, etc. [1] https://issues.apache.org/jira/browse/FOP-1969 On Thu, Jul 14, 2016 at 12:51 PM, Marc Kaufman <[email protected]<mailto:[email protected]>> wrote: I’m stumped by this error: org.xml.sax.SAXParseException; lineNumber: 92; columnNumber: 51; java.lang.IllegalArgumentException: isolated high surrogate I have text with surrogate pairs throughout the file, but this only occurs in this context: <fo:block padding-top="2em" padding-bottom=".5em" text-align="left" font-family="Kozuka Gothic PR6N" font-size="18pt" color="black"> <xsl:call-template name="zero_width_space_1"> <xsl:with-param name="data" select="@documentName"/> </xsl:call-template> </fo:block> I’ve checked the input stream, and all the surrogates are correctly paired. I’ve tried escaping the surrogate pairs (e.g. “&#-integer-;”), but that doesn’t change the error.
