[ 
https://issues.apache.org/jira/browse/COCOON-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471290
 ] 

Andrew Stevens commented on COCOON-2002:
----------------------------------------

Just a thought - won't the encoding that needs to be used depend on what was 
used in the input document?  i.e. if the source document passed in from a file 
generator has <?xml version="1.0" encoding="Big5"?>, would the above change 
cause similar problems?

Also, how is this affected by the char-encoding property in the tidy.properties 
configuration file?  Rather than the above change, could you have solved your 
problem by ensuring that property matches the source encoding being used in 
your documents?  It may be that jtidy's default is latin-1.

It seems to me that passing the above value in to the getBytes call assumes 
that the AbstractSAXTransformer's text recording code is written to always use 
UTF-8 for the stored text (and transcode where necessary).  Is this actually 
the case?


> HTML transformer  only works with latin-1 characters
> ----------------------------------------------------
>
>                 Key: COCOON-2002
>                 URL: https://issues.apache.org/jira/browse/COCOON-2002
>             Project: Cocoon
>          Issue Type: Bug
>          Components: Blocks: HTML
>    Affects Versions: 2.1.10, 2.1.11-dev (Current SVN)
>            Reporter: Abbas Mousavi
>            Priority: Critical
>
> when transforming HTML in encodings other than latin-1
> the result is a page of question mark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to