Re: Malformed XML with exotic characters

2011-02-03 Thread Markus Jelsma
Hi I've seen almost all funky charsets but gothic is always trouble. I'm also unsure if its really a bug in Solr. It could well be the Xerces being unable to cope. Besides, most systems indeed don't go well with gothic. This mail client does, but my terminal can't find its cursor after (properl

Re: Malformed XML with exotic characters

2011-02-01 Thread Robert Muir
Hi, it might only be a problem with your xml tools (e.g. firefox). the problem here is characters outside of the basic multilingual plane (in this case Gothic). XML tools typically fall apart on these portions of unicode (in lucene we recently reverted to a patched/hacked copy of xerces specificall

Re: Malformed XML with exotic characters

2011-02-01 Thread Sascha Szott
Hi Markus, in my case the JSON response writer returns valid JSON. The same holds for the PHP response writer. -Sascha On 01.02.2011 18:44, Markus Jelsma wrote: You can exclude the input's involvement by checking if other response writers do work. For me, the JSONResponseWriter works perfect

Re: Malformed XML with exotic characters

2011-02-01 Thread Markus Jelsma
You can exclude the input's involvement by checking if other response writers do work. For me, the JSONResponseWriter works perfectly with the same returned data in some AJAX environment. On Tuesday 01 February 2011 18:29:06 Sascha Szott wrote: > Hi folks, > > I've made the same observation whe

Re: Malformed XML with exotic characters

2011-02-01 Thread Sascha Szott
Hi folks, I've made the same observation when working with Solr's ExtractingRequestHandler on the command line (no browser interaction). When issuing the following curl command curl 'http://mysolrhost/solr/update/extract?extractOnly=true&extractFormat=text&wt=xml&resource.name=foo.pdf' --da

Re: Malformed XML with exotic characters

2011-02-01 Thread Markus Jelsma
Hi, There is no typical encoding issues on my system. I can index, query and display english, german, chinese, vietnamese etc. Cheers On Tuesday 01 February 2011 17:23:49 François Schiettecatte wrote: > Markus > > A few things to check, make sure whatever SOLR is hosted on is outputting > utf-

Re: Malformed XML with exotic characters

2011-02-01 Thread Markus Jelsma
It's throwing out a lot of disturbing messages: select.xml:17: parser error : Char 0xD800 out of allowed range ki • Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • ^ select.xml:17: parser error : PCDATA invalid Ch

Re: Malformed XML with exotic characters

2011-02-01 Thread François Schiettecatte
Markus A few things to check, make sure whatever SOLR is hosted on is outputting utf-8 ( URIEncoding="UTF-8" in the Connector section in server.xml on Tomcat for example), which it looks like here, also make sure that whatever http header there is tells firefox that it is getting utf-8 (otherw

Re: Malformed XML with exotic characters

2011-02-01 Thread Stefan Matheis
Hi Markus, to verify that it's not an Firefox-Issue, try xmllint on your shell to check the given xml? Regards Stefan On Tue, Feb 1, 2011 at 4:43 PM, Markus Jelsma wrote: > There is an issue with the XML response writer. It cannot cope with some very > exotic characters or possibly the right-to

Malformed XML with exotic characters

2011-02-01 Thread Markus Jelsma
There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the