Re: XML parsing error

2007-07-26 Thread Brian Whitman
On Jul 26, 2007, at 11:49 AM, Yonik Seeley wrote: Could you try it with jetty to see if it's the servlet container? It should be simple to just copy the index directory into solr's example/solr/data directory. Yonik, sorry for my delay, but I did just try this in jetty -- it works (it doe

Re: XML parsing error

2007-07-26 Thread Yonik Seeley
On 7/26/07, Brian Whitman <[EMAIL PROTECTED]> wrote: > > On Jul 26, 2007, at 11:25 AM, Yonik Seeley wrote: > > > OK, then perhaps it's a jetty bug with charset handling. > > > > I'm using resin btw Could you try it with jetty to see if it's the servlet container? It should be simple to just copy t

Re: XML parsing error

2007-07-26 Thread Brian Whitman
On Jul 26, 2007, at 11:25 AM, Yonik Seeley wrote: OK, then perhaps it's a jetty bug with charset handling. I'm using resin btw Could you run the same query, but use the python output? wt=python Seems to be OK: {'responseHeader':{'status':0,'QTime':0,'params':{'start':'7','fl':'c onten

Re: XML parsing error

2007-07-26 Thread Yonik Seeley
On 7/26/07, Brian Whitman <[EMAIL PROTECTED]> wrote: > > On Jul 26, 2007, at 11:10 AM, Yonik Seeley wrote: > > > > > If the '<' truely got destroyed, it's a server (Solr or Jetty) bug. > > > > One possibility is that the '<' does exist, but due to a charset > > mismatch, it's being slurped into a m

Re: XML parsing error

2007-07-26 Thread Brian Whitman
On Jul 26, 2007, at 11:10 AM, Yonik Seeley wrote: If the '<' truely got destroyed, it's a server (Solr or Jetty) bug. One possibility is that the '<' does exist, but due to a charset mismatch, it's being slurped into a multi-byte char. Just dumped it with curl and did a hexdump: 5a0

Re: XML parsing error

2007-07-26 Thread Yonik Seeley
On 7/26/07, Brian Whitman <[EMAIL PROTECTED]> wrote: > I ended up with this doc in solr: > > > > 0 name="QTime">17 name="fl">content"Pez"~1 name="rows">1 numFound="5381" start="7">Akatsuki - PE'Z > ҳ | ̳ | պ | ŷ | >>> Akatsuki - PE'Z ר | и  | > Ů  | ֶ  | պ  | ¸  | tӺ > &

Re: XML parsing error

2007-07-26 Thread Marc Worrell - Mediamatic
Looks to me as if your document is not valid UTF-8 and is missing one byte at the end. Then the '<' of '' is included into the previous character. Did you create the text snippet yourself? Maybe check if the string functions you are using are multi-byte aware. Greetings, Marc On 26-jul-2