Re: detailed Error reporting in Solr

2013-04-05 Thread Walter Underwood
It is not a bug. XML parsers are required to reject documents with undefined character entities. Try parsing it as HTML or XHTML. wunder On Apr 4, 2013, at 11:14 AM, eShard wrote: > Yes, that's it exactly. > I crawled a link with these ( ›) in each list item and solr > couldn't handle it threw

Re: detailed Error reporting in Solr

2013-04-04 Thread Jack Krupansky
ality, in most cases, can simply be ignored. Yes, by all means ask on the Tika list. Solr is just wrapping the error Tika reports. -- Jack Krupansky -Original Message- From: eShard Sent: Thursday, April 04, 2013 2:14 PM To: solr-user@lucene.apache.org Subject: Re: detailed Error re

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
Yes, that's it exactly. I crawled a link with these ( ›) in each list item and solr couldn't handle it threw the xml parse error and the crawler terminated the job. Is this fixable? Or do I have to submit a bug to the tika folks? Thanks, -- View this message in context: http://lucene.472066.

Re: detailed Error reporting in Solr

2013-04-04 Thread Jack Krupansky
I'm trying to understand the context is here... are you trying to crawl web pages that have bad HTML? Or, ... what? -- Jack Krupansky -Original Message- From: eShard Sent: Thursday, April 04, 2013 10:23 AM To: solr-user@lucene.apache.org Subject: detailed Error reporting in Solr Good

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
ok, one possible fix is to add the xml equivalent to nbsp with is: ]> but how do I add this into the tika configuration? -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053823.html Sent from the Solr - User mailing list archiv