Re: detailed Error reporting in Solr

2013-04-05 Thread Walter Underwood
It is not a bug. XML parsers are required to reject documents with undefined character entities. Try parsing it as HTML or XHTML. wunder On Apr 4, 2013, at 11:14 AM, eShard wrote: > Yes, that's it exactly. > I crawled a link with these ( ›) in each list item and solr > couldn't handle it threw

Re: detailed Error reporting in Solr

2013-04-04 Thread Jack Krupansky
ality, in most cases, can simply be ignored. Yes, by all means ask on the Tika list. Solr is just wrapping the error Tika reports. -- Jack Krupansky -Original Message- From: eShard Sent: Thursday, April 04, 2013 2:14 PM To: solr-user@lucene.apache.org Subject: Re: detailed Error re

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
ene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053882.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: detailed Error reporting in Solr

2013-04-04 Thread Jack Krupansky
I'm trying to understand the context is here... are you trying to crawl web pages that have bad HTML? Or, ... what? -- Jack Krupansky -Original Message- From: eShard Sent: Thursday, April 04, 2013 10:23 AM To: solr-user@lucene.apache.org Subject: detailed Error reporting in

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
ok, one possible fix is to add the xml equivalent to nbsp with is: ]> but how do I add this into the tika configuration? -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053823.html Sent from the Solr - User mailing l

detailed Error reporting in Solr

2013-04-04 Thread eShard
27;t handle. for example: Cyber Systems and Technology › My question is two fold: 1) how do I get solr to report more detailed errors and 2) how do I get tika to accept (or ignore) nbsp? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in