Re: unable to figure out nutch type highlighting in solr....

Adrian Sutton Fri, 05 Oct 2007 04:15:38 -0700

On 05/10/2007, at 4:07 PM, Ravish Bhagdev wrote:

(Query esp. Adrian):


If you are indexing XHTML, do you replace tags with entities before
giving it to solr, if so, when you get back snippets do you get tags
or entities or do you convert again to tags for presentation?  What's
the best way out?  It would help me a lot if you briefly explain your
configuration.

We happen to develop a HTML editor so we know 100% for certain thatthe XHTML is valid XML. Given that we just throw the raw XHTML atSolr which uses the HTMLStripWhitespaceTokenizer. However, at thisstage we haven't configured highlighting at all, so our index is usedfor search and retrieving a document ID. At some point I'd like toadd highlighting and it sounds like the best way to do so would be toindex the document text instead of the HTML.

Beyond that, we also use Solr as an optimization for extractinginformation such as what content was most recently changed, whichpages link to others etc. On the page linking, we actually identifywhat pages are linked to prior to indexing and store them as aseparate field - Solr itself has no understanding of the linking.

Oh and I should note, I'm very new to Solr so I'm probably not doingthings the best way, but I'm getting great results anyway.


Regards,

Adrian Sutton
http://www.symphonious.net

Re: unable to figure out nutch type highlighting in solr....

Reply via email to