I managed to hack some highlighting into a request handler last night for a quick and dirty application demo, but it is less than ideal. The current situation with XMLWriter actually pulling the Document from the index coupled with the lack of access to the Query causes this to currently be a tricky situation. My hack is just within the handleRequest method of the request handler and makes a second pass over the DocList and re-retrieves the Document objects to highlight them, and adds the highlighted text to additional XML elements in the response, not to the <doc>'s. So my current hack is not worth contributing.

Yonik additionally brought up some other very good points regarding term vectors and stored fields. Stored fields would be necessary for highlighting in the general sense, certainly, but I envision some applications wanting to store the original text elsewhere and a custom highlighting hook used to retrieve the original text through other means.

I'm not quite sure where to go with this highlighting issue from here given what seems to be a bit of an overhaul in where the Document objects are accessed, or in being able to get the full context of the Query (and filters, etc) down to the XMLWriter.

Thoughts?

        Erik



On Apr 4, 2006, at 9:18 PM, Chris Hostetter wrote:


For the record, i know next to nothing about highlighting in Lucene. i
can't remember if i read that chapter in LIA or not :)

: curious if the standard handler and configuration should evolve to
: handle the common need for search term highlighting, and if so how

+1

: would that ideally look in the configuration and search request?

one of the things i've been doing in my custom plugins (one of which is really generic and i'm hoping to get permission to commit it back to solr real soon now) is to make every possible query param have a corrisponding
identically named init param (in the solr config) which it uses as the
default.  That way you can have...
    <str name="highlightFields">title description</str>
...in your solrconfig.xml, and clients that want differnet behavior can
override it with...
   highlightFields=title+description+body
...in the URL.

: I am game for developing the highlighting piece in some way in the
: next few days, and would gladly contribute that feature back provided
: it was done in a way that fits with Solr's architecture.

from a usage standpoint, i think adding both a URL param and init param
to StandardRequestHandler that takes in a space seperated list of
fieldNames to highlight makes a lot of sense ... the question is what do
we do with it?

Modifing XMLWriter and SolrQueryResponse to have "defaultHighlightFields" in the same way they currently have "defaultReturnFields" seems like it makes the most sense, (especially since that way other plugins can use it
as well).  Then the XMLWriter can include a new <hi>word</hi> in it's
output anytime it wants to highlight something.

(NOTE: Adding XML markup for highlighting probably means the default
"Protocol Version" should be rev'ed to 2.2, and highlighting should be
flat out disabled if the version is less then that  so older clients
aren't suddenly suprised to find xml markup in their strings if the server
configuration cahnges)


-Hoss

Reply via email to