I managed to hack some highlighting into a request handler last night
for a quick and dirty application demo, but it is less than ideal.
The current situation with XMLWriter actually pulling the Document
from the index coupled with the lack of access to the Query causes
this to currently be a tricky situation. My hack is just within the
handleRequest method of the request handler and makes a second pass
over the DocList and re-retrieves the Document objects to highlight
them, and adds the highlighted text to additional XML elements in the
response, not to the <doc>'s. So my current hack is not worth
contributing.
Yonik additionally brought up some other very good points regarding
term vectors and stored fields. Stored fields would be necessary for
highlighting in the general sense, certainly, but I envision some
applications wanting to store the original text elsewhere and a
custom highlighting hook used to retrieve the original text through
other means.
I'm not quite sure where to go with this highlighting issue from here
given what seems to be a bit of an overhaul in where the Document
objects are accessed, or in being able to get the full context of the
Query (and filters, etc) down to the XMLWriter.
Thoughts?
Erik
On Apr 4, 2006, at 9:18 PM, Chris Hostetter wrote:
For the record, i know next to nothing about highlighting in
Lucene. i
can't remember if i read that chapter in LIA or not :)
: curious if the standard handler and configuration should evolve to
: handle the common need for search term highlighting, and if so how
+1
: would that ideally look in the configuration and search request?
one of the things i've been doing in my custom plugins (one of
which is
really generic and i'm hoping to get permission to commit it back
to solr
real soon now) is to make every possible query param have a
corrisponding
identically named init param (in the solr config) which it uses as the
default. That way you can have...
<str name="highlightFields">title description</str>
...in your solrconfig.xml, and clients that want differnet behavior
can
override it with...
highlightFields=title+description+body
...in the URL.
: I am game for developing the highlighting piece in some way in the
: next few days, and would gladly contribute that feature back
provided
: it was done in a way that fits with Solr's architecture.
from a usage standpoint, i think adding both a URL param and init
param
to StandardRequestHandler that takes in a space seperated list of
fieldNames to highlight makes a lot of sense ... the question is
what do
we do with it?
Modifing XMLWriter and SolrQueryResponse to have
"defaultHighlightFields"
in the same way they currently have "defaultReturnFields" seems
like it
makes the most sense, (especially since that way other plugins can
use it
as well). Then the XMLWriter can include a new <hi>word</hi> in it's
output anytime it wants to highlight something.
(NOTE: Adding XML markup for highlighting probably means the default
"Protocol Version" should be rev'ed to 2.2, and highlighting should be
flat out disabled if the version is less then that so older clients
aren't suddenly suprised to find xml markup in their strings if the
server
configuration cahnges)
-Hoss