Martin,

You may want to follow Mark Miller's effort https://issues.apache.org/jira/browse/LUCENE-1286 as it develops -- perhaps even help with it. He's developing a Lucene highlighter which would "run through query terms by using their offsets" making highlighting large documents much more time efficient. I would be interested to see something like this end up as a Solr highlighting option.

Revisiting some of your original thoughts:
What I see though is that the highlighting functionality is heavily tied
to the fragment (highlight context) functionality. This actually makes
it interesting to write a plane highlight method that just returns meta
data (so some other process can do the actual highlighting in some
custom fashion).

So is it worth while to make sure that solr is able to do multiple
different kinds of highlighting, even if it means passing meta data back
in the request? Should we have standard ways to index and read back
payload information if we're dealing with pages, books, co-ordinates
(for highlighting images) and other meta data which is used for
highlights (chat offset, term offset eccettera). I also noticed much of
the highlighting code to do with fragments being duplicated in custom
code.
My idea for highlighting based on https://issues.apache.org/jira/browse/SOLR-380 was to include the coordinates for highlighting images as just another attribute in the input xml. Then the PayloadComponent will give the coordinates associated with a given query as part of the xpath. I have written some code beyond what is posted there that takes some extra parameters and reconstructs the xpath into useful results based on the granularity of the information that is requested (roughly based on xquery). Is that a "standard" enough way or is there something else you're thinking about?

If you find anything thing I've contributed useful feel free to improve it for the benefit of those that use Solr and Lucene.

Tricia

Reply via email to