Re: Highlighting Output

Tricia Williams Tue, 12 Aug 2008 09:17:53 -0700

Martin,

You may want to follow Mark Miller's efforthttps://issues.apache.org/jira/browse/LUCENE-1286 as it develops --perhaps even help with it. He's developing a Lucene highlighter whichwould "run through query terms by using their offsets" makinghighlighting large documents much more time efficient. I would beinterested to see something like this end up as a Solr highlighting option.


Revisiting some of your original thoughts:

What I see though is that the highlighting functionality is heavily tied
to the fragment (highlight context) functionality. This actually makes
it interesting to write a plane highlight method that just returns meta
data (so some other process can do the actual highlighting in some
custom fashion).

So is it worth while to make sure that solr is able to do multiple
different kinds of highlighting, even if it means passing meta data back
in the request? Should we have standard ways to index and read back
payload information if we're dealing with pages, books, co-ordinates
(for highlighting images) and other meta data which is used for
highlights (chat offset, term offset eccettera). I also noticed much of
the highlighting code to do with fragments being duplicated in custom
code.

My idea for highlighting based onhttps://issues.apache.org/jira/browse/SOLR-380 was to include thecoordinates for highlighting images as just another attribute in theinput xml. Then the PayloadComponent will give the coordinatesassociated with a given query as part of the xpath. I have written somecode beyond what is posted there that takes some extra parameters andreconstructs the xpath into useful results based on the granularity ofthe information that is requested (roughly based on xquery). Is that a"standard" enough way or is there something else you're thinking about?

If you find anything thing I've contributed useful feel free to improveit for the benefit of those that use Solr and Lucene.


Tricia

Re: Highlighting Output

Reply via email to