Getting a list of matching terms and offsets

Justin Lee Sat, 04 Jun 2016 14:40:09 -0700

Is anyone aware of a way of getting a list of each matching token and their
offsets after executing a search?  The reason I want to do this is because
I have the physical coordinates of each token in the original document
stored out of band, and I want to be able to highlight in the original
document.  I would really like to have Solr return the list of matching
tokens because then things like stemming and phrase matching will work as
expected. I'm thinking of something like the highlighter component, except
instead of returning html, it would return just the matching tokens and
their offsets.


I have googled high and low and can't seem to find an exact answer to this
question, so I have spent the last few days examining the internals of the
various highlighting classes in Solr and Lucene.  I think the bulk of the
action is in WeightedSpanTermExtractor and its interaction with
getBestTextFragments in the Highlighter class.  But before I spend anymore
time on this I thought I'd ask (1) whether anyone knows of an easier way of
doing this, and (2) whether I'm at least barking up the right tree.

Thanks much,
Justin

Getting a list of matching terms and offsets

Reply via email to