Is anyone aware of a way of getting a list of each matching token and their offsets after executing a search? The reason I want to do this is because I have the physical coordinates of each token in the original document stored out of band, and I want to be able to highlight in the original document. I would really like to have Solr return the list of matching tokens because then things like stemming and phrase matching will work as expected. I'm thinking of something like the highlighter component, except instead of returning html, it would return just the matching tokens and their offsets.
I have googled high and low and can't seem to find an exact answer to this question, so I have spent the last few days examining the internals of the various highlighting classes in Solr and Lucene. I think the bulk of the action is in WeightedSpanTermExtractor and its interaction with getBestTextFragments in the Highlighter class. But before I spend anymore time on this I thought I'd ask (1) whether anyone knows of an easier way of doing this, and (2) whether I'm at least barking up the right tree. Thanks much, Justin