It sounds like TermVector component's output: https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
Perhaps with additional flags enabled (e.g. tv.offsets and/or tv.positions). Regards, Alex. ---- Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 5 June 2016 at 07:39, Justin Lee <lee.justi...@gmail.com> wrote: > Is anyone aware of a way of getting a list of each matching token and their > offsets after executing a search? The reason I want to do this is because > I have the physical coordinates of each token in the original document > stored out of band, and I want to be able to highlight in the original > document. I would really like to have Solr return the list of matching > tokens because then things like stemming and phrase matching will work as > expected. I'm thinking of something like the highlighter component, except > instead of returning html, it would return just the matching tokens and > their offsets. > > I have googled high and low and can't seem to find an exact answer to this > question, so I have spent the last few days examining the internals of the > various highlighting classes in Solr and Lucene. I think the bulk of the > action is in WeightedSpanTermExtractor and its interaction with > getBestTextFragments in the Highlighter class. But before I spend anymore > time on this I thought I'd ask (1) whether anyone knows of an easier way of > doing this, and (2) whether I'm at least barking up the right tree. > > Thanks much, > Justin