Thanks, yea, I looked at debug query too. Unfortunately the output of debug query doesn't quite do it. For example, if you use a wildcard query, it will simply explain the score associated with that wildcard query, not the actual matching token. In order words, if you search for "hour*" and the actual matching text is "hours", debug query doesn't tell you that. Instead, it just reports the score associated with "hour*".
The closest example I've ever found is this: https://lucidworks.com/blog/2013/05/09/update-accessing-words-around-a-positional-match-in-lucene-4/ But this kind of approach won't let me use the full power of the Solr ecosystem. I'd basically be back to dealing with Lucene directly, which I think is a step backwards. I think the right approach is to write my own SearchComponent, using the highlighter as a starting point. But I wanted to make sure there wasn't a simpler way. On Sun, Jun 5, 2016 at 11:30 AM Ahmet Arslan <iori...@yahoo.com.invalid> wrote: > Well debug query has the list of token that caused match. > If i am not mistaken i read an example about span query and spans thing. > It was listing the positions of the matches. > Cannot find the example at the moment.. > > Ahmet > > > > On Sunday, June 5, 2016 9:10 PM, Justin Lee <lee.justi...@gmail.com> > wrote: > Thanks for the responses Alex and Ahmet. > > The TermVector component was the first thing I looked at, but what it gives > you is offset information for every token in the document. I'm trying to > get a list of tokens that actually match the search query, and unless I'm > missing something, the TermVector component doesn't give you that > information. > > The TermSpans class does contain the right information, but again the hard > part is: how do I reliably get a list of TokenSpans for the tokens that > actually match the search query? That's why I ended up in the highlighter > source code, because the highlighter has to do just this in order to create > snippets with accurate highlighting. > > Justin > > > On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan <iori...@yahoo.com.invalid> > wrote: > > > Hi, > > > > May be org.apache.lucene.search.spans.TermSpans ? > > > > > > > > On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch < > arafa...@gmail.com> > > wrote: > > It sounds like TermVector component's output: > > > https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component > > > > Perhaps with additional flags enabled (e.g. tv.offsets and/or > > tv.positions). > > > > Regards, > > Alex. > > ---- > > Newsletter and resources for Solr beginners and intermediates: > > http://www.solr-start.com/ > > > > > > > > On 5 June 2016 at 07:39, Justin Lee <lee.justi...@gmail.com> wrote: > > > Is anyone aware of a way of getting a list of each matching token and > > their > > > offsets after executing a search? The reason I want to do this is > > because > > > I have the physical coordinates of each token in the original document > > > stored out of band, and I want to be able to highlight in the original > > > document. I would really like to have Solr return the list of matching > > > tokens because then things like stemming and phrase matching will work > as > > > expected. I'm thinking of something like the highlighter component, > > except > > > instead of returning html, it would return just the matching tokens and > > > their offsets. > > > > > > I have googled high and low and can't seem to find an exact answer to > > this > > > question, so I have spent the last few days examining the internals of > > the > > > various highlighting classes in Solr and Lucene. I think the bulk of > the > > > action is in WeightedSpanTermExtractor and its interaction with > > > getBestTextFragments in the Highlighter class. But before I spend > > anymore > > > time on this I thought I'd ask (1) whether anyone knows of an easier > way > > of > > > doing this, and (2) whether I'm at least barking up the right tree. > > > > > > Thanks much, > > > Justin > > >