https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-2878 provides lucene API what you are trying to do, it's not yet in though. There's a fork which has the change in https://github.com/flaxsearch/lucene-solr-intervals On 12 Sep 2014 21:24, "Craig Longman" <clong...@iconect.com> wrote:
> In order to take our Solr usage to the next step, we really need to > improve its highlighting abilities. What I'm trying to do is to be able > to write a new component that can return the fields that matched the > search (including numeric fields) and the start/end positions for the > alphanumeric matches. > > > > I see three different approaches take, either way will require making > some modifications to the lucene/solr parts, as it just does not appear > to be doable as a completely stand alone component. > > > > 1) At initial search time. > > This seemed like a good approach. I can follow IndexSearcher creating > the TermContext that parses through AtomicReaderContexts to see if it > contains a match and then adds it to the contexts available for later. > However, at this point, inside SegmentTermsEnum.seekExact() it seems > like Solr is not really looking for matching terms as such, it's just > scanning what looks like the raw index. So, I don't think I can easily > extract term positions at this point. > > > > 2) Write a odified HighlighterComponent. We have managed to get phrases > to highlight properly, but it seems like getting the full field matches > would be more difficult in this module, however, because it does its > highlighting oblivious to any other criteria, we can't use it as is. > For example, this search: > > > > (body:large+AND+user_id:7)+OR+user_id:346 > > > > Will highlight "large" in records that have user_id = 346 when > technically (for our purposes at least) it should not be considered a > hit because the "large" was accompanied by the user_id = 7 criteria. > It's not immediately clear to me how difficult it would be to change > this. > > > > 3) Make a modified DebugComponent and enhance the existing explain() > methods (in the query types we require it at least) to include more > information such as the start/end positions of the term that was hit. > I'm exploring this now, but I don't easily see how I can figure out what > those positions might be from the explain() information. Any pointers > on how, at the point that TermQuery.explain() is being called that I can > figure out which indexed token was the actual hit on? > > > > > > Craig Longman > > C++ Developer > > iCONECT Development, LLC > 519-645-1663 > > > > > > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and confidential. > If the reader of the message is not the intended recipient or an authorized > representative of the intended recipient, you are hereby notified that any > dissemination of this communication is strictly prohibited. If you have > received this communication in error, notify the sender immediately by > return email and delete the message and any attachments from your system. > >