Hi, For the sake of story completeness, I was able to fix the highlighter to work with the token matches that go beyond the length of the text field. The solution was to mod on matched token positions, if they exceed the length of the text.
Dmitry On Thu, Dec 27, 2012 at 10:13 AM, Dmitry Kan <solrexp...@gmail.com> wrote: > Hi, > > answering my own question for the records: the experiments show that the > described functionality is achievable with the TokenFilter class > implementation. The only caveat though, is that Highlighter component stops > working properly, if the match position goes beyond the length of the text > field. > > As for the performance, no major delays compared to the original proximity > search implementation have been noticed. > > Best, > > Dmitry Kan > > > On Wed, Dec 19, 2012 at 10:53 AM, Dmitry Kan <solrexp...@gmail.com> wrote: > >> Dear list, >> >> We are currently evaluating proximity searches ("term1 term2" ~slope) for >> a specific use case. In particular, each document contains artificial >> delimiter characters (one character between each pair of sentences in the >> text). Our goal is to hit the sentences individually for any proximity >> search and avoid sentence cross-boundary matches. >> >> We figured, that by using PositionIncrementAttribute as a field in the >> descendant of TokenFilter class it is possible to set a position >> increment of each artificial character (which is a term in Lucene / SOLR >> notation) to an arbitrarily large number. Thus any proximity searches with >> reasonably small slope values should automatically hit withing the sentence >> boundaries. >> >> Does this sound like a right way to tackle the problem? Are there any >> performance costs involved? >> >> Thanks in advance for any input, >> >> Dmitry Kan >> > >