Hi Bjørn, >From solr4.4, the behavior of end offsets in EdgeNGramFilterFactory was changed due to the following issue, https://issues.apache.org/jira/browse/LUCENE-3907 The related source code in this patch as below, == + if (version.onOrAfter(Version.LUCENE_44)) { + // Never update offsets + updateOffsets = false; + } else { + // if length by start + end offsets doesn't match the term text then assume + // this is a synonym and don't adjust the offsets. + updateOffsets = (tokStart + curTermLength) == tokEnd; + } ==
It seems that there is no any property for specifying the previous behavior of offsets as in LUCENE_43. Therefore, you might have to set luceneMatchVersion to deal with it as you mentioned. However, it would be better to apply luceneMatchVersion just on the EdgeNGramFilterFactory as below, == <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" minGramSize="1" luceneMatchVersion="4.3"/> == The setting of <luceneMatchVersion>LUCENE_43</luceneMatchVersion> in solrconfig.xml will also affect other configurations. Regards, Mingchun 2014-12-19 23:26 GMT+09:00 Bjørn Hjelle <bjorn.hje...@gmail.com>: > Hi, > > based on this example: > http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ > I have earlier successfully implemented highlight of terms in > (Edge)NGram-analyzed fields. > > In a new project, however, with Solr 4.10.2 it does not work. > > In the Solr admin analysis page I see the following in Solr 4.10.2 > (simplified): > > ENGTF text t te tes test > start 0 0 0 0 > end 4 4 4 4 > > But if I change to LUCENE_43 in solrconfig.xml, and reload the > analysis page I get this: > > ENGTF text t te tes test > start 0 0 0 0 > end 1 2 3 4 > > So, in 4.10.2 it is not able to find the correct end-positions and the > highlighter will instead highlight the complete word ("test" in this > case). > > > To reproduce this: > 1. download Solr 4.10.2 > 2. In the collection1 schema.xml, add field type: > > > <fieldType name="autocomplete_ngram" class="solr.TextField"> > <analyzer type="index"> > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EdgeNGramFilterFactory" > maxGramSize="20" minGramSize="1"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/> > </analyzer> > <analyzer type="query"> > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="0" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> > </analyzer> > </fieldType> > > 3. Start solr and in analysis page add "Test" to Field Value (Index) > -field and check the output. > 4. Then change to this in solrconfig.xml > > <luceneMatchVersion>LUCENE_43</luceneMatchVersion> > > 5. reload the core and reload the analyis page. > 6. you will now see that the end-positions are correct. > > > > Any ideas on how to make this work with Solr 4.10.2 without resorting > to changing lucene version in solrconfig.xml? > > > Thanks, > Bjørn