Mingchun, yes, that is better, and it works fine.
Thank you! Bjørn On Sat, Dec 20, 2014 at 1:26 PM, Mingchun Zhao <mingchun.zha...@gmail.com> wrote: > Hi Bjørn, > > From solr4.4, the behavior of end offsets in EdgeNGramFilterFactory > was changed due to the following issue, > https://issues.apache.org/jira/browse/LUCENE-3907 > The related source code in this patch as below, > == > + if (version.onOrAfter(Version.LUCENE_44)) { > + // Never update offsets > + updateOffsets = false; > + } else { > + // if length by start + end offsets doesn't match the > term text then assume > + // this is a synonym and don't adjust the offsets. > + updateOffsets = (tokStart + curTermLength) == tokEnd; > + } > == > > It seems that there is no any property for specifying the previous > behavior of offsets as in LUCENE_43. > Therefore, you might have to set luceneMatchVersion to deal with it as > you mentioned. > However, it would be better to apply luceneMatchVersion just on the > EdgeNGramFilterFactory as below, > == > <filter class="solr.EdgeNGramFilterFactory" > maxGramSize="20" minGramSize="1" luceneMatchVersion="4.3"/> > == > The setting of <luceneMatchVersion>LUCENE_43</luceneMatchVersion> in > solrconfig.xml > will also affect other configurations. > > Regards, > Mingchun > > > 2014-12-19 23:26 GMT+09:00 Bjørn Hjelle <bjorn.hje...@gmail.com>: >> Hi, >> >> based on this example: >> http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ >> I have earlier successfully implemented highlight of terms in >> (Edge)NGram-analyzed fields. >> >> In a new project, however, with Solr 4.10.2 it does not work. >> >> In the Solr admin analysis page I see the following in Solr 4.10.2 >> (simplified): >> >> ENGTF text t te tes test >> start 0 0 0 0 >> end 4 4 4 4 >> >> But if I change to LUCENE_43 in solrconfig.xml, and reload the >> analysis page I get this: >> >> ENGTF text t te tes test >> start 0 0 0 0 >> end 1 2 3 4 >> >> So, in 4.10.2 it is not able to find the correct end-positions and the >> highlighter will instead highlight the complete word ("test" in this >> case). >> >> >> To reproduce this: >> 1. download Solr 4.10.2 >> 2. In the collection1 schema.xml, add field type: >> >> >> <fieldType name="autocomplete_ngram" class="solr.TextField"> >> <analyzer type="index"> >> <charFilter class="solr.MappingCharFilterFactory" >> mapping="mapping-ISOLatin1Accent.txt"/> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.EdgeNGramFilterFactory" >> maxGramSize="20" minGramSize="1"/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/> >> </analyzer> >> <analyzer type="query"> >> <charFilter class="solr.MappingCharFilterFactory" >> mapping="mapping-ISOLatin1Accent.txt"/> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="0" generateNumberParts="0" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/> >> <filter class="solr.PatternReplaceFilterFactory" >> pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> >> </analyzer> >> </fieldType> >> >> 3. Start solr and in analysis page add "Test" to Field Value (Index) >> -field and check the output. >> 4. Then change to this in solrconfig.xml >> >> <luceneMatchVersion>LUCENE_43</luceneMatchVersion> >> >> 5. reload the core and reload the analyis page. >> 6. you will now see that the end-positions are correct. >> >> >> >> Any ideas on how to make this work with Solr 4.10.2 without resorting >> to changing lucene version in solrconfig.xml? >> >> >> Thanks, >> Bjørn