Hi,
based on this example:
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
I have earlier successfully implemented highlight of terms in
(Edge)NGram-analyzed fields.
In a new project, however, with Solr 4.10.2 it does not work.
In the Solr admin analysis page I see the following in Solr 4.10.2 (simplified):
ENGTF text t te tes test
start 0 0 0 0
end 4 4 4 4
But if I change to LUCENE_43 in solrconfig.xml, and reload the
analysis page I get this:
ENGTF text t te tes test
start 0 0 0 0
end 1 2 3 4
So, in 4.10.2 it is not able to find the correct end-positions and the
highlighter will instead highlight the complete word ("test" in this
case).
To reproduce this:
1. download Solr 4.10.2
2. In the collection1 schema.xml, add field type:
<fieldType name="autocomplete_ngram" class="solr.TextField">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory"
maxGramSize="20" minGramSize="1"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts="0" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="^(.{20})(.*)?" replacement="$1" replace="all"/>
</analyzer>
</fieldType>
3. Start solr and in analysis page add "Test" to Field Value (Index)
-field and check the output.
4. Then change to this in solrconfig.xml
<luceneMatchVersion>LUCENE_43</luceneMatchVersion>
5. reload the core and reload the analyis page.
6. you will now see that the end-positions are correct.
Any ideas on how to make this work with Solr 4.10.2 without resorting
to changing lucene version in solrconfig.xml?
Thanks,
Bjørn