My guess is that the boundary scanner isn't configured right for your highlighter. Try setting the bs.language and bs.country parameters either in your request or in the requestHandler.
k/r, Scott On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes <jerome.bernar...@mappy.com > wrote: > Dear Solr Users, > I am facing a problem with highligting on ngram fields. > Highlighting is working well, except for words with german character > "ß". > Eg : with q=rosen& > "highlighting": { > "gcl3r:12723710:6643": { > "textng": [ > "<em>Rosen</em>steinpark (Métro), Stuttgart (Allemagne)" > ] > }, > "gcl3r:2267495:780930": { > "textng": [ > "<em>Rosenstraße</em>, 94554 Moos (Allemagne)" > ] > } > } > Without "ß" words are highlight partially <em>Rosen</em>steinpark but > with "ß", the whole word is highlighted (<em>Rosenstraße</em>) > > ------------- > This characters ß is mapped to "ss" at query and index time (using > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > > ) > . > Here the schema.xml for the highlighted field. > <fieldType name="autocomplete_ngram" class="solr.TextField"> > <analyzer type="index"> > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <!--<tokenizer class="solr.StandardTokenizerFactory"/>--> > <tokenizer class="solr.PatternTokenizerFactory" > pattern="[\s,;: > \-\']"/> > <filter class="solr.WordDelimiterFilterFactory" > splitOnNumerics="0" > generateWordParts="1" > generateNumberParts="1" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="1" > preserveOriginal="1" > types="wdfftypes.txt" > /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonym.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" > minGramSize="1"/> > <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d > \*&æøåÆØÅ ])" replacement="" replace="all"/> > </analyzer> > <analyzer type="query"> > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <!--<tokenizer class="solr.StandardTokenizerFactory"/>--> > <tokenizer class="solr.PatternTokenizerFactory" > pattern="[\s,;: > \-\']"/> > <filter class="solr.WordDelimiterFilterFactory" > splitOnNumerics="0" > generateWordParts="1" > generateNumberParts="0" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="0" > preserveOriginal="1" > types="wdfftypes.txt" > /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d > \*&æøåÆØÅ ])" replacement="" replace="all"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> > </analyzer> > </fieldType> > > Is it a problem in our configuration or a known bug ? > Regards > Jérôme > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com