Yep, I misunderstood the problem. The multiple tokens at the same offset might be messing things up. One thing you can do is copyField to a field that doesn't have n-grams and do something like f.textng.hl.alternateField= in your solrconfig. That'll use the other field during highlighting. Yeah, that'll increase your index size on disk.
On Fri, Oct 16, 2015 at 10:07 AM, Jérôme Bernardes < jerome.bernar...@mappy.com> wrote: > Thanks for your reply Scott. > > I tried > > bs.language=de&bs.country=de > > Unfortunately the problem still occurs. > I have just discovered that the problem does not only affect "ß" but also > "æ" (which is mapped to "ae" > at query and index time) > q=hae --> <em>hæna<em> > So it seems to me that the problem is related to any single character that > is map to several characters using <charFilter > class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > > Jérôme > > > Le 13/10/2015 07:46, Scott Stults a écrit : > >> My guess is that the boundary scanner isn't configured right for your >> highlighter. Try setting the bs.language and bs.country parameters either >> in your request or in the requestHandler. >> >> >> k/r, >> Scott >> >> On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes < >> jerome.bernar...@mappy.com >> >>> wrote: >>> Dear Solr Users, >>> I am facing a problem with highligting on ngram fields. >>> Highlighting is working well, except for words with german character >>> "ß". >>> Eg : with q=rosen& >>> "highlighting": { >>> "gcl3r:12723710:6643": { >>> "textng": [ >>> "<em>Rosen</em>steinpark (Métro), Stuttgart (Allemagne)" >>> ] >>> }, >>> "gcl3r:2267495:780930": { >>> "textng": [ >>> "<em>Rosenstraße</em>, 94554 Moos (Allemagne)" >>> ] >>> } >>> } >>> Without "ß" words are highlight partially <em>Rosen</em>steinpark but >>> with "ß", the whole word is highlighted (<em>Rosenstraße</em>) >>> >>> ------------- >>> This characters ß is mapped to "ss" at query and index time (using >>> <charFilter class="solr.MappingCharFilterFactory" >>> mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> ) >>> . >>> Here the schema.xml for the highlighted field. >>> <fieldType name="autocomplete_ngram" class="solr.TextField"> >>> <analyzer type="index"> >>> <charFilter class="solr.MappingCharFilterFactory" >>> mapping="mapping-ISOLatin1Accent.txt"/> >>> <!--<tokenizer class="solr.StandardTokenizerFactory"/>--> >>> <tokenizer class="solr.PatternTokenizerFactory" >>> pattern="[\s,;: >>> \-\']"/> >>> <filter class="solr.WordDelimiterFilterFactory" >>> splitOnNumerics="0" >>> generateWordParts="1" >>> generateNumberParts="1" >>> catenateWords="0" >>> catenateNumbers="0" >>> catenateAll="0" >>> splitOnCaseChange="1" >>> preserveOriginal="1" >>> types="wdfftypes.txt" >>> /> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.SynonymFilterFactory" synonyms="synonym.txt" >>> ignoreCase="true" expand="true"/> >>> <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" >>> minGramSize="1"/> >>> <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d >>> \*&æøåÆØÅ ])" replacement="" replace="all"/> >>> </analyzer> >>> <analyzer type="query"> >>> <charFilter class="solr.MappingCharFilterFactory" >>> mapping="mapping-ISOLatin1Accent.txt"/> >>> <!--<tokenizer class="solr.StandardTokenizerFactory"/>--> >>> <tokenizer class="solr.PatternTokenizerFactory" >>> pattern="[\s,;: >>> \-\']"/> >>> <filter class="solr.WordDelimiterFilterFactory" >>> splitOnNumerics="0" >>> generateWordParts="1" >>> generateNumberParts="0" >>> catenateWords="0" >>> catenateNumbers="0" >>> catenateAll="0" >>> splitOnCaseChange="0" >>> preserveOriginal="1" >>> types="wdfftypes.txt" >>> /> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d >>> \*&æøåÆØÅ ])" replacement="" replace="all"/> >>> <filter class="solr.PatternReplaceFilterFactory" >>> pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> >>> </analyzer> >>> </fieldType> >>> >>> Is it a problem in our configuration or a known bug ? >>> Regards >>> Jérôme >>> >>> >>> >> > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com