Hi Guys I'm getting crazy with the highlighting in solr. The problem is the follow: when I submit an exact phrase query, I get the related results and the related snippets with highlight. But I've noticed that the *single term of the phrase are highlighted too*. Here an example:
If I start a search for "quick brown fox", I obtain the correct result with the doc wich contains the phrase, but the snippets came to me like this: <lst name="highlighting"> <lst name="14"> <arr name="DocumentText"> <str> The <em>quick brown fox</em> jump over the lazy dog. The <em>fox</em> is a nice animal. </str> </arr> </lst> </lst> Also with some documents, only single terms are highlighted insteand of exact sentence even if the exact phrase is contained into the document i. e.: <lst name="highlighting"> <lst name="14"> <arr name="DocumentText"> <str> The <em>fox</em> is a nice animal. </str> </arr> </lst> </lst> My understanding of highlighting is that if I search for exact phrase, only the exact phrase is should be highlighted. Here an extract of my solrconfig.xml & schema.xml solrconfig.xml: <highlighting> <!-- Configure the standard fragmenter --> <!-- This could most likely be commented out in the "default" case --> <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"> <lst name="defaults"> <int name="hl.fragsize">500</int> </lst> </fragmenter> <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) --> <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter" default="true"> <lst name="defaults"> <!-- slightly smaller fragsizes work better because of slop --> <int name="hl.fragsize">700</int> <!-- allow 50% slop on fragment sizes --> <float name="hl.regex.slop">0.5</float> <!-- a basic sentence pattern --> <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str> <bool name="hl.usePhraseHighlighter">true</bool> <bool name="hl.highlightMultiTerm">true</bool> </lst> </fragmenter> <!-- Configure the standard formatter --> <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"> <lst name="highlighting"> <str name="hl.simple.pre"><![CDATA[<strong>]]></str> <str name="hl.simple.post"><![CDATA[</strong>]]></str> </lst> </formatter> schema.xml: <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stop_italiano.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stop_italiano.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> Maybe I'm missing something, or my understanding of the highlighting feature is not correct. Any Idea? As always, thanks for your support! Regards, Antonio