Re: HighLithing exact phrases with solr

Koji Sekiguchi Tue, 06 Oct 2009 09:40:13 -0700

Please try hl.usePhraseHighlighter=true parameter.

(It should be true by default if you use the latest nightly, but I thinkyou don't)


Koji

Antonio Calò wrote:

Hi Guys

I'm getting crazy with the highlighting in solr. The problem is the follow:
when I submit an exact phrase query, I get the related results and the
related snippets with highlight. But I've noticed that the *single term of
the phrase are highlighted too*. Here an example:

If I start a search for "quick brown fox", I obtain the correct result with
the doc wich contains the phrase, but the snippets came to me like this:

<lst name="highlighting">
     <lst name="14">
        <arr name="DocumentText">
            <str>
The <em>quick brown fox</em> jump over the lazy dog. The <em>fox</em> is a
nice animal.
            </str>
     </arr>
  </lst>
</lst>


Also with some documents, only single terms are highlighted insteand of
exact sentence even if the exact phrase is contained into the document i.
e.:
<lst name="highlighting">
     <lst name="14">
        <arr name="DocumentText">
            <str>
The <em>fox</em> is a nice animal.
            </str>
     </arr>
  </lst>
</lst>


My understanding of highlighting is that if I search for exact phrase, only
the exact phrase is should be highlighted.

Here an extract of my solrconfig.xml & schema.xml

solrconfig.xml:

<highlighting>
   <!-- Configure the standard fragmenter -->
   <!-- This could most likely be commented out in the "default" case -->
   <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter">
    <lst name="defaults">
     <int name="hl.fragsize">500</int>
    </lst>
   </fragmenter>

   <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
   <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter" default="true">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">700</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.5</float>
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>

      <bool name="hl.usePhraseHighlighter">true</bool>

      <bool name="hl.highlightMultiTerm">true</bool>
    </lst>
   </fragmenter>

   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter">
    <lst name="highlighting">
     <str name="hl.simple.pre"><![CDATA[<strong>]]></str>
     <str name="hl.simple.post"><![CDATA[</strong>]]></str>
    </lst>
   </formatter>


schema.xml:

<analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                 <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stop_italiano.txt"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>


            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stop_italiano.txt"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldtype>


Maybe I'm missing something, or my understanding of the highlighting feature
is not correct. Any Idea?

As always, thanks for your support!

Regards, Antonio

Re: HighLithing exact phrases with solr

Reply via email to