Re: HighLithing exact phrases with solr

Koji Sekiguchi Tue, 20 Oct 2009 07:11:33 -0700

Antonio,

Put the parameter into <requestHandler/> element, rather than<highlighting/>.

If you are using "standard" reques thandler, set it like this:


 <requestHandler name="standard" class="solr.SearchHandler" default="true">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <bool name="hl">on</bool>
      <int name="hl.snuppets">3</int>
      <str name="hl.fl">features</str>
      <bool name="hl.usePhraseHighlighter">true</bool>
    </lst>
 </requestHandler>

Or you can set it directly in your HTTP request:

http://localhost:8983/solr/select?q=something&hl=on&hl.fl=features&hl.usePhraseHighlighter=true

Koji
--

http://www.rondhuit.com/en/


Antonio Calò wrote:

Hi Kaji, many thanks for your suggestion.

Sorry for delay in my feedback.

I've tried to set hl.usePhraseHighlighter=true, but it still not working.

Here my setup:

<highlighting>
   <!-- Configure the standard fragmenter -->
   <!-- This could most likely be commented out in the "default" case -->
   <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter">
    <lst name="defaults">
     <int name="hl.fragsize">100</int>
     <bool name="hl.usePhraseHighlighter">true</bool>

      <bool name="hl.highlightMultiTerm">true</bool>
    </lst>
   </fragmenter>

   <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
   <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter" default="true">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">100</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.5</float>
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>

      <bool name="hl.usePhraseHighlighter">true</bool>

      <bool name="hl.highlightMultiTerm">true</bool>

    </lst>
   </fragmenter>

   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter">
    <lst name="highlighting">
    <bool name="hl.usePhraseHighlighter">true</bool>

      <bool name="hl.highlightMultiTerm">true</bool>
     <str name="hl.simple.pre"><![CDATA[<strong>]]></str>
     <str name="hl.simple.post"><![CDATA[</strong>]]></str>
    </lst>
   </formatter>


  </highlighting>


Any help from other user is really appreciated.

2009/10/6 Koji Sekiguchi <k...@r.email.ne.jp>

Please try hl.usePhraseHighlighter=true parameter.
(It should be true by default if you use the latest nightly, but I think
you don't)

Koji


Antonio Calň wrote:

Hi Guys

I'm getting crazy with the highlighting in solr. The problem is the
follow:
when I submit an exact phrase query, I get the related results and the
related snippets with highlight. But I've noticed that the *single term of
the phrase are highlighted too*. Here an example:

If I start a search for "quick brown fox", I obtain the correct result
with
the doc wich contains the phrase, but the snippets came to me like this:

<lst name="highlighting">
    <lst name="14">
       <arr name="DocumentText">
           <str>
The <em>quick brown fox</em> jump over the lazy dog. The <em>fox</em> is a
nice animal.
           </str>
    </arr>
 </lst>
</lst>


Also with some documents, only single terms are highlighted insteand of
exact sentence even if the exact phrase is contained into the document i.
e.:
<lst name="highlighting">
    <lst name="14">
       <arr name="DocumentText">
           <str>
The <em>fox</em> is a nice animal.
           </str>
    </arr>
 </lst>
</lst>


My understanding of highlighting is that if I search for exact phrase,
only
the exact phrase is should be highlighted.

Here an extract of my solrconfig.xml & schema.xml

solrconfig.xml:

<highlighting>
  <!-- Configure the standard fragmenter -->
  <!-- This could most likely be commented out in the "default" case -->
  <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter">
   <lst name="defaults">
    <int name="hl.fragsize">500</int>
   </lst>
  </fragmenter>

  <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
  <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter" default="true">
   <lst name="defaults">
     <!-- slightly smaller fragsizes work better because of slop -->
     <int name="hl.fragsize">700</int>
     <!-- allow 50% slop on fragment sizes -->
     <float name="hl.regex.slop">0.5</float>
     <!-- a basic sentence pattern -->
     <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>

     <bool name="hl.usePhraseHighlighter">true</bool>

     <bool name="hl.highlightMultiTerm">true</bool>
   </lst>
  </fragmenter>

  <!-- Configure the standard formatter -->
  <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter">
   <lst name="highlighting">
    <str name="hl.simple.pre"><![CDATA[<strong>]]></str>
    <str name="hl.simple.post"><![CDATA[</strong>]]></str>
   </lst>
  </formatter>


schema.xml:

<analyzer type="index">
               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stop_italiano.txt"/>
               <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
               <filter class="solr.LowerCaseFilterFactory"/>
                 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
           </analyzer>


           <analyzer type="query">
               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
               <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stop_italiano.txt"/>
               <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
           </analyzer>
       </fieldtype>


Maybe I'm missing something, or my understanding of the highlighting
feature
is not correct. Any Idea?

As always, thanks for your support!

Regards, Antonio

Re: HighLithing exact phrases with solr

Reply via email to