Re: HighLithing exact phrases with solr

Antonio Calò Tue, 20 Oct 2009 04:53:02 -0700

Hi Kaji, many thanks for your suggestion.

Sorry for delay in my feedback.


I've tried to set hl.usePhraseHighlighter=true, but it still not working.

Here my setup:

<highlighting>
   <!-- Configure the standard fragmenter -->
   <!-- This could most likely be commented out in the "default" case -->
   <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter">
    <lst name="defaults">
     <int name="hl.fragsize">100</int>
     <bool name="hl.usePhraseHighlighter">true</bool>

      <bool name="hl.highlightMultiTerm">true</bool>
    </lst>
   </fragmenter>

   <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
   <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter" default="true">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">100</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.5</float>
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>

      <bool name="hl.usePhraseHighlighter">true</bool>

      <bool name="hl.highlightMultiTerm">true</bool>

    </lst>
   </fragmenter>

   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter">
    <lst name="highlighting">
    <bool name="hl.usePhraseHighlighter">true</bool>

      <bool name="hl.highlightMultiTerm">true</bool>
     <str name="hl.simple.pre"><![CDATA[<strong>]]></str>
     <str name="hl.simple.post"><![CDATA[</strong>]]></str>
    </lst>
   </formatter>


  </highlighting>


Any help from other user is really appreciated.

2009/10/6 Koji Sekiguchi <k...@r.email.ne.jp>

> Please try hl.usePhraseHighlighter=true parameter.
> (It should be true by default if you use the latest nightly, but I think
> you don't)
>
> Koji
>
>
> Antonio Calň wrote:
>
>> Hi Guys
>>
>> I'm getting crazy with the highlighting in solr. The problem is the
>> follow:
>> when I submit an exact phrase query, I get the related results and the
>> related snippets with highlight. But I've noticed that the *single term of
>> the phrase are highlighted too*. Here an example:
>>
>> If I start a search for "quick brown fox", I obtain the correct result
>> with
>> the doc wich contains the phrase, but the snippets came to me like this:
>>
>> <lst name="highlighting">
>>     <lst name="14">
>>        <arr name="DocumentText">
>>            <str>
>> The <em>quick brown fox</em> jump over the lazy dog. The <em>fox</em> is a
>> nice animal.
>>            </str>
>>     </arr>
>>  </lst>
>> </lst>
>>
>>
>> Also with some documents, only single terms are highlighted insteand of
>> exact sentence even if the exact phrase is contained into the document i.
>> e.:
>> <lst name="highlighting">
>>     <lst name="14">
>>        <arr name="DocumentText">
>>            <str>
>> The <em>fox</em> is a nice animal.
>>            </str>
>>     </arr>
>>  </lst>
>> </lst>
>>
>>
>> My understanding of highlighting is that if I search for exact phrase,
>> only
>> the exact phrase is should be highlighted.
>>
>> Here an extract of my solrconfig.xml & schema.xml
>>
>> solrconfig.xml:
>>
>> <highlighting>
>>   <!-- Configure the standard fragmenter -->
>>   <!-- This could most likely be commented out in the "default" case -->
>>   <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter">
>>    <lst name="defaults">
>>     <int name="hl.fragsize">500</int>
>>    </lst>
>>   </fragmenter>
>>
>>   <!-- A regular-expression-based fragmenter (f.i., for sentence
>> extraction) -->
>>   <fragmenter name="regex"
>> class="org.apache.solr.highlight.RegexFragmenter" default="true">
>>    <lst name="defaults">
>>      <!-- slightly smaller fragsizes work better because of slop -->
>>      <int name="hl.fragsize">700</int>
>>      <!-- allow 50% slop on fragment sizes -->
>>      <float name="hl.regex.slop">0.5</float>
>>      <!-- a basic sentence pattern -->
>>      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
>>
>>      <bool name="hl.usePhraseHighlighter">true</bool>
>>
>>      <bool name="hl.highlightMultiTerm">true</bool>
>>    </lst>
>>   </fragmenter>
>>
>>   <!-- Configure the standard formatter -->
>>   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter">
>>    <lst name="highlighting">
>>     <str name="hl.simple.pre"><![CDATA[<strong>]]></str>
>>     <str name="hl.simple.post"><![CDATA[</strong>]]></str>
>>    </lst>
>>   </formatter>
>>
>>
>> schema.xml:
>>
>> <analyzer type="index">
>>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stop_italiano.txt"/>
>>                <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0"/>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>                  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>            </analyzer>
>>
>>
>>            <analyzer type="query">
>>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>                <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stop_italiano.txt"/>
>>                <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="1"/>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>            </analyzer>
>>        </fieldtype>
>>
>>
>> Maybe I'm missing something, or my understanding of the highlighting
>> feature
>> is not correct. Any Idea?
>>
>> As always, thanks for your support!
>>
>> Regards, Antonio
>>
>>
>>
>
>


-- 
Antonio Calò
------------------------------------------
Software Developer Engineer
@ Intellisemantic
Mail anton.c...@gmail.com
Tel. 011-56.90.429
------------------------------------------

Re: HighLithing exact phrases with solr

Reply via email to