Researching more, it was already an issue. Sorry for the inconvenience.

http://issues.apache.org/jira/browse/SOLR-42

Pako


Francisco Sanmartin wrote:
Highlighting in Solr has a strange behavior in some items. I attach an example to see if anyone can throw some light at it. Basically solr is highlighting wrong words. I'm looking for the word "car" and I tell solr to highlight it with the code <strong> and </strong>. The response is ok in most of the cases, but there are some items that appear with the wrong words highlighted. I attach an example at the bottom.


The problem of this example is that is highlighting the word "his", but the search word is "car".
This is the scenario:

Solr 1.2
The url:
http://solr-server:8983/solr/select/?q=id:11439968%20AND%20description%3Acar&hl=on&hl.fl=description&hl.simple.pre=%3Cstrong%3E&hl.simple.post=%20%3C%2Fstrong%3E

The query fancy style:
<lst name="params">
<str name="hl.simple.pre"><strong></str>
<str name="hl.simple.post"> </strong></str>
<str name="hl.fl">description</str>
<str name="hl">on</str>
<str name="q">id:11439968 AND description:car</str>
</lst>

(I query with the id to obtain the item that is failing in highlighing, so everything is more clear).

The response:
<result name="response" numFound="1" start="0">
 <doc>
   ...
   <int name="id">11439968</int>
    ...
    <str name="description">
This is a one of a kind all custom &#39;95 Integra LS with 2005 TSX headlight and tailight conversion. It has GSR all black interior, 18 inch rims, strut bars, cd changer, coil overs, HID headlights, catback exhaust, intake, new clutch and brakes. Motor has 130,000 miles. No smoke or leaks. Runs great. This car is completly shaved. Paint is a two toned black/white with white ice flake. It is flawless and ready to show. This car has not even seen winter after being built! It is stored in a garage all year. Serious inquires only (203)994-0085. OR Email [EMAIL PROTECTED] $8,500 OR BEST OFFER!!!!!
   </str>
   ...
 </doc>
<lst name="highlighting">
   <lst name="11439968">
       <arr name="description">
           <str>
back exhaust, intake, new clutch and brakes. Motor has 130,000 miles. No smoke or leaks. Runs great. T<strong>his </strong>
           </str>
       </arr>
   </lst>
</lst>
</response>

The schema (relevant parts);

<field name="description" type="text_html" indexed="true" stored="true"/>

...

<fieldtype name="text_html" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
         <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
         <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
   </fieldtype>


Thanks in advance.

Pako





Reply via email to