Researching more, it was already an issue. Sorry for the inconvenience.
http://issues.apache.org/jira/browse/SOLR-42
Pako
Francisco Sanmartin wrote:
Highlighting in Solr has a strange behavior in some items. I attach an
example to see if anyone can throw some light at it. Basically solr
is highlighting wrong words. I'm looking for the word "car" and I tell
solr to highlight it with the code <strong> and </strong>. The
response is ok in most of the cases, but there are some items that
appear with the wrong words highlighted. I attach an example at the
bottom.
The problem of this example is that is highlighting the word "his",
but the search word is "car".
This is the scenario:
Solr 1.2
The url:
http://solr-server:8983/solr/select/?q=id:11439968%20AND%20description%3Acar&hl=on&hl.fl=description&hl.simple.pre=%3Cstrong%3E&hl.simple.post=%20%3C%2Fstrong%3E
The query fancy style:
<lst name="params">
<str name="hl.simple.pre"><strong></str>
<str name="hl.simple.post"> </strong></str>
<str name="hl.fl">description</str>
<str name="hl">on</str>
<str name="q">id:11439968 AND description:car</str>
</lst>
(I query with the id to obtain the item that is failing in
highlighing, so everything is more clear).
The response:
<result name="response" numFound="1" start="0">
<doc>
...
<int name="id">11439968</int>
...
<str name="description">
This is a one of a kind all custom '95 Integra LS with 2005
TSX headlight and tailight conversion. It has GSR all black interior,
18 inch rims, strut bars, cd changer, coil overs, HID headlights,
catback exhaust, intake, new clutch and brakes. Motor has 130,000
miles. No smoke or leaks. Runs great. This car is completly
shaved. Paint is a two toned black/white with white ice flake. It is
flawless and ready to show. This car has not even seen winter
after being built! It is stored in a garage all year. Serious inquires
only (203)994-0085. OR Email [EMAIL PROTECTED] $8,500 OR BEST
OFFER!!!!!
</str>
...
</doc>
<lst name="highlighting">
<lst name="11439968">
<arr name="description">
<str>
back exhaust, intake, new clutch and brakes. Motor has
130,000 miles. No smoke or leaks. Runs great. T<strong>his </strong>
</str>
</arr>
</lst>
</lst>
</response>
The schema (relevant parts);
<field name="description" type="text_html" indexed="true"
stored="true"/>
...
<fieldtype name="text_html" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldtype>
Thanks in advance.
Pako