Seeing strange highlighting in multi-valued field (was: Why does highlight use the index analyzer)

Christian Vogler Wed, 27 Feb 2008 00:17:01 -0800

On Wednesday 27 February 2008 03:58:14 Chris Hostetter wrote:
> I'm not much of a highligher expert, but this *seems* like it was probably
> intentional ... you are tlaking abouthte use case where you have a stored
> field, and no term positions correct? ... so in order to highlight, the
> highlighter needs to analyzed the stored text to find the word positions?


Yes, that is correct. I index and store the field, and have term positions 
disabled. Your explanation makes sense, thanks. 

However, to follow up, I have run into some strange highlighter behavior on 
multi-valued text fields. In particular, I have a field like this:

<fieldType name="text_de" class="solr.TextField" 
positionIncrementGap="100">...</fieldType>

The analyzers for indexing and query are identical, except that I put a 
compound word splitter in the indexer chain. I use this in a multi-valued 
category field:

<field name="category" type="text_de" indexed="true" stored="true" 
multiValued="true" />

Typical values from documents are:
<arr name="category"><str>Gebärdensprache</str><str>Recht</str></arr>

where the indexed terms, after analysis are: "gebard" "sprach" and "recht", 
respectively. Now, if I query for "Gebärden" (which the analyzer transforms 
into "gebard"), I get matches, as expected, but the highlighter retrieves 
only the match on the first token of the first field, like this:

<arr name="category"><str>&lt;em&gt;Gebärden&lt;/em&gt;</str></arr>

The fragment, snippet, and merging parameters have no effect on this behavior; 
hl.requireFieldMatch is off; hl.fragmenter is gap.

What is a bit strange is that If the field have only one value, then the 
highlighter retrieves the entire contents of the field; that is, if we have 
indexed

<arr name="category"><str>Gebärdensprache</str></arr>

then the highlighter will show

<arr name="category"><str>&lt;em&gt;Gebärden&lt;/em&gt;sprache</str></arr>

which is the behavior that I expected, irrespective of whether the field has 
one or more values.

Any idea what could be going on here?

Best regards
- Christian

Seeing strange highlighting in multi-valued field (was: Why does highlight use the index analyzer)

Reply via email to