I've been wanting to try out the PostingsHighlighter, so I added
storeOffsetsWithPositions to my field definition, enabled the
highlighter in solrconfig.xml, reindexed and tried it out. When I
issue a query I'm getting this error:
|field 'text' was indexed without offsets, cannot highlight
java.lang.IllegalArgumentException: field 'text' was indexed without offsets,
cannot highlight
at
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightDoc(PostingsHighlighter.java:545)
at
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightField(PostingsHighlighter.java:467)
at
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFieldsAsObjects(PostingsHighlighter.java:392)
at
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFields(PostingsHighlighter.java:293)|
I've been trying to figure out why the field wouldn't have offsets
indexed, but I just can't see it. Is there something in the analysis
chain that could stripping out offsets?
This is the field definition:
<field name="text" type="text_en" indexed="true" stored="true"
multiValued="false" termVectors="true" termPositions="true"
termOffsets="true" storeOffsetsWithPositions="true" />
(Yes I know PH doesn't require term vectors; I'm keeping them around for
now while I experiment)
<fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<!-- We are indexing mostly HTML so we need to ignore the tags -->
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- lower casing must happen before WordDelimiterFilter or
protwords.txt will not work -->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="1" protected="protwords.txt"/>
<!-- This deals with contractions -->
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" expand="true" ignoreCase="true"/>
<filter class="solr.HunspellStemFilterFactory"
dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- lower casing must happen before WordDelimiterFilter or
protwords.txt will not work -->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"/>
<!-- setting tokenSeparator="" solves issues with compound
words and improves phrase search -->
<filter class="solr.HunspellStemFilterFactory"
dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>