Re: problems with PhraseHighlighter

AHMET ARSLAN Sun, 01 Nov 2009 09:07:22 -0800

> Copy-paste your field definition for
> the field you are trying to
> highlight/search on.
> 
> Cheers
> Avlesh


Thank you for your interest Avlesh,

My field type mostly contains custom filters and tokenizers.

<fieldType name="XMLText" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
  <tokenizer class="XMLStripStandardTokenizerFactory" /> 
  <filter class="solr.SynonymFilterFactory" synonyms="synonyms_index.txt" 
ignoreCase="true" expand="true" /> 
  <filter class="CustomStemFilterFactory" protected="protwords.txt" /> 
  <filter class="LowerCaseFilterFactory" /> 
  </analyzer>
 <analyzer type="query">
  <tokenizer class="CustomTokenizerFactory" /> 
  <filter class="CustomDeasciifyFilterFactory" /> 
  <filter class="CustomStemFilterFactory" protected="protwords.txt" /> 
  <filter class="LowerCaseFilterFactory" /> 
  </analyzer>
  </fieldType>


Firstly I tried to use solr.HTMLStripCharFilterFactory to strip xml tags, it 
works fine but when it comes to highlighting the <em> tags are replaced 
incorrect position. Same as solr.HTMLStripStandardTokenizerFactory. The <em> 
tags are inserted interestingly exactly one character before the actual term. 
So I added a new token definition to StandardTokenizer's jflex file, to 
recogize xml tags and ingores them. I confirmed that it is working with some 
testcases. It strips xml tags in tokenizer level. I am doing this because I am 
displaying original documents with xml + xslt. Therefore i need to highlight 
xml files to display.

And I am using ComplexPhraseQueryParser [1].

But i reproduced the problem with &defType=lucene&q="term1 term2"~5 I see that 
term1 and term2 is 5 terms close to each other . Therefore it is returned. But 
highlighting is empty. And there is no xml tags (stripped by tokenizer) between 
those terms in the original document.

hl.maxanalyzedchars parameter is about original document, right? I mean in my 
case including xml tags too.

[1] 
http://lucene.apache.org/java/2_9_0/api/contrib-misc/org/apache/lucene/queryParser/complexPhrase/package-summary.html

Re: problems with PhraseHighlighter

Reply via email to