> Copy-paste your field definition for > the field you are trying to > highlight/search on. > > Cheers > Avlesh
Thank you for your interest Avlesh, My field type mostly contains custom filters and tokenizers. <fieldType name="XMLText" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="XMLStripStandardTokenizerFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_index.txt" ignoreCase="true" expand="true" /> <filter class="CustomStemFilterFactory" protected="protwords.txt" /> <filter class="LowerCaseFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="CustomTokenizerFactory" /> <filter class="CustomDeasciifyFilterFactory" /> <filter class="CustomStemFilterFactory" protected="protwords.txt" /> <filter class="LowerCaseFilterFactory" /> </analyzer> </fieldType> Firstly I tried to use solr.HTMLStripCharFilterFactory to strip xml tags, it works fine but when it comes to highlighting the <em> tags are replaced incorrect position. Same as solr.HTMLStripStandardTokenizerFactory. The <em> tags are inserted interestingly exactly one character before the actual term. So I added a new token definition to StandardTokenizer's jflex file, to recogize xml tags and ingores them. I confirmed that it is working with some testcases. It strips xml tags in tokenizer level. I am doing this because I am displaying original documents with xml + xslt. Therefore i need to highlight xml files to display. And I am using ComplexPhraseQueryParser [1]. But i reproduced the problem with &defType=lucene&q="term1 term2"~5 I see that term1 and term2 is 5 terms close to each other . Therefore it is returned. But highlighting is empty. And there is no xml tags (stripped by tokenizer) between those terms in the original document. hl.maxanalyzedchars parameter is about original document, right? I mean in my case including xml tags too. [1] http://lucene.apache.org/java/2_9_0/api/contrib-misc/org/apache/lucene/queryParser/complexPhrase/package-summary.html