PostingsHighlighter and analysis

Trey Hyde Mon, 11 Mar 2013 13:44:13 -0700

debug=timing has told me for a very long time that 99% of my query time for 
slow queries is in the highlighting component so I've been eagerly awaiting the 
postingshighlighter for quite some time.  Mean query times 50ms or less, with 
certain queries able to generate > 30s worth of highlighting.    Now that it's 
here I've been somewhat disappointed since I can't use it since so many common 
analyzers emit tokens out of order, which, apparently is not compatible with 
storeOffsetsWithPositions.


The only analyzer that is in the "bad" list according to LUCENE-4641 that is 
really critical to our searches is the WordDelimiter filer.    

My current index time filter config (which I believe has bee unchanged for me 
for 5+ years):
 <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" 
generateWordParts="1"
                        generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0"/>

Does anyone have any suggestions deal with this?   Perhaps limiting certain 
options will always produce tokens in order?

Thanks

Trey Hyde 
Director of Engineering
Email th...@centraldesktop.com

Central Desktop. Work together in ways you never thought possible. 
Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Google+  |  
Blog

PostingsHighlighter and analysis

Reply via email to