debug=timing has told me for a very long time that 99% of my query time for
slow queries is in the highlighting component so I've been eagerly awaiting the
postingshighlighter for quite some time. Mean query times 50ms or less, with
certain queries able to generate > 30s worth of highlighting. Now that it's
here I've been somewhat disappointed since I can't use it since so many common
analyzers emit tokens out of order, which, apparently is not compatible with
storeOffsetsWithPositions.
The only analyzer that is in the "bad" list according to LUCENE-4641 that is
really critical to our searches is the WordDelimiter filer.
My current index time filter config (which I believe has bee unchanged for me
for 5+ years):
<filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1"
generateWordParts="1"
generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
Does anyone have any suggestions deal with this? Perhaps limiting certain
options will always produce tokens in order?
Thanks
Trey Hyde
Director of Engineering
Email [email protected]
Central Desktop. Work together in ways you never thought possible.
Connect with us Website | Twitter | Facebook | LinkedIn | Google+ |
Blog