Upayavira, thanks for the helpful suggestion, that works. I was looking for an option to turn off/circumvent that particular WordDelimiterFilter's behavior completely. Since our indexes are hundred's of Terabytes, every time we find a term that needs to be added, it will be a cumbersome process to reload all the cores.
thanks On Tue, Jul 21, 2015 at 12:57 AM, Upayavira <u...@odoko.co.uk> wrote: > Looking at the javadoc for the WordDelimiterFilterFactory, it suggests > this config: > > <fieldType name="text_wd" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > protected="protectedword.txt" > preserveOriginal="0" splitOnNumerics="1" > splitOnCaseChange="1" > catenateWords="0" catenateNumbers="0" catenateAll="0" > generateWordParts="1" generateNumberParts="1" > stemEnglishPossessive="1" > types="wdfftypes.txt" /> > </analyzer> > </fieldType> > > Note the protected="xxxxx" attribute. I suspect if you put Yahoo! into a > file referenced by that attribute, it may survive analysis. I'd be > curious to hear whether it works. > > Upayavira > > On Tue, Jul 21, 2015, at 12:51 AM, Sathiya N Sundararajan wrote: > > Question about WordDelimiterFilter. The search behavior that we > > experience > > with WordDelimiterFilter satisfies well, except for the case where there > > is > > a special character either at the leading or trailing end of the term. > > > > For instance: > > > > *‘d&b’ * —> Works as expected. Finds all docs with ‘d&b’. > > *‘p!nk’* —> Works fine as above. > > > > But on cases when, there is a special character towards the trailing end > > of > > the term, like ‘Yahoo!’ > > > > *‘yahoo!’* —> Turns out to be a search for just *‘yahoo’* with the > > special > > character *‘!’* stripped out. This WordDelimiterFilter behavior is > > documented > > > http://lucene.apache.org/core/4_6_0/analyzers-common/index.html?org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html > > > > What I would like to have is, the search performed without stripping out > > the leading & trailing special character. Is there a way to achieve this > > behavior with WordDelimiterFilter. > > > > This is current config that we have for the field: > > > > <fieldType name="text_wdf" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > > <filter class="solr.WordDelimiterFilterFactory" > > splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" > > catenateWords="0" catenateNumbers="0" catenateAll="0" > > preserveOriginal="1" > > types="specialchartypes.txt"/> > > <filter class="solr.LowerCaseFilterFactory" /> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > > <filter class="solr.WordDelimiterFilterFactory" > > splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" > > catenateWords="0" catenateNumbers="0" catenateAll="0" > > preserveOriginal="1" > > types="specialchartypes.txt"/> > > <filter class="solr.LowerCaseFilterFactory" /> > > </analyzer> > > </fieldType> > > > > > > thanks >