You can also use the types attribute to change the type of specific characters, such as to treat the "!" or "&" as an <ALPHA>.
-- Jack Krupansky On Tue, Jul 21, 2015 at 7:43 PM, Sathiya N Sundararajan <ausat...@gmail.com> wrote: > Upayavira, > > thanks for the helpful suggestion, that works. I was looking for an option > to turn off/circumvent that particular WordDelimiterFilter's behavior > completely. Since our indexes are hundred's of Terabytes, every time we > find a term that needs to be added, it will be a cumbersome process to > reload all the cores. > > > thanks > > On Tue, Jul 21, 2015 at 12:57 AM, Upayavira <u...@odoko.co.uk> wrote: > > > Looking at the javadoc for the WordDelimiterFilterFactory, it suggests > > this config: > > > > <fieldType name="text_wd" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.WordDelimiterFilterFactory" > > protected="protectedword.txt" > > preserveOriginal="0" splitOnNumerics="1" > > splitOnCaseChange="1" > > catenateWords="0" catenateNumbers="0" catenateAll="0" > > generateWordParts="1" generateNumberParts="1" > > stemEnglishPossessive="1" > > types="wdfftypes.txt" /> > > </analyzer> > > </fieldType> > > > > Note the protected="xxxxx" attribute. I suspect if you put Yahoo! into a > > file referenced by that attribute, it may survive analysis. I'd be > > curious to hear whether it works. > > > > Upayavira > > > > On Tue, Jul 21, 2015, at 12:51 AM, Sathiya N Sundararajan wrote: > > > Question about WordDelimiterFilter. The search behavior that we > > > experience > > > with WordDelimiterFilter satisfies well, except for the case where > there > > > is > > > a special character either at the leading or trailing end of the term. > > > > > > For instance: > > > > > > *‘d&b’ * —> Works as expected. Finds all docs with ‘d&b’. > > > *‘p!nk’* —> Works fine as above. > > > > > > But on cases when, there is a special character towards the trailing > end > > > of > > > the term, like ‘Yahoo!’ > > > > > > *‘yahoo!’* —> Turns out to be a search for just *‘yahoo’* with the > > > special > > > character *‘!’* stripped out. This WordDelimiterFilter behavior is > > > documented > > > > > > http://lucene.apache.org/core/4_6_0/analyzers-common/index.html?org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html > > > > > > What I would like to have is, the search performed without stripping > out > > > the leading & trailing special character. Is there a way to achieve > this > > > behavior with WordDelimiterFilter. > > > > > > This is current config that we have for the field: > > > > > > <fieldType name="text_wdf" class="solr.TextField" > > > positionIncrementGap="100"> > > > <analyzer type="index"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > > > <filter class="solr.WordDelimiterFilterFactory" > > > splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" > > > catenateWords="0" catenateNumbers="0" catenateAll="0" > > > preserveOriginal="1" > > > types="specialchartypes.txt"/> > > > <filter class="solr.LowerCaseFilterFactory" /> > > > </analyzer> > > > <analyzer type="query"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > > > <filter class="solr.WordDelimiterFilterFactory" > > > splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" > > > catenateWords="0" catenateNumbers="0" catenateAll="0" > > > preserveOriginal="1" > > > types="specialchartypes.txt"/> > > > <filter class="solr.LowerCaseFilterFactory" /> > > > </analyzer> > > > </fieldType> > > > > > > > > > thanks > > >