thanks for the suggestion Jack. We are already using @ and # as <ALPHA>, will see if it makes sense to go that route.
On Tue, Jul 21, 2015 at 4:52 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > You can also use the types attribute to change the type of specific > characters, such as to treat the "!" or "&" as an <ALPHA>. > > -- Jack Krupansky > > On Tue, Jul 21, 2015 at 7:43 PM, Sathiya N Sundararajan < > ausat...@gmail.com> > wrote: > > > Upayavira, > > > > thanks for the helpful suggestion, that works. I was looking for an > option > > to turn off/circumvent that particular WordDelimiterFilter's behavior > > completely. Since our indexes are hundred's of Terabytes, every time we > > find a term that needs to be added, it will be a cumbersome process to > > reload all the cores. > > > > > > thanks > > > > On Tue, Jul 21, 2015 at 12:57 AM, Upayavira <u...@odoko.co.uk> wrote: > > > > > Looking at the javadoc for the WordDelimiterFilterFactory, it suggests > > > this config: > > > > > > <fieldType name="text_wd" class="solr.TextField" > > > positionIncrementGap="100"> > > > <analyzer> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.WordDelimiterFilterFactory" > > > protected="protectedword.txt" > > > preserveOriginal="0" splitOnNumerics="1" > > > splitOnCaseChange="1" > > > catenateWords="0" catenateNumbers="0" catenateAll="0" > > > generateWordParts="1" generateNumberParts="1" > > > stemEnglishPossessive="1" > > > types="wdfftypes.txt" /> > > > </analyzer> > > > </fieldType> > > > > > > Note the protected="xxxxx" attribute. I suspect if you put Yahoo! into > a > > > file referenced by that attribute, it may survive analysis. I'd be > > > curious to hear whether it works. > > > > > > Upayavira > > > > > > On Tue, Jul 21, 2015, at 12:51 AM, Sathiya N Sundararajan wrote: > > > > Question about WordDelimiterFilter. The search behavior that we > > > > experience > > > > with WordDelimiterFilter satisfies well, except for the case where > > there > > > > is > > > > a special character either at the leading or trailing end of the > term. > > > > > > > > For instance: > > > > > > > > *‘d&b’ * —> Works as expected. Finds all docs with ‘d&b’. > > > > *‘p!nk’* —> Works fine as above. > > > > > > > > But on cases when, there is a special character towards the trailing > > end > > > > of > > > > the term, like ‘Yahoo!’ > > > > > > > > *‘yahoo!’* —> Turns out to be a search for just *‘yahoo’* with the > > > > special > > > > character *‘!’* stripped out. This WordDelimiterFilter behavior is > > > > documented > > > > > > > > > > http://lucene.apache.org/core/4_6_0/analyzers-common/index.html?org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html > > > > > > > > What I would like to have is, the search performed without stripping > > out > > > > the leading & trailing special character. Is there a way to achieve > > this > > > > behavior with WordDelimiterFilter. > > > > > > > > This is current config that we have for the field: > > > > > > > > <fieldType name="text_wdf" class="solr.TextField" > > > > positionIncrementGap="100"> > > > > <analyzer type="index"> > > > > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > > > > <filter class="solr.WordDelimiterFilterFactory" > > > > splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" > > > > catenateWords="0" catenateNumbers="0" catenateAll="0" > > > > preserveOriginal="1" > > > > types="specialchartypes.txt"/> > > > > <filter class="solr.LowerCaseFilterFactory" /> > > > > </analyzer> > > > > <analyzer type="query"> > > > > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > > > > <filter class="solr.WordDelimiterFilterFactory" > > > > splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" > > > > catenateWords="0" catenateNumbers="0" catenateAll="0" > > > > preserveOriginal="1" > > > > types="specialchartypes.txt"/> > > > > <filter class="solr.LowerCaseFilterFactory" /> > > > > </analyzer> > > > > </fieldType> > > > > > > > > > > > > thanks > > > > > >