I have configured WordDelimiterFilterFactory for custom tokenizers for '&'
and '-' , and for few tokenizer (like . _ :) we need to split on boundries
only.
e.g.
test.com (should tokenized to test.com)
newyear. (should tokenized to newyear)
new_car (should tokenized to new_car)
..
..
Below is defination for text field
<fieldType name="text_general_preserved" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="false" />
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange ="0"
splitOnNumerics ="0"
stemEnglishPossessive ="0"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
preserveOriginal="0"
protected="protwords_general.txt"
types="wdfftypes_general.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="false" />
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange ="0"
splitOnNumerics ="0"
stemEnglishPossessive ="0"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
preserveOriginal="0"
protected="protwords_general.txt"
types="wdfftypes_general.txt"
/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
below is wdfftypes_general.txt content
& => ALPHA
- => ALPHA
_ => SUBWORD_DELIM
: => SUBWORD_DELIM
. => SUBWORD_DELIM
types can be used in worddelimiter are LOWER, UPPER, ALPHA, DIGIT,
ALPHANUM, SUBWORD_DELIM . there's no description available for use of each
type. as per name, i thought type SUBWORD_DELIM may fulfill my need, but it
doesn't seem to work.
Can anybody suggest me how can i set configuration for worddelimiter factory
to fulfill my requirement.
Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-WordDelimiterFactory-with-Custom-Tokenizer-to-split-only-on-Boundires-tp4058557.html
Sent from the Solr - User mailing list archive at Nabble.com.