I have configured WordDelimiterFilterFactory for custom tokenizers for '&' and '-' , and for few tokenizer (like . _ :) we need to split on boundries only.
e.g. test.com (should tokenized to test.com) newyear. (should tokenized to newyear) new_car (should tokenized to new_car) .. .. Below is defination for text field <fieldType name="text_general_preserved" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" /> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange ="0" splitOnNumerics ="0" stemEnglishPossessive ="0" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="0" protected="protwords_general.txt" types="wdfftypes_general.txt" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" /> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange ="0" splitOnNumerics ="0" stemEnglishPossessive ="0" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="0" protected="protwords_general.txt" types="wdfftypes_general.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> below is wdfftypes_general.txt content & => ALPHA - => ALPHA _ => SUBWORD_DELIM : => SUBWORD_DELIM . => SUBWORD_DELIM types can be used in worddelimiter are LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM . there's no description available for use of each type. as per name, i thought type SUBWORD_DELIM may fulfill my need, but it doesn't seem to work. Can anybody suggest me how can i set configuration for worddelimiter factory to fulfill my requirement. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-WordDelimiterFactory-with-Custom-Tokenizer-to-split-only-on-Boundires-tp4058557.html Sent from the Solr - User mailing list archive at Nabble.com.