I am using Solr 4.7 and have got a serious problem with WordDelimiterFilterFactory.
WordDelimiterFilterFactory behaves different on hyphenated terms if they contain charaters (a-Z) or characters AND numbers. Splitting up hyphenated terms is deactivated in my configuration: *This is the fieldType setup from my schema:* {code} <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="0" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="lang/synonyms_de.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType> {code} The given search term is: *X-002-99-495* WordDelimiterFilterFactory indexes the following word parts: * X-002-99-495 * X (shouldn't be there) * 00299495 (shouldn't be there) * X00299495 But the 'X' should not be indexed or queried as a single term. You can see that splitting is completely deactivated in the schema. I can move the charater part around in the search term: Searching for *002-abc-99-495* gives me * 002-abc-99-495 * 002 (shouldn't be there) * abc (shouldn't be there) * 99495 (shouldn't be there) * 002abc99495 Searching for Searching for *002-99-495* (no character) gives me * 002-99-495 * 00299495 This result is what I would expect. Any ideas?