My issue is with the use of WordDelimiterFilter and how the QueryParser (Dismax) converts the query into a MultiPhraseQuery.
This is on solr 1.3 / lucene 2.4.1. For example: 1. yuma -> 3:10 to Yuma 2. yUma -> no results For #2 it gets split into y + uma and becomes a MultiPhraseQuery requiring both terms thus no results vs. requiring either one with a preference on both (or a preference on joining the terms or at least an OR query). 1. joker-man -> Joker-Man Goes For Gold 2. joKerman -> no results 3. jo-kerman -> no results 1. prom night -> Prom Night 2. PromNight -> Prom Night 3. promnight -> no results 4. pRomnIght -> no results Is there a way to configure this behavior. I need to support all the above use-cases. I have a brute force solution using a copyField and a non-WordDelimiterFilter analyzer (whitespacetoken, lowercase, patternreplace punctuation, edgengram) and basically drop into solrconfig.xml a 2nd field for this (titleNameSubstring2). Those two combined is pretty much what I need, but that costs a memory hit + performance hit whereas some tuning to avoid MultiPhraseQuery would be a better fit. Here are the schema.xml + solrconfig.xml bits that are not working. [schema.xml] <fieldType name="textSubstring" class="solr.TextField" positionIncrementGap="100" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="12"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> [solrconfig.xml] <requestHandler name="stuff_title" class="solr.SearchHandler" > <lst name="invariants"> <str name="defType">dismax</str> <str name="echoParams">explicit</str> <str name="sort">score desc</str> <str name="qf"> titleNameSubstring^200.0 </str> <str name="pf"> titleNameSubstring^2.0 </str> <str name="bf"> product(releaseYear,0.1) </str> <str name="mm">1</str> </lst> <lst name="appends"> <str name="fq">searchable:true</str> </lst> </requestHandler> Any ideas? -netcam