Hi all, I am using a mostly out-of-the-box install of Solr that I'm using to search through our code repositories. I've run into a funny problem where searches for text that is camelCased aren't returning results unless the casing is exactly the same.
For example, a query for "getElementById" returns 364 results, but "getelementbyid" returns 0. There isn't a problem with all casings, though. For example, "function" and "Function" both return the same number of results, as does "FUNCTION" and "FUNCtion" (6,278 with my docs). However, "funcTION" returns only a few results--and it's where the word is actually split up (e.g. "func tion")! So it seems that something may be tokenizing words where casing appears in the middle of them! How can I get this to stop? Thanks! Ben Here's the definition for the text field type in my schema.xml: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>