On Jan 7, 2008 5:15 PM, Benjamin Higgins <[EMAIL PROTECTED]> wrote: > Hi all, I am using a mostly out-of-the-box install of Solr that I'm > using to search through our code repositories. I've run into a funny > problem where searches for text that is camelCased aren't returning > results unless the casing is exactly the same. > > For example, a query for "getElementById" returns 364 results, but > "getelementbyid" returns 0. > > There isn't a problem with all casings, though. For example, "function" > and "Function" both return the same number of results, as does > "FUNCTION" and "FUNCtion" (6,278 with my docs). However, "funcTION" > returns only a few results--and it's where the word is actually split up > (e.g. "func tion")! > > So it seems that something may be tokenizing words where casing appears > in the middle of them! > > How can I get this to stop?
remove WordDelimiterFilter. It's funny though, since WordDelimiterFilter should not have caused this to happen (a query of getelementbyid should have matched a doc with getElementById). -Yonik > Thanks! > > Ben > > > Here's the definition for the text field type in my schema.xml: > > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > >