I have a field defined as:
    <field name="content" type="text" indexed="true" stored="false"
termVectors="true" multiValued="true" />
where "text" is unmodified from the schema.xml example that came with Solr
1.4.1.
I have documents with some compound words indexed, words like Sandstone. And
in several cases words that are camel case like MaxSize. If I query using
all lower case, sandstone or maxsize, I get the documents I expect. If I
query with proper case, ie. Sandstone or Maxsize I get the documents I
expect. However, if the query is camel case, MaxSize or SandStone, it
doesn't find the documents. In the case of MaxSize it is particularly
frustrating because that is the actual case of the word that was indexed. Is
this expected behavior?  The query analyzer definition the the "text" field
type is:
<analyzer type="query"> 
  <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
  <filter class="solr.SynonymFilterFactory" ignoreCase="true" expand="true"
synonyms="synonyms.txt"/> 
  <filter class="solr.StopFilterFactory" enablePositionIncrements="true"
words="stopwords.txt" ignoreCase="true"/> 
  <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1"
catenateAll="0" catenateNumbers="0" catenateWords="0"
generateNumberParts="1" generateWordParts="1"/> 
  <filter class="solr.LowerCaseFilterFactory"/> 
  <filter language="English" class="solr.SnowballPorterFilterFactory"
protected="protwords.txt"/> 
</analyzer>

Is the order by the filters important? If LowerCaseFilterFactory came before
WordDelimiterFilterFactory, would that fix this? Would it break something
else?

Thanks,
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Compound-word-search-not-what-I-expected-tp3036089p3036089.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to