Ralf,

Not sure if you got this working or not, but perhaps a simple solution is 
changing the default boolean operator from OR to AND.

Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 




________________________________
From: "Kraus, Ralf | pixelhouse GmbH" <r...@pixelhouse.de>
To: solr-user@lucene.apache.org
Sent: Friday, February 6, 2009 6:23:51 PM
Subject: Need help with DictionaryCompoundWordTokenFilterFactory

Hi,

Now I ran into another problem by using the 
solr.DictionaryCompoundWordTokenFilterFactory :-(
If I search for the german word "Spargelcremesuppe" which contains "Spargel", 
"Creme" and "Suppe" SOLR will find way to many result.
Its because SOLR finds EVERY entry with either one of the three words in it :-(

Here is my schema.xml

      <fieldType name="text_text" class="solr.TextField" 
positionIncrementGap="100">
          <analyzer>
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              <filter class="solr.DictionaryCompoundWordTokenFilterFactory"
                              dictionary="dictionary.txt"
                              minWordSize="5"
                              minSubwordSize="2"
                              maxSubwordSize="15"
                              onlyLongestMatch="true" />
              <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
              <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
              <filter class="solr.LowerCaseFilterFactory"/>
              <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
              <filter class="solr.SnowballPorterFilterFactory" 
language="German" />
          </analyzer>
      </fieldType>

Any help ?

Greets,

Ralf Kraus

Reply via email to