Re: Need help with DictionaryCompoundWordTokenFilterFactory

Grant Ingersoll Sat, 07 Feb 2009 19:50:28 -0800

Sounds like you need some work on the analysis part. I would start byusing the Solr Admin Analysis tool and play around with your settingsfor that TokenFilter. Sounds too me like you might want a differentapproach to compound words. I'm not a German expert, so can't offertoo much there, but one thought that comes to mind is using phrases orngrams or if it is just that word, then put it in a protected wordslist.


-Grant


On Feb 6, 2009, at 5:23 AM, Kraus, Ralf | pixelhouse GmbH wrote:

Hi,
Now I ran into another problem by using thesolr.DictionaryCompoundWordTokenFilterFactory :-(If I search for the german word "Spargelcremesuppe" which contains"Spargel", "Creme" and "Suppe" SOLR will find way to many result.Its because SOLR finds EVERY entry with either one of the threewords in it :-(
Here is my schema.xml
<fieldType name="text_text" class="solr.TextField"positionIncrementGap="100">
          <analyzer>
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filterclass="solr.DictionaryCompoundWordTokenFilterFactory"
                              dictionary="dictionary.txt"
                              minWordSize="5"
                              minSubwordSize="2"
                              maxSubwordSize="15"
                              onlyLongestMatch="true" />
<filter class="solr.SynonymFilterFactory"synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory"ignoreCase="true" words="stopwords.txt"/>
              <filter class="solr.LowerCaseFilterFactory"/>
<filterclass="solr.RemoveDuplicatesTokenFilterFactory"/><filter class="solr.SnowballPorterFilterFactory"language="German" />
          </analyzer>
      </fieldType>

Any help ?

Greets,

Ralf Kraus


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika) using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Need help with DictionaryCompoundWordTokenFilterFactory

Reply via email to