Hi,
Now I ran into another problem by using the
solr.DictionaryCompoundWordTokenFilterFactory :-(
If I search for the german word "Spargelcremesuppe" which contains
"Spargel", "Creme" and "Suppe" SOLR will find way to many result.
Its because SOLR finds EVERY entry with either one of the three words in
it :-(
Here is my schema.xml
<fieldType name="text_text" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter
class="solr.DictionaryCompoundWordTokenFilterFactory"
dictionary="dictionary.txt"
minWordSize="5"
minSubwordSize="2"
maxSubwordSize="15"
onlyLongestMatch="true" />
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"
language="German" />
</analyzer>
</fieldType>
Any help ?
Greets,
Ralf Kraus