Solr User Group -
I have a case where I need to be able to search against compound words, even
when the user delimits with a space. (e.g. baseball => base ball). I think
I've solved this by creating a compound-words dictionary file containing the
split words that I would want DictionaryCompoundWordTokenFilterFactory to split.
base \n
ball
I also applied in the synonym file the following rule: baseball => base ball (
to allow baseball to also get a hit)
<filter class="solr.DictionaryCompoundWordTokenFilterFactory"
dictionary="compound-words.txt" minWordSize="5" minSubwordSize="2"
maxSubwordSize="15" onlyLongestMatch="true"/>
Two questions - If I could in advance figure out all the compound words I would
want to split, would it be better (more reliable results) for me to maintain
this compount-words file or would it be better to throw one of those open
office dictionaries at it the filter?
Also - Any better suggestions to dealing with this problem vs the one I
described using both the dictionary filter and the synonym rule?
Thanks in advance!
Mike