Solr User Group -

   I have a case where I need to be able to search against compound words, even 
when the user delimits with a space. (e.g. baseball => base ball).  I think 
I've solved this by creating a compound-words dictionary file containing the 
split words that I would want DictionaryCompoundWordTokenFilterFactory to split.
 base \n  
ball
I also applied in the synonym file the following rule: baseball => base ball  ( 
to allow baseball to also get a hit)
       <filter class="solr.DictionaryCompoundWordTokenFilterFactory" 
dictionary="compound-words.txt" minWordSize="5" minSubwordSize="2" 
maxSubwordSize="15" onlyLongestMatch="true"/>           
  
Two questions - If I could in advance figure out all the compound words I would 
want to split, would it be better (more reliable results) for me to maintain 
this compount-words file or would it be better to throw one of those open 
office dictionaries at it the filter?
Also - Any better suggestions to dealing with this problem vs the one I 
described using both the dictionary filter and the synonym rule?
Thanks in advance!
Mike

Reply via email to