Typo:   *even when the user delimits with a space. (e.g. base ball should find 
baseball). 

Thanks,
      From: Mike L. <javaone...@yahoo.com>
 To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> 
 Sent: Tuesday, April 7, 2015 9:05 AM
 Subject: DictionaryCompoundWordTokenFilterFactory - Dictionary/Compound-Words 
File
   

Solr User Group -

   I have a case where I need to be able to search against compound words, even 
when the user delimits with a space. (e.g. baseball => base ball).  I think 
I've solved this by creating a compound-words dictionary file containing the 
split words that I would want DictionaryCompoundWordTokenFilterFactory to split.
 base \n  
ball
I also applied in the synonym file the following rule: baseball => base ball  ( 
to allow baseball to also get a hit)
       <filter class="solr.DictionaryCompoundWordTokenFilterFactory" 
dictionary="compound-words.txt" minWordSize="5" minSubwordSize="2" 
maxSubwordSize="15" onlyLongestMatch="true"/>           
  
Two questions - If I could in advance figure out all the compound words I would 
want to split, would it be better (more reliable results) for me to maintain 
this compount-words file or would it be better to throw one of those open 
office dictionaries at it the filter?
Also - Any better suggestions to dealing with this problem vs the one I 
described using both the dictionary filter and the synonym rule?
Thanks in advance!
Mike



  

Reply via email to