Could someone help me to understand the differences between TokenizerFactory, 
Tokenizer, & Analyzer?

Specifically, I'm interested in implementing auto-complete for tags that could 
contain both English & Chinese. I read this article 
(http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/).
 In the article KeywordTokenizerFactory is used as tokenizer. I thought I'd try 
replacing that with CJKTokenizer. 2 questions:

1) KeywordTokenizerFactory seems to be a "tokenizer factory" while CJKTokenizer 
seems to be just a tokenizer. Are they the same type of things at all? 
Could I just replace 
<tokenizer class="solr.KeywordTokenizerFactory"/>
with
<tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
??

2) I'm also interested in trying out SmartChineseAnalyzer 
(http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html)
However SmartChineseAnalyzer doesn't offer a separate tokenizer. It's just an 
analyzer and that's it. How do I use it in Solr?

Thanks.
Andy


      

Reply via email to