Could someone help me to understand the differences between TokenizerFactory, Tokenizer, & Analyzer?
Specifically, I'm interested in implementing auto-complete for tags that could contain both English & Chinese. I read this article (http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/). In the article KeywordTokenizerFactory is used as tokenizer. I thought I'd try replacing that with CJKTokenizer. 2 questions: 1) KeywordTokenizerFactory seems to be a "tokenizer factory" while CJKTokenizer seems to be just a tokenizer. Are they the same type of things at all? Could I just replace <tokenizer class="solr.KeywordTokenizerFactory"/> with <tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/> ?? 2) I'm also interested in trying out SmartChineseAnalyzer (http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html) However SmartChineseAnalyzer doesn't offer a separate tokenizer. It's just an analyzer and that's it. How do I use it in Solr? Thanks. Andy