I'm jumping in the middle of the thread here. CJK = Chinese, Japanese, Korean German = etwas ganz anderes Why are you trying to use CJKAnalyzer+Tokenizer for German? Have you tried German Analyzer from Lucene contrib?
Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Xuesong Luo <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, June 22, 2007 8:54:37 AM Subject: RE: add CJKTokenizer to solr Thanks, Toru and Chris, I tried both the CJKTokenizer and CJKAnalyzer. Both return some unexpected highlight results when I tested with Germany. The field value I searched is "Ein Mann beißt den Hund". The search criteria is beißt. When using CJKAnalyzer, beißt is treated as 2 single terms(bei and ß) the highlight result is: <str>Ein Mann <em>bei</em><em>ß</em>t den Hund</str> When using CJKTokenizer, beißt is treated as 3 single terms, the result is: <str>Ein Mann <em>bei</em><em>ß</em><em>t</em> den Hund</str> When using standard tokenizer, beißt is treated as a word, the result is: <str>Ein Mann <em>beißt</em> den Hund</str> I understand why the standard tokenizer treat beißt as a word, but don't know how CJKAnalyzer and CJKAnalyzer work, could anyone explain a little bit? Thanks Xuesong -----Original Message----- From: Toru Matsuzawa [mailto:[EMAIL PROTECTED] Sent: Monday, June 18, 2007 10:29 PM To: solr-user@lucene.apache.org Subject: Re: add CJKTokenizer to solr I'm sorry. Because it was not possible to append it, it sends it again. > > I got the error below after adding CJKTokenizer to schema.xml. I > > checked the constructor of CJKTokenizer, it requires a Reader parameter, > > I guess that's why I get this error, I searched the email archive, it > > seems working for other users. Does anyone know what is the problem? > > > CJKTokenizerFactory that I am using is appended. > -- package org.apache.solr.analysis.ja; import java.io.Reader; import org.apache.lucene.analysis.cjk.CJKTokenizer ; import org.apache.lucene.analysis.TokenStream; import org.apache.solr.analysis.BaseTokenizerFactory; /** * CJKTokenizer for Solr * @see org.apache.lucene.analysis.cjk.CJKTokenizer * @author matsu * */ public class CJKTokenizerFactory extends BaseTokenizerFactory { /** * @see org.apache.solr.analysis.TokenizerFactory#create(Reader) */ public TokenStream create(Reader input) { return new CJKTokenizer( input ); } } -- Trou Matsuzawa