Hi, Shawn Thank you for replying me.
> CJKBigramFilter shouldn't care what tokenizer you're using. It should > work with any tokenizer. What problem are you seeing that you're trying > to solve? What version of Solr, what configuration, and what does it do > that you're not expecting, and what do you want it to do? I am sorry for lack of information. I tried this with Solr 5.5.5 and 7.5.0. And here is analyzer configuration from my managed-schema. <fieldType name="text_classic" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.CJKBigramFilterFactory"/> </analyzer> </fieldType> And what I want to do is 1. to create CJ bigram token 2. to extract each word that contains a hyphen and stopwords as a single token (e.g. as-is, to-be, etc...) from CJK and English sentences. CJKBigramFilter seems to check TOKEN_TYPES attribute added by StandardTokenizer when creating CJK bigram token. (See https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKBigramFilter.java#L64 ) ClassicTokenizer also adds obsolete TOKEN_TYPES "CJ" to the CJ token and "ALPHANUM" to the Korean alphabet, but both are not targets for CJKBigramFilter... Thanks, Yasufumi 2018年10月2日(火) 0:05 Shawn Heisey <apa...@elyograg.org>: > On 9/30/2018 10:14 PM, Yasufumi Mizoguchi wrote: > > I am looking for the way to create CJK bigram tokens with > ClassicTokenizer. > > I tried this by using CJKBigramFilter, but it only supports for > > StandardTokenizer... > > CJKBigramFilter shouldn't care what tokenizer you're using. It should > work with any tokenizer. What problem are you seeing that you're trying > to solve? What version of Solr, what configuration, and what does it do > that you're not expecting, and what do you want it to do? > > I don't have access to the systems where I was using that filter, but if > I recall correctly, I was using the whitespace tokenizer. > > Thanks, > Shawn > >