This does not address the question. A single-ideogram query will not find ideograms in the middle of phrases.
I have also found that phrase slop does not work with bigrams. At all. I created a separate field type with unigrams. The CJK fields use the StandardAnalyzer. I made a stack with just the SA which gives raw Euro text and single terms for CJK ideograms. This worked well for direct phrase and phrase slop queries. You should use both kinds of fields- the bigram search helps boost similar phrases. You should also try the SmartChineseAnalyzer and new Japanese analyzer suite. I've discovered that CJK search is a very tricky thing, and different use cases like different strategies. On Fri, Apr 27, 2012 at 10:57 AM, Walter Underwood <wun...@wunderwood.org> wrote: > Bigrams across character types seems like a useful thing, especially for > indexing adjective and verb endings. > > An n-gram approach is always going to generate a lot of junk along with the > gold. Tighten the rules and good stuff is missed, guaranteed. The only way to > sort it out is to use a tokenizer with some linguistic rules. > > wunder > > On Apr 27, 2012, at 10:43 AM, Burton-West, Tom wrote: > >> I have a few questions about the CJKBigram filter. >> >> About 10% of our queries that contain Han characters are single character >> queries. It looks like the CJKBigram filter only outputs single characters >> when there are no adjacent bigrammable characters in the input. This means >> we would have to create a separate field to index Han unigrams in order to >> address single character queries. Is this correct? >> >> For Japanese, the default settings form bigrams across character types. So >> for a string containing Hiragana and Han characters bigrams containing a >> mixture of Hiragana and Han characters are formed: >> いろは革命歌 => “いろ” ”ろは“ “は革” ”革命” “命歌” >> >> Is there a way to specify that you don’t want bigrams across character types? >> >> Tom >> >> Tom Burton-West >> Digital Library Production Service >> University of Michigan Library >> >> http://www.hathitrust.org/blogs/large-scale-search >> > > > > > -- Lance Norskog goks...@gmail.com