Hi Shawn, >>For an input of 田中角栄 the bigram filter works like you described, and what I would expect. If I add a space at the point where the ICU >>tokenizer would have split them anyway, the bigram filter output is very different.
If I'm understanding what you are reporting, I suspect this is behavior as designed. My guess is that the bigram filter figures that if there was space in the original input (to the whole filter chain), it should not create a bigram across it. Tom BTW: if you can show a few examples of Japanese queries the show the original problem and the reason its a problem (without of course showing anything proprietary), I'd love to see them. I'm always interested in learning more about Japanese query processing.