Re: Analysis of Japanese characters

Tom Burton-West Thu, 03 Apr 2014 10:59:30 -0700

Hi Shawn,

>>For an input of 田中角栄 the bigram filter works like you described, and what
I would expect.  If I add a space at the point where the ICU >>tokenizer
would have split them anyway, the bigram filter output is very different.


If I'm understanding what you are reporting, I suspect this is behavior as
designed.   My guess is that the bigram filter figures that if there was
space in the original input (to the whole filter chain), it should not
create a bigram across it.

Tom

BTW: if you can show a few examples of Japanese queries the show the
original problem  and the reason its a problem (without of course showing
anything proprietary), I'd love to see them.  I'm always interested in
learning more about Japanese query processing.

Re: Analysis of Japanese characters

Reply via email to