My analysis chain includes CJKBigramFilter on both the index and query. I have outputUnigrams enabled on the index side, but it is disabled on the query side. This has resulted in a problem with phrase queries. This is a subset of my index analysis for the three terms you can see in the ICUNF step, separated by spaces:

https://www.dropbox.com/s/9q1x9pdbsjhzocg/bigram-position-problem.png

Note that in the CJKBF step, the second unigram is output at position 2, pushing the english terms to 3 and 4.

When the customer phrase filter query (lucene query parser) for the first two terms on this specific field, it doesn't match, because the query analysis doesn't output the unigrams and therefore the positions don't match.

I would have expected both unigrams to be at position 1. Is this a bug or expected behavior?

Thanks,
Shawn

Reply via email to