My analysis chain includes CJKBigramFilter on both the index and query.
I have outputUnigrams enabled on the index side, but it is disabled on
the query side. This has resulted in a problem with phrase queries.
This is a subset of my index analysis for the three terms you can see in
the ICUNF step, separated by spaces:
https://www.dropbox.com/s/9q1x9pdbsjhzocg/bigram-position-problem.png
Note that in the CJKBF step, the second unigram is output at position 2,
pushing the english terms to 3 and 4.
When the customer phrase filter query (lucene query parser) for the
first two terms on this specific field, it doesn't match, because the
query analysis doesn't output the unigrams and therefore the positions
don't match.
I would have expected both unigrams to be at position 1. Is this a bug
or expected behavior?
Thanks,
Shawn