Did you read through the CJK article series? Maybe there is something in there? http://discovery-grindstone.blogspot.com/2013/10/cjk-with-solr-for-libraries-part-1.html
Sorry, no help on actual Japanese. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 18, 2014 at 12:50 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 4/10/2014 11:53 AM, Shawn Heisey wrote: >> My analysis chain includes CJKBigramFilter on both the index and query. >> I have outputUnigrams enabled on the index side, but it is disabled on >> the query side. This has resulted in a problem with phrase queries. >> This is a subset of my index analysis for the three terms you can see in >> the ICUNF step, separated by spaces: >> >> https://www.dropbox.com/s/9q1x9pdbsjhzocg/bigram-position-problem.png >> >> Note that in the CJKBF step, the second unigram is output at position 2, >> pushing the english terms to 3 and 4. >> >> When the customer phrase filter query (lucene query parser) for the >> first two terms on this specific field, it doesn't match, because the >> query analysis doesn't output the unigrams and therefore the positions >> don't match. >> >> I would have expected both unigrams to be at position 1. Is this a bug >> or expected behavior? > > It's been a week with no reply. > > First I worked around this problem by disabling outputUnigrams on the > index side, to match the query side. At that point, the customer was > unable to do a searches for a single character and find longer strings > containing that character. I knew this would happen ... I did tell our > project manager, but I do not know whether it was communicated to the > customer. > > Then I tried setting outputUnigrams to true on both index and query. > Just as I had anticipated, the customer was unhappy with getting results > where a "word" containing only one character of their multi-character > search string was present. > > Re-stating the underlying problem and my question: > > The outputUnigrams option sets one of the unigrams from each bigram to > the same position as the bigram, but then puts the other one at the next > position, breaking phrase queries. This sounds like a bug. Is it a > bug? If not, I would REALLY like a config option to produce the > behavior that I expected. > > Thanks, > Shawn >