Sent from Windows Mail
Speaking from experience: if you are using bigrams for CJK, do not highlight.
The results will look very wrong to someone who knows the language.
Even with a dictionary-based tokenizer, you'll need a client dictionary for
local terms.
wunder
On Jan 2, 2013, at 10:51 AM, Tom Burton-West wrote:
Hello all,
What are the best practices for setting up the highlighter to work with CJK?
We are using the ICUTokenizer with the CJKBigramFilter, so overlapping
bigrams are what are actually being searched. However the highlighter seems
to only highlight the first of any two overlapping bigrams. i