Speaking from experience: if you are using bigrams for CJK, do not highlight. 
The results will look very wrong to someone who knows the language.

Even with a dictionary-based tokenizer, you'll need a client dictionary for 
local terms.

wunder

On Jan 2, 2013, at 10:51 AM, Tom Burton-West wrote:

> Hello all,
> 
> What are the best practices for setting up the highlighter to work with CJK?
> We are using the ICUTokenizer with the CJKBigramFilter, so overlapping
> bigrams are what are actually being searched. However the highlighter seems
> to only highlight the first of any two overlapping bigrams.   i.e.  ABC =>
> searched as AB BC  only AB gets highlighted even if the matching string is
> ABC. (Where ABC are chinese characters such as 大亚湾  => searched as 大亚 亚湾,
> but only   大亚 is highlighted rather than 大亚湾)
> 
> Is there some highlighting parameter that might fix this?
> 
> Tom Burton-West




Reply via email to