[frameworks-baloo] [Bug 425020] baloosearch's search support for Chinese, Japanese and Korean is very weak.

bugzilla_noreply Thu, 06 Aug 2020 03:27:04 -0700

https://bugs.kde.org/show_bug.cgi?id=425020


--- Comment #17 from 2wxsy5823...@opayq.com ---
> ICU BreakIterators can be used to locate the following kinds of text 
> boundaries:
> 1. Character Boundary
> 2. Word Boundary
> 3. Line-break Boundary
> 4. Sentence Boundary

For Chinese and Japanese, I believe "Character Boundary" is applicable but
"Word Boundary" is not.

Since these two languages do not use spaces to separate words [1], I believe
word segmentation [2] is difficult unless dictionaries or AI are used.

Links:
[1] https://en.wikipedia.org/wiki/Word_divider
[2] https://en.wikipedia.org/wiki/Text_segmentation#Word_segmentation

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 425020] baloosearch's search support for Chinese, Japanese and Korean is very weak.

Reply via email to