pshinjo added a comment.

  First of all, thanks for the work. Some things are passing through my mind, 
especially regarding classical Hangul and half-completed Hangul characters.
  
  1. U+AC00 .. U+D7AF - no problems in mapping a single QChar to one Hangul 
character
  2. U+3130 .. U+318F - same as Hangul Syllables block (single QChar to one 
Hangul character) as characters in this range are non-combining
  3. Hangul Jamo, Hangul Jamo Extended-A, B (U+1100.. U+11FF, U+A960 .. U+A97F, 
U+D7B0 .. U+D7FF) - here is the tricky part, as what users will see as a single 
"Hangul character" is not always a single "QChar".
  
  Let's take an example of '나랏말ᄊᆞ미'. The 'ᄊᆞ' part may be seen as a single 
character if the rendering font combines U+110A and U+119E. This and other 
classical Hangul characters can't be "normalized" into a single Unicode code 
point/QChar, so as half-completed characters (cho+jong, jung+jong). If the 
underlying font is not combining those two (e.g. the font is not supporting 
classical Hangul) then users will think that as two separate characters, 
otherwise one single character. If we can get the font information here then 
the statistics may follow how the font is rendering these characters (two or 
one). If not, KS X 1026-1 [1] could be used as a guideline on determining the 
boundary of a single character.
  
  Have you checked how other word processors are handling this issue? We can 
also build some test cases around this too.
  
  [1] http://www.unicode.org/L2/L2008/08225-n3422.pdf

REPOSITORY
  R8 Calligra

REVISION DETAIL
  https://phabricator.kde.org/D21553

To: daehyuns, Calligra-Devel-list
Cc: jachin, pshinjo, hein, dcaliste, cochise, vandenoever

Reply via email to