https://bugs.kde.org/show_bug.cgi?id=463848
Bug ID: 463848 Summary: KDE Text Encoding for Korean (applies to KWrite and SubtitleComposer in Flatpaks) Classification: I don't know Product: kde Version: unspecified Platform: Ubuntu OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: general Assignee: unassigned-b...@kde.org Reporter: j_j_chiare...@posteo.net Target Milestone: --- SUMMARY The text encoding for Korean is broken/wrong on KDE software like KWrite and SubtitleComposer. The "Save As with Encoding ... EUC-KR" is actually not EUC-KR, but Unified Hangul Code/Windows-949/CP 949. The "Save As with Encoding ... CP 949" just corrupts every single non-ASCII character. STEPS TO REPRODUCE 1. Create a text file in Unicode (UTF-8), which is the default. 2. Insert Korean Hangul text like `로씨써쑤쪼뢔쌰쎼쓔쬬` 3a. Save As with Encoding ... EUC-KR or 3b. Save As with Encoding ... CP 949 OBSERVED RESULT With EUC-KR, all of the characters `로씨써쑤쪼뢔쌰쎼쓔쬬` are present in the file. (`로씨써쑤쪼` are in EUC-KR, but `뢔쌰쎼쓔쬬` are only theoretically possible but are *not* in EUC-KR.) With CP 949 all of the characters `로씨써쑤쪼뢔쌰쎼쓔쬬` become `??????????`. (`로씨써쑤쪼뢔쌰쎼쓔쬬` *are* all in Windows-949/CP 949/UHC.) EXPECTED RESULT With EUC-KR, the characters `로씨써쑤쪼뢔쌰쎼쓔쬬` should become `로씨써쑤쪼?????` or `로씨써쑤쪼` because `로씨써쑤쪼` *are* in EUC-KR, but `뢔쌰쎼쓔쬬` are *not* in EUC-KR, despite being theoretically possible arrangements of letters into pre-composed blocks. With CP 949, the characters `로씨써쑤쪼뢔쌰쎼쓔쬬` should all be preserved as `로씨써쑤쪼뢔쌰쎼쓔쬬` because *all* are in CP 949/Windows-949/UHC. SOFTWARE/OS VERSIONS Latest Flatpak as of 2022-12-31, running on Linux (Ubuntu) ADDITIONAL INFORMATION EUC-KR *does* have `로씨써쑤쪼`, but it does *not* have `뢔쌰쎼쓔쬬` or `낥` or several other theoretical possibilities. In Korean, one types letters to form blocks. `낥` is theoretically possible. One just types `ㄴ` and `ㅏ` and `ㄹ` and `ㅌ`. Then, the IME assembles these into the block `낥` and the computer saves this block as a pre-composed block in Unicode. However, this syllable `낥` never occurs in any native or borrowed words. It is *not* in EUC-KR. The English/Latin script equivalent is writing "igloo" as "ig" and "loo" in pre-composed blocks. Korean usually uses its own alphabet, but with letters arranged into monospaced blocks by morpho-phonemic syllable. (Unicode also does have combining individual letters. It can store `낥` as four code points: `combining ㄴ` and `combining ㅏ` and `combining ㄹ` and combining `ㅌ`. However, Unicode included pre-composed blocks for the sake of round-trip conversion, and no IME has ever moved away from pre-composed blocks. In other words, you will always see the pre-composed blocks in real-life text.) To correct this deficiency, Microsoft added *all* possible pre-composed Hangul blocks to a new encoding style. The cost was sacrificing true ASCII compatibility. This encoding, like Shift JIS and others, can have an ASCII byte (0xxxxxxx) as a sole byte (an ASCII character) or as the trailing byte in a two-byte character. Microsoft called its new encoding "Windows-949" or "Code Page 949" or "Unified Hangul Code (UHC)." This price was worth it to ensure that a typo character (`낥` instead of `날`) would not be lost. UTF-8 everywhere is the way to go, of course. Still, many of us need to work with the legacy encodings, especially with smart TVs. (Smart TVs and players only seem to support some form of ISO-8859-# or a variable 1-2-byte encoding.) KDE's KWrite and SubtitleComposer as of now do use Windows-949/CP 949/UHC, but the menu option is erroneously titled `EUC-KR`. There is a menu option for CP 949 that does not work at all. This is confusing. SUGGESTION 1. Change the behavior of the menu entry that says `EUC-KR` so that it behaves as expected and rejects characters like `낥`. 2. Make the menu entry that says `CP 949` just do what the menu entry called `EUC-KR` does right now. OR ... 1. Change the menu entry that currently and erroneously says `EUC-KR` so that it will say `EUC-KR (Windows)` or `CP 949` or `Windows-949` or `UHC`. 2. Remove the broken menu entry that currently and erroneously claims to support CP 949. 3. Forget about true `EUC-KR` support on saving. -- You are receiving this mail because: You are watching all bug changes.