https://bugs.kde.org/show_bug.cgi?id=465305
Bug ID: 465305 Summary: character counts are wrong when text includes emojis; each counted as two (2) Classification: I don't know Product: kde Version: unspecified Platform: Other OS: Other Status: REPORTED Severity: normal Priority: NOR Component: general Assignee: unassigned-b...@kde.org Reporter: kdeb...@toeai.com Target Milestone: --- SUMMARY Product/Component unknown; sorry. Observed in kate and konsole, so probably affects something they both depend on. STEPS TO REPRODUCE (kate) 1. In kate, enter an emoji, e.g. 😊 2. Move cursor back and forth from before to after cursor. 3. Look at line:column indicator at bottom. OBSERVED RESULT (kate) Column jumps by two for a single character. EXPECTED RESULT (kate) Column should increase by only one per character. STEPS TO REPRODUCE (konsole) 1. In kate, copy and paste the emoji until you have OVER 4000 (e.g. 4001). (Remember that the column number will say 8003 at the end of a line with 4001 emojis.) 2. Select them all and copy to clipboard. 3. In konsole, run 'python3'. Then type: len(""" 4. Press Ctrl+Shift+V (or go to Edit, Paste; or right-click and select Paste). OBSERVED RESULT (konsole) It will ask you if you want to paste X number of characters (e.g. 8002) instead of the correct number (e.g. 4001). Answer 'yes'. Then complete the python expression with: """) and hit enter. The correct number of characters (e.g. 4001) is displayed. EXPECTED RESULT (konsole) It should count the characters correctly, not double-count them. SOFTWARE/OS VERSIONS Kubuntu 22.10 KDE Plasma Version: 5.25.5 KDE Frameworks Version: 5.98.0 Qt Version: 5.15.6 Kate 22.08.2 Konsole 22.08.2 ADDITIONAL INFORMATION For casual users, the number of characters may not really matter, but for people like me who do programming or work on data projects, I need to know correct character counts, and not be wondering where did X number of characters go or where did X number of characters magically come from. If it's a single Unicode code point (e.g. U+1F60A) then it needs to be treated as just one character, regardless of how many bytes it might require to encode in a particular encoding. The whole point of working with text instead of bytes is that you can work with characters, not worrying about how things are encoded under the hood. -- You are receiving this mail because: You are watching all bug changes.