[kde] [Bug 465305] New: character counts are wrong when text includes emojis; each counted as two (2)

bugzilla_noreply Sat, 04 Feb 2023 21:24:39 -0800

https://bugs.kde.org/show_bug.cgi?id=465305


            Bug ID: 465305
           Summary: character counts are wrong when text includes emojis;
                    each counted as two (2)
    Classification: I don't know
           Product: kde
           Version: unspecified
          Platform: Other
                OS: Other
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: general
          Assignee: unassigned-b...@kde.org
          Reporter: kdeb...@toeai.com
  Target Milestone: ---

SUMMARY
Product/Component unknown; sorry.  Observed in kate and konsole, so probably
affects something they both depend on.

STEPS TO REPRODUCE (kate)
1. In kate, enter an emoji, e.g. 😊
2. Move cursor back and forth from before to after cursor.
3. Look at line:column indicator at bottom.

OBSERVED RESULT (kate)
Column jumps by two for a single character.

EXPECTED RESULT (kate)
Column should increase by only one per character.

STEPS TO REPRODUCE (konsole)
1. In kate, copy and paste the emoji until you have OVER 4000 (e.g. 4001). 
(Remember that the column number will say 8003 at the end of a line with 4001
emojis.)
2. Select them all and copy to clipboard.
3. In konsole, run 'python3'.  Then type:
len("""
4. Press Ctrl+Shift+V (or go to Edit, Paste; or right-click and select Paste).

OBSERVED RESULT (konsole)
It will ask you if you want to paste X number of characters (e.g. 8002) instead
of the correct number (e.g. 4001).
Answer 'yes'.  Then complete the python expression with:
""")
and hit enter.
The correct number of characters (e.g. 4001) is displayed.

EXPECTED RESULT (konsole)
It should count the characters correctly, not double-count them.

SOFTWARE/OS VERSIONS
Kubuntu 22.10
KDE Plasma Version: 5.25.5
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.6
Kate 22.08.2
Konsole 22.08.2

ADDITIONAL INFORMATION
For casual users, the number of characters may not really matter, but for
people like me who do programming or work on data projects, I need to know
correct character counts, and not be wondering where did X number of characters
go or where did X number of characters magically come from.  If it's a single
Unicode code point (e.g. U+1F60A) then it needs to be treated as just one
character, regardless of how many bytes it might require to encode in a
particular encoding.  The whole point of working with text instead of bytes is
that you can work with characters, not worrying about how things are encoded
under the hood.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kde] [Bug 465305] New: character counts are wrong when text includes emojis; each counted as two (2)

Reply via email to