Hi all, I'd like to propose the addition of a new object-oriented abstraction for representing full Unicode characters in Java: `UnicodeCharacter`.
This class addresses a fundamental limitation of the current `Character` type, which wraps a single `char` and therefore cannot properly represent Unicode characters outside the Basic Multilingual Plane (BMP). With the growing importance of supplementary characters (e.g., emoji, non-Latin scripts, rare CJK glyphs), a more complete and object-oriented Unicode abstraction would be beneficial to the JDK. ### Motivation The `Character` type is limited to 16-bit `char` units, and cannot represent characters requiring surrogate pairs (code points > U+FFFF). Java developers working with text must often deal with `codePointAt`, `toChars`, and `offsetByCodePoints`, resulting in fragile and error-prone code. Furthermore, there's no immutable object type that cleanly encapsulates a single logical Unicode character. ### Proposed Class: UnicodeCharacter This proposal introduces a final, immutable class that wraps a valid Unicode code point and exposes convenient methods to work with it. A reference implementation is available here: https://github.com/pponec/ujorm/blob/master/project-m2/ujo-tools/src/main/java/org/ujorm/tools/common/UnicodeCharacter.java Highlights: ```java public final class UnicodeCharacter implements CharSequence, Comparable<UnicodeCharacter> { public static UnicodeCharacter of(final int codePoint); public static UnicodeCharacter of(final CharSequence text, final int unicodeIndex); public int codePoint(); public char[] toChars(); public int charCount(); public boolean equals(char c); @Override public String toString(); @Override public int length(); @Override public char charAt(int index); @Override public CharSequence subSequence(int start, int end); } Benefits Proper support for the full Unicode range, including supplementary characters. Immutable and type-safe object model. Simpler and safer text iteration and processing. Aligns well with modern Java idioms, e.g. Stream<UnicodeCharacter> from a String. Object-oriented alternative to repeated Character.toChars(...), codePointAt(...), etc. Compatibility The proposed class is entirely new and doesn't break any existing APIs. It complements existing types and uses only standard Java APIs. It can be introduced in the java.lang or java.text package without VM-level changes. Adoption This type can be used by libraries, UI frameworks, editors, and any text-processing tools where proper Unicode character semantics are critical. It promotes correctness in multilingual and emoji-rich applications. Please let me know if there's interest. I'm happy to further develop this idea into a JEP if the community agrees it's worth exploring. Best regards, Pavel Ponec ppo...@gmail.com