Ken Hu created TINKERPOP-3240:
---------------------------------
Summary: Relax String escaping rules
Key: TINKERPOP-3240
URL: https://issues.apache.org/jira/browse/TINKERPOP-3240
Project: TinkerPop
Issue Type: Improvement
Components: driver
Affects Versions: 4.0.0
Reporter: Ken Hu
{{GremlinLang.argAsString()}} uses {{StringEscapeUtils.escapeJava()}} from
Apache Commons Text to escape string and character values when serializing
parameters to gremlin-lang string literals. This method escapes all non-ASCII
characters to \uXXXX form, which is more aggressive than necessary for Gremlin
string literals.
For example, the string "café" gets serialized as "caf\u00E9" rather than
preserving the literal é character. escapeJava() was designed to produce
ASCII-safe Java source code, a concern rooted in legacy encoding issues that
are largely irrelevant in a modern UTF-8 world. This behavior is inconsistent
with how other languages (Python, JavaScript, Groovy, Go) treat printable
Unicode in string literals, where characters like é, ñ, or ü are preserved
as-is and only control characters are escaped.
For gremlin-lang serialization, the escaping should only need to target
characters that are not safely printable: backslashes, quotes, and control
characters (e.g. \n, \t, \r, null, etc.). Printable Unicode should pass through
unchanged. We should consider replacing escapeJava() with a lighter escaping
approach that matches this expectation.
This issue should also be investigated across all the GLVs to ensure consistent
GremlinLang is being produced for each Traversal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)