Ken Hu created TINKERPOP-3240:
---------------------------------

             Summary: Relax String escaping rules
                 Key: TINKERPOP-3240
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-3240
             Project: TinkerPop
          Issue Type: Improvement
          Components: driver
    Affects Versions: 4.0.0
            Reporter: Ken Hu


{{GremlinLang.argAsString()}} uses {{StringEscapeUtils.escapeJava()}} from 
Apache Commons Text to escape string and character values when serializing 
parameters to gremlin-lang string literals. This method escapes all non-ASCII 
characters to \uXXXX form, which is more aggressive than necessary for Gremlin 
string literals.

For example, the string "café" gets serialized as "caf\u00E9" rather than 
preserving the literal é character. escapeJava() was designed to produce 
ASCII-safe Java source code, a concern rooted in legacy encoding issues that 
are largely irrelevant in a modern UTF-8 world. This behavior is inconsistent 
with how other languages (Python, JavaScript, Groovy, Go) treat printable 
Unicode in string literals, where characters like é, ñ, or ü are preserved 
as-is and only control characters are escaped.

For gremlin-lang serialization, the escaping should only need to target 
characters that are not safely printable: backslashes, quotes, and control 
characters (e.g. \n, \t, \r, null, etc.). Printable Unicode should pass through 
unchanged. We should consider replacing escapeJava() with a lighter escaping 
approach that matches this expectation.

This issue should also be investigated across all the GLVs to ensure consistent 
GremlinLang is being produced for each Traversal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to