On 07/01/2025 10:55, r...@apache.org wrote:
This is an automated email from the ASF dual-hosted git repository.

remm pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tomcat.git


The following commit(s) were added to refs/heads/main by this push:
      new 2bdb19a504 BZ69521: Allow more non latin languages in EL
2bdb19a504 is described below

commit 2bdb19a504b2871c8cc2e6832fff4c0653bec179
Author: remm <r...@apache.org>
AuthorDate: Tue Jan 7 11:54:32 2025 +0100

     BZ69521: Allow more non latin languages in EL

-1 (veto)

The allowed ranges must only include those characters that are permitted in a Java identifier.

For example, this change permits the use of \uff00 which is not valid for a Java identifier.

I have thrown some code together that generates the valid ranges. I've only looked at Character.isJavaIdentifierPart() but it would be easy to extend to cover start as well.

Looking at the current parser source, LETTER and DIGIT are only a close(ish) approximation to what Java allows.

The current EL spec uses the same ranges as Tomcat before this commit BUT it also states that the grammar is only intended as a guide.

My reading of the EL spec is that it is clear that IDENTIFIER == Java Identifier.

My current thinking is:
- add the code that does the range generation as a test case so we can use it to generate the ranges for the grammar - add tests for each character in the range 0x0000-0xFFFF and check that confirm that Java and Tomcat's EL implementation treat them the same way for both starting an identifier and being part of an identifier.

I am slightly surprised that it has taken this long for this issue to emerge.

     The parser needs to be regenerated but I need to look if the formatting
     changes are intentional.

They are. My thinking was with the code generator changing the formatting from time to time, if we generate and then apply our own standard formatting before committing, it would be easier to spot differences. It was also easier to just apply it to the whole package and not exclude the generated classes.

Mark


---
  java/org/apache/el/parser/ELParser.jjt | 6 +-----
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/java/org/apache/el/parser/ELParser.jjt 
b/java/org/apache/el/parser/ELParser.jjt
index 1a9cc31dd6..30795f8486 100644
--- a/java/org/apache/el/parser/ELParser.jjt
+++ b/java/org/apache/el/parser/ELParser.jjt
@@ -566,11 +566,7 @@ java.util.Deque<Integer> deque = new 
java.util.ArrayDeque<Integer>();
          "\u00d8"-"\u00f6",
          "\u00f8"-"\u00ff",
          "\u0100"-"\u1fff",
-        "\u3040"-"\u318f",
-        "\u3300"-"\u337f",
-        "\u3400"-"\u3d2d",
-        "\u4e00"-"\u9fff",
-        "\uf900"-"\ufaff"
+        "\u2c60"-"\uffef"
          ]
      >
  |    < #DIGIT:


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to