Re: (tomcat) branch main updated: BZ69521: Allow more non latin languages in EL

Mark Thomas Tue, 07 Jan 2025 03:57:07 -0800

On 07/01/2025 10:55, [email protected] wrote:

This is an automated email from the ASF dual-hosted git repository.


remm pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tomcat.git


The following commit(s) were added to refs/heads/main by this push:
      new 2bdb19a504 BZ69521: Allow more non latin languages in EL
2bdb19a504 is described below

commit 2bdb19a504b2871c8cc2e6832fff4c0653bec179
Author: remm <[email protected]>
AuthorDate: Tue Jan 7 11:54:32 2025 +0100

     BZ69521: Allow more non latin languages in EL


-1 (veto)

The allowed ranges must only include those characters that are permittedin a Java identifier.

For example, this change permits the use of \uff00 which is not validfor a Java identifier.

I have thrown some code together that generates the valid ranges. I'veonly looked at Character.isJavaIdentifierPart() but it would be easy toextend to cover start as well.

Looking at the current parser source, LETTER and DIGIT are only aclose(ish) approximation to what Java allows.

The current EL spec uses the same ranges as Tomcat before this commitBUT it also states that the grammar is only intended as a guide.

My reading of the EL spec is that it is clear that IDENTIFIER == JavaIdentifier.


My current thinking is:

- add the code that does the range generation as a test case so we canuse it to generate the ranges for the grammar- add tests for each character in the range 0x0000-0xFFFF and check thatconfirm that Java and Tomcat's EL implementation treat them the same wayfor both starting an identifier and being part of an identifier.

I am slightly surprised that it has taken this long for this issue toemerge.

     The parser needs to be regenerated but I need to look if the formatting
     changes are intentional.

They are. My thinking was with the code generator changing theformatting from time to time, if we generate and then apply our ownstandard formatting before committing, it would be easier to spotdifferences. It was also easier to just apply it to the whole packageand not exclude the generated classes.


Mark

---
  java/org/apache/el/parser/ELParser.jjt | 6 +-----
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/java/org/apache/el/parser/ELParser.jjt 
b/java/org/apache/el/parser/ELParser.jjt
index 1a9cc31dd6..30795f8486 100644
--- a/java/org/apache/el/parser/ELParser.jjt
+++ b/java/org/apache/el/parser/ELParser.jjt
@@ -566,11 +566,7 @@ java.util.Deque<Integer> deque = new 
java.util.ArrayDeque<Integer>();
          "\u00d8"-"\u00f6",
          "\u00f8"-"\u00ff",
          "\u0100"-"\u1fff",
-        "\u3040"-"\u318f",
-        "\u3300"-"\u337f",
-        "\u3400"-"\u3d2d",
-        "\u4e00"-"\u9fff",
-        "\uf900"-"\ufaff"
+        "\u2c60"-"\uffef"
          ]
      >
  |    < #DIGIT:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: (tomcat) branch main updated: BZ69521: Allow more non latin languages in EL

Reply via email to