On 07/01/2025 10:55, r...@apache.org wrote:
This is an automated email from the ASF dual-hosted git repository.
remm pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tomcat.git
The following commit(s) were added to refs/heads/main by this push:
new 2bdb19a504 BZ69521: Allow more non latin languages in EL
2bdb19a504 is described below
commit 2bdb19a504b2871c8cc2e6832fff4c0653bec179
Author: remm <r...@apache.org>
AuthorDate: Tue Jan 7 11:54:32 2025 +0100
BZ69521: Allow more non latin languages in EL
-1 (veto)
The allowed ranges must only include those characters that are permitted
in a Java identifier.
For example, this change permits the use of \uff00 which is not valid
for a Java identifier.
I have thrown some code together that generates the valid ranges. I've
only looked at Character.isJavaIdentifierPart() but it would be easy to
extend to cover start as well.
Looking at the current parser source, LETTER and DIGIT are only a
close(ish) approximation to what Java allows.
The current EL spec uses the same ranges as Tomcat before this commit
BUT it also states that the grammar is only intended as a guide.
My reading of the EL spec is that it is clear that IDENTIFIER == Java
Identifier.
My current thinking is:
- add the code that does the range generation as a test case so we can
use it to generate the ranges for the grammar
- add tests for each character in the range 0x0000-0xFFFF and check that
confirm that Java and Tomcat's EL implementation treat them the same way
for both starting an identifier and being part of an identifier.
I am slightly surprised that it has taken this long for this issue to
emerge.
The parser needs to be regenerated but I need to look if the formatting
changes are intentional.
They are. My thinking was with the code generator changing the
formatting from time to time, if we generate and then apply our own
standard formatting before committing, it would be easier to spot
differences. It was also easier to just apply it to the whole package
and not exclude the generated classes.
Mark
---
java/org/apache/el/parser/ELParser.jjt | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/java/org/apache/el/parser/ELParser.jjt
b/java/org/apache/el/parser/ELParser.jjt
index 1a9cc31dd6..30795f8486 100644
--- a/java/org/apache/el/parser/ELParser.jjt
+++ b/java/org/apache/el/parser/ELParser.jjt
@@ -566,11 +566,7 @@ java.util.Deque<Integer> deque = new
java.util.ArrayDeque<Integer>();
"\u00d8"-"\u00f6",
"\u00f8"-"\u00ff",
"\u0100"-"\u1fff",
- "\u3040"-"\u318f",
- "\u3300"-"\u337f",
- "\u3400"-"\u3d2d",
- "\u4e00"-"\u9fff",
- "\uf900"-"\ufaff"
+ "\u2c60"-"\uffef"
]
>
| < #DIGIT:
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org