On Tue, Jan 7, 2025 at 12:57 PM Mark Thomas <ma...@apache.org> wrote: > > On 07/01/2025 10:55, r...@apache.org wrote: > > This is an automated email from the ASF dual-hosted git repository. > > > > remm pushed a commit to branch main > > in repository https://gitbox.apache.org/repos/asf/tomcat.git > > > > > > The following commit(s) were added to refs/heads/main by this push: > > new 2bdb19a504 BZ69521: Allow more non latin languages in EL > > 2bdb19a504 is described below > > > > commit 2bdb19a504b2871c8cc2e6832fff4c0653bec179 > > Author: remm <r...@apache.org> > > AuthorDate: Tue Jan 7 11:54:32 2025 +0100 > > > > BZ69521: Allow more non latin languages in EL > > -1 (veto) > > The allowed ranges must only include those characters that are permitted > in a Java identifier. > > For example, this change permits the use of \uff00 which is not valid > for a Java identifier. > > I have thrown some code together that generates the valid ranges. I've > only looked at Character.isJavaIdentifierPart() but it would be easy to > extend to cover start as well. > > Looking at the current parser source, LETTER and DIGIT are only a > close(ish) approximation to what Java allows. > > The current EL spec uses the same ranges as Tomcat before this commit > BUT it also states that the grammar is only intended as a guide. > > My reading of the EL spec is that it is clear that IDENTIFIER == Java > Identifier. > > My current thinking is: > - add the code that does the range generation as a test case so we can > use it to generate the ranges for the grammar > - add tests for each character in the range 0x0000-0xFFFF and check that > confirm that Java and Tomcat's EL implementation treat them the same way > for both starting an identifier and being part of an identifier. > > I am slightly surprised that it has taken this long for this issue to > emerge.
Yes, only Chinese and Japanese were allowed. I'm not sure why it is important to validate the identifiers so much though ? UnicodeBlock has the full list (in particular blockStarts) but it's not really possible to query it directly. > > The parser needs to be regenerated but I need to look if the formatting > > changes are intentional. > > They are. My thinking was with the code generator changing the > formatting from time to time, if we generate and then apply our own > standard formatting before committing, it would be easier to spot > differences. It was also easier to just apply it to the whole package > and not exclude the generated classes. Ok, that's what I thought. Rémy > Mark > > > > --- > > java/org/apache/el/parser/ELParser.jjt | 6 +----- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/java/org/apache/el/parser/ELParser.jjt > > b/java/org/apache/el/parser/ELParser.jjt > > index 1a9cc31dd6..30795f8486 100644 > > --- a/java/org/apache/el/parser/ELParser.jjt > > +++ b/java/org/apache/el/parser/ELParser.jjt > > @@ -566,11 +566,7 @@ java.util.Deque<Integer> deque = new > > java.util.ArrayDeque<Integer>(); > > "\u00d8"-"\u00f6", > > "\u00f8"-"\u00ff", > > "\u0100"-"\u1fff", > > - "\u3040"-"\u318f", > > - "\u3300"-"\u337f", > > - "\u3400"-"\u3d2d", > > - "\u4e00"-"\u9fff", > > - "\uf900"-"\ufaff" > > + "\u2c60"-"\uffef" > > ] > > > > > | < #DIGIT: > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > > For additional commands, e-mail: dev-h...@tomcat.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org