Re: (tomcat) branch main updated: BZ69521: Allow more non latin languages in EL

Rémy Maucherat Tue, 07 Jan 2025 06:19:25 -0800

On Tue, Jan 7, 2025 at 12:57 PM Mark Thomas <ma...@apache.org> wrote:
>
> On 07/01/2025 10:55, r...@apache.org wrote:
> > This is an automated email from the ASF dual-hosted git repository.
> >
> > remm pushed a commit to branch main
> > in repository https://gitbox.apache.org/repos/asf/tomcat.git
> >
> >
> > The following commit(s) were added to refs/heads/main by this push:
> >       new 2bdb19a504 BZ69521: Allow more non latin languages in EL
> > 2bdb19a504 is described below
> >
> > commit 2bdb19a504b2871c8cc2e6832fff4c0653bec179
> > Author: remm <r...@apache.org>
> > AuthorDate: Tue Jan 7 11:54:32 2025 +0100
> >
> >      BZ69521: Allow more non latin languages in EL
>
> -1 (veto)
>
> The allowed ranges must only include those characters that are permitted
> in a Java identifier.
>
> For example, this change permits the use of \uff00 which is not valid
> for a Java identifier.
>
> I have thrown some code together that generates the valid ranges. I've
> only looked at Character.isJavaIdentifierPart() but it would be easy to
> extend to cover start as well.
>
> Looking at the current parser source, LETTER and DIGIT are only a
> close(ish) approximation to what Java allows.
>
> The current EL spec uses the same ranges as Tomcat before this commit
> BUT it also states that the grammar is only intended as a guide.
>
> My reading of the EL spec is that it is clear that IDENTIFIER == Java
> Identifier.
>
> My current thinking is:
> - add the code that does the range generation as a test case so we can
> use it to generate the ranges for the grammar
> - add tests for each character in the range 0x0000-0xFFFF and check that
> confirm that Java and Tomcat's EL implementation treat them the same way
> for both starting an identifier and being part of an identifier.
>
> I am slightly surprised that it has taken this long for this issue to
> emerge.


Yes, only Chinese and Japanese were allowed. I'm not sure why it is
important to validate the identifiers so much though ?
UnicodeBlock has the full list (in particular blockStarts) but it's
not really possible to query it directly.

> >      The parser needs to be regenerated but I need to look if the formatting
> >      changes are intentional.
>
> They are. My thinking was with the code generator changing the
> formatting from time to time, if we generate and then apply our own
> standard formatting before committing, it would be easier to spot
> differences. It was also easier to just apply it to the whole package
> and not exclude the generated classes.

Ok, that's what I thought.

Rémy

> Mark
>
>
> > ---
> >   java/org/apache/el/parser/ELParser.jjt | 6 +-----
> >   1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/java/org/apache/el/parser/ELParser.jjt 
> > b/java/org/apache/el/parser/ELParser.jjt
> > index 1a9cc31dd6..30795f8486 100644
> > --- a/java/org/apache/el/parser/ELParser.jjt
> > +++ b/java/org/apache/el/parser/ELParser.jjt
> > @@ -566,11 +566,7 @@ java.util.Deque<Integer> deque = new 
> > java.util.ArrayDeque<Integer>();
> >           "\u00d8"-"\u00f6",
> >           "\u00f8"-"\u00ff",
> >           "\u0100"-"\u1fff",
> > -        "\u3040"-"\u318f",
> > -        "\u3300"-"\u337f",
> > -        "\u3400"-"\u3d2d",
> > -        "\u4e00"-"\u9fff",
> > -        "\uf900"-"\ufaff"
> > +        "\u2c60"-"\uffef"
> >           ]
> >       >
> >   |    < #DIGIT:
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: dev-h...@tomcat.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: dev-h...@tomcat.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Re: (tomcat) branch main updated: BZ69521: Allow more non latin languages in EL

Reply via email to