On 30/08/2022 13:21, Konstantin Kolinko wrote:
вт, 30 авг. 2022 г. в 10:13, Mark Thomas <ma...@apache.org>:
On 29/08/2022 14:06, Christopher Schultz wrote:
Mark,
On 8/29/22 02:39, ma...@apache.org wrote:
<snip/>
public static String get(final String key, final Object... args) {
String value = get(key);
+ // Convert all Number arguments to String else MessageFormat
may try to
+ // format them in unexpected ways.
+ if (args != null) {
+ for (int i = 0; i < args.length; i++) {
+ if (args[i] instanceof Number) {
+ args[i] = args[i].toString();
+ }
+ }
+ }
+
This might represent a big change in behavior, especially with
floating-point numbers. I'm not sure what role MessageFormat plays in
the whole EL ecosystem... is it any part of the spec, or only for like
error messages and things like that?
It is only for error messages and the like.
oss-fuzz found an edge case where MessageFormat would output a number
with hundreds of thousands of digits as an integer rather than using
exponential form.
Any such instances would be application bugs (the issue is in parsing
the EL expression so there is no way for users to trigger this). It
seems unlikely that this would occur in practice.
1. I think we are actually dealing with a JRE bug here.
As such, while only MessageFactory in EL was tested, in theory it
concerns other copies of this class.
2. Personally, as a programmer, I do like when numbers are printed in
a programmer-friendly way, i.e. via toString() like here, but
generally it means losing i18n.
3. For future reference
- Javadoc of MessageFormat. Java 17:
https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html
- messages used by EL code come from
static final ResourceBundle bundle =
ResourceBundle.getBundle("org.apache.el.LocalStrings");
and look like
error.convert=Cannot convert [{0}] of type [{1}] to [{2}]
4. I think that the JRE bug is as follows:
(1) The Javadoc of MessageFormat says (see a table there) that to
format a number you specify the format with FormatType = "number". The
FormatStyle may be omitted.
E.g. "{0, number}" can be used to clearly declare that the argument
has to be formatted as a Number.
(2) It says that for FormatType=none, it creates a Subformat of null.
Looking into the source code (OpenJDK 17.0.4)
parsing the format string is implemented in method #makeFormat(...).
See the branch with "case TYPE_NULL:" there.
It results in assigning the null value to an entry in "formats[]" array.
(3) I suspect that a null format was actually intended to serve as a
"text" format. My concerns:
- The MessageFormat lacks an explicit "text" FormatType.
In comparison, with a java.util.Formatter I can use an explicit %s
pattern to specify textual format.
- There is a comment in the source code of MessageFormat #toPattern():
[[[
if (fmt == null) {
// do nothing, string format
} else if ...
]]]
- I do not see any documentation in MessageFormat javadoc on how the
null Subformat is supposed to be treated.
Nor I found it in the official tutorial.
https://docs.oracle.com/javase/tutorial/i18n/format/messageFormat.html
I guess the behaviour may be "specified" in a hidden way, via TCK.
(4) The actual implementation of formatting (implemented in
MessageFormat#subformat(...)) is different:
[[[
} else if (formats[i] != null) {
subFormatter = formats[i];
if (subFormatter instanceof ChoiceFormat) {
arg = formats[i].format(obj);
if (arg.indexOf('{') >= 0) {
subFormatter = new MessageFormat(arg, locale);
obj = arguments;
arg = null;
}
}
} else if (obj instanceof Number) {
// format number if can
subFormatter = NumberFormat.getInstance(locale);
} else if (obj instanceof Date) {
// format a Date if can
subFormatter = DateFormat.getDateTimeInstance(
DateFormat.SHORT, DateFormat.SHORT, locale);//fix
} else if (obj instanceof String) {
arg = (String) obj;
} else {
arg = obj.toString();
if (arg == null) arg = "null";
}
]]]
I.e. if format[i] is null it does not interpret it as "text", but
tries "to be smart" and does some processing for Numbers.
(5) According to oss-fuzz we now know that such processing is broken.
It assumes that any Number can be sanely formatted by a
"NumberFormat.getInstance(locale);".
5. What to do. I see the following ways for a better workaround of the JRE bug:
(a) It is possible to narrow Mark's patch:
narrowing the type test to look for BigInteger and BigDecimal instead
of a generic Number.
On the fact that
- There are known problems with some of such numbers.
- Those classes know how to print themselves better than a NumberFormat knows.
Pros: It fixes a rare edge case, and doesn't change anything else.
Cons: I think that it breaks formatting if a pattern is using an
explicit FormatType of "number" or of "choice".
(b) It is possible to obtain formats from a MessageFormat. See the
getFormatsByArgumentIndex() method.
I propose
if the format is null then format the value with String.valueOf().
I.e. interpret "{0}" as a text format.
If it causes a regression somewhere, it is possible to update a
specific format pattern to revert to the old behaviour:
just replace {0} with {0,number} for numbers, or with a pair of
{0,date} {0, time} for dates.
I assume that the type of an argument in such patterns is known.
Pros:
- Explicitly controlled behaviour.
Cons:
- If restoring the old behaviour is required in many places,
updating all them may be a notable amount of work.
Other notes:
- The default format for Dates when printed with toString() is"dow mon
dd hh:mm:ss zzz yyyy", as documented in Javadoc
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Date.html#toString()
It may look uglier than the old behaviour, but no data is lost: both
date and time are printed.
- I am not handling the case of nested format patterns, i.e.
the case of ChoiceFormat in the code fragment above.
I think that usage of ChoiceFormat is rare, and that it is always used
with numbers, not with some random values of unknown type.
So I think that we are safe to ignore this case.
(c) It is possible to combine (b) with a type check,
e.g. apply it for Numbers only, but skip Dates.
Is there a worth in such behaviour?
My preference is for (b).
That seems reasonable to me. I'll make the changes.
I think the chances of this being an issue for users is low but having a
way to revert to the previous behaviour is a sensible precaution.
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org