from:"Archie Cobbs"

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

2024-06-06 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now contains 12 commits:

 - Bump @since from 23 to 24.
 - Merge branch 'master' into JDK-8322256
 - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
 - Simplify code by eliminating an impossible case.
 - Field name change and Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - Clarify exceptions: sometimes ZipException, sometimes EOFException.
 - Merge branch 'master' into JDK-8322256
 - ... and 2 more: https://git.openjdk.org/jdk/compare/75dc2f85...f845a75b

-

Changes: https://git.openjdk.org/jdk/pull/18385/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=04
  Stats: 318 lines in 2 files changed: 292 ins; 13 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: Read-only view of ByteArrayInputStream content

2024-07-09 Thread Archie Cobbs

On Tue, Jul 9, 2024 at 10:51 AM Martin Desruisseaux <
martin.desruisse...@geomatys.com> wrote:

> JDK-22 has modified ByteArrayInputStream.transferTo(OutputStream)
> implementation for performing a defensive copy of the byte array when the
> destination is not trusted (JDK-8321053). However, I was using the previous
> implementation as a trick for reading the array without copy. The reason is
> because we have millions of ByteArrayInputStream instances when reading
> BLOBs from a database (e.g., geometries from a spatial database), and some
> popular JDBC drivers implement ResultSet.getBinaryStream(int) by reading
> all data in a byte[] array and wrapping the array in a
> ByteArrayInputStream. As a cleaner replacement for my old trick, would it
> be possible to add something like the following method in
> ByteArrayInputStream?
>
> /**
>  * Returns the remaining content of this input stream as a read-only buffer.
>  * The sequence of remaining bytes in the buffer is the same as the sequence
>  * of bytes that would be returned by {@link #readAllBytes()}.
>  *
>  * @return the remaining content of this input stream, as a read-only buffer
>  */
> public synchronized ByteBuffer asByteBuffer() {
> return ByteBuffer.wrap(buf, pos, count - pos).slice().asReadOnlyBuffer();
> }
>
> The call to slice() is for blocking callers from accessing the bytes
> before the current stream position. I assumed that it could be a security
> issue. If this proposal is acceptable, I can create a pull request.
>
A few random thoughts...

First, let's consider what you are optimizing for. The difference in the
old vs. new behavior is the use of a 128k temporary transfer buffer. So if
I understand this correctly the performance problem you are addressing (in
terms of blown cache) is not proportional to the size of the original BLOB,
but to at most 128K (in both the old and new cases, you have to scan the
entire BLOB, so that's a wash). Does that 128K create a noticeable
difference?

Anyway, IMHO more interoperability between the different Java "ways of
doing things" is good. E.g., converting between Instant and Date, etc. This
falls into that category. In fact, in the past I've wished for the reverse
- i.e., ByteBuffer.asOutputStream() - and ended up creating a simple
wrapper class that does that. So perhaps these two additions could go
together - just a thought.

Minor nit - the Javadoc would need to clarify that operations on the
returned ByteBuffer have no effect on the original OutputStream.

-Archie

-- 
Archie L. Cobbs

Re: Read-only view of ByteArrayInputStream content

2024-07-09 Thread Archie Cobbs

Hi Martin,

On Tue, Jul 9, 2024 at 12:31 PM Martin Desruisseaux <
martin.desruisse...@geomatys.com> wrote:

> I was using a custom OutputStream implementation for getting the value of
> that field, avoiding any form of copy.
>
Gotcha - so in other words, you want a way to effectively "unwrap" the
original byte[] array so you can access the whole thing at one time (random
access), as opposed to just accessing it in online fashion as a stream of
bytes.

It seems that the root of this dilemma is really the java.sql API. For
example, you would think there should be a method
java.sql.Blob.asByteBuffer() (and Clob.asCharBuffer()). Maybe adding such
methods would be a more precise way to fix this.

In looking at this, I noticed that Blob.getBytes() does not specify
whether, when the entire array is asked for, a copy must (or must not) be
returned. So you can't rely on that method for any particular behavior
either.

Basically, the BLOB API seems clearly designed to allow the implementation
to stream the data on demand if it wants to (which is good), but as a side
effect it doesn't provide a way for the caller to guarantee avoidance of
copying the entire array (if the implementation happens to not stream the
data on demand).

-Archie

-- 
Archie L. Cobbs

Re: Read-only view of ByteArrayInputStream content

2024-07-10 Thread Archie Cobbs

On Tue, Jul 9, 2024 at 1:42 PM Martin Desruisseaux <
martin.desruisse...@geomatys.com> wrote:

>
> Basically, the BLOB API seems clearly designed to allow the implementation
> to stream the data on demand if it wants to (which is good), but as a side
> effect it doesn't provide a way for the caller to guarantee avoidance of
> copying the entire array (if the implementation happens to not stream the
> data on demand).
>
> Right. The wish to "unwrap" the ByteArrayInputStream original array come
> from the empirical observation that many of the JDBC drivers that we are
> using do not stream.
>

So would you accept a solution to this problem in the java.sql package
instead? For example by adding two new methods:

java.sql.Blob.asByteBuffer()
java.sql.Clob.asCharBuffer()

-Archie

-- 
Archie L. Cobbs

Re: Read-only view of ByteArrayInputStream content

2024-07-10 Thread Archie Cobbs

On Wed, Jul 10, 2024 at 12:05 PM Martin Desruisseaux <
martin.desruisse...@geomatys.com> wrote:

> Le 2024-07-10 à 18 h 00, Archie Cobbs a écrit :
>
> So would you accept a solution to this problem in the java.sql package
> instead? For example by adding two new methods:
> java.sql.Blob.asByteBuffer()
> java.sql.Clob.asCharBuffer()
>
> Yes, it would work for me if JDBC drivers provide their own
> implementation. But it would not be exactly equivalent. A
> Blob.asByteBuffer() method would be efficient when the JDBC driver loads
> all bytes anyway, but may be expansive if the JDBC driver was really doing
> streaming. Maybe some JDBC drivers do streaming only when the Blob size is
> over some threshold (I didn't verified). Since a Blob can be very large,
> wouldn't it be safer to avoid adding API that performs an unconditional
> loading of all bytes? Unless the plan was to return an Optional.
>
But the API already has that problem - see Blob.getBytes().

So we wouldn't be adding any new constraints on how it must behave.

In any case, the API seems designed to let you choose how you want to
access the data - Blob.getBytes() for random access, Blob.getBinaryStream()
for stream access.

The bug here is that Blob.getBytes() seems to force the driver to make a
copy of the entire blob (or at least the portion you want to access) simply
by the way it's designed - because the "copyness" is unspecified, a
defensive copy of the array data is required.

The nice thing about a Blob.asByteBuffer() method is that it would fix that
problem. And a smart implementation could download and cache regions of the
blob on demand as they were accessed, providing a continuous middle ground
between the current "all or none" downloading choices.

> By contrast, the ByteArrayInputStream.asByteBuffer() proposal never loads
> bytes that were not already in memory. Therefore, it can be used in a more
> opportunistic way ("only if all bytes happen to be in memory anyway,
> opportunistically use them directly without copy"). Also, it does not
> require changes in JDBC drivers, and could be used in contexts other than
> JDBC as well.
>
I'm not opposed to this either, but it's a separate (nice) idea.

I'm curious to hear what other people think about both ideas...

-Archie

-- 
Archie L. Cobbs

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

2024-07-15 Thread Archie Cobbs

On Mon, 15 Jul 2024 13:12:41 GMT, Jaikiran Pai  wrote:

>> src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 153:
>> 
>>> 151:  */
>>> 152: public GZIPInputStream(InputStream in, int size,
>>> 153: boolean allowConcatenation, boolean ignoreExtraBytes) 
>>> throws IOException {
>> 
>> I haven't reviewed the javadoc changes. I will do that separately once we 
>> settle down on the API and implementation changes.
>> As for this new constructor, I think the only new parameter we should 
>> introduce is whether or not the underlying input stream `in` is 
>> expected/allowed to have concatenated GZIP stream(s). So I think we should 
>> remove the "ignoreExtraBytes" new parameters from this constructor and. 
>> Additionally, I think the `allowConcatenation` should instead be named 
>> `allowConcatenatedGZIPStream`:
>> 
>> 
>> public GZIPInputStream(InputStream in, int size, boolean 
>> allowConcatenatedGZIPStream) throws IOException {
>
> Just to provide a more concrete input, here's a very minimal tested version 
> of what I had in mind:
> 
> 
> --- a/src/java.base/share/classes/java/util/zip/GZIPInputStream.java
> +++ b/src/java.base/share/classes/java/util/zip/GZIPInputStream.java
> @@ -55,6 +55,8 @@ public class GZIPInputStream extends InflaterInputStream {
>  
>  private boolean closed = false;
>  
> +private final boolean allowConcatenatedGZIPStream;
> +
>  /**
>   * Check to make sure that this stream has not been closed
>   */
> @@ -76,14 +78,7 @@ private void ensureOpen() throws IOException {
>   * @throwsIllegalArgumentException if {@code size <= 0}
>   */
>  public GZIPInputStream(InputStream in, int size) throws IOException {
> -super(in, createInflater(in, size), size);
> -usesDefaultInflater = true;
> -try {
> -readHeader(in);
> -} catch (IOException ioe) {
> -this.inf.end();
> -throw ioe;
> -}
> +this(in, size, true);
>  }
>  
>  /*
> @@ -111,7 +106,28 @@ private static Inflater createInflater(InputStream in, 
> int size) {
>   * @throwsIOException if an I/O error has occurred
>   */
>  public GZIPInputStream(InputStream in) throws IOException {
> -this(in, 512);
> +this(in, 512, true);
> +}
> +
> +/**
> + * WIP
> + * @param in the input stream
> + * @param size the input buffer size
> + * @param allowConcatenatedGZIPStream true if the input stream is 
> allowed to contain
> + *concatenated GZIP streams. false 
> otherwise
> + * @throws IOException if an I/O error has occurred
> + */
> +public GZIPInputStream(InputStream in, int size, boolean 
> allowConcatenatedGZIPStream)
> +throws IOException {
> +super(in, createInflater(in, size), size);
> +this.allowConcatenatedGZIPStream = allowConcatenatedGZIPStream;
> +usesDefaultInflater = true;
> +try {
> +readHeader(in);
> +} catch (IOException ioe) {
> +this.inf.end();
> +throw ioe;
> +}
>  }
>  
>  /**
> @@ -150,10 +166,45 @@ public int read(byte[] buf, int off, int len) throws 
> IOException {
>  }
>  int n = super.read(buf, off, len);
>  if (n == -1) {
> -if (readTrailer())
> +// reading of deflated data is now complete, we now read the 
> GZIP stream trailer
> +readTrailer();
> +if (!allowConcatenatedGZIPStream) {
>  eos = true;
> -else
> +...

Hi @jaikiran,

>  I think the only new parameter we should introduce is whether or not the 
> underlying input stream in is expected/allowed to have concatenated GZIP 
> stream(s).

I disagree with you here. Based on what I think the goal is here, we need two 
separate flags.

Here's my thinking, please see if this makes sense to you...

The overall problem being addressed here is that this class does not give the 
user precise control of how the input is to be decoded. In particular, it fails 
in two ways:
* It doesn't give the user the ability to stop reading (precisely) after the 
first GZIP stream
* It doesn't give the user the ability to detect (some) erroneous inputs after 
the first GZIP stream

Detecting erronous input is just a basic capability that any "decoder" should 
have. For example, we would never accept a class file reader that failed to 
detect and report unknown bytecode.

While this class does correctly detect invalid GZIP data frames that appear at 
the start of the overall input, or in the middle of a compressed GZIP stream, 
in trying to be "smart" by automatically handling concatenated GZIP streams, it 
ends up failing to detect corruption where extra bytes are added to the end of 
the input.

Instead, it will randomly do one of the following:
  * Completely ignore the corruption, or
  * Throw an excepti

RFR: 4452735: Add GZIPOutputStream constructor to specify Deflater

2024-07-17 Thread Archie Cobbs

The class `GZIPOutputStream` extends `DeflaterOutputStream`, which is logical 
because the GZIP encoding is based on ZLIB "deflate" encoding.

However, while `DeflaterOutputStream` provides constructors that take a custom 
`Deflator` argument supplied by the caller, `GZIPOutputStream` has no such 
constructors.

As a result, it's not possible to do entirely reasonable customization, such as 
configuring a `GZIPOutputStream` for a non-default compression level.

This change adds a new `GZIPOutputStream` constructor that accepts a custom 
`Deflater`, and also adds a basic unit test for it and all of the other 
`GZIPOutputStream` constructors, based on the existing test 
`BasicGZIPInputStreamTest.java` which does the same thing for `GZIPInputStream`.

-

Commit messages:
 - Add a GZIPOutputStream() constructor that takes a Deflator.

Changes: https://git.openjdk.org/jdk/pull/20226/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20226&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-4452735
  Stats: 139 lines in 2 files changed: 139 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20226.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20226/head:pull/20226

PR: https://git.openjdk.org/jdk/pull/20226

Re: RFR: 6356745: (coll) Add PriorityQueue(Collection, Comparator) [v6]

2024-07-17 Thread Archie Cobbs

On Thu, 28 Dec 2023 00:09:09 GMT, Valeh Hajiyev  wrote:

>> This commit addresses the current limitation in the `PriorityQueue` 
>> implementation, which lacks a constructor to efficiently create a priority 
>> queue with a custom comparator and an existing collection. In order to 
>> create such a queue, we currently need to initialize a new queue with custom 
>> comparator, and after that populate the queue using `addAll()` method, which 
>> in the background calls `add()` method (which takes `O(logn)` time) for each 
>> element of the collection (`n` times).  This is resulting in an overall time 
>> complexity of `O(nlogn)`. 
>> 
>> 
>> PriorityQueue pq = new PriorityQueue<>(customComparator);
>> pq.addAll(existingCollection);
>> 
>> 
>> The pull request introduces a new constructor to streamline this process and 
>> reduce the time complexity to `O(n)`.  If you create the queue above using 
>> the new constructor, the contents of the collection will be copied (which 
>> takes `O(n)` time) and then later  `heapify()` operation (Floyd's algorithm) 
>> will be called once (another `O(n)` time). Overall the operation will be 
>> reduced from `O(nlogn)` to `O(2n)` -> `O(n)` time.
>> 
>> 
>> PriorityQueue pq = new PriorityQueue<>(existingCollection, 
>> customComparator);
>
> Valeh Hajiyev has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix style

Hmm. Why did this PR expire without further action? Are there any remaining 
blockers (other than it still needs a CSR and a review)?

-

PR Comment: https://git.openjdk.org/jdk/pull/17045#issuecomment-2234671764

Re: RFR: 4452735: Add GZIPOutputStream constructor to specify Deflater

2024-07-19 Thread Archie Cobbs

On Fri, 19 Jul 2024 10:41:05 GMT, Lance Andersen  wrote:

> I understand the request here, but is there a current use case for needing a 
> custom Deflater?

I think the primary use case is when you want to set a non-default compression 
level, e.g., "best" or "fast". This is a pretty normal thing to do and matches 
what people expect from the `gzip(1)` command line flags. Allowing a custom 
`Deflater` is the simplest way to accomplish this; it also solves some other 
less common use cases, e.g., you want to set "no compression" for an already 
compressed file, or you want to keep the `Deflater` open so you can gather 
stats or whatever.

> Before adding additional features, I think GZIP could benefit with more test 
> coverage.

Agreed. `GZIPOutputStream` does get some coverage in some of the 
`GZIPInputStream` tests, and this PR adds more testing, but certainly more is 
better.

-

PR Comment: https://git.openjdk.org/jdk/pull/20226#issuecomment-2239264081

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

2024-07-19 Thread Archie Cobbs

On Fri, 19 Jul 2024 09:56:46 GMT, Eirik Bjørsnøs  wrote:

> The term "extra" here feels somewhat open to interpretation. Specifically, 
> "extra" sounds like something that is out of the ordinary, but not uncommon 
> or wrong. It could be used when describing an optional feature in a format 
> specification.
> 
> The API description referes to "unexpected data". Perhaps the word 
> "unexpected" is a more precise and descriptive term to use in this option? So 
> ignoreUnexpectedBytes or ignoreUnexpectedData.

This is a good point.. because it's actually a bit subtle what kind of data is 
being referred to here as being "ignored". Even "unexpected" doesn't quite 
capture it.

To be precise, here's what bytes the  `ignoreExtraBytes` parameter is going to 
cause to be ignored:
* In the case `allowConcatenation == false`, it really does refer to "extra" or 
"extraneous" data, that is: one or more bytes having any value appearing 
_after_ the end of the GZIP trailer frame.
* In the case `allowConcatenation == true`, it means a truncated or invalid 
GZIP _header frame_ appearing after the end of any GZIP trailer frame.

The second case means "unexpected" seems wrong because why would unexpected 
data in a header frame be treated any differently from unexpected data any 
other frame (which of course is not going to be ignored but instead will 
trigger an exception)?

So maybe this is just more evidence that we shouldn't use boolean parameters - 
because they're not really orthogonal.

What about something like:

public enum ConcatPolicy {
DISALLOW_STRICT,// one GZIP stream, nothing else
DISALLOW_LENIENT,   // one GZIP stream, ignore any extra byte(s)
ALLOW_STRICT,   // N GZIP streams, nothing else
ALLOW_LENIENT;  // N GZIP streams, stop at, and ignore, an invalid 
header frame
}

The legacy behavior would of course correspond to `ALLOW_LENIENT`.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1685106562

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

2024-07-22 Thread Archie Cobbs

On Mon, 22 Jul 2024 18:08:52 GMT, Eirik Bjørsnøs  wrote:

>>> The term "extra" here feels somewhat open to interpretation. Specifically, 
>>> "extra" sounds like something that is out of the ordinary, but not uncommon 
>>> or wrong. It could be used when describing an optional feature in a format 
>>> specification.
>>> 
>>> The API description referes to "unexpected data". Perhaps the word 
>>> "unexpected" is a more precise and descriptive term to use in this option? 
>>> So ignoreUnexpectedBytes or ignoreUnexpectedData.
>> 
>> This is a good point.. because it's actually a bit subtle what kind of data 
>> is being referred to here as being "ignored". Even "unexpected" doesn't 
>> quite capture it.
>> 
>> To be precise, here's what bytes the  `ignoreExtraBytes` parameter is going 
>> to cause to be ignored:
>> * In the case `allowConcatenation == false`, it really does refer to "extra" 
>> or "extraneous" data, that is: one or more bytes having any value appearing 
>> _after_ the end of the GZIP trailer frame.
>> * In the case `allowConcatenation == true`, it means a truncated or invalid 
>> GZIP _header frame_ appearing after the end of any GZIP trailer frame.
>> 
>> The second case means "unexpected" seems wrong because why would unexpected 
>> data in a header frame be treated any differently from unexpected data any 
>> other frame (which of course is not going to be ignored but instead will 
>> trigger an exception)?
>> 
>> So maybe this is just more evidence that we shouldn't use boolean parameters 
>> - because they're not really orthogonal.
>> 
>> What about something like:
>> 
>> 
>> public enum ConcatPolicy {
>> DISALLOW_STRICT,// one GZIP stream, nothing else
>> DISALLOW_LENIENT,   // one GZIP stream, ignore any extra byte(s)
>> ALLOW_STRICT,   // N GZIP streams, nothing else
>> ALLOW_LENIENT;  // N GZIP streams, stop at, and ignore, an invalid 
>> header frame
>> }
>> 
>> The legacy behavior would of course correspond to `ALLOW_LENIENT`.
>
> Another thought:
> 
> Are we sure we would want to introduce a new mode now which does _not_ allow 
> concatenation, but _does_ ignore trailing data?
> 
> If the ignore mode is really discouraged, why open a new mode of config 
> allowing it?
> 
> In other words, could we instead (at least conceptually)  have the policies 
> LEGACY, SINGLE_STREAM, CONCATENATED_STREAM where the latter two always reject 
> trailing data?

> Are we sure we would want to introduce a new mode now which does not allow 
> concatenation, but does ignore trailing data?

I don't see it as necessary. It's certainly not a case that anyone is currently 
relying on. So it's fine with me to omit it.

If are calling this a "concatenated stream policy" then we could have:

public enum ConcatPolicy {
DISALLOW,
ALLOW,
ALLOW_LENIENT;
}

(I'm avoiding `LEGACY` which makes sense today but might make less sense N 
years from now.)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1687042483

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v6]

2024-07-22 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now contains 14 commits:

 - Change concatenation policy config from booleans to new ConcatPolicy enum.
 - Merge branch 'master' into JDK-8322256
 - Bump @since from 23 to 24.
 - Merge branch 'master' into JDK-8322256
 - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
 - Simplify code by eliminating an impossible case.
 - Field name change and Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - ... and 4 more: https://git.openjdk.org/jdk/compare/388fcf03...bb78bd08

-

Changes: https://git.openjdk.org/jdk/pull/18385/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=05
  Stats: 383 lines in 2 files changed: 357 ins; 13 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v7]

2024-07-24 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now contains 16 commits:

 - Merge branch 'master' into JDK-8322256
 - Wording tweak.
 - Change concatenation policy config from booleans to new ConcatPolicy enum.
 - Merge branch 'master' into JDK-8322256
 - Bump @since from 23 to 24.
 - Merge branch 'master' into JDK-8322256
 - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
 - Simplify code by eliminating an impossible case.
 - Field name change and Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - ... and 6 more: https://git.openjdk.org/jdk/compare/332df83e...72b60092

-

Changes: https://git.openjdk.org/jdk/pull/18385/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=06
  Stats: 383 lines in 2 files changed: 357 ins; 13 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v7]

2024-07-26 Thread Archie Cobbs

On Fri, 26 Jul 2024 13:36:43 GMT, Lance Andersen  wrote:

>> Archie Cobbs has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 16 commits:
>> 
>>  - Merge branch 'master' into JDK-8322256
>>  - Wording tweak.
>>  - Change concatenation policy config from booleans to new ConcatPolicy enum.
>>  - Merge branch 'master' into JDK-8322256
>>  - Bump @since from 23 to 24.
>>  - Merge branch 'master' into JDK-8322256
>>  - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
>>  - Simplify code by eliminating an impossible case.
>>  - Field name change and Javadoc wording tweaks.
>>  - Merge branch 'master' into JDK-8322256
>>  - ... and 6 more: https://git.openjdk.org/jdk/compare/332df83e...72b60092
>
> First thank you for your efforts and patience with this PR Archie.
> 
> I am trying to better grasp the problem to be solved outside of better 
> clarity surrounding how GZIPInputStream handles concatenated gzip files
> 
> - I could see possibly  providing an enum which indicates to stop 
> decompressing after processing the first gzip file or to decompress until the 
> end of input.  
> - This would be similar to what  the commons-compress 
> GzipCompressorInputStream class does.
> - It is also what Eirik and Jai are suggesting I believe.  
> - If support for processing multiple gzip files is enabled, which would be 
> the default, then the existing behavior is not changed.  
> - As I have mentioned previously, we should add more testing including taking 
> a couple of concatenated examples created via the gzip tool (or equivalent) 
> give how light our coverage is.
> 
> I don't believe we want to do more than is needed given the age of the API 
> and there is been minimal requests for improvements at this time.

Hi @LanceAndersen,

Thanks for taking a look. Let's achieve consensus on what the goal(s) should be 
here, and then I think a proper API for that will come naturally. Up until now 
the discussion has been a shifting blend of API design and API goals which 
makes things a little bit chaotic (probably my fault).

**Proposed Goal №1:**

This one seems relatively non-controversial: Allow creation of a 
`GZIPInputStream` that only decompresses one GZIP stream.

OK but what does that mean exactly?
* Option A: The `GZIPInputStream` will never read past the end of the GZIP 
trailer frame
  * Sounds great, but how exactly would that work?
* Option A.1: Caller must provide a `BufferedInputStream`, so we can back 
it up if any extra bytes have been read past the trailer frame
* Option A.2: `GZIPInputStream` provides a way for caller to access the 
extra bytes read once EOF is detected
* Option A.3: `GZIPInputStream` never buffers more than 8 bytes (size of 
trailer frame)
* Option B: `GZIPInputStream` attempts to read the byte after the trailer frame 
and throws `IOException` if EOF is not returned

My opinion: All of the A options are unsatisfactory (although A.1 is the least 
bad), so it seems option B is the answer here.

**Proposed Goal №2:**

This one seems more controversial: Allow the caller to configure 
`GZIPInputStream` for "strict mode".

Just to be clear and restate what this is referring to. Currently, 
`GZIPInputStream` is hard-wired for "lenient mode". This means that it is 
indeterminate (a) how many additional bytes (if any) were read beyond the GZIP 
trailer frame, and (b) whether reading stopped due to EOF, invalid data, or an 
underlying`IOException`.

My opinion: this is, at best, a random intermittent bug - **and at worst a 
giant security hole** - waiting to happen.

Reasoning: If the caller is OK with lenient mode, then the caller obviously 
doesn't (or shouldn't) care about the data that follows the input stream (if 
any), because the caller won't even know whether there was any, how much of it 
has been discarded (if any), or whether it was actually another valid GZIP 
stream that got corrupted or threw an`IOException`.

So why would any caller actually want to use this feature to concatenate 
anything useful after a GZIP trailer frame?

In particular: if an underlying `IOException` occurs at just the wrong time, an 
application that wrote `GZIP-STREAM-1`, `GZIP-STREAM-2`, `GZIP-STREAM-3` could 
read back `GZIP-STREAM-1`, `GZIP-STREAM-2` and never know that anything was 
wrong.

How useful is concatenated GZIP stream support if it's not even reliable??

This just seems completely wonky to me (and oh, did I mention it's a potential 
security hole waiting to happen? :)

It seems like if concatenation is going to be a supported feature, then it 
should be possible to access a _reliable_ version of it. In other words, there 
should be a way for an a

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v7]

2024-07-26 Thread Archie Cobbs

On Fri, 26 Jul 2024 16:12:11 GMT, Jaikiran Pai  wrote:

>> Archie Cobbs has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 16 commits:
>> 
>>  - Merge branch 'master' into JDK-8322256
>>  - Wording tweak.
>>  - Change concatenation policy config from booleans to new ConcatPolicy enum.
>>  - Merge branch 'master' into JDK-8322256
>>  - Bump @since from 23 to 24.
>>  - Merge branch 'master' into JDK-8322256
>>  - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
>>  - Simplify code by eliminating an impossible case.
>>  - Field name change and Javadoc wording tweaks.
>>  - Merge branch 'master' into JDK-8322256
>>  - ... and 6 more: https://git.openjdk.org/jdk/compare/332df83e...72b60092
>
> Hello Archie, I don't think we need any enum in the API. GZIPInputStream 
> class should only allow parsing GZIP streams. The InputStream passed to the 
> GZIPInputStream constructor must be a InputStream which has either 1 GZIP 
> stream or N GZIP streams (concatenated). The user should be allowed to state 
> whether the passed InputStream is expected contain more than 1 GZIP stream. 
> This is the only addition that I think we should be introducing. 
> 
> The way to allow users to state that the InputStream being passed is allowed 
> to have more than 1 GZIP stream is through the new proposed boolean to the 
> constructor. The boolean can be named "allowConcatenated".
> 
> When allowConcatenated is true, then during the read() operation on the 
> GZIPInputStream, when we internally read a GZIP stream trailer, we should 
> check if the underlying InputStream has a next GZIP stream header:
>   - If there's no bytes present after the trailer, we just return 
> whatever data we have inflated so far and that also would be the 
> end-of-stream of the GZIPInputStream. 
>   - If there's bytes after the trailer, then those next bytes MUST be the 
> next GZIP stream header. If those bytes aren't a GZIP stream header, then we 
> throw an IOException. This implies whatever data we had deflated and any 
> additional bytes that we had read after the trailer will not be returned to 
> the user. That's fine because the underlying InputStream was "corrupt" - the 
> GZIPInputStream shouldn't expect any other data other than GZIP stream(s) in 
> the underlying InputStream.
> 
> When allowConcatenated is false, then during the read() operation on the 
> GZIPInputStream, when we internally read a GZIP stream trailer, we should 
> check if the underlying InputStream has any more bytes remaining:
>   - If there are bytes remaining then we consider that an error and raise 
> an IOException. We do this because we were instructed not to allow 
> concatenated GZIP streams. So presence of any data after the first GZIP 
> stream trailer should be an exception.
>   - If there are no more bytes remaining then we have reached the 
> end-of-stream and we return back whatever GZIP data we have inflated during 
> the read operation and that also would be the end-of-stream of the GZIPStream 
> instance.
> 
> The default value for allowConcatenated should be true to allow for 
> supporting concatenated GZIP streams (like we do now). There however will be 
> a change in behaviour if the stream content after the trailer isn't a GZIP 
> stream. With this new proposed change we will throw an IOException ...

Hi @jaikiran,

> There however will be a change in behaviour if the stream content after the 
> trailer isn't a GZIP stream. With this new proposed change we will throw an 
> IOException (unlike previously). I think that's fine and in fact the correct 
> thing to do.

Ah! This simplifies everything. I was just assuming that preserving the 
existing behavior was a hard requirement - but if not, we can make this class 
"always precise", and the only thing left to configure is whether to allow 1 
stream or N streams. I am in complete agreement with you now, especially 
because this eliminates the aforementioned security concern.

I'm assuming you and @LanceAndersen have communicated and he is OK with this as 
well. I will update the PR and CSR.

Thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2253125776

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v7]

2024-07-26 Thread Archie Cobbs

On Fri, 26 Jul 2024 16:46:28 GMT, Archie Cobbs  wrote:

> With this new proposed change we will throw an IOException (unlike 
> previously). I think that's fine and in fact the correct thing to do.

I've run into a problem.

With this change, `GZIPInZip.java` fails; the test is annotated like this:

/* @test
 * @bug 7021870 8023431 8026756
 * @summary Reading last gzip chain member must not close the input stream.
 *  Garbage following gzip entry must be ignored.
 */

Note the "Garbage following gzip entry must be ignored" line. This must be what 
bugs 8023431 8026756 refer to, but those bugs are not accessible ("You can't 
view this issue. It may have been deleted or you don't have permission to view 
it.") so I can't see what the problem was that required ignoring the extra 
garbage. Maybe you guys have access or know what this was about? Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2253721455

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-07-27 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request incrementally with two additional 
commits since the last revision:

 - Refactor to eliminate "lenient" mode (still failng test: GZIPInZip.java).
 - Add GZIP input test for various gzip(1) command outputs.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18385/files
  - new: https://git.openjdk.org/jdk/pull/18385/files/72b60092..f3939c05

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=06-07

  Stats: 282 lines in 3 files changed: 137 ins; 89 del; 56 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-07-29 Thread Archie Cobbs

On Mon, 29 Jul 2024 13:06:24 GMT, Lance Andersen  wrote:

> So where does that leave us:
> 
>Keep the code as is and document the current behavior
>Continue to add additional test coverage for the current API
>We probably do not need a new constructor given it probably adds no new 
> additional value the existing API

Just so I understand... are you saying that you are OK with this class being 
fundamentally unreliable _and_ providing no way to avoid that unreliability? 
(Just to restate a previous example of this: if an underlying `IOException` 
occurs at just the wrong time, an application that wrote `GZIP-STREAM-1`, 
`GZIP-STREAM-2`, `GZIP-STREAM-3` could read back `GZIP-STREAM-1`, 
`GZIP-STREAM-2` and never know that anything was wrong.)

If that's _not_ what you're saying, then I don't the understand "We probably do 
not need a new constructor" part.

-

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2256010579

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-07-29 Thread Archie Cobbs

On Mon, 29 Jul 2024 14:27:28 GMT, Lance Andersen  wrote:

> ... we really need more datapoint to better understand the risks/benefits in 
> order to make an informed decision.

Agreed.

Here's some what I've come up with after a little bit of research:

First, we shouldn't confuse GZIP with ZIP files. They are only indirectly 
related:
* The [DEFLATE format](https://en.wikipedia.org/wiki/Deflate) is a single file 
compression format
* The [GZIP format](https://en.wikipedia.org/wiki/Gzip) is a wrapper around the 
DEFLATE format.
* Individual entries in a [ZIP 
archive](https://en.wikipedia.org/wiki/ZIP_(file_format)) is typically 
DEFLATE'd, but never GZIP'd

The "overlap" that is relevant here is simply when the file you are storing in 
a ZIP file happens to itself be a GZIP file e.g. `foo.gz` (so yes, you are 
likely compressing `foo` twice).

Specifically, the failing test `GZIPInZip.java` is doing this:
1. Create a normal GZIP data file `b.gz`
1. Create a normal ZIP file
1. Write `b.gz` _plus one extra byte_ into the ZIP file
1. Close the ZIP file
1. Open the ZIP file
1. Find the `InputStream` corresponding to the entry for `b.gz`
1. Read and decompress it with a `GZIPInputStream`
1. Verify the extra byte was ignored

So this has nothing to do with the ZIP file format itself, rather it's related 
to the behavior of how other software might build ZIP files.

Nor does it directly relate to the GZIP format or how any GZIP decoder (taken 
in isolation) should behave.

So what I don't understand is what is the motivation for the behavior this test 
is verifying? It looks like, at best, a bug workaround for some unknown other 
broken software.

Here's the commit it was added:

commit 71f33254816a17ff741e0119e16db28181d7b43b
Author: Ivan Gerasimov 
Date:   Tue Oct 15 21:15:17 2013 +0400

8023431: Test java/util/zip/GZIP/GZIPInZip.java failed

Properly close PipedStreams. Additional testing for malformed input

Reviewed-by: darcy, sherman

Did you double check the unviewable issues? What does JDK-8023431 say?

I also took a look at how Apache's commons-compress `GzipCompressorInputStream` 
behaves.

`GzipCompressorInputStream` is the commons-compress version of 
`GZIPInputStream`. From what I can infer, part of its original motivation was 
that pre-JDK7 `GZIPInputStream` didn't support concatenated streams. This class 
provides an optional `decompressConcatenated` boolean constructor parameter 
(default `false`).

So how does `GzipCompressorInputStream` behave...?

* `IOException`'s thrown by the underlying input stream are never suppressed; 
this includes `IOException`'s thrown while reading "extra garbage" that may 
follow the first (if `decompressConcatenated = false`) or any (if 
`decompressConcatenated = true`) GZIP trailer frame.
* When `decompressConcatenated = false`, after reading the first GZIP trailer 
frame, the underlying input is `reset()` if possible to the stream's boundary. 
If `reset()` is not supported, some indeterminate number of extra bytes will 
have been read (note, this exception is missing from the Javadoc).
* When `decompressConcatenated = true`, the entire input stream is read, and it 
must contain only whole, valid GZIP streams with no trailing garbage.

Summary: GzipCompressorInputStream is never "lenient"; it either reads the 
whole stream (if `decompressConcatenated = true`) or else just the first GZIP 
stream and stops as soon as it can thereafter (exactly if `markSupported()`, 
othewise inexactly).

Side note: the reliance on `mark()`/`reset()` is a nice trick so to speak: if 
you can provide an input stream that supports it, you get precision in return, 
and this is accomplished without requiring any new API surface.

This brings up the question, how would commons-compress behave in the scenario 
being tested by the `GZIPInZip.java` test?

Answer: Let's assume you replace `new GZIPInputStream(zis)` with `new 
GzipCompressorInputStream(zis)` in that test. Then the test would succeed, 
because:
* By default, `GzipCompressorInputStream` sets `decompressConcatenated = false`.
* The extra byte might get read, but because `decompressConcatenated = false` 
decompression stops after the last byte of  `b.gz`

Overall the API and behavior design of `GzipCompressorInputStream` seems like a 
good one. I would even recommend that we adopt it, except that it's 
incompatible with the current behavior (we assume `decompressConcatenated = 
true` by default).

So here's a stab at what we might do (opening debate): Basically, we replicate 
the `GzipCompressorInputStream` behavior with the exception that our 
`allowConcatenated` defaults to `true` to preserve existing behavior. In 
summary:
* We utilize `mark()`/`reset()` to guarantee precise stopping behavior when 
`allowConcatenated = false` - but only if `markSupported()` returns true; 
otherwise, best effort/imprecise stop that may read past the GZIP end.
* We eliminate all "lenient" behavior - e.g., underlying `IOExcepti

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-07-30 Thread Archie Cobbs

On Tue, 30 Jul 2024 17:35:33 GMT, Lance Andersen  wrote:

> Based on the above, I am reluctant to change the current behavior given it 
> appears to have been modeled after gzip/gunzip as well as WinZip.

That's a reasonable conclusion... and I have no problem with preserving the 
existing behavior if that's what we want. But in that case I would lobby that 
we should also provide some new way to configure a `GZIPInputStream` that 
guarantees reliable behavior.

The key question here is: "Exactly what current behavior of `new 
GZIPInputStream(in)` do we want to preserve?"

Let's start by assuming that we want your above test to pass. Putting that into 
words: "Given a single GZIP stream followed by trailing garbage, `new 
GZIPInputStream(in)` should successfully decode the GZIP stream and ignore the 
trailing garbage".

Note however that what `new GZIPInputStream(in)` currently provides is stronger 
than that:
1. Trailing garbage is ignored
1. Any `IOException` thrown while reading trailing garbage is ignored
1. Concatenated streams are automatically decoded

So we know we want to preserve 1 - What about 2 and/or 3? Your thoughts?

My personal opinions:

* I think 2 is inherently bad and it should not be implemented in any variant
* I think 3 is not required _by default_, but one should be able to enable it 
somehow

If we were to accept those opinions (preserving only 1), then we would end up 
at the same place as `GzipCompressorInputStream`:
* Underlying `IOException`'s are never suppressed
* `new GZIPInputStream(in)` decodes only one GIZP stream and ignores any 
trailing garbage
* `new GZIPInputStream(in, true)` decodes concatenated streams; trailing 
garbage causes `IOException`

-

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2258919532

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-07-30 Thread Archie Cobbs

On Sat, 27 Jul 2024 15:00:51 GMT, Archie Cobbs  wrote:

>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
>> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
>> order to do this, after a GZIP trailer frame is read, it attempts to read a 
>> GZIP header frame and, if successful, proceeds onward to decompress the new 
>> stream. If the attempt to decode a GZIP header frame fails, or happens to 
>> trigger an `IOException`, it just ignores the trailing garbage and/or the 
>> `IOException` and returns EOF.
>> 
>> There are several issues with this:
>> 
>> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
>> trailing garbage are not documented, much less precisely specified.
>> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
>> or other data corruption that an application would rather be notified about. 
>> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
>> data is read, but obviously that doesn't happen in the trailing garbage 
>> scenario (so N concatenated streams where the last one has a corrupted 
>> header frame is indistinguishable from N-1 valid streams).
>> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
>> stream concatenation.
>> 
>> On the other hand, `GZIPInputStream` is an old class with lots of existing 
>> usage, so it's important to preserve the existing behavior, warts and all 
>> (note: my the definition of "existing behavior" here includes the bug fix in 
>> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
>> 
>> So this patch adds a new constructor that takes two new parameters 
>> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
>> enabled by setting both to true; otherwise, they do what they sound like. In 
>> particular, when `allowTrailingGarbage` is false, then the underlying input 
>> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
>> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
>> exception is guaranteed.
>
> Archie Cobbs has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Refactor to eliminate "lenient" mode (still failng test: GZIPInZip.java).
>  - Add GZIP input test for various gzip(1) command outputs.

Another (more conservative) possibility is to preserve both 1 ("Trailing 
garbage is ignored") and 3 ("Concatenated streams are automatically decoded") 
in the default configuration.

Then basically all we would be changing is no longer suppressing 
`IOException`'s.

And then - as a separate remaining question - if we wanted to also provide more 
control there could be new constructor(s) to control concatenation and/or 
tolerance for trailing garbage.

(In all cases, I think using `mark()`/`reset()` (when available) to make 
trailing garbage detection precise is a good idea.)

-

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2259012402

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-07-31 Thread Archie Cobbs

On Sat, 27 Jul 2024 15:00:51 GMT, Archie Cobbs  wrote:

>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
>> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
>> order to do this, after a GZIP trailer frame is read, it attempts to read a 
>> GZIP header frame and, if successful, proceeds onward to decompress the new 
>> stream. If the attempt to decode a GZIP header frame fails, or happens to 
>> trigger an `IOException`, it just ignores the trailing garbage and/or the 
>> `IOException` and returns EOF.
>> 
>> There are several issues with this:
>> 
>> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
>> trailing garbage are not documented, much less precisely specified.
>> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
>> or other data corruption that an application would rather be notified about. 
>> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
>> data is read, but obviously that doesn't happen in the trailing garbage 
>> scenario (so N concatenated streams where the last one has a corrupted 
>> header frame is indistinguishable from N-1 valid streams).
>> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
>> stream concatenation.
>> 
>> On the other hand, `GZIPInputStream` is an old class with lots of existing 
>> usage, so it's important to preserve the existing behavior, warts and all 
>> (note: my the definition of "existing behavior" here includes the bug fix in 
>> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
>> 
>> So this patch adds a new constructor that takes two new parameters 
>> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
>> enabled by setting both to true; otherwise, they do what they sound like. In 
>> particular, when `allowTrailingGarbage` is false, then the underlying input 
>> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
>> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
>> exception is guaranteed.
>
> Archie Cobbs has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Refactor to eliminate "lenient" mode (still failng test: GZIPInZip.java).
>  - Add GZIP input test for various gzip(1) command outputs.

It seems like we're going around in circles a bit here... but maybe the circles 
are getting smaller? But I'm also running out of steam...

> We don't want to change this long standing behavior as it has the potential 
> of breaking existing applications and it is consistent with gzip and also 
> winzip.

Sounds great. In fact my original plan was to not change the existing behavior 
at all (by putting all of the new/improved behavior behind a new constructor)...

> So through this PR, we should clarify the javadoc as to what current 
> GZIPInputStream implementation does and add additional tests to expand the 
> coverage

Event just doing this has some subtlety to it... Note, none of the current 
behavior is actually specified (the entirety of the Javadoc says: "This class 
implements a stream filter for reading compressed data in the GZIP file 
format"), so any description we add creates a new actual specification for the 
future.

Are you sure we want to do that for all of the current behavior?

I would think maybe for some we do and for some we don't:
* Swallowing `IOException`'s - this we should not specify, so that we may 
someday eliminate it to make this class reliable
* Concatenated GZIP's - this would be good to document
* Reading extra bytes - the only thing to document would be that "there is a 
chance extra bytes will be read" which is not very useful, so what's the point?

What of that would you want to specify?

My original idea was to only add specification for the new/improved 
functionality, and leave the existing functionality. If we go this route, then 
the only thing to decide would be what new/improved functionality we want to 
add.

> A separate discussion can take place to discuss the merits of whether there 
> is perceived value in throwing an IOException when trailing garbage is 
> encountered as well as any benefit of not supporting concatenated gzip files. 
> It will also allow time for further review of other tools/apis that support 
> gzip to see what they may or may not do.

Guess I thought we were sort-of having those discussions now...

What do we agree on?
* Supporting concatenation is a good thing
* Supporting ignoring trailing garbage is a good thing

What is there debate about?
* Swallowing trailin

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-08-02 Thread Archie Cobbs

On Fri, 2 Aug 2024 13:12:00 GMT, Jaikiran Pai  wrote:

>> Archie Cobbs has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - Refactor to eliminate "lenient" mode (still failng test: GZIPInZip.java).
>>  - Add GZIP input test for various gzip(1) command outputs.
>
> Hello Archie,
> 
>> It seems like we're going around in circles a bit here... but maybe the 
>> circles are getting smaller?
> 
> We try to avoid/reduce changes in zip, gzip and other similar areas because 
> of several open to interpretation parts of the specification and the fact 
> that there are tools outside of the JDK (like the one we found out from 
> Lance's example) which behave in a certain manner. I agree we sometimes do 
> end up having to re-evaluate the original proposal. It's not a bad thing 
> though. In fact the current discussion in this PR and the experiments have 
> been useful to go back and check why the code behaves the way it currently 
> does.
> 
>> IMO doing nothing is a bad option, because as it stands today the class is 
>> fundamentally unreliable
> 
> Lance, me and others have been discussing the changes in this PR for the past 
> several days. We started this off with an intention to introduce a new 
> constructor to allow more tighter control over not processing the "corrupt" 
> bytes after a GZIP trailer. Experiments with the other tools (gunzip, winzip 
> and even 7zip) has shown us that any garbage/corrupt data after a GZIP 
> trailer is considered equivalent to end of stream and any deflated content 
> thus far is returned from that stream - just like what the `GZIPInputStream` 
> currently does in mainline. So its behaviour is at least consistent with 
> these other commonly used tools. 
> 
> 
>> Are you sure we want to do that for all of the current behavior?
>>
>> I would think maybe for some we do and for some we don't:
>>
>>Swallowing IOException's - this we should not specify, so that we may 
>> someday eliminate it to make this class reliable
>>Concatenated GZIP's - this would be good to document
>>Reading extra bytes - the only thing to document would be that "there is 
>> a chance extra bytes will be read" which is not very useful, so what's the 
>> point?
>>
>> What of that would you want to specify?
> 
> The `GZIPInputStream` currently states:
> 
>> This class implements a stream filter for reading compressed data in the 
>> GZIP file format.
> 
> and that's it, nothing more. What we want to do is specify that the 
> InputStream that is passed to the GZIPInputStream constructor is allowed to 
> have one or more GZIP streams. A stream with more than one GZIP stream is 
> considered a concatenated GZIP stream. The GZIPInputStream will stop 
> processing any further content after an individual GZIP stream's trailer if 
> that content doesn't represent a valid GZIP stream header of the next GZIP 
> stream. 
> (Of course, it mi...

Hi @jaikiran,

Thanks. Sorry to sound frustrated. It's not due to the back & forth with you 
guys but more because there are several subtle, overlapping technical issues 
(and judgement calls) and hashing it all out over the simplex channel of github 
PR comments is inherently cumbersome.

I do want to clarify one thing...

> Experiments with the other tools (gunzip, winzip and even 7zip) has shown us 
> that any garbage/corrupt data after a GZIP trailer is considered equivalent 
> to end of stream and any deflated content thus far is returned from that 
> stream - just like what the GZIPInputStream currently does in mainline. So 
> its behaviour is at least consistent with these other commonly used tools.

Yes I agree with that. My main worry is with the swallowing of `IOException`'s 
from the underlying stream. In theory a carefully timed malicious TCP reset 
could result in truncated data (even encrypted data) being read without any 
error reported. For some reason I'm morally offended by this :) I'm also 
skeptical there is code out there that actually relies on this part of the 
current behavior. As a result I think this particular behavior should be at 
least a candidate for deprecation, if not outright fixing or an alternative 
provided. But of course I could be wrong and realize my opinion is just one of 
many.

> What we want to do is specify that the InputStream that is passed to the 
> GZIPInputStream constructor is allowed to have one or more GZIP streams. A 
> stream with more than one GZIP stream is considered a concatenated GZIP 
> stream. The GZIPInputStream will stop processing any further content after an 
> indivi

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v9]

2024-08-11 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Revert all functional changes, leaving only tests & Javadoc.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18385/files
  - new: https://git.openjdk.org/jdk/pull/18385/files/f3939c05..a69bf2a9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=07-08

  Stats: 133 lines in 3 files changed: 18 ins; 84 del; 31 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-08-11 Thread Archie Cobbs

On Sat, 27 Jul 2024 15:00:51 GMT, Archie Cobbs  wrote:

>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
>> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
>> order to do this, after a GZIP trailer frame is read, it attempts to read a 
>> GZIP header frame and, if successful, proceeds onward to decompress the new 
>> stream. If the attempt to decode a GZIP header frame fails, or happens to 
>> trigger an `IOException`, it just ignores the trailing garbage and/or the 
>> `IOException` and returns EOF.
>> 
>> There are several issues with this:
>> 
>> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
>> trailing garbage are not documented, much less precisely specified.
>> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
>> or other data corruption that an application would rather be notified about. 
>> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
>> data is read, but obviously that doesn't happen in the trailing garbage 
>> scenario (so N concatenated streams where the last one has a corrupted 
>> header frame is indistinguishable from N-1 valid streams).
>> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
>> stream concatenation.
>> 
>> On the other hand, `GZIPInputStream` is an old class with lots of existing 
>> usage, so it's important to preserve the existing behavior, warts and all 
>> (note: my the definition of "existing behavior" here includes the bug fix in 
>> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
>> 
>> So this patch adds a new constructor that takes two new parameters 
>> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
>> enabled by setting both to true; otherwise, they do what they sound like. In 
>> particular, when `allowTrailingGarbage` is false, then the underlying input 
>> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
>> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
>> exception is guaranteed.
>
> Archie Cobbs has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Refactor to eliminate "lenient" mode (still failng test: GZIPInZip.java).
>  - Add GZIP input test for various gzip(1) command outputs.

OK I've reverted any functional changes. The net result should just be 
additional Javadoc and two new unit tests. Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2282855270

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

2024-08-21 Thread Archie Cobbs

On Wed, 31 Jul 2024 15:51:48 GMT, Lance Andersen  wrote:

>> Another (more conservative) possibility is to preserve both 1 ("Trailing 
>> garbage is ignored") and 3 ("Concatenated streams are automatically 
>> decoded") in the default configuration.
>> 
>> Then basically all we would be changing is no longer suppressing 
>> `IOException`'s.
>> 
>> And then - as a separate remaining question - if we wanted to also provide 
>> more control there could be new constructor(s) to control concatenation 
>> and/or tolerance for trailing garbage.
>> 
>> (In all cases, I think using `mark()`/`reset()` (when available) to make 
>> trailing garbage detection precise is a good idea.)
>
>> Another (more conservative) possibility is to preserve both 1 ("Trailing 
>> garbage is ignored") and 3 ("Concatenated streams are automatically 
>> decoded") in the default configuration.
>> 
>> Then basically all we would be changing is no longer suppressing 
>> `IOException`'s.
>> 
>> And then - as a separate remaining question - if we wanted to also provide 
>> more control there could be new constructor(s) to control concatenation 
>> and/or tolerance for trailing garbage.
>> 
>> (In all cases, I think using `mark()`/`reset()` (when available) to make 
>> trailing garbage detection precise is a good idea.)
> 
> We don't want to change this long standing behavior as it has the potential 
> of breaking existing applications and it is consistent with gzip and also 
> winzip.
> 
> So through this PR, we should clarify the javadoc as to what  current 
> GZIPInputStream implementation  does  and add additional tests to  expand the 
> coverage
> 
> A separate discussion can take place to discuss the merits of whether there 
> is perceived value in throwing an IOException when trailing garbage is 
> encountered as well as any benefit of not supporting concatenated gzip files. 
>  It will also allow time for further review of other tools/apis that support 
> gzip to see what they may or may not do.

@LanceAndersen & @jaikiran,

When you get a chance please review the 
[CSR](https://bugs.openjdk.org/browse/JDK-8330195) and let me know if you think 
the new wording is appropriate.

Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2302856157

Re: New candidate JEP: 484: Class-File API

2024-08-27 Thread Archie Cobbs

Question... would it be appropriate for this JEP to mention that support
for older-than-current classfile versions is an explicit non-goal?

Otherwise I think there could be many repeats of this discussion

from the other day.

To be clear, I don't disagree with the design choice, I just think it might
be worthwhile to address that point directly and clarify the thinking
behind it so there's no ambiguity.

As it's written today, a casual reading of the JEP comes across as if we're
talking about a great new JDK-sanctioned tool with state of the art design
that will help get all of the classfile manipulation libraries on the same
page to allow analysis/transformation of any class file on the classpath.
Or at least, it doesn't do anything to dispel that notion (unless I'm
missing something).

-Archie

On Tue, Aug 27, 2024 at 8:58 AM Mark Reinhold 
wrote:

> https://openjdk.org/jeps/484
>
>   Summary: Provide a standard API for parsing, generating, and
>   transforming Java class files.
>
> - Mark

-- 
Archie L. Cobbs

Re: Namespace Prefix Issue in XML to XSL Transformation

2024-08-27 Thread Archie Cobbs

Hi Hempushpa,

What you describe appears to be this issue: JDK-8168664
.

-Archie

On Tue, Aug 27, 2024 at 12:29 AM Hempushpa Sahu 
wrote:

> Problem Description :
>
> XSLT transformation creating unique namespace prefixes.
>
>
>
> Analysis & Observation :
>
> When upgrading from Java 8 to Java 17, the XSLT transformation is
> generating a new namespace prefix for every XML element, even when the
> namespace value matches that of the parent element. This leads to large XML
> file transformations exceeding the file size and memory limits on our
> systems.
>
>
>
> The behaviour in OpenJDK has remained consistent since JDK 8; however, the
> namespace prefix issue is not something we find acceptable.
>
>
>
> Our investigation into the OpenJDK code led us to defect
> https://bugs.openjdk.org/browse/JDK-8167179, which addressed the
> namespace prefix issue in the OpenJDK 8 release. Despite this, we are still
> able to reproduce the issue in OpenJDK versions 8, 11, 17, and 22.
>
>
>
>
>
> Releases:
>
> OpenJDK 8, 11, 17 & 22 the issue is seen.
>
>
>
> Next steps:
>
> Please review and suggest if the above understanding is right. And please
> suggest solution to resolve the issue.
>
>
>
> NOTE:
>
> Please find the attachment with below files-
>
> XML file - sourceFile.xml
>
> XSL file - ie_si_to_spe.xsl
>
> Current output - sourceFile-test-transform.xml
>
> Expected output - sourceFile-expected-transform.xml
>
>
>


-- 
Archie L. Cobbs

Re: [External] : Re: New candidate JEP: 484: Class-File API

2024-08-28 Thread Archie Cobbs

Hi Adam,

My apologies, this was a misunderstanding on my part. I think I misread this
comment
<https://mail.openjdk.org/pipermail/core-libs-dev/2024-August/127983.html>
and didn't notice the word "preview":

Unfortunately, ClassFile API's scope is only to interpret correctly the
> Class Files that are accepted by the current VM and support writing such
> class files; for example, for release N, we will not support correct
> parsing or writing of *preview* class files from N-1, N-2, etc.


So I mistakenly thought it was saying the ClassFile API in JDK N would not
support class files from previous versions N-1, etc.

-Archie

On Wed, Aug 28, 2024 at 2:13 AM Adam Sotona  wrote:

> Class files conforming to the Chapter 4. “The class File Format” of the
> Java Virtual Machine Specification are supported. It covers class file
> versions far to the history.
>
>
>
> Feel free to report cases where the support is insufficient.
>
>
>
> Thank you,
>
> Adam
>
>
>
> *From: *Archie Cobbs 
> *Date: *Tuesday, 27 August 2024 at 18:49
> *To: *core-libs-dev@openjdk.org 
> *Cc: *Adam Sotona , jdk-...@openjdk.org <
> jdk-...@openjdk.org>
> *Subject: *[External] : Re: New candidate JEP: 484: Class-File API
>
> Question... would it be appropriate for this JEP to mention that support
> for older-than-current classfile versions is an explicit non-goal?
>
>
>
> Otherwise I think there could be many repeats of this discussion
> <https://mail.openjdk.org/pipermail/core-libs-dev/2024-August/127982.html>
> from the other day.
>
>
>
> To be clear, I don't disagree with the design choice, I just think it
> might be worthwhile to address that point directly and clarify the thinking
> behind it so there's no ambiguity.
>
>
>
> As it's written today, a casual reading of the JEP comes across as if
> we're talking about a great new JDK-sanctioned tool with state of the art
> design that will help get all of the classfile manipulation libraries on
> the same page to allow analysis/transformation of any class file on the
> classpath. Or at least, it doesn't do anything to dispel that notion
> (unless I'm missing something).
>
>
>
> -Archie
>
>
>
> On Tue, Aug 27, 2024 at 8:58 AM Mark Reinhold 
> wrote:
>
> https://openjdk.org/jeps/484
>
>   Summary: Provide a standard API for parsing, generating, and
>   transforming Java class files.
>
> - Mark
>
>
>
> --
>
> Archie L. Cobbs
>


-- 
Archie L. Cobbs

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v9]

2024-08-28 Thread Archie Cobbs

On Wed, 28 Aug 2024 17:50:36 GMT, Lance Andersen  wrote:

>> Archie Cobbs has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Revert all functional changes, leaving only tests & Javadoc.
>
> src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 43:
> 
>> 41:  * frame that marks the end of the compressed data. Therefore it's 
>> possible for the underlying
>> 42:  * input to contain additional data beyond the end of the compressed 
>> GZIP data. In particular,
>> 43:  * some GZIP compression tools function by partitioning the input, 
>> compressing each parttion
> 
> I think we need to simplify the above  keeping the focus on that 
> GZIPInputStream supports the reading of concatenated  Gzip streams.  After 
> reading  a streams trailer(though some docs refer to it as a footer, but I 
> think we go with trailer),  if an additional gzip header is found, the stream 
> will be decompressed... etc.. 
> 
> If there is additional data/byes after the trailer that does not represent 
> another gzip, header, it will indicate you have reached the end of the input 
> stream

Sounds good. How would you like to do that?

E.g. we could just remove the words "In particular, some GZIP compression tools 
function by partitioning the input, compressing each partition separately, and 
then concatenating the resulting compressed data streams. To support this kind 
of input".  We could also remove "In the latter case, the number of additional 
bytes that were read is unspecified." Or something else (what would you 
suggest?)

Thanks.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1735101196

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v10]

2024-08-28 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Shorten the description of concatenation behavior per review comments.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18385/files
  - new: https://git.openjdk.org/jdk/pull/18385/files/a69bf2a9..34f2a21c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=08-09

  Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Withdrawn: 8322256: Define and document GZIPInputStream concatenated stream semantics

2024-08-29 Thread Archie Cobbs

On Tue, 19 Mar 2024 21:48:14 GMT, Archie Cobbs  wrote:

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

This pull request has been closed without being integrated.

-

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v10]

2024-08-29 Thread Archie Cobbs

On Wed, 28 Aug 2024 21:56:50 GMT, Archie Cobbs  wrote:

>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
>> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
>> order to do this, after a GZIP trailer frame is read, it attempts to read a 
>> GZIP header frame and, if successful, proceeds onward to decompress the new 
>> stream. If the attempt to decode a GZIP header frame fails, or happens to 
>> trigger an `IOException`, it just ignores the trailing garbage and/or the 
>> `IOException` and returns EOF.
>> 
>> There are several issues with this:
>> 
>> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
>> trailing garbage are not documented, much less precisely specified.
>> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
>> or other data corruption that an application would rather be notified about. 
>> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
>> data is read, but obviously that doesn't happen in the trailing garbage 
>> scenario (so N concatenated streams where the last one has a corrupted 
>> header frame is indistinguishable from N-1 valid streams).
>> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
>> stream concatenation.
>> 
>> On the other hand, `GZIPInputStream` is an old class with lots of existing 
>> usage, so it's important to preserve the existing behavior, warts and all 
>> (note: my the definition of "existing behavior" here includes the bug fix in 
>> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
>> 
>> So this patch adds a new constructor that takes two new parameters 
>> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
>> enabled by setting both to true; otherwise, they do what they sound like. In 
>> particular, when `allowTrailingGarbage` is false, then the underlying input 
>> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
>> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
>> exception is guaranteed.
>
> Archie Cobbs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Shorten the description of concatenation behavior per review comments.

Thanks for all the helpful comments. Unfortunately due to unrelated/personal 
reasons I need to pause working on this issue.

Closing this PR for now but will plan to possibly reopen sometime in the future.

-

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2318155817

Re: Draft: Deprecate toLowerCase()/toUpperCase() and provide locale insensitive alternative

2023-04-11 Thread Archie Cobbs

On Tue, Apr 11, 2023 at 4:52 PM Glavo  wrote:

> i'm not sure that the methods lowercase() and uppercase() are a good
>> addition.
>> Their names are too close to toLowerCase/toUpperCase thus too easy to
>> misuse ones for the others.
>>
>

> I'm fine with users having to write toLowerCase(Locale.ROOT) or
>> toUpperCase(Locale.Root) because this is already what we teach.
>
>
> This is the current situation, but I don't think it is good.
>
> This idiom forces users to import Locale even in locale-insensitive
> scenarios.
>

Possible compromise? How about toUpper() and toLower()?

This would be consistent with the trend whereby newer replacement names
tend to be less verbose (e.g., Enumeration.hasMoreElements() ->
Iterator.hasNext()), while also being different enough that people
shouldn't mistake them for something that already exists, and will
understand that they must mean something new.

-Archie

-- 
Archie L. Cobbs

Re: Draft: Deprecate toLowerCase()/toUpperCase() and provide locale insensitive alternative

2023-04-13 Thread Archie Cobbs

On Thu, Apr 13, 2023 at 9:14 AM Roger Riggs  wrote:

> And yes, it is a problem to cause many warnings; it creates an immediate
> and pressing need for projects that have tight source requirements and
> don't allow warnings, for example, the JDK itself.
>

Hmm.. I'd agree with this statement if most of the warnings were false
positives. But the argument here is that (quoting): "**Almost all** use
cases of these two methods are misuse"

In that case, isn't there something a little backwards about saying we
should continue sweeping them under the rug? (Am I being too idealistic?)

-Archie

-- 
Archie L. Cobbs

Re: RFR: 8305748: Clarify reentrant behavior of close() in FileInputStream, FileOutputStream, and RandomAccessFile [v3]

2023-05-09 Thread Archie Cobbs

On Fri, 7 Apr 2023 18:24:42 GMT, Archie Cobbs  wrote:

>> IO stream classes like `FileOutputStream` can have assocated NIO channels.
>> 
>> When `close()` is invoked on one of these classes, it in turn invokes 
>> `close()` on the associated channel (if any). But when the associated 
>> channel's `close()` method is invoked, it in turn invokes `close()` on the 
>> associated stream (if any).
>> 
>> As a result, these IO stream `close()` methods invoke themselves reentrantly 
>> when there is an associated channel.
>> 
>> This is not a problem for these classes because they are written to handle 
>> this (i.e., they are idempotent), but it can be surprising (or worse, just 
>> silently bug-inducing) for subclasses that override `close()`.
>> 
>> There are two possible ways to address this:
>> 1. Modify the code to detect and avoid the (unnecessary) reentrant 
>> invocations
>> 2. Add a `@implNote` to the Javadoc so subclass implementers are made aware
>> 
>> This patch takes the second, more conservative approach.
>
> Archie Cobbs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Address review comments.

pingbot

-

PR Comment: https://git.openjdk.org/jdk/pull/13379#issuecomment-1540952824

Integrated: 8305748: Clarify reentrant behavior of close() in FileInputStream, FileOutputStream, and RandomAccessFile

2023-05-10 Thread Archie Cobbs

On Thu, 6 Apr 2023 22:36:33 GMT, Archie Cobbs  wrote:

> IO stream classes like `FileOutputStream` can have assocated NIO channels.
> 
> When `close()` is invoked on one of these classes, it in turn invokes 
> `close()` on the associated channel (if any). But when the associated 
> channel's `close()` method is invoked, it in turn invokes `close()` on the 
> associated stream (if any).
> 
> As a result, these IO stream `close()` methods invoke themselves reentrantly 
> when there is an associated channel.
> 
> This is not a problem for these classes because they are written to handle 
> this (i.e., they are idempotent), but it can be surprising (or worse, just 
> silently bug-inducing) for subclasses that override `close()`.
> 
> There are two possible ways to address this:
> 1. Modify the code to detect and avoid the (unnecessary) reentrant invocations
> 2. Add a `@implNote` to the Javadoc so subclass implementers are made aware
> 
> This patch takes the second, more conservative approach.

This pull request has now been integrated.

Changeset: 0198afca
Author:Archie Cobbs 
Committer: Brian Burkhalter 
URL:   
https://git.openjdk.org/jdk/commit/0198afca3ac1a7c421b0669ae2180eee3e4f1482
Stats: 17 lines in 4 files changed: 16 ins; 0 del; 1 mod

8305748: Clarify reentrant behavior of close() in FileInputStream, 
FileOutputStream, and RandomAccessFile

Reviewed-by: alanb, bpb

-

PR: https://git.openjdk.org/jdk/pull/13379

Re: javac to emit static flag for local and anonymous classes in static contexts

2023-05-22 Thread Archie Cobbs

Hi Liang,

On Sun, May 21, 2023 at 11:29 PM -  wrote:

> Thus, I suggest emitting static flags for the InnerClasses attribute
> of anonymous/local classes in static/pre-initialization contexts,
>

Related to your question on PR#13656 ... local/anonymous classes in a
pre-initialization context (which we're now calling a "pre-construction"
context) are not static, they're just not instantiable until after
superclass construction.

-Archie

-- 
Archie L. Cobbs

Re: RFR: 8304478: Initial nroff manpage generation for JDK 22

2023-06-14 Thread Archie Cobbs

On Wed, 14 Jun 2023 04:53:58 GMT, David Holmes  wrote:

> The following changes, to javac.1, were never applied to the closed sources 
> and are "lost" by this update. These changes will need to be re-applied 
> directly in JDK 21 and JDK 22

Just curious, since you have access to the secret closed sources, can you not 
backport these changes yourself? Instead of just deleting them and expecting 
someone else to rescue them from oblivion?

To be clear, I'm not blaming you for this situation - the problem is that there 
exist "closed sources" at all (why??) - but that doesn't seem like a good 
excuse for deleting work without a clear path to ensuring it is preserved (and 
the simplest way to do that is to do it yourself). Obviously I'd do it myself 
if I could.

Thanks.

-

PR Comment: https://git.openjdk.org/jdk/pull/14462#issuecomment-1591849483

Re: RFR: 8304478: Initial nroff manpage generation for JDK 22

2023-06-14 Thread Archie Cobbs

On Wed, 14 Jun 2023 21:28:01 GMT, David Holmes  wrote:

> > Just curious, since you have access to the secret closed sources, can you 
> > not backport these changes yourself? Instead of just deleting them and 
> > expecting someone else to rescue them from oblivion?
> 
> @archiecobbs we (Oracle) will take care of restoring this text so please 
> don't be concerned about that. It will just be a temporary glitch. 

OK thanks. Putting my trust in you :)

-

PR Comment: https://git.openjdk.org/jdk/pull/14462#issuecomment-1592023452

RFR: 8322027: One XMLStreamException constructor fails to initialize cause

2023-12-13 Thread Archie Cobbs

One of the three `XMLStreamException` constructors that takes a `Throwable` 
fails to pass it to the superclass constructor.

This simple patch fixes that omission.

-

Commit messages:
 - Propagate cause to superclass constructor in XMLStreamException constructor.

Changes: https://git.openjdk.org/jdk/pull/17090/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17090&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8322027
  Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/17090.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17090/head:pull/17090

PR: https://git.openjdk.org/jdk/pull/17090

Re: RFR: 8322027: One XMLStreamException constructor fails to initialize cause

2023-12-13 Thread Archie Cobbs

On Wed, 13 Dec 2023 20:06:03 GMT, Archie Cobbs  wrote:

> One of the three `XMLStreamException` constructors that takes a `Throwable` 
> fails to pass it to the superclass constructor.
> 
> This simple patch fixes that omission.
> 
> It's worth considering if there is any code out there that is working around 
> this problem already by invoking `initCause()` manually. If so, that code 
> would start throwing an `IllegalStateException` after this change.
> 
> So a more conservative fix would be to just add another constructor taking 
> the same arguments in a different order. But then again that's not much 
> better than just saying "always use initCause() with the broken constructor", 
> i.e., don't change anything.
> 
> Hmm.

After filing this PR, I had some second thoughts and added them to the summary 
but by then it was too late for the `core-libs-dev` email, so I'll repeat here 
in case anyone has some comments:

> It's worth considering if there is any code out there that is working around 
> this problem already by invoking `initCause()` manually. If so, that code 
> would start throwing an `IllegalStateException` after this change.
> 
> So a more conservative fix would be to just add another constructor taking 
> the same arguments in a different order. But then again that's not much 
> better than just saying "always use `initCause()` with the broken 
> constructor", i.e., don't change anything.
> 
> Hmm.

-

PR Comment: https://git.openjdk.org/jdk/pull/17090#issuecomment-1854723627

Re: [External] : Re: Introduce constructor for PriorityQueue with existing collection and custom comparator

2023-12-14 Thread Archie Cobbs

On Thu, Dec 14, 2023 at 4:56 AM Viktor Klang 
wrote:

> I presume that the precondition to have the original collection be
> pre-ordered according to the supplied Comparator can be verified by
> checking before adding each element in the collection to the PQ that it
> compareTo equal-or-greater to the previous one?
>

Hmm...  something is not adding up.

Either (a) there is a precondition that the collection already be sorted or
(b) there's no such precondition.

In case (a):

   - Why isn't the new constructor invoking initElementsFromCollection()
   instead of initFromCollection()?
   - The precondition is not very obvious from the Javadoc description, and
   moreover what happens when the precondition is not met is not documented at
   all.

In case (b):

   - The Javadoc is misleading because it (ambiguously) implies there is a
   precondition with the wording "collection that orders its elements
   according to the specified comparator" (the referent of "that" is ambiguous
   - does it refer to the collection or the PriorityQueue?)

>From the PR description it seems clear that there is no such precondition.
So maybe the Javadoc should say this instead:

Creates a {@code PriorityQueue} containing the elements in the specified
> collection. The {@code PriorityQueue} will order its elements according to
> the specified comparator.
>

-Archie

-- 
Archie L. Cobbs

Re: RFR: 8322027: One XMLStreamException constructor fails to initialize cause

2023-12-14 Thread Archie Cobbs

On Wed, 13 Dec 2023 20:06:03 GMT, Archie Cobbs  wrote:

> One of the three `XMLStreamException` constructors that takes a `Throwable` 
> fails to pass it to the superclass constructor.
> 
> This simple patch fixes that omission.
> 
> It's worth considering if there is any code out there that is working around 
> this problem already by invoking `initCause()` manually. If so, that code 
> would start throwing an `IllegalStateException` after this change.
> 
> So a more conservative fix would be to just add another constructor taking 
> the same arguments in a different order. But then again that's not much 
> better than just saying "always use initCause() with the broken constructor", 
> i.e., don't change anything.
> 
> Hmm.

I agree - let's go the CSR route.

-

PR Comment: https://git.openjdk.org/jdk/pull/17090#issuecomment-1856510107

Re: Bug in GZIPInputStream.read() causing data loss

2023-12-14 Thread Archie Cobbs

Hi Louis,

On first glance this looks easy to fix. I've filed a draft PR here (pending
tests) https://github.com/openjdk/jdk/pull/17113

-Archie

On Thu, Dec 14, 2023 at 1:10 PM Louis Bergelson 
wrote:

> Hello.  This is my first time posting here so I apologize if this is the
> wrong forum.  I wanted to bring up an issue in the GZipInputStream which
> was first identified in 2011, confirmed as a bug, and then never resolved.
>
> When reading certain GZIP files from certain types of InputStreams the
> GZIPInputStream can misidentify the end of the stream and close early
> resulting in silently truncated data.
>
> You can see the bug report which has a detailed description here:
> https://bugs.openjdk.org/browse/JDK-7036144
>
> In short it comes down to incorrect use of the (quite confusing)
> InputStream.available() method to detect the end of stream.  This typically
> works fine with local files, but with network streams that might not have
> bytes available at any given moment it fails nondeterministically.
>
> How could I go about getting this fixed?  I can contribute a patch or
> additional examples if necessary.
>
> Genomics data is typically encoded as block gzipped files, so this comes
> up regularly and causes a lot of confusion.  The workaround is to just not
> use the GZIPInput stream.  It seems like a core java class though so it
> would be nice if it worked.
>
> Thank you,
> Louis
>


-- 
Archie L. Cobbs

Re: [External] : Re: Introduce constructor for PriorityQueue with existing collection and custom comparator

2023-12-14 Thread Archie Cobbs

Looks great - thanks (I'm not an official reviewer so I can't approve it
though).

-Archie

On Thu, Dec 14, 2023 at 2:37 PM Valeh Hajiyev 
wrote:

> Yes, there's no such precondition. Thanks for having a look, I updated the
> javadoc as you suggested, and linked it to the old existing ticket. It's
> now ready for review.
>
> I would appreciate if you could have a look again.
>
> Cheers,
> Valeh
>
> On Thu, Dec 14, 2023 at 4:22 PM Archie Cobbs 
> wrote:
>
>> On Thu, Dec 14, 2023 at 4:56 AM Viktor Klang 
>> wrote:
>>
>>> I presume that the precondition to have the original collection be
>>> pre-ordered according to the supplied Comparator can be verified by
>>> checking before adding each element in the collection to the PQ that it
>>> compareTo equal-or-greater to the previous one?
>>>
>>
>> Hmm...  something is not adding up.
>>
>> Either (a) there is a precondition that the collection already be sorted
>> or (b) there's no such precondition.
>>
>> In case (a):
>>
>>- Why isn't the new constructor invoking initElementsFromCollection()
>>instead of initFromCollection()?
>>- The precondition is not very obvious from the Javadoc description,
>>and moreover what happens when the precondition is not met is not
>>documented at all.
>>
>> In case (b):
>>
>>- The Javadoc is misleading because it (ambiguously) implies there is
>>a precondition with the wording "collection that orders its elements
>>according to the specified comparator" (the referent of "that" is 
>> ambiguous
>>- does it refer to the collection or the PriorityQueue?)
>>
>> From the PR description it seems clear that there is no such
>> precondition. So maybe the Javadoc should say this instead:
>>
>> Creates a {@code PriorityQueue} containing the elements in the specified
>>> collection. The {@code PriorityQueue} will order its elements according to
>>> the specified comparator.
>>>
>>
>> -Archie
>>
>> --
>> Archie L. Cobbs
>>
>

-- 
Archie L. Cobbs

RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream

2023-12-14 Thread Archie Cobbs

`GZIPInputStream`, when looking for a concatenated stream, relies on what the 
underlying `InputStream` says is how many bytes are `available()`. But this is 
inappropriate because `InputStream.available()` is just an estimate and is 
allowed (for example) to always return zero.

The fix is to ignore what's `available()` and just proceed and see what 
happens. If fewer bytes are available than required, the attempt to extend to 
another stream is canceled just as it was before, e.g., when the next stream 
header couldn't be read.

-

Commit messages:
 - Fix bug in GZIPInputStream when underlying available() returns short.

Changes: https://git.openjdk.org/jdk/pull/17113/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-7036144
  Stats: 101 lines in 2 files changed: 86 ins; 9 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v2]

2023-12-14 Thread Archie Cobbs

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Address review comments.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17113/files
  - new: https://git.openjdk.org/jdk/pull/17113/files/c07554d0..c7087e55

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=00-01

  Stats: 29 lines in 1 file changed: 6 ins; 7 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v2]

2023-12-14 Thread Archie Cobbs

On Thu, 14 Dec 2023 21:14:33 GMT, Eirik Bjorsnos  wrote:

>> Archie Cobbs has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Address review comments.
>
> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 37:
> 
>> 35: public static void main(String [] args) throws IOException {
>> 36: 
>> 37: // Create gz data
> 
> Perhaps expand the comment to explain that you are creating a concatenated 
> stream?

Thanks - should be addressed in c7087e55319.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 54:
> 
>> 52: try (GZIPInputStream in = new GZIPInputStream(new 
>> ByteArrayInputStream(gz32))) {
>> 53: count1 = countBytes(in);
>> 54: }
> 
> Consider rewriting countBytes to take the 
> ByteArrayInputStream/ZeroAvailableInputStream as a parameter, so you could 
> just do:
> 
>  ```suggestion
> long count1 = countBytes(new ByteArrayInputStream(gz32));

Thanks - should be addressed in c7087e55319.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 56:
> 
>> 54: }
>> 55: 
>> 56: // (a) Read it from a stream where available() always returns 
>> zero
> 
> Suggestion:
> 
> // (b) Read it from a stream where available() always returns zero

Thanks - should be addressed in c7087e55319.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 64:
> 
>> 62: // They should be the same
>> 63: if (count2 != count1)
>> 64: throw new AssertionError(count2 + " != " + count1);
> 
> Consider converting the test to JUnit, this would allow:
> Suggestion:
> 
> asssertEquals(count1, count2);

Thanks - should be addressed in c7087e55319.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1427507217
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1427507107
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1427507154
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1427507189

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v3]

2023-12-15 Thread Archie Cobbs

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Address second round of review comments.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17113/files
  - new: https://git.openjdk.org/jdk/pull/17113/files/c7087e55..074b8455

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=01-02

  Stats: 40 lines in 1 file changed: 13 ins; 6 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v2]

2023-12-15 Thread Archie Cobbs

On Fri, 15 Dec 2023 14:09:12 GMT, Eirik Bjorsnos  wrote:

>> Archie Cobbs has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Address review comments.
>
> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 52:
> 
>> 50: buf.reset();
>> 51: for(int i = 0; i < 100; i++)
>> 52: buf.write(gz);
> 
> Whitespace after `for`, braces are recommended even when optional in the 
> language:
> 
> Suggestion:
> 
> for (int i = 0; i < 100; i++) {
> buf.write(gz);
> }

Thanks - should be addressed in 074b8455b11.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 53:
> 
>> 51: for(int i = 0; i < 100; i++)
>> 52: buf.write(gz);
>> 53: final byte[] gz32 = buf.toByteArray();
> 
> Drop final, consider finding a more expressive name:
> 
> Suggestion:
> 
> byte[] concatenatedGz = buf.toByteArray();

Thanks - should be addressed in 074b8455b11.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 56:
> 
>> 54: 
>> 55: // (a) Read it back from a stream where available() is accurate
>> 56: long count1 = countBytes(new GZIPInputStream(new 
>> ByteArrayInputStream(gz32)));
> 
> This is the expected number of bytes to be read. This could be calculated 
> directly from the uncompressed data. At least the name should express that 
> this is the expected number of bytes:
> 
> Suggestion:
> 
> long expectedBytes = countBytes(new GZIPInputStream(new 
> ByteArrayInputStream(gz32)));
> 
> 
> Similarly, `count2` could be renamed `actualBytes`

Thanks - should be addressed in 074b8455b11.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 62:
> 
>> 60: 
>> 61: // They should be the same
>> 62: Assert.assertEquals(count2, count1);
> 
> This could be a static import. And for the assertion failure message to be 
> correct, the expected value must come first:
> 
> Suggestion:
> 
> assertEquals(expectedBytes, actualBytes);
> 
> 
> An alternative here could be to store the original uncompressed data and 
> compare that to the full inflated  data read from GZIPInputStream using 
> assertArrayEquals. The length alone is a bit of a weak proxy for equality.

Thanks - should be addressed in 074b8455b11.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428174716
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428175505
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428174894
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428175182

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v3]

2023-12-15 Thread Archie Cobbs

On Fri, 15 Dec 2023 20:38:19 GMT, Eirik Bjorsnos  wrote:

> The test is shaping up nicely. Since it's a new test it should use JUnit 5.
> 
> Also included a couple suggestions, I'll stop now, promise! :)

No prob - they're all reasonable suggestions :)

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 30:
> 
>> 28:  */
>> 29: 
>> 30: import org.junit.Test;
> 
> Let's use JUnit 5:
> Suggestion:
> 
> import org.junit.jupiter.api.Test;

Should be fixed in cf457eff38c.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 36:
> 
>> 34: import java.util.zip.*;
>> 35: 
>> 36: import static org.junit.Assert.assertArrayEquals;
> 
> Let's use JUnit 5:
> 
> Suggestion:
> 
> import static org.junit.jupiter.api.Assertions.assertArrayEquals;

Should be fixed in cf457eff38c.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 53:
> 
>> 51: byte[] compressedN = repeat(compressed1, NUM_COPIES);
>> 52: 
>> 53: // (a) Read back copied compressed data from a stream where 
>> available() is accurate and verify
> 
> Suggestion:
> 
> // (a) Read back inflated data from a stream where available() is 
> accurate and verify

Should be fixed in cf457eff38c.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 54:
> 
>> 52: 
>> 53: // (a) Read back copied compressed data from a stream where 
>> available() is accurate and verify
>> 54: byte[] readback1 = new GZIPInputStream(new 
>> ByteArrayInputStream(compressedN)).readAllBytes();
> 
> These readback lines got rather long. Perhaps consider extracting the GZIP 
> reading into a method taking the source InputStream as a parameter?
> 
> Suggestion:
> 
> byte[] readback1 = inflateFrom(new ByteArrayInputStream(compressedN));

Should be fixed in cf457eff38c.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 57:
> 
>> 55: assertArrayEquals(uncompressedN, readback1);
>> 56: 
>> 57: // (b) Read back copied compressed data from a stream where 
>> available() always returns zero and verify
> 
> Suggestion:
> 
> // (b) Read back inflated data from a stream where available() always 
> returns zero and verify

Should be fixed in cf457eff38c.

> test/jdk/java/util/zip/GZIP/GZIPInputStreamAvailable.java line 58:
> 
>> 56: 
>> 57: // (b) Read back copied compressed data from a stream where 
>> available() always returns zero and verify
>> 58: byte[] readback2 = new GZIPInputStream(new 
>> ZeroAvailableStream(new ByteArrayInputStream(compressedN))).readAllBytes();
> 
> Suggestion:
> 
> byte[] readback2 = inflateFrom(new ZeroAvailableStream(new 
> ByteArrayInputStream(compressedN)));

Should be fixed in cf457eff38c.

-

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1858489270
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428486646
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428487198
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428486708
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428487465
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428486783
PR Review Comment: https://git.openjdk.org/jdk/pull/17113#discussion_r1428486924

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v4]

2023-12-15 Thread Archie Cobbs

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Address third round of review comments.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17113/files
  - new: https://git.openjdk.org/jdk/pull/17113/files/074b8455..cf457eff

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=02-03

  Stats: 12 lines in 1 file changed: 4 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v4]

2023-12-16 Thread Archie Cobbs

On Sat, 16 Dec 2023 05:53:34 GMT, Bernd  wrote:

> I wonder: if the stream does no longer depend on this `available()` condition 
> to be true, does that mean it’s no longer (indirectly) verified?

I'm not sure I understand ... what do you mean by "verified"?

If what you're saying is "Previously we were implicitly verifying that the data 
reported by `available()` was actually there, and now we're no longer verifying 
that" then that's not correct. The code that was previously conditional on an 
`available()` check was just catching and discarding `IOException`, so even if 
`available()` was lying before, that would not have ever been detected.

-

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1858883789

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v4]

2023-12-17 Thread Archie Cobbs

On Sun, 17 Dec 2023 13:47:00 GMT, Eirik Bjorsnos  wrote:

> The current behavior of allowing/ignoring trailing malformed data seems to 
> have a complicated history...

Thanks for researching all of that. I agree this should be cleaned up and have 
created [JDK-8322256](https://bugs.openjdk.org/browse/JDK-8322256) to that end.

-

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1859210088

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v4]

2023-12-19 Thread Archie Cobbs

On Tue, 19 Dec 2023 08:11:42 GMT, Bernd  wrote:

> > If what you're saying is "Previously we were implicitly verifying that the 
> > data reported by `available()` was actually there, and now we're no longer 
> > verifying that" then that's not correct.
> 
> I mean it verified the non-zero behavior, not that the data length was 
> correct. Not sure if that is somewhere tested now.

Ok gotcha.

The test `GZIPInputStreamRead.java` verifies that `available()` returns zero 
once all of the compressed data has been read out. So this verifies 
`available()` at the end of a stream. It doesn't appear there are any tests 
which verify `available()` in the middle of a stream. Adding such a test would 
be a great idea but is beyond the scope of this bug of course.

In any case, I don't think the code that was there before was providing much in 
the way of implicit testing of `available()` either:

// try concatenated case
if (this.in.available() > 0 || n > 26) {
int m = 8;  // this.trailer
try {
m += readHeader(in);// next.header
} catch (IOException ze) {
return true;  // ignore any malformed, do nothing
}
inf.reset();
if (n > m)
inf.setInput(buf, len - n + m, n - m);
return false;
}
return true;

As you can see, in the previous version, when `available() > 0`, there would be 
no noticeable side effect if there was actually less data than that ("do 
nothing").

-

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1863006155

Integrated: 8322027: One XMLStreamException constructor fails to initialize cause

2024-01-02 Thread Archie Cobbs

On Wed, 13 Dec 2023 20:06:03 GMT, Archie Cobbs  wrote:

> One of the three `XMLStreamException` constructors that takes a `Throwable` 
> fails to pass it to the superclass constructor.
> 
> This simple patch fixes that omission.
> 
> It's worth considering if there is any code out there that is working around 
> this problem already by invoking `initCause()` manually. If so, that code 
> would start throwing an `IllegalStateException` after this change.
> 
> So a more conservative fix would be to just add another constructor taking 
> the same arguments in a different order. But then again that's not much 
> better than just saying "always use initCause() with the broken constructor", 
> i.e., don't change anything.
> 
> Hmm.

This pull request has now been integrated.

Changeset: 5852f3ea
Author:Archie Cobbs 
Committer: Jaikiran Pai 
URL:   
https://git.openjdk.org/jdk/commit/5852f3eafe4509a064c727371962ff249886e115
Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod

8322027: One XMLStreamException constructor fails to initialize cause

Reviewed-by: joehw, jpai

-

PR: https://git.openjdk.org/jdk/pull/17090

Re: RFR: 8322027: One XMLStreamException constructor fails to initialize cause

2024-01-02 Thread Archie Cobbs

On Tue, 2 Jan 2024 06:47:52 GMT, Jaikiran Pai  wrote:

>> One of the three `XMLStreamException` constructors that takes a `Throwable` 
>> fails to pass it to the superclass constructor.
>> 
>> This simple patch fixes that omission.
>> 
>> It's worth considering if there is any code out there that is working around 
>> this problem already by invoking `initCause()` manually. If so, that code 
>> would start throwing an `IllegalStateException` after this change.
>> 
>> So a more conservative fix would be to just add another constructor taking 
>> the same arguments in a different order. But then again that's not much 
>> better than just saying "always use initCause() with the broken 
>> constructor", i.e., don't change anything.
>> 
>> Hmm.
>
> Hello Archie, the changes look fine to me. I see that Joe has approved this 
> change and the CSR too has been approved. I've triggered a CI run to verify 
> there's no unexpected issues. Once that completes, I'll go ahead and sponsor 
> this.

@jaikiran - thanks for the review!

-

PR Comment: https://git.openjdk.org/jdk/pull/17090#issuecomment-1874251561

Is MessageFormat.toPattern() supposed to be reversible?

2024-01-13 Thread Archie Cobbs

Clarification question regarding MessageFormat.toPattern() before I file a
bug...

Is it supposed to be the case that given a MessageFormat object fmt, new
MessageFormat(fmt.toPattern()) is equivalent to the original fmt object?

Thanks,
-Archie

-- 
Archie L. Cobbs

RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern

2024-01-14 Thread Archie Cobbs

Please consider this fix to ensure that going from `MessageFormat` to pattern 
string via `toPattern()` and then back via `new MessageFormat()` results in a 
format that is equivalent to the original.

The quoting and escaping rules for `MessageFormat` pattern strings are really 
tricky. I admit not completely understanding them. At a high level, they work 
like this: The normal way one would "nest" strings containing special 
characters is with straightforward recursive escaping like with the `bash` 
command line. For example, if you want to echo `a "quoted string" example` then 
you enter `echo "a "quoted string" example"`. With this scheme it's always the 
"outer" layer's job to (un)escape special characters as needed. That is, the 
echo command never sees the backslash characters.

In contrast, with `MessageFormat` and friends, nested subformat pattern strings 
are always provided "pre-escaped". So to build an "outer" string (e.g., for 
`ChoiceFormat`) the "inner" subformat pattern strings are more or less just 
concatenated, and then only the `ChoiceFormat` option separator characters 
(e.g., `<`, `#`, `|`, etc.) are escaped.

The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
indicates the beginning of a format argument. However, it doesn't escape `}` 
characters. This is OK because the format string parser treats any "extra" 
closing braces (where "extra" means not matching an opening brace) as plain 
characters.

So far, so good... at least, until a format string containing an extra closing 
brace is nested inside a larger format string, where the extra closing brace, 
which was previously "extra", can now suddenly match an opening brace in the 
outer pattern containing it, thus truncating it by "stealing" the match from 
some subsequent closing brace.

An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
{1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
trailing closing brace in plain text. If you create a `MessageFormat` with this 
string, you see a trailing `}` only with the second option.

However, if you then invoke `toPattern()`, the result is `"{0,choice,0.0#option 
A: {1}|1.0#option B: {1}}}"`. Oops, now because the "extra" closing brace is no 
longer quoted, it matches the opening brace at the beginning of the string, and 
the following closing  brace, which was the previous match, is now just plain 
text in the outer `MessageFormat` string.

As a result, invoking `f.format(new Object{} { 0, 5 })` will return `"option A: 
5"` using the original `MessageFormat` but `"option A: 5}"` from a new one 
created with the string from `toPattern()`.

This patch fixes this problem by adding quotes around "extra" closing braces in 
the subformat pattern strings.

-

Commit messages:
 - Quote runs of extra unquoted closing braces in subformat patterns.

Changes: https://git.openjdk.org/jdk/pull/17416/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8323699
  Stats: 103 lines in 2 files changed: 100 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Re: Is MessageFormat.toPattern() supposed to be reversible?

2024-01-14 Thread Archie Cobbs

Nevermind, I believe now it is and have filed JDK-8323699. Thanks.

On Sat, Jan 13, 2024 at 12:45 PM Archie Cobbs 
wrote:

> Clarification question regarding MessageFormat.toPattern() before I file a
> bug...
>
> Is it supposed to be the case that given a MessageFormat object fmt, new
> MessageFormat(fmt.toPattern()) is equivalent to the original fmt object?
>
> Thanks,
> -Archie
>
> --
> Archie L. Cobbs
>


-- 
Archie L. Cobbs

Re: [External] : Re: [POTENTIAL BUG] Potential FIFO violation in BlockingQueue under high contention and suggestion for fair mode in ArrayBlockingQueue and LinkedBlockingQueue

2024-09-05 Thread Archie Cobbs

Hi Kim,

I think there may be an issue with your test. Specifically, this code is
suspect:

// Wait for the producer thread to enter BLOCKED state
// This ensures that the thread is waiting on the full queue
while (thread.getState() == State.RUNNABLE || thread.getState() ==
State.NEW);

I don't think that actually guarantees that the thread is blocked in put().

I was able to reproduce the bug using ArrayBlockingQueue(), but could no
longer reproduce it after I changed the above code to this:

// Wait for the producer thread to block in queue.put()
// This ensures that the thread is waiting on the full queue
while (!queue.isPutting(thread));

after also making this change to ArrayBlockingQueue.java:

---
a/src/java.base/share/classes/java/util/concurrent/ArrayBlockingQueue.java
+++
b/src/java.base/share/classes/java/util/concurrent/ArrayBlockingQueue.java
@@ -365,10 +365,31 @@ public void put(E e) throws InterruptedException {
 Objects.requireNonNull(e);
 final ReentrantLock lock = this.lock;
 lock.lockInterruptibly();
+final boolean added = puttingThreads.add(Thread.currentThread());
 try {
 while (count == items.length)
 notFull.await();
 enqueue(e);
+} finally {
+if (added)
+puttingThreads.remove(Thread.currentThread());
+lock.unlock();
+}
+}
+
+private final java.util.HashSet puttingThreads = new
java.util.HashSet<>();
+
+/**
+ * Test for putting thread.
+ *
+ * @param thread thread to test
+ * @return true if thread is a putting thread
+ */
+public boolean isPutting(Thread thread) {
+final ReentrantLock lock = this.lock;
+lock.lock();
+try {
+return puttingThreads.contains(thread);
 } finally {
 lock.unlock();
 }

So the original test may not be correct due to Java memory model
subtleties, etc.

-Archie

-- 
Archie L. Cobbs

Re: [External] : Re: [POTENTIAL BUG] Potential FIFO violation in BlockingQueue under high contention and suggestion for fair mode in ArrayBlockingQueue and LinkedBlockingQueue

2024-09-05 Thread Archie Cobbs

Apologies in advance if I'm misunderstanding anything...

On Thu, Sep 5, 2024 at 2:05 PM Viktor Klang 
wrote:

>  Thread state polling aside, for as long as Condition::await() is allowed
> to spuriously wake, FIFO just cannot be "guaranteed".

What about this statement in the Javadoc for ReentrantLock.newCondition():

The ordering of lock reacquisition for threads returning from waiting
> methods is the same as for threads initially acquiring the lock, which is
> in the default case not specified, but for *fair* locks favors those
> threads that have been waiting the longest.

So what you're saying is that a spurious wakeup on a Condition is not the
same thing as a spurious signal() on a Condition; if it were, then the
above statement would apply and FIFO ordering would be preserved.

Of course, a spurious wakeup would not find the condition being waited on
satisfied unless there was a big coincidence. So an ordering violation that
actually mattered should be exceedingly rare.

Anyway, this does seem to be a glitch in how things are supposed to work.
That is: there can be no guaranteed ordering for Condition waiters when
there can be spurious wakeups.

Maybe this corner case should be documented. Or better yet, fix the bug by
requiring Condition to "filter out" spurious wakeups if preserving FIFO
ordering (it should be possible).

-Archie

-- 
Archie L. Cobbs

Re: [External] : Re: [POTENTIAL BUG] Potential FIFO violation in BlockingQueue under high contention and suggestion for fair mode in ArrayBlockingQueue and LinkedBlockingQueue

2024-09-07 Thread Archie Cobbs

Hi Kim,

On Sat, Sep 7, 2024 at 10:36 AM 김민주  wrote:

> Here's a clearer outline of the scenario:
>
>- Threads T1 to T10 are waiting on Condition::await() because the
>queue is full.
>- T11 calls take() and holds the lock via lock.lockInterruptibly().
>- T12 calls queue.put() and enters the wait queue for
>lock.lockInterruptibly(). (As I understand, the wait queue for
>ReentrantLock and Condition are separate.)
>- T11 reduces the count and sends a signal, then releases the lock.
>- T1 receives the signal and moves to the lock queue. Since the
>ReentrantLock is in fair mode, T12 (which was already waiting) gets
>priority, and T1 will acquire the lock later.
>- T12 acquires the lock and successfully enqueues.
>
> From one reading of the Javadoc, your step #5 ("T12 gets priority") is not
supposed to happen that way. Instead, one of T1 through T10 should be
guaranteed to acquire the lock.

Here it is again (from ReentrantLock.newCondition()):

The ordering of lock reacquisition for threads returning from waiting
> methods is the same as for threads initially acquiring the lock, which is
> in the default case not specified, but for *fair* locks favors those
> threads that have been waiting the longest.

But part of the problem here is that this documentation is ambiguous.

The ambiguity is: are ALL threads trying to acquire the lock, whether on an
initial attempt or after a condition wakeup, ordered for fairness together
in one big pool? → In this case step #5 can't happen. Call this
Interpretation A.

Or is this merely saying that threads waiting on a condition reacquire the
lock based on when they started waiting on the condition, but there are no
ordering guarantees between those threads and any other unrelated threads
trying to acquire the lock? → In this case step #5 can happen. Call this
Interpretation B.

So I think we first need to clarify which interpretation is correct here, A
or B.

On that point, Victor said this:

I've re-read ReentrantLock and AQS, and from my understanding on the logic
> the Condition's place in the wait queue should be maintained, which means
> that T3 shouldn't be able to "barge".

So it sounds like Victor is confirming interpretation A. Victor do you
agree?

If so, then it seems like we need to do two things:

1. File a Jira ticket to clarify the Javadoc, e.g. to say something like
this:

The ordering of lock reacquisition for threads returning from waiting
> methods is the same as for threads initially acquiring the lock, which is
> in the default case not specified, but for *fair* locks favors those
> threads that have been waiting the longest. *In the latter case, the
> ordering consideration includes all threads attempting to acquire the lock,
> regardless of whether or not they were previously blocked on the condition.*
>

2. Understand why Kim's updated test case is still failing (it must be
either a bug in the test or a bug in ArrayBlockingQueue).

-Archie

-- 
Archie L. Cobbs

Re: [External] : Re: [POTENTIAL BUG] Potential FIFO violation in BlockingQueue under high contention and suggestion for fair mode in ArrayBlockingQueue and LinkedBlockingQueue

2024-09-08 Thread Archie Cobbs

and the next node in AQS is unparked.
>
>[Now, AQS holds T1, while ConditionNode holds T2 to T10.]
>9.
>
>T1, having reacquired the lock after Condition::await(), fails to exit
>the while loop and waits again.
>
>[Now, ConditionNode holds T1 and T2 to T10.]
>
>
> This is how I currently understand the situation.
>
> If there are any mistakes in my understanding, I would greatly appreciate
> your clarification.
>
> Best Regards,
>
>  Kim Minju
>
> 2024. 9. 8. 오전 3:34, Archie Cobbs  작성:
>
> Hi Kim,
>
> On Sat, Sep 7, 2024 at 10:36 AM 김민주  wrote:
>
>> Here's a clearer outline of the scenario:
>>
>>- Threads T1 to T10 are waiting on Condition::await() because the
>>queue is full.
>>- T11 calls take() and holds the lock via lock.lockInterruptibly().
>>- T12 calls queue.put() and enters the wait queue for
>>lock.lockInterruptibly(). (As I understand, the wait queue for
>>ReentrantLock and Condition are separate.)
>>- T11 reduces the count and sends a signal, then releases the lock.
>>- T1 receives the signal and moves to the lock queue. Since the
>>ReentrantLock is in fair mode, T12 (which was already waiting) gets
>>priority, and T1 will acquire the lock later.
>>- T12 acquires the lock and successfully enqueues.
>>
>> From one reading of the Javadoc, your step #5 ("T12 gets priority") is
> not supposed to happen that way. Instead, one of T1 through T10 should be
> guaranteed to acquire the lock.
>
> Here it is again (from ReentrantLock.newCondition()):
>
> The ordering of lock reacquisition for threads returning from waiting
>> methods is the same as for threads initially acquiring the lock, which is
>> in the default case not specified, but for *fair* locks favors those
>> threads that have been waiting the longest.
>
>
> But part of the problem here is that this documentation is ambiguous.
>
> The ambiguity is: are ALL threads trying to acquire the lock, whether on
> an initial attempt or after a condition wakeup, ordered for fairness
> together in one big pool? → In this case step #5 can't happen. Call this
> Interpretation A.
>
> Or is this merely saying that threads waiting on a condition reacquire the
> lock based on when they started waiting on the condition, but there are no
> ordering guarantees between those threads and any other unrelated threads
> trying to acquire the lock? → In this case step #5 can happen. Call this
> Interpretation B.
>
> So I think we first need to clarify which interpretation is correct here,
> A or B.
>
> On that point, Victor said this:
>
> I've re-read ReentrantLock and AQS, and from my understanding on the logic
>> the Condition's place in the wait queue should be maintained, which means
>> that T3 shouldn't be able to "barge".
>
>
> So it sounds like Victor is confirming interpretation A. Victor do you
> agree?
>
> If so, then it seems like we need to do two things:
>
> 1. File a Jira ticket to clarify the Javadoc, e.g. to say something like
> this:
>
> The ordering of lock reacquisition for threads returning from waiting
>> methods is the same as for threads initially acquiring the lock, which is
>> in the default case not specified, but for *fair* locks favors those
>> threads that have been waiting the longest. *In the latter case, the
>> ordering consideration includes all threads attempting to acquire the lock,
>> regardless of whether or not they were previously blocked on the condition.*
>>
>
> 2. Understand why Kim's updated test case is still failing (it must be
> either a bug in the test or a bug in ArrayBlockingQueue).
>
> -Archie
>
> --
> Archie L. Cobbs
>
>
>

-- 
Archie L. Cobbs

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v5]

2024-01-16 Thread Archie Cobbs

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains five additional commits since 
the last revision:

 - Merge branch 'master' into JDK-7036144
 - Address third round of review comments.
 - Address second round of review comments.
 - Address review comments.
 - Fix bug in GZIPInputStream when underlying available() returns short.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17113/files
  - new: https://git.openjdk.org/jdk/pull/17113/files/cf457eff..4f1a0459

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=03-04

  Stats: 35310 lines in 1143 files changed: 22955 ins; 7236 del; 5119 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern

2024-01-17 Thread Archie Cobbs

On Tue, 16 Jan 2024 22:05:03 GMT, Justin Lu  wrote:

>> Please consider this fix to ensure that going from `MessageFormat` to 
>> pattern string via `toPattern()` and then back via `new MessageFormat()` 
>> results in a format that is equivalent to the original.
>> 
>> The quoting and escaping rules for `MessageFormat` pattern strings are 
>> really tricky. I admit not completely understanding them. At a high level, 
>> they work like this: The normal way one would "nest" strings containing 
>> special characters is with straightforward recursive escaping like with the 
>> `bash` command line. For example, if you want to echo `a "quoted string" 
>> example` then you enter `echo "a "quoted string" example"`. With this scheme 
>> it's always the "outer" layer's job to (un)escape special characters as 
>> needed. That is, the echo command never sees the backslash characters.
>> 
>> In contrast, with `MessageFormat` and friends, nested subformat pattern 
>> strings are always provided "pre-escaped". So to build an "outer" string 
>> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
>> less just concatenated, and then only the `ChoiceFormat` option separator 
>> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
>> 
>> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
>> indicates the beginning of a format argument. However, it doesn't escape `}` 
>> characters. This is OK because the format string parser treats any "extra" 
>> closing braces (where "extra" means not matching an opening brace) as plain 
>> characters.
>> 
>> So far, so good... at least, until a format string containing an extra 
>> closing brace is nested inside a larger format string, where the extra 
>> closing brace, which was previously "extra", can now suddenly match an 
>> opening brace in the outer pattern containing it, thus truncating it by 
>> "stealing" the match from some subsequent closing brace.
>> 
>> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
>> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
>> trailing closing brace in plain text. If you create a `MessageFormat` with 
>> this string, you see a trailing `}` only with the second option.
>> 
>> However, if you then invoke `toPattern()`, the result is 
>> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
>> "extra" closing brace is no longer quoted, it matches the opening brace at 
>> the beginning of the string, and the following closing  brace, which was the 
>> previous match, is now just plain text in the outer `MessageFormat` string.
>> 
>> As a result, invoking `f.format(new ...
>
> Hi Archie, thanks for the proposed fix. I am still taking a look, but I 
> wanted to demonstrate a current issue,
> 
> (Jshell with your patch)
> 
> 
> var pattIn = "Test: {0,number,foo'{'#.00}";
> MessageFormat mFmt = new MessageFormat(pattIn);
> var pattOut  = mFmt.toPattern(); // returns "Test: {0,number,foo{#.00}";
> 
> 
> 
> var pattIn = "Test: {0,number,foo'}'#.00}";
> MessageFormat mFmt = new MessageFormat(pattIn);
> var pattOut  = mFmt.toPattern(); // returns "Test: {0,number,foo'}'#.00}";
> 
> 
> As it stands, it would be inconsistent to have the closing bracket quoted and 
> the opening bracket not quoted. 
> 
> Also in response to your earlier question on core-libs-dev, ideally invoking 
> toPattern() can roundtrip, but there are known issues, such as a custom user 
> defined Format subclass, or one of the newer Format subclasses that do not 
> implement the toPattern() method. I am working on making this apparent in the 
> specification of the method in a separate issue.

Hi @justin-curtis-lu,

Thanks a lot for taking a look, I am glad for any other set of eyes on this 
tricky stuff!

> As it stands, it would be inconsistent to have the closing bracket quoted and 
> the opening bracket not quoted.

You're right - the problem exists with both `{` and `}`, as is shown here 
(unmodified jshell):


$ jshell
|  Welcome to JShell -- Version 17.0.9
|  For an introduction type: /help intro

jshell> import java.text.*;

jshell> new MessageFormat("Test: {0,number,foo'{'#.00}");
$2 ==> java.text.MessageFormat@951c5b58

jshell> $2.toPattern()
$3 ==> "Test: {0,number,foo{#.00}"

jshell> new MessageFormat($3)
|  Exception java.lang.IllegalArgumentException: Unmatched braces in the 
pattern.
|at MessageFormat.applyPattern (MessageFormat.java:521)
|at MessageFormat. (MessageFormat.java:371)
|at (#4:1)


I've been missing the forest for the trees a bit and I think the fix can be 
simpler now.

For the record, here is my latest understanding of what's going on...

1. When `MessageFormat.toPattern()` constructs the combined pattern string, it 
concatenates the plain text bits, suitably escaped, and the pattern strings 
from each subformat.
1. The subformat strings are already escaped, in the sense that you can take 
(for example) a `ChoiceFormat` format

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v2]

2024-01-17 Thread Archie Cobbs

> Please consider this fix to ensure that going from `MessageFormat` to pattern 
> string via `toPattern()` and then back via `new MessageFormat()` results in a 
> format that is equivalent to the original.
> 
> The quoting and escaping rules for `MessageFormat` pattern strings are really 
> tricky. I admit not completely understanding them. At a high level, they work 
> like this: The normal way one would "nest" strings containing special 
> characters is with straightforward recursive escaping like with the `bash` 
> command line. For example, if you want to echo `a "quoted string" example` 
> then you enter `echo "a "quoted string" example"`. With this scheme it's 
> always the "outer" layer's job to (un)escape special characters as needed. 
> That is, the echo command never sees the backslash characters.
> 
> In contrast, with `MessageFormat` and friends, nested subformat pattern 
> strings are always provided "pre-escaped". So to build an "outer" string 
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
> less just concatenated, and then only the `ChoiceFormat` option separator 
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
> 
> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
> indicates the beginning of a format argument. However, it doesn't escape `}` 
> characters. This is OK because the format string parser treats any "extra" 
> closing braces (where "extra" means not matching an opening brace) as plain 
> characters.
> 
> So far, so good... at least, until a format string containing an extra 
> closing brace is nested inside a larger format string, where the extra 
> closing brace, which was previously "extra", can now suddenly match an 
> opening brace in the outer pattern containing it, thus truncating it by 
> "stealing" the match from some subsequent closing brace.
> 
> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
> trailing closing brace in plain text. If you create a `MessageFormat` with 
> this string, you see a trailing `}` only with the second option.
> 
> However, if you then invoke `toPattern()`, the result is 
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
> "extra" closing brace is no longer quoted, it matches the opening brace at 
> the beginning of the string, and the following closing  brace, which was the 
> previous match, is now just plain text in the outer `MessageFormat` string.
> 
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Quote '{' and '}' in subformat patterns, but only it not already quoted.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17416/files
  - new: https://git.openjdk.org/jdk/pull/17416/files/688f5748..a9d78c76

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=00-01

  Stats: 344 lines in 4 files changed: 300 ins; 18 del; 26 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern

2024-01-18 Thread Archie Cobbs

On Thu, 18 Jan 2024 20:08:27 GMT, Justin Lu  wrote:

>> Hi @justin-curtis-lu,
>> 
>> Thanks a lot for taking a look, I am glad for any other set of eyes on this 
>> tricky stuff!
>> 
>>> As it stands, it would be inconsistent to have the closing bracket quoted 
>>> and the opening bracket not quoted.
>> 
>> You're right - the problem exists with both `{` and `}`, as is shown here 
>> (unmodified jshell):
>> 
>> 
>> $ jshell
>> |  Welcome to JShell -- Version 17.0.9
>> |  For an introduction type: /help intro
>> 
>> jshell> import java.text.*;
>> 
>> jshell> new MessageFormat("Test: {0,number,foo'{'#.00}");
>> $2 ==> java.text.MessageFormat@951c5b58
>> 
>> jshell> $2.toPattern()
>> $3 ==> "Test: {0,number,foo{#.00}"
>> 
>> jshell> new MessageFormat($3)
>> |  Exception java.lang.IllegalArgumentException: Unmatched braces in the 
>> pattern.
>> |at MessageFormat.applyPattern (MessageFormat.java:521)
>> |at MessageFormat. (MessageFormat.java:371)
>> |at (#4:1)
>> 
>> 
>> I've been missing the forest for the trees a bit and I think the fix can be 
>> simpler now.
>> 
>> For the record, here is my latest understanding of what's going on...
>> 
>> 1. When `MessageFormat.toPattern()` constructs the combined pattern string, 
>> it concatenates the plain text bits, suitably escaped, and the pattern 
>> strings from each subformat.
>> 1. The subformat strings are already escaped, in the sense that you can take 
>> (for example) a `ChoiceFormat` format string and use it as-is to recreate 
>> that same `ChoiceFormat`, but they are escaped _only for their own purposes_.
>> 1. The original example is an example of where this matters - `ChoiceFormat` 
>> needs to escape `#`, `|`, etc., but doesn't need to escape `{` or `}` - but 
>> a `MessageFormat` format string does need to escape those.
>> 1. Therefore, the correct fix is for `MessageFormat.toPattern()` to modify 
>> the subformat strings by quoting any _unquoted_ `{` or `}` characters while 
>> leaving any already-quoted characters alone.
>> 
>> Updated in a9d78c76f5f. I've also updated the test so it does more thorough 
>> round-trip checks.
>
> Hi @archiecobbs , I think the recent commit is good progress.
> 
> First off I think this change will need a CSR as there are behavioral 
> concerns, although minimal. Please let me know if you have access on JBS to 
> file one, otherwise I can file one for you.
> 
>> Therefore, the correct fix is for MessageFormat.toPattern() to modify the 
>> subformat strings by quoting any unquoted {or } characters while leaving any 
>> already-quoted characters alone.
> 
> Yes, I think this behavior allows for the String returned by `toPattern()` to 
> create a MessageFormat that can format equivalently to the original 
> MessageFormat. Although to clarify, the original String pattern will not be 
> guaranteed to be equivalent to the one returned by `toPattern()` as we are 
> adding quotes to all brackets in the subformatPattern, regardless if they 
> were quoted in the original String. I think the former is more important 
> here, so the latter is okay.
> 
> For DecimalFormat and SimpleDateFormat subformats, adding quotes to brackets 
> found in the subformatPattern is always right, those subformats can not be 
> used to create recursive format behavior. Thus you would **never** except a 
> nested ArgumentIndex in the subformatPattern (ex: `"{0,number,{1}foo0.0"}`), 
> and can confidently escape all brackets found for these subFormats (which you 
> do).
> 
> To me, ChoiceFormat is where there is some concern. Consider the following,
> 
> `new MessageFormat("{0,choice,0# {1} {2} {3} }”)`
> 
> With this change, invoking `toPattern()` on the created MessageFormat would 
> return 
> 
> `"{0,choice,0.0# '{'1'}' '{'2'}' '{'3'}' }"`.
> 
> This would technically be incorrect. One would think instead of allowing 3 
> nested elements, we are now printing the literal `{1} {2} {3}` since the 
> brackets are escaped. But, this is not actually the case. Escaping brackets 
> with a choice subFormat does not function properly. This is due to the fact 
> that a recursive messageFormat is created, but the pattern passed has already 
> lost the quotes.
> 
> This means that `new MessageFormat("{0,choice,0.0# '{'1'}' '{'2'}' '{'3'}' 
> }")`.
> 
> is still equivalent to `new MessageFormat("{0,choice,0# {1} {2} {3} }”).`
> 
> So the behavior of quoting all brackets would still guarantee _the String 
> returned by `toPattern()` to create a MessageFormat that can format 
> equivalently to the original MessageFormat_ but only because the current 
> behavior of formatting with a choice subFormat is technically wrong. I think 
> this is okay, since this incorrect behavior is long-standing, but a CSR would 
> be good to ad...

Hi @justin-curtis-lu,

Thanks for your comments.

> First off I think this change will need a CSR as there are behavioral 
> concerns, although minimal. Please let me know if you have access on JBS to 
> file one, oth

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v2]

2024-01-18 Thread Archie Cobbs

On Thu, 18 Jan 2024 23:02:57 GMT, Justin Lu  wrote:

> Sorry if I'm going over stuff you already know...

No apology needed, this stuff is obscure detail madness!

> So with this change, although calling `toPattern()` on `new 
> MessageFormat("{0,choice,0.0#foo {1} {2} {3} }")` would return the String 
> `"{0,choice,0.0#foo '{'1'}' '{'2'}' '{'3'}' }"` which **I** would think, 
> formats differently, **in reality**, they format equivalently, so I think it 
> is okay. I brought it up not necessarily as an issue, but just something to 
> make apparent. Although I suppose something like this is up for 
> interpretation, so perhaps its only wrong to me.

OK gotcha. I totally agree that what you think it should do makes a lot more 
sense that what it actually does!

-

PR Comment: https://git.openjdk.org/jdk/pull/17416#issuecomment-1899385354

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v3]

2024-01-19 Thread Archie Cobbs

On Fri, 19 Jan 2024 23:30:43 GMT, Archie Cobbs  wrote:

>> Please consider this fix to ensure that going from `MessageFormat` to 
>> pattern string via `toPattern()` and then back via `new MessageFormat()` 
>> results in a format that is equivalent to the original.
>> 
>> The quoting and escaping rules for `MessageFormat` pattern strings are 
>> really tricky. I admit not completely understanding them. At a high level, 
>> they work like this: The normal way one would "nest" strings containing 
>> special characters is with straightforward recursive escaping like with the 
>> `bash` command line. For example, if you want to echo `a "quoted string" 
>> example` then you enter `echo "a "quoted string" example"`. With this scheme 
>> it's always the "outer" layer's job to (un)escape special characters as 
>> needed. That is, the echo command never sees the backslash characters.
>> 
>> In contrast, with `MessageFormat` and friends, nested subformat pattern 
>> strings are always provided "pre-escaped". So to build an "outer" string 
>> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
>> less just concatenated, and then only the `ChoiceFormat` option separator 
>> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
>> 
>> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
>> indicates the beginning of a format argument. However, it doesn't escape `}` 
>> characters. This is OK because the format string parser treats any "extra" 
>> closing braces (where "extra" means not matching an opening brace) as plain 
>> characters.
>> 
>> So far, so good... at least, until a format string containing an extra 
>> closing brace is nested inside a larger format string, where the extra 
>> closing brace, which was previously "extra", can now suddenly match an 
>> opening brace in the outer pattern containing it, thus truncating it by 
>> "stealing" the match from some subsequent closing brace.
>> 
>> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
>> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
>> trailing closing brace in plain text. If you create a `MessageFormat` with 
>> this string, you see a trailing `}` only with the second option.
>> 
>> However, if you then invoke `toPattern()`, the result is 
>> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
>> "extra" closing brace is no longer quoted, it matches the opening brace at 
>> the beginning of the string, and the following closing  brace, which was the 
>> previous match, is now just plain text in the outer `MessageFormat` string.
>> 
>> As a result, invoking `f.format(new ...
>
> Archie Cobbs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add @implNote to Javadoc for toPattern().

Adding an additional note of explanation here just for the record (this is 
copied from the CSR):

> We need to ask how do we know it is safe to quote the unquoted curly brace 
> characters in the subformat patterns? If curly braces are not special to the 
> subformat, then quoting them clearly does no harm. So we only need worry 
> about subformat patterns where curly braces are special. But the only 
> subformat pattern strings supported by `MessageFormat` are for 
> `DecimalFormat`, `SimpleDateFormat`, and `ChoiceFormat`, and curly braces are 
> not special for any of these classes, so we're good.
> 
> However, it should be noted that there is some confusing special logic that 
> clouds this question. If the string that results from evaluating a 
> `ChoiceFormat` subformat of a `MessageFormat` contains an opening curly 
> brace, then a new `MessageFormat` is created from that string and evaluated, 
> and _that_ string replaces the original. This behavior doesn't impact how 
> subformats should be quoted, only how their results are interpreted at "run 
> time".

-

PR Comment: https://git.openjdk.org/jdk/pull/17416#issuecomment-1901291484

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v3]

2024-01-19 Thread Archie Cobbs

> Please consider this fix to ensure that going from `MessageFormat` to pattern 
> string via `toPattern()` and then back via `new MessageFormat()` results in a 
> format that is equivalent to the original.
> 
> The quoting and escaping rules for `MessageFormat` pattern strings are really 
> tricky. I admit not completely understanding them. At a high level, they work 
> like this: The normal way one would "nest" strings containing special 
> characters is with straightforward recursive escaping like with the `bash` 
> command line. For example, if you want to echo `a "quoted string" example` 
> then you enter `echo "a "quoted string" example"`. With this scheme it's 
> always the "outer" layer's job to (un)escape special characters as needed. 
> That is, the echo command never sees the backslash characters.
> 
> In contrast, with `MessageFormat` and friends, nested subformat pattern 
> strings are always provided "pre-escaped". So to build an "outer" string 
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
> less just concatenated, and then only the `ChoiceFormat` option separator 
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
> 
> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
> indicates the beginning of a format argument. However, it doesn't escape `}` 
> characters. This is OK because the format string parser treats any "extra" 
> closing braces (where "extra" means not matching an opening brace) as plain 
> characters.
> 
> So far, so good... at least, until a format string containing an extra 
> closing brace is nested inside a larger format string, where the extra 
> closing brace, which was previously "extra", can now suddenly match an 
> opening brace in the outer pattern containing it, thus truncating it by 
> "stealing" the match from some subsequent closing brace.
> 
> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
> trailing closing brace in plain text. If you create a `MessageFormat` with 
> this string, you see a trailing `}` only with the second option.
> 
> However, if you then invoke `toPattern()`, the result is 
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
> "extra" closing brace is no longer quoted, it matches the opening brace at 
> the beginning of the string, and the following closing  brace, which was the 
> previous match, is now just plain text in the outer `MessageFormat` string.
> 
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Add @implNote to Javadoc for toPattern().

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17416/files
  - new: https://git.openjdk.org/jdk/pull/17416/files/a9d78c76..36d70b8a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=01-02

  Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v3]

2024-01-23 Thread Archie Cobbs

On Tue, 23 Jan 2024 22:13:09 GMT, Justin Lu  wrote:

>> Archie Cobbs has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add @implNote to Javadoc for toPattern().
>
> src/java.base/share/classes/java/text/MessageFormat.java line 1:
> 
>> 1: /*
> 
> For this file, please also bump the latter copyright year to 2024.

Will do.

> src/java.base/share/classes/java/text/MessageFormat.java line 585:
> 
>> 583: if (fmt instanceof DecimalFormat) {
>> 584: result.append(",number");
>> 585: subformatPattern = 
>> ((DecimalFormat)fmt).toPattern();
> 
> Could use pattern matching `instanceof` here and in the other instances, but 
> I understand if you're trying to stay consistent with the existing code.

Agreed! Old habits :) I'll fix.

> test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 
> 56:
> 
>> 54: 
>> 55: private static final int NUM_RANDOM_TEST_CASES = 1000;
>> 56: private static final int MAX_FORMAT_NESTING = 3;
> 
> Can you add a comment defining `MAX_FORMAT_TESTING`, might not be immediately 
> understood without further investigation

Will do.

> test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 
> 96:
> 
>> 94: @ParameterizedTest
>> 95: @MethodSource("testCases")
>> 96: public void testRoundTrip(MessageFormat format1) {
> 
> Can we also include the original String pattern that created format1, to help 
> with debugging. I find myself wondering what the original String was.
> 
> Since technically, the full round trip is _pattern string -> MessageFormat1 
> -> pattern string -> MessageFormat2_

I would have done that but it's not (easily) possible. The `MessageFormat`'s 
are created not from format strings, but by piecing together plain text and 
sub-`Format` objects manually. This was we are sure what we're dealing with.

Trying to create format strings with multiple levels of nesting from scratch is 
too complex for my brain due to all the levels of quoting required.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1464104293
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1464104138
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1464104181
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1464104467

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v3]

2024-01-23 Thread Archie Cobbs

On Tue, 23 Jan 2024 23:00:29 GMT, Naoto Sato  wrote:

>> src/java.base/share/classes/java/text/MessageFormat.java line 556:
>> 
>>> 554:  * does not necessarily equal the previously applied pattern.
>>> 555:  *
>>> 556:  * @implNote The string returned by this method can be used to 
>>> create
>> 
>> Hmm, I'm not sure about the current note, because its not true in all cases 
>> (for example, some unknown subclass of Format). Maybe @naotoj has thoughts?
>
> Yes, my comment to the CSR was to explain the default implementation as the 
> `@implNote`. Making it explicit would be helpful to readers.

Yes my mistake, I'll change this to:

@implNote The implementation in {@link MessageFormat} returns a string
that can be used to create a new instance that is semantically equivalent
to this instance.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1464104922

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v3]

2024-01-23 Thread Archie Cobbs

On Tue, 23 Jan 2024 22:15:40 GMT, Justin Lu  wrote:

>> Archie Cobbs has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add @implNote to Javadoc for toPattern().
>
> test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 
> 90:
> 
>> 88: // A few more test cases from the PR#17416 discussion
>> 89: testRoundTrip(new MessageFormat("Test: {0,number,foo'{'#.00}"));
>> 90: testRoundTrip(new MessageFormat("Test: {0,number,foo'}'#.00}"));
> 
> Can we add some more concrete test cases, for example, escaped quotes within 
> a MessageFormat pattern (represented by doubled single quotes). 
> 
> Another funny case is something like a MessageFormat created by `"{0,number,' 
> abc }'' ' 0.00}"` which becomes `"{0,number, abc '}'''  #0.00}"` after 
> created from the value returned by toPattern() from the first MessageFormat. 
> The Strings appear really different but format equivalently, it would be nice 
> to have suspicious cases like these explicitly defined.

Sure thing - added in 58e8cc68f45.

> test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 
> 154:
> 
>> 152: // Generate a "random" MessageFormat. We do this by creating a 
>> MessageFormat with "{0}" placeholders
>> 153: // and then substituting in random DecimalFormat, DateFormat, and 
>> ChoiceFormat subformats. The goal here
>> 154: // is to avoid using pattern strings to construct formats, because 
>> they're what we're trying to check.
> 
> Can you add a general example of a randomly generated MessageFormat in the 
> comments, (I know they vary by quite a lot), but it would be hard for someone 
> to piece if together without digging into the code. And unfortunately the 
> current toString() of MessageFormat is not implemented, so although JUnit 
> prints the MessageFormat out, it's just gibberish.

Added in 58e8cc68f45 as a test case.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1464143414
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1464143622

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v4]

2024-01-23 Thread Archie Cobbs

> Please consider this fix to ensure that going from `MessageFormat` to pattern 
> string via `toPattern()` and then back via `new MessageFormat()` results in a 
> format that is equivalent to the original.
> 
> The quoting and escaping rules for `MessageFormat` pattern strings are really 
> tricky. I admit not completely understanding them. At a high level, they work 
> like this: The normal way one would "nest" strings containing special 
> characters is with straightforward recursive escaping like with the `bash` 
> command line. For example, if you want to echo `a "quoted string" example` 
> then you enter `echo "a "quoted string" example"`. With this scheme it's 
> always the "outer" layer's job to (un)escape special characters as needed. 
> That is, the echo command never sees the backslash characters.
> 
> In contrast, with `MessageFormat` and friends, nested subformat pattern 
> strings are always provided "pre-escaped". So to build an "outer" string 
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
> less just concatenated, and then only the `ChoiceFormat` option separator 
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
> 
> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
> indicates the beginning of a format argument. However, it doesn't escape `}` 
> characters. This is OK because the format string parser treats any "extra" 
> closing braces (where "extra" means not matching an opening brace) as plain 
> characters.
> 
> So far, so good... at least, until a format string containing an extra 
> closing brace is nested inside a larger format string, where the extra 
> closing brace, which was previously "extra", can now suddenly match an 
> opening brace in the outer pattern containing it, thus truncating it by 
> "stealing" the match from some subsequent closing brace.
> 
> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
> trailing closing brace in plain text. If you create a `MessageFormat` with 
> this string, you see a trailing `}` only with the second option.
> 
> However, if you then invoke `toPattern()`, the result is 
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
> "extra" closing brace is no longer quoted, it matches the opening brace at 
> the beginning of the string, and the following closing  brace, which was the 
> previous match, is now just plain text in the outer `MessageFormat` string.
> 
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...

Archie Cobbs has updated the pull request incrementally with six additional 
commits since the last revision:

 - Add more test cases and more pattern string variety.
 - Make it easier to debug & show what the test is doing.
 - Add comment explaining what MAX_FORMAT_NESTING is for.
 - Clean up code a bit by using instanceof patterns.
 - Tweak @implNote to clarify only referring to MessageFormat class.
 - Update copyright year in MessageFormat.java.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17416/files
  - new: https://git.openjdk.org/jdk/pull/17416/files/36d70b8a..58e8cc68

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=02-03

  Stats: 94 lines in 2 files changed: 50 ins; 12 del; 32 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v5]

2024-01-25 Thread Archie Cobbs

> Please consider this fix to ensure that going from `MessageFormat` to pattern 
> string via `toPattern()` and then back via `new MessageFormat()` results in a 
> format that is equivalent to the original.
> 
> The quoting and escaping rules for `MessageFormat` pattern strings are really 
> tricky. I admit not completely understanding them. At a high level, they work 
> like this: The normal way one would "nest" strings containing special 
> characters is with straightforward recursive escaping like with the `bash` 
> command line. For example, if you want to echo `a "quoted string" example` 
> then you enter `echo "a "quoted string" example"`. With this scheme it's 
> always the "outer" layer's job to (un)escape special characters as needed. 
> That is, the echo command never sees the backslash characters.
> 
> In contrast, with `MessageFormat` and friends, nested subformat pattern 
> strings are always provided "pre-escaped". So to build an "outer" string 
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
> less just concatenated, and then only the `ChoiceFormat` option separator 
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
> 
> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
> indicates the beginning of a format argument. However, it doesn't escape `}` 
> characters. This is OK because the format string parser treats any "extra" 
> closing braces (where "extra" means not matching an opening brace) as plain 
> characters.
> 
> So far, so good... at least, until a format string containing an extra 
> closing brace is nested inside a larger format string, where the extra 
> closing brace, which was previously "extra", can now suddenly match an 
> opening brace in the outer pattern containing it, thus truncating it by 
> "stealing" the match from some subsequent closing brace.
> 
> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
> trailing closing brace in plain text. If you create a `MessageFormat` with 
> this string, you see a trailing `}` only with the second option.
> 
> However, if you then invoke `toPattern()`, the result is 
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
> "extra" closing brace is no longer quoted, it matches the opening brace at 
> the beginning of the string, and the following closing  brace, which was the 
> previous match, is now just plain text in the outer `MessageFormat` string.
> 
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Change MessageFormat.toPattern() @implNote to @implSpec.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17416/files
  - new: https://git.openjdk.org/jdk/pull/17416/files/58e8cc68..7f96511e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v5]

2024-01-25 Thread Archie Cobbs

On Thu, 25 Jan 2024 22:49:39 GMT, Justin Lu  wrote:

> In terms of the CSR,...

Hi Justin, thanks again very much for the comments and careful review. The CSR 
should be updated accordingly.

-

PR Comment: https://git.openjdk.org/jdk/pull/17416#issuecomment-1911134650

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v5]

2024-01-26 Thread Archie Cobbs

On Fri, 26 Jan 2024 18:58:49 GMT, Justin Lu  wrote:

> I wasn't sure if you were waiting to make additional changes or something 
> else, but you can go ahead and re-finalize the CSR (when you are ready), now 
> that it has been reviewed.

Thanks - I wasn't sure about that. I've updated it now.

-

PR Comment: https://git.openjdk.org/jdk/pull/17416#issuecomment-1912570659

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v5]

2024-01-26 Thread Archie Cobbs

On Fri, 26 Jan 2024 20:30:42 GMT, Roger Riggs  wrote:

> The "can be used to create" seems conditional. 

It is conditional - in the sense that you don't _have_ to use it to create a 
new instance of `MessageFormat`. You can also use it for something else, in 
other words.

But I also understand how it comes across as a bit wishy-washy...

Hmm, what do you think about this wording?


@implSpec The implementation in {@link MessageFormat} returns a string that,
when passed to the {@link MessageFormat(String)} constructor, produces an
instance that is semantically equivalent to this instance.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1468155069

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v5]

2024-01-26 Thread Archie Cobbs

On Fri, 26 Jan 2024 21:28:47 GMT, Justin Lu  wrote:

>>> The "can be used to create" seems conditional. 
>> 
>> It is conditional - in the sense that you don't _have_ to use it to create a 
>> new instance of `MessageFormat`. You can also use it for something else, in 
>> other words.
>> 
>> But I also understand how it comes across as a bit wishy-washy...
>> 
>> Hmm, what do you think about this wording?
>> 
>> 
>> @implSpec The implementation in {@link MessageFormat} returns a string that,
>> when passed to the {@link MessageFormat(String)} constructor, produces an
>> instance that is semantically equivalent to this instance.
>
> Not sure which wording will ultimately be used, but if the wording ends up 
> including the constructor, it's probably worth mentioning the `applyPattern` 
> method as well.

Good point... maybe this?

@implSpec The implementation in {@link MessageFormat} returns a string that, 
when passed
to a {@code MessageFormat()} constructor or {@link #applyPattern 
applyPattern()}, produces
an instance that is semantically equivalent to this instance.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1468170808

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v6]

2024-01-28 Thread Archie Cobbs

> Please consider this fix to ensure that going from `MessageFormat` to pattern 
> string via `toPattern()` and then back via `new MessageFormat()` results in a 
> format that is equivalent to the original.
> 
> The quoting and escaping rules for `MessageFormat` pattern strings are really 
> tricky. I admit not completely understanding them. At a high level, they work 
> like this: The normal way one would "nest" strings containing special 
> characters is with straightforward recursive escaping like with the `bash` 
> command line. For example, if you want to echo `a "quoted string" example` 
> then you enter `echo "a "quoted string" example"`. With this scheme it's 
> always the "outer" layer's job to (un)escape special characters as needed. 
> That is, the echo command never sees the backslash characters.
> 
> In contrast, with `MessageFormat` and friends, nested subformat pattern 
> strings are always provided "pre-escaped". So to build an "outer" string 
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
> less just concatenated, and then only the `ChoiceFormat` option separator 
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
> 
> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
> indicates the beginning of a format argument. However, it doesn't escape `}` 
> characters. This is OK because the format string parser treats any "extra" 
> closing braces (where "extra" means not matching an opening brace) as plain 
> characters.
> 
> So far, so good... at least, until a format string containing an extra 
> closing brace is nested inside a larger format string, where the extra 
> closing brace, which was previously "extra", can now suddenly match an 
> opening brace in the outer pattern containing it, thus truncating it by 
> "stealing" the match from some subsequent closing brace.
> 
> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
> trailing closing brace in plain text. If you create a `MessageFormat` with 
> this string, you see a trailing `}` only with the second option.
> 
> However, if you then invoke `toPattern()`, the result is 
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
> "extra" closing brace is no longer quoted, it matches the opening brace at 
> the beginning of the string, and the following closing  brace, which was the 
> previous match, is now just plain text in the outer `MessageFormat` string.
> 
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Reword the @implSpec note for clarity per PR discussion.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17416/files
  - new: https://git.openjdk.org/jdk/pull/17416/files/7f96511e..a4d82c67

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=04-05

  Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v7]

2024-01-30 Thread Archie Cobbs

> Please consider this fix to ensure that going from `MessageFormat` to pattern 
> string via `toPattern()` and then back via `new MessageFormat()` results in a 
> format that is equivalent to the original.
> 
> The quoting and escaping rules for `MessageFormat` pattern strings are really 
> tricky. I admit not completely understanding them. At a high level, they work 
> like this: The normal way one would "nest" strings containing special 
> characters is with straightforward recursive escaping like with the `bash` 
> command line. For example, if you want to echo `a "quoted string" example` 
> then you enter `echo "a "quoted string" example"`. With this scheme it's 
> always the "outer" layer's job to (un)escape special characters as needed. 
> That is, the echo command never sees the backslash characters.
> 
> In contrast, with `MessageFormat` and friends, nested subformat pattern 
> strings are always provided "pre-escaped". So to build an "outer" string 
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
> less just concatenated, and then only the `ChoiceFormat` option separator 
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
> 
> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
> indicates the beginning of a format argument. However, it doesn't escape `}` 
> characters. This is OK because the format string parser treats any "extra" 
> closing braces (where "extra" means not matching an opening brace) as plain 
> characters.
> 
> So far, so good... at least, until a format string containing an extra 
> closing brace is nested inside a larger format string, where the extra 
> closing brace, which was previously "extra", can now suddenly match an 
> opening brace in the outer pattern containing it, thus truncating it by 
> "stealing" the match from some subsequent closing brace.
> 
> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
> trailing closing brace in plain text. If you create a `MessageFormat` with 
> this string, you see a trailing `}` only with the second option.
> 
> However, if you then invoke `toPattern()`, the result is 
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
> "extra" closing brace is no longer quoted, it matches the opening brace at 
> the beginning of the string, and the following closing  brace, which was the 
> previous match, is now just plain text in the outer `MessageFormat` string.
> 
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Fix misspelling in comment.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17416/files
  - new: https://git.openjdk.org/jdk/pull/17416/files/a4d82c67..76e3d6ef

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=05-06

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Re: RFR: JDK-8318761: MessageFormat pattern support for CompactNumberFormat, ListFormat, and DateTimeFormatter

2024-01-31 Thread Archie Cobbs

On Wed, 31 Jan 2024 22:24:14 GMT, Justin Lu  wrote:

> Please review this PR and [CSR](https://bugs.openjdk.org/browse/JDK-8319344) 
> which adds MessageFormat pattern support for the following subformats: 
> ListFormat, CompactNumberFormat, and DateTimeFormatter. This change is 
> intended to provide pattern support for the more recently added JDK Format 
> subclasses, as well as improving java.time formatting within i18n. The draft 
> javadoc can be viewed here: 
> https://cr.openjdk.org/~jlu/docs/api/java.base/java/text/MessageFormat.html. 
> Please see the CSR for more in-depth behavioral changes, as well as 
> limitations.
> 
> The `FormatTypes`:  dtf_date, dtf_time, dtf_datetime, pre-defined 
> DateTimeFormatter(s), and list are added.
> The `FormatStyles`: compact_short, compact_long, or, and unit are added.
> 
> For example, previously,
> 
> 
> Object[] args = {LocalDate.of(2023, 11, 16), LocalDate.of(2023, 11, 27)};
> MessageFormat.format("It was {0,date,full}, now it is {1,date,full}", args);
> 
> 
> would throw `Exception java.lang.IllegalArgumentException: Cannot format 
> given Object as a Date`
> 
> Now, a user can call
> 
> 
> MessageFormat.format("It was {0,dtf_date,full}, now it is {1,dtf_date,full}", 
> args);
> 
> 
> which returns "It was Thursday, November 16, 2023, now it is Friday, November 
> 17, 2023"

Slightly tangential question (since you're talking about adding new 
subformats)...

Has it ever been considered to add `MessageFormat` itself as a subformat option?

It's not as crazy an idea as it sounds :) Here's an example:

MessageFormat f = new MessageFormat(
  "Result: {0,choice,0#no letters|1#one letter: {1,message,'{0}'}|2#two 
letters: {1,message,'{0} and {1}'}}");
f.format(new Object[] { 2, new Object[] { "A", "B" } });// "Result: two 
letters: A and B"

-

PR Comment: https://git.openjdk.org/jdk/pull/17663#issuecomment-1920211513

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v8]

2024-02-01 Thread Archie Cobbs

> Please consider this fix to ensure that going from `MessageFormat` to pattern 
> string via `toPattern()` and then back via `new MessageFormat()` results in a 
> format that is equivalent to the original.
> 
> The quoting and escaping rules for `MessageFormat` pattern strings are really 
> tricky. I admit not completely understanding them. At a high level, they work 
> like this: The normal way one would "nest" strings containing special 
> characters is with straightforward recursive escaping like with the `bash` 
> command line. For example, if you want to echo `a "quoted string" example` 
> then you enter `echo "a "quoted string" example"`. With this scheme it's 
> always the "outer" layer's job to (un)escape special characters as needed. 
> That is, the echo command never sees the backslash characters.
> 
> In contrast, with `MessageFormat` and friends, nested subformat pattern 
> strings are always provided "pre-escaped". So to build an "outer" string 
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
> less just concatenated, and then only the `ChoiceFormat` option separator 
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
> 
> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
> indicates the beginning of a format argument. However, it doesn't escape `}` 
> characters. This is OK because the format string parser treats any "extra" 
> closing braces (where "extra" means not matching an opening brace) as plain 
> characters.
> 
> So far, so good... at least, until a format string containing an extra 
> closing brace is nested inside a larger format string, where the extra 
> closing brace, which was previously "extra", can now suddenly match an 
> opening brace in the outer pattern containing it, thus truncating it by 
> "stealing" the match from some subsequent closing brace.
> 
> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
> trailing closing brace in plain text. If you create a `MessageFormat` with 
> this string, you see a trailing `}` only with the second option.
> 
> However, if you then invoke `toPattern()`, the result is 
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
> "extra" closing brace is no longer quoted, it matches the opening brace at 
> the beginning of the string, and the following closing  brace, which was the 
> previous match, is now just plain text in the outer `MessageFormat` string.
> 
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...

Archie Cobbs has updated the pull request incrementally with four additional 
commits since the last revision:

 - Add a couple more pattern test cases suggested in review.
 - Update unit test @summary to better reflect what it tests.
 - Use curly braces consistently in patch.
 - Update copyright year in modified unit test files.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17416/files
  - new: https://git.openjdk.org/jdk/pull/17416/files/76e3d6ef..59e88126

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=06-07

  Stats: 13 lines in 4 files changed: 6 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/17416.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416

PR: https://git.openjdk.org/jdk/pull/17416

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v7]

2024-02-01 Thread Archie Cobbs

On Thu, 1 Feb 2024 23:57:24 GMT, Naoto Sato  wrote:

>> Archie Cobbs has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Fix misspelling in comment.
>
> src/java.base/share/classes/java/text/MessageFormat.java line 1675:
> 
>> 1673: i++;
>> 1674: } else
>> 1675: quoted = !quoted;
> 
> Nit: be consistent on curly-braces. Same for some other places as well.

Thanks, should be fixed.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475419901

Re: RFR: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern [v7]

2024-02-01 Thread Archie Cobbs

On Thu, 1 Feb 2024 23:02:59 GMT, Justin Lu  wrote:

>> Archie Cobbs has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Fix misspelling in comment.
>
> test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 
> 26:
> 
>> 24: /*
>> 25:  * @test
>> 26:  * @summary Verify MessageFormat.toPattern() is equivalent to original 
>> pattern
> 
> Could make the summary a little more specific for future readers

Thanks, should be fixed.

> test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 
> 70:
> 
>> 68: @BeforeAll
>> 69: public static void setup() {
>> 70: savedLocale = Locale.getDefault();
> 
> I'm not sure we need to save the default locale and restore it, unless I'm 
> missing something.

We are verifying output that includes floating point numbers, and the current 
locale affects that:

jshell> Locale.setDefault(Locale.US);

jshell> new MessageFormat("{0}").format(new Object[] { 1.23 });
$9 ==> "1.23"

jshell> Locale.setDefault(Locale.FRENCH);

jshell> new MessageFormat("{0}").format(new Object[] { 1.23 });
$11 ==> "1,23"

> test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 
> 104:
> 
>> 102: Arguments.of("{0,choice,0.0#option A: {0}|1.0#option B: 
>> {0}'}'}", "option B: 1.23}"),
>> 103: Arguments.of("{0,choice,0.0#option A: {0}|2.0#option B: 
>> {0}'}'}", "option A: 1.23"),
>> 104: 
> 
> Suggestion:
> 
> // Absurd double quote examples
> Arguments.of("Foo '}''''''''}' {0,number,bar'}' '}' } baz ", "Foo }''''} bar} 
> } 1 baz "),
> Arguments.of("'''}''{'''}''''}'"), "'}'{'}''}"),

Thanks, should be fixed.

> test/jdk/java/text/Format/MessageFormat/MessageFormatsByArgumentIndex.java 
> line 35:
> 
>> 33: import java.text.NumberFormat;
>> 34: 
>> 35: public class MessageFormatsByArgumentIndex {
> 
> Can you bump the latter copyright year for this file

Thanks, should be fixed.

> test/jdk/java/text/Format/MessageFormat/MessageRegression.java line 114:
> 
>> 112:  * MessageFormat.toPattern has weird rounding behavior.
>> 113:  */
>> 114: @Test
> 
> This file as well

Thanks, should be fixed.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475420027
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475420069
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475419965
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475420147
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475420112

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v6]

2024-02-05 Thread Archie Cobbs

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains six additional commits since the 
last revision:

 - Merge branch 'master' into JDK-7036144
 - Merge branch 'master' into JDK-7036144
 - Address third round of review comments.
 - Address second round of review comments.
 - Address review comments.
 - Fix bug in GZIPInputStream when underlying available() returns short.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17113/files
  - new: https://git.openjdk.org/jdk/pull/17113/files/4f1a0459..f4178acc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=04-05

  Stats: 38832 lines in 1817 files changed: 17181 ins; 8675 del; 12976 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Integrated: 8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat pattern

2024-02-05 Thread Archie Cobbs

On Sun, 14 Jan 2024 15:32:12 GMT, Archie Cobbs  wrote:

> Please consider this fix to ensure that going from `MessageFormat` to pattern 
> string via `toPattern()` and then back via `new MessageFormat()` results in a 
> format that is equivalent to the original.
> 
> The quoting and escaping rules for `MessageFormat` pattern strings are really 
> tricky. I admit not completely understanding them. At a high level, they work 
> like this: The normal way one would "nest" strings containing special 
> characters is with straightforward recursive escaping like with the `bash` 
> command line. For example, if you want to echo `a "quoted string" example` 
> then you enter `echo "a "quoted string" example"`. With this scheme it's 
> always the "outer" layer's job to (un)escape special characters as needed. 
> That is, the echo command never sees the backslash characters.
> 
> In contrast, with `MessageFormat` and friends, nested subformat pattern 
> strings are always provided "pre-escaped". So to build an "outer" string 
> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or 
> less just concatenated, and then only the `ChoiceFormat` option separator 
> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
> 
> The "pre-escape" escaping algorithm escapes `{` characters, because `{` 
> indicates the beginning of a format argument. However, it doesn't escape `}` 
> characters. This is OK because the format string parser treats any "extra" 
> closing braces (where "extra" means not matching an opening brace) as plain 
> characters.
> 
> So far, so good... at least, until a format string containing an extra 
> closing brace is nested inside a larger format string, where the extra 
> closing brace, which was previously "extra", can now suddenly match an 
> opening brace in the outer pattern containing it, thus truncating it by 
> "stealing" the match from some subsequent closing brace.
> 
> An example is the `MessageFormat` string `"{0,choice,0.0#option A: 
> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a 
> trailing closing brace in plain text. If you create a `MessageFormat` with 
> this string, you see a trailing `}` only with the second option.
> 
> However, if you then invoke `toPattern()`, the result is 
> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the 
> "extra" closing brace is no longer quoted, it matches the opening brace at 
> the beginning of the string, and the following closing  brace, which was the 
> previous match, is now just plain text in the outer `MessageFormat` string.
> 
> As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...

This pull request has now been integrated.

Changeset: f1f93988
Author:Archie Cobbs 
Committer: Justin Lu 
URL:   
https://git.openjdk.org/jdk/commit/f1f93988fba3de0665fc7f69a5219dd04323c6f5
Stats: 442 lines in 4 files changed: 430 ins; 0 del; 12 mod

8323699: MessageFormat.toPattern() generates non-equivalent MessageFormat 
pattern

Reviewed-by: jlu, naoto

-

PR: https://git.openjdk.org/jdk/pull/17416

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v6]

2024-02-26 Thread Archie Cobbs

On Mon, 26 Feb 2024 06:51:12 GMT, Jaikiran Pai  wrote:

>> Archie Cobbs has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains six additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-7036144
>>  - Merge branch 'master' into JDK-7036144
>>  - Address third round of review comments.
>>  - Address second round of review comments.
>>  - Address review comments.
>>  - Fix bug in GZIPInputStream when underlying available() returns short.
>
> Hello Archie, the proposal to not depend on the `available()` method of the 
> underlying `InputStream` to decide whether to read additional bytes from the 
> underlying stream to detect the "next" header seems reasonable.
> 
> What's being proposed here is that we proceed and read the underlying 
> stream's few additional bytes to detect the presence or absence of a GZIP 
> member header and if that attempt fails (with an IOException) then we 
> consider that we have reached the end of GZIP stream and just return back.  
> 
> For this change, I think we would also need to consider whether we should 
> "unread" the read bytes from the `InputStream` if those don't correspond to a 
> "next" GZIP member header. That way any underlying `InputStream` which was 
> implemented in a way that it would return availability as 0 when it knew that 
> the GZIP stream was done and yet had additional (non GZIP) data to read on 
> the underlying stream, would still be able to read that data after this 
> change. It's arguable whether we should have been doing that "unread" even 
> when we were doing the `available() > 0` check and the decision that comes 
> out of https://bugs.openjdk.org/browse/JDK-8322256 might cover that.

Hi @jaikiran,

I agree with your comments. My only question is whether we should do all of 
this in one stage or two stages.

My initial thought is to do this in two stages:
* A narrow fix for the bug described here as implemented by this PR
* A larger change (requiring a separate bug, CSR, and PR) to:
  * More precisely define and specify the expected behavior, with support for 
concatenated streams
  * Eliminate situations where we read beyond the end-of-stream (i.e., 
"unreading" if/when necessary)

The reason I think this two stage approach is appropriate is because there is 
no downside to doing it this way - that is, the problem you describe of reading 
beyond the end-of-stream is _already_ a problem in the current code, with the 
exception of the one corner case where this bug fix applies, namely, when 
`in.available()` returns zero and yet there actually _is_ more data available.

Your thoughts?

-

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1964372772

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v6]

2024-03-06 Thread Archie Cobbs

On Wed, 6 Mar 2024 14:14:56 GMT, Jaikiran Pai  wrote:

>> Archie Cobbs has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains six additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-7036144
>>  - Merge branch 'master' into JDK-7036144
>>  - Address third round of review comments.
>>  - Address second round of review comments.
>>  - Address review comments.
>>  - Fix bug in GZIPInputStream when underlying available() returns short.
>
> Hello Archie, I hope to finish off running the necessary analysis to see if 
> there is any obvious impact because of this change, in the coming days. Based 
> on the intial runs, the changes proposed in this PR look OK to me.
> 
> In the meantime, given the nature of this change, I am marking this as 
> requiring a CSR. Would you be willing to come up with the CSR text for this 
> (https://wiki.openjdk.org/display/csr/Main)?

Hi @jaikiran,

> Would you be willing to come up with the CSR text for this

Happy to - please see 
[JDK-8327489](https://bugs.openjdk.org/browse/JDK-8327489).

-

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1981533001

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v7]

2024-03-07 Thread Archie Cobbs

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains eight additional commits since 
the last revision:

 - Document the handling of concatenated streams.
 - Merge branch 'master' into JDK-7036144
 - Merge branch 'master' into JDK-7036144
 - Merge branch 'master' into JDK-7036144
 - Address third round of review comments.
 - Address second round of review comments.
 - Address review comments.
 - Fix bug in GZIPInputStream when underlying available() returns short.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17113/files
  - new: https://git.openjdk.org/jdk/pull/17113/files/f4178acc..d557d8c6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=05-06

  Stats: 123995 lines in 2894 files changed: 27722 ins; 83983 del; 12290 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v6]

2024-03-07 Thread Archie Cobbs

On Thu, Mar 7, 2024 at 2:20 PM Louis Bergelson  wrote:

> >> `GZIPInputStream`, when looking for a concatenated stream, relies on
> what the underlying `InputStream` says is how many bytes are `available()`.
> But this is inappropriate because `InputStream.available()` is just an
> estimate and is allowed (for example) to always return zero.
>
> Hello, I just wanted to check if there was anything way I could help move
> this along.
>

Hi Louis,

Thanks for offering to help. The process is slow but moving forward and we
are getting close. Because there are implications to the specified behavior
it is being reviewed for backward compatibility, etc.

-Archie

-- 
Archie L. Cobbs

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v8]

2024-03-08 Thread Archie Cobbs

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Back-out Javadoc addition; to be added in a separate issue.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17113/files
  - new: https://git.openjdk.org/jdk/pull/17113/files/d557d8c6..04072c19

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=06-07

  Stats: 19 lines in 1 file changed: 0 ins; 19 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v9]

2024-03-17 Thread Archie Cobbs

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains 10 additional commits since the 
last revision:

 - Merge branch 'master' into JDK-7036144
 - Back-out Javadoc addition; to be added in a separate issue.
 - Document the handling of concatenated streams.
 - Merge branch 'master' into JDK-7036144
 - Merge branch 'master' into JDK-7036144
 - Merge branch 'master' into JDK-7036144
 - Address third round of review comments.
 - Address second round of review comments.
 - Address review comments.
 - Fix bug in GZIPInputStream when underlying available() returns short.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17113/files
  - new: https://git.openjdk.org/jdk/pull/17113/files/04072c19..d567cef2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17113&range=07-08

  Stats: 26396 lines in 568 files changed: 13590 ins; 10571 del; 2235 mod
  Patch: https://git.openjdk.org/jdk/pull/17113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17113/head:pull/17113

PR: https://git.openjdk.org/jdk/pull/17113

Integrated: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream

2024-03-20 Thread Archie Cobbs

On Thu, 14 Dec 2023 20:15:39 GMT, Archie Cobbs  wrote:

> `GZIPInputStream`, when looking for a concatenated stream, relies on what the 
> underlying `InputStream` says is how many bytes are `available()`. But this 
> is inappropriate because `InputStream.available()` is just an estimate and is 
> allowed (for example) to always return zero.
> 
> The fix is to ignore what's `available()` and just proceed and see what 
> happens. If fewer bytes are available than required, the attempt to extend to 
> another stream is canceled just as it was before, e.g., when the next stream 
> header couldn't be read.

This pull request has now been integrated.

Changeset: d3f3011d
Author:Archie Cobbs 
Committer: Jaikiran Pai 
URL:   
https://git.openjdk.org/jdk/commit/d3f3011d56267360d65841da3550eca79cf1575b
Stats: 111 lines in 2 files changed: 96 ins; 9 del; 6 mod

7036144: GZIPInputStream readTrailer uses faulty available() test for 
end-of-stream

Reviewed-by: jpai

-

PR: https://git.openjdk.org/jdk/pull/17113

RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics

2024-04-02 Thread Archie Cobbs

`GZIPInputStream` supports reading data from multiple concatenated GZIP data 
streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
order to do this, after a GZIP trailer frame is read, it attempts to read a 
GZIP header frame and, if successful, proceeds onward to decompress the new 
stream. If the attempt to decode a GZIP header frame fails, or happens to 
trigger an `IOException`, it just ignores the trailing garbage and/or the 
`IOException` and returns EOF.

There are several issues with this:

1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
trailing garbage are not documented, much less precisely specified.
2. Ignoring trailing garbage is dubious because it could easily hide errors or 
other data corruption that an application would rather be notified about. 
Moreover, the API claims that a `ZipException` will be thrown when corrupt data 
is read, but obviously that doesn't happen in the trailing garbage scenario (so 
N concatenated streams where the last one has a corrupted header frame is 
indistinguishable from N-1 valid streams).
3. There's no way to create a `GZIPInputStream` that does _not_ support stream 
concatenation.

On the other hand, `GZIPInputStream` is an old class with lots of existing 
usage, so it's important to preserve the existing behavior, warts and all 
(note: my the definition of "existing behavior" here includes the bug fix in 
[JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).

So this patch adds a new constructor that takes two new parameters 
`allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is enabled 
by setting both to true; otherwise, they do what they sound like. In 
particular, when `allowTrailingGarbage` is false, then the underlying input 
must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
`allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
exception is guaranteed.

-

Commit messages:
 - Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - Clarify exceptions: sometimes ZipException, sometimes EOFException.
 - Merge branch 'master' into JDK-8322256
 - Javadoc wording tweaks.
 - Add "allowConcatenation" and "allowTrailingGarbage" to GZIPInputStream.

Changes: https://git.openjdk.org/jdk/pull/18385/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8322256
  Stats: 303 lines in 2 files changed: 286 ins; 2 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v2]

2024-04-12 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now contains eight commits:

 - Field name change and Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - Clarify exceptions: sometimes ZipException, sometimes EOFException.
 - Merge branch 'master' into JDK-8322256
 - Javadoc wording tweaks.
 - Add "allowConcatenation" and "allowTrailingGarbage" to GZIPInputStream.

-

Changes: https://git.openjdk.org/jdk/pull/18385/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=01
  Stats: 304 lines in 2 files changed: 287 ins; 2 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v3]

2024-04-12 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Simplify code by eliminating an impossible case.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18385/files
  - new: https://git.openjdk.org/jdk/pull/18385/files/48b38049..32ff1abd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=01-02

  Stats: 8 lines in 1 file changed: 0 ins; 6 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v4]

2024-04-15 Thread Archie Cobbs

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.

Archie Cobbs has updated the pull request incrementally with one additional 
commit since the last revision:

  Relabel "trailing garbage" as "extra bytes" to sound less accusatory.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18385/files
  - new: https://git.openjdk.org/jdk/pull/18385/files/32ff1abd..df302a62

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=02-03

  Stats: 31 lines in 2 files changed: 0 ins; 0 del; 31 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

1 2 3 >

1 - 100 of 212 matches

Mail list logo