[
https://issues.apache.org/jira/browse/HADOOP-19400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925664#comment-17925664
]
ASF GitHub Bot commented on HADOOP-19400:
-----------------------------------------
steveloughran commented on code in PR #7367:
URL: https://github.com/apache/hadoop/pull/7367#discussion_r1949623287
##########
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md:
##########
@@ -175,6 +175,24 @@ must block until at least one byte is returned. Thus, for
any data source
of length greater than zero, repeated invocations of this `read()` operation
will eventually read all the data.
+#### Implementation Notes
+
+1. If the caller passes a `null` buffer, then an unchecked exception is
thrown. The base JDK
+`InputStream` implementation throws `NullPointerException`. HDFS historically
used
+`IllegalArgumentException`. Either of these are acceptable.
+1. If the caller passes a negative value for `offset`, then
`IndexOutOfBoundsException` is thrown.
+1. If the caller passes a negative value for `length`, then an unchecked
exception is thrown. The
+base JDK `InputStream` implementation throws `IndexOutOfBoundsException`. HDFS
historically used
+`IllegalArgumentException`. Either of these are acceptable.
+1. If the caller passes an `offset + length` that would run past the length of
`buffer`, then
+`IndexOutOfBoundsException` is thrown.
+1. A read of `length` 0 is a no-op, and the returned `result` is 0. No
exception is thrown, assuming
+all other arguments are valid.
+1. Reads through any method are expected to return the same data.
Review Comment:
ooh, good one this. It's implicit in the model of data as an array of bytes,
but yes, nice to call out.
##########
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md:
##########
@@ -175,6 +175,24 @@ must block until at least one byte is returned. Thus, for
any data source
of length greater than zero, repeated invocations of this `read()` operation
will eventually read all the data.
+#### Implementation Notes
Review Comment:
1. is there a way to add these specification statements in the python
statements? as that's designed to be what people write tests off.
2. Please you the SHALL/MUST/MAYR than other terms "is expected to" etc.
Yes, yours is the better prose, but we want no ambiguity here
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractOpenTest.java:
##########
@@ -408,4 +411,137 @@ public void testFloatingPointLength() throws Throwable {
.isEqualTo(len);
}
+ @Test
+ public void testInputStreamReadNullBuffer() throws Throwable {
+ // The JDK base InputStream (and by extension LocalFSFileInputStream)
throws
+ // NullPointerException. Historically, DFSInputStream has thrown
IllegalArgumentException
+ // instead. Allow either behavior.
+ describe("Attempting to read into a null buffer should throw
IllegalArgumentException or " +
+ "NullPointerException");
+ Path path = methodPath();
+ FileSystem fs = getFileSystem();
+ int len = 4096;
+ createFile(fs, path, true,
+ dataset(len, 0x40, 0x80));
+ try (FSDataInputStream is = fs.openFile(path).build().get()) {
+ Assertions.assertThatThrownBy(() -> is.read(null, 0, 10))
+ .isInstanceOfAny(IllegalArgumentException.class,
NullPointerException.class);
Review Comment:
not seen this before, interesting.
I wonder if we could extend intercept() to take a list of classes.
I prefer intercept because it does two things assertj doesn't
* returns the assert for future analysis
* if there is no exception raised, returns the result of any
lambda-expression which doesn't return void.
> Expand specification and contract test coverage for InputStream reads.
> ----------------------------------------------------------------------
>
> Key: HADOOP-19400
> URL: https://issues.apache.org/jira/browse/HADOOP-19400
> Project: Hadoop Common
> Issue Type: Improvement
> Components: documentation, fs, test
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Priority: Major
> Labels: pull-request-available
>
> This issue is a spin-off from HADOOP-19389, specifically [this code review
> discussion|https://github.com/apache/hadoop/pull/7291#discussion_r1920495312].
> We can enhance the FS specification and contract tests to cover expected
> semantics of the {{InputStream}} single-byte and multi-byte read methods:
> * Multi-byte read should validate the arguments passed to it, according to
> the pattern established in the JDK base {{InputStream}} class.
> * You should get the same bytes whether going through single-byte or
> multi-byte read.
> * It is legal to mix calls to single-byte and multi-byte read, and this
> should also yield the same bytes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]