[jira] [Commented] (HADOOP-19400) Expand specification and contract test coverage for InputStream reads.

ASF GitHub Bot (Jira) Mon, 10 Feb 2025 09:58:06 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925664#comment-17925664
 ]


ASF GitHub Bot commented on HADOOP-19400:
-----------------------------------------

steveloughran commented on code in PR #7367:
URL: https://github.com/apache/hadoop/pull/7367#discussion_r1949623287


##########
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md:
##########
@@ -175,6 +175,24 @@ must block until at least one byte is returned. Thus, for 
any data source
 of length greater than zero, repeated invocations of this `read()` operation
 will eventually read all the data.
 
+#### Implementation Notes
+
+1. If the caller passes a `null` buffer, then an unchecked exception is 
thrown. The base JDK
+`InputStream` implementation throws `NullPointerException`. HDFS historically 
used
+`IllegalArgumentException`. Either of these are acceptable.
+1. If the caller passes a negative value for `offset`, then 
`IndexOutOfBoundsException` is thrown.
+1. If the caller passes a negative value for `length`, then an unchecked 
exception is thrown. The
+base JDK `InputStream` implementation throws `IndexOutOfBoundsException`. HDFS 
historically used
+`IllegalArgumentException`. Either of these are acceptable.
+1. If the caller passes an `offset + length` that would run past the length of 
`buffer`, then
+`IndexOutOfBoundsException` is thrown.
+1. A read of `length` 0 is a no-op, and the returned `result` is 0. No 
exception is thrown, assuming
+all other arguments are valid.
+1. Reads through any method are expected to return the same data.

Review Comment:
   ooh, good one this. It's implicit in the model of data as an array of bytes, 
but yes, nice to call out.



##########
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md:
##########
@@ -175,6 +175,24 @@ must block until at least one byte is returned. Thus, for 
any data source
 of length greater than zero, repeated invocations of this `read()` operation
 will eventually read all the data.
 
+#### Implementation Notes

Review Comment:
   1. is there a way to add these specification statements in the python 
statements? as that's designed to be what people write tests off.
   2. Please you the SHALL/MUST/MAYR than other terms "is expected to" etc. 
Yes, yours is the better prose, but we want no ambiguity here



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractOpenTest.java:
##########
@@ -408,4 +411,137 @@ public void testFloatingPointLength() throws Throwable {
         .isEqualTo(len);
   }
 
+  @Test
+  public void testInputStreamReadNullBuffer() throws Throwable {
+    // The JDK base InputStream (and by extension LocalFSFileInputStream) 
throws
+    // NullPointerException. Historically, DFSInputStream has thrown 
IllegalArgumentException
+    // instead. Allow either behavior.
+    describe("Attempting to read into a null buffer should throw 
IllegalArgumentException or " +
+        "NullPointerException");
+    Path path = methodPath();
+    FileSystem fs = getFileSystem();
+    int len = 4096;
+    createFile(fs, path, true,
+        dataset(len, 0x40, 0x80));
+    try (FSDataInputStream is = fs.openFile(path).build().get()) {
+      Assertions.assertThatThrownBy(() -> is.read(null, 0, 10))
+          .isInstanceOfAny(IllegalArgumentException.class, 
NullPointerException.class);

Review Comment:
   not seen this before, interesting.
   I wonder if we could extend intercept() to take a list of classes.
   
   I prefer intercept because it does two things assertj doesn't
   * returns the assert for future analysis
   * if there is no exception raised, returns the result of any 
lambda-expression which doesn't return void. 





> Expand specification and contract test coverage for InputStream reads.
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-19400
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19400
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: documentation, fs, test
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Major
>              Labels: pull-request-available
>
> This issue is a spin-off from HADOOP-19389, specifically [this code review 
> discussion|https://github.com/apache/hadoop/pull/7291#discussion_r1920495312].
>  We can enhance the FS specification and contract tests to cover expected 
> semantics of the {{InputStream}} single-byte and multi-byte read methods:
> * Multi-byte read should validate the arguments passed to it, according to 
> the pattern established in the JDK base {{InputStream}} class.
> * You should get the same bytes whether going through single-byte or 
> multi-byte read.
> * It is legal to mix calls to single-byte and multi-byte read, and this 
> should also yield the same bytes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19400) Expand specification and contract test coverage for InputStream reads.

Reply via email to