[jira] [Work logged] (HADOOP-17657) SequeneFile.Writer should implement StreamCapabilities

ASF GitHub Bot (Jira) Tue, 27 Apr 2021 06:02:10 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-17657?focusedWorklogId=589716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589716
 ]


ASF GitHub Bot logged work on HADOOP-17657:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Apr/21 13:01
            Start Date: 27/Apr/21 13:01
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on a change in pull request 
#2949:
URL: https://github.com/apache/hadoop/pull/2949#discussion_r621183749



##########
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestSequenceFile.java
##########
@@ -730,6 +732,33 @@ public void testSerializationAvailability() throws 
IOException {
     }
   }
 
+  @Test
+  public void testSequenceFileWriter() throws Exception {
+    FileSystem fs = FileSystem.getLocal(conf);
+    Path p = new Path(GenericTestUtils.getTempPath("testCreateUsesFSArg.seq"));
+    FSDataOutputStream out = fs.create(p);
+    FSDataOutputStream spyOut = Mockito.spy(out);
+    SequenceFile.Writer writer = SequenceFile.createWriter(
+        fs, conf, p, NullWritable.class, NullWritable.class);
+
+    writer.setOutputStream(spyOut);
+    writer.flush();
+    writer.hflush();
+    writer.hasCapability(StreamCapabilities.HFLUSH);
+    Mockito.verify(spyOut, times(1)).hflush();
+    Mockito.verify(spyOut, times(1)).flush();
+    Mockito.verify(spyOut, times(1)).hasCapability(StreamCapabilities.HFLUSH);
+
+    writer.setOutputStream(null);
+
+    // below statements should not throw NullPointerException, although output 
stream is null
+    writer.flush();
+    writer.hflush();
+    Assert.assertFalse(writer.hasCapability(StreamCapabilities.HFLUSH));
+
+    writer.close();

Review comment:
       better to use try-with-resources so even when an assertion is thrown the 
writer is closed.

##########
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestSequenceFile.java
##########
@@ -730,6 +732,33 @@ public void testSerializationAvailability() throws 
IOException {
     }
   }
 
+  @Test
+  public void testSequenceFileWriter() throws Exception {

Review comment:
       I see what you are trying to do here and think it's actually overkill. 
And as mockito backporting is an utter nightmare (believe me), I don't want 
mockito code unless need be.
   
   how about just creating a writer on the localfs (no mocking), verify that 
you can verify that the stream supports "HSYNC"; call hflush and hsync on it. 
   
   Now, a full end to end test would be to write data, flush it and verify that 
the file length is now > 0. 
   ```
   writer.write()
   writer.hflush();
   writer.hsync();
   Assertions.assertThat(fs.getFileStatus(p).getLen()).isGreaterThan(0);
   ```
   
   that should be enough. Probe passthrough and the write goes through.
   

##########
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java
##########
@@ -1051,6 +1051,11 @@ public static Option compression(CompressionType value,
       return new CompressionOption(value, codec);
     }
 
+    
@org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting

Review comment:
       I'm proposing elsewhere to not use mockito, which avoids exposing this 
operation...I'd be worried about exposing something into a broadly used API, as 
inevitably someone will use it in production




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 589716)
    Time Spent: 1h 50m  (was: 1h 40m)

> SequeneFile.Writer should implement StreamCapabilities
> ------------------------------------------------------
>
>                 Key: HADOOP-17657
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17657
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Kishen Das
>            Assignee: Kishen Das
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Following exception is thrown whenever we invoke ProtoMessageWriter.hflush on 
> S3 from Tez, which internally calls 
> org.apache.hadoop.io.SequenceFile$Writer.hflush ->  org.apache.hadoop.fs.FS 
> DataOutputStream.hflush -> S3ABlockOutputStream.hflush which is not 
> implemented and throws java.lang.UnsupportedOperationException. 
> bdffe22d96ae [mdc@18060 class="yarn.YarnUncaughtExceptionHandler" 
> level="ERROR" thread="HistoryEventHandlingThread"] Thread 
> Thread[HistoryEventHandlingThread, 5,main] threw an 
> Exception.^Mjava.lang.UnsupportedOperationException: S3A streams are not 
> Syncable^M at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.hflush(S3ABlockOutputStream.java:657)^M
>  at org.apache.hadoop.fs.FS 
> DataOutputStream.hflush(FSDataOutputStream.java:136)^M at 
> org.apache.hadoop.io.SequenceFile$Writer.hflush(SequenceFile.java:1367)^M at 
> org.apache.tez.dag.history.logging.proto.ProtoMessageWriter.hflush(ProtoMessageWr
>  iter.java:64)^M at 
> org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService.finishCurrentDag(ProtoHistoryLoggingService.java:239)^M
>  at org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService.han 
> dleEvent(ProtoHistoryLoggingService.java:198)^M at 
> org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService.loop(ProtoHistoryLoggingService.java:153)^M
>  at java.lang.Thread.run(Thread.java:748)^M
> In order to fix this issue we should implement StreamCapabilities in 
> SequenceFile.Writer. Also, we should fall back to flush(), if hflush() is not 
> supported. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HADOOP-17657) SequeneFile.Writer should implement StreamCapabilities

Reply via email to