[
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530520#comment-15530520
]
Chris Nauroth commented on HADOOP-13560:
----------------------------------------
At this point, I've read through all of patch 006 except for the documentation
updates. Very nice patch! Here is more feedback on the code, and I'll start
reviewing the documentation changes next.
I see the checks in {{S3ADataBlocks#validateWriteArgs}} are taken from
{{java.io.OutputStream#write}}. Would you please add a note in the JavaDoc
that this is implemented specifically to match that contract?
It seems the validation check in {{DataBlock#verifyState}} would only ever
throw an exception if there was a bug in our S3A code's state transition logic,
not a recoverable situation for the client. If so, then is a runtime exception
more appropriate, such as {{IllegalStateException}}? If so, then that would
also avoid the ignored exception in {{DataBlock#enterClosedState}}.
The JavaDoc for {{DataBlock#flush}} states that it is only valid in writing
state. Does it make sense to move the validation check for that from
{{DiskBlock#flush}} up into the base class?
In {{ByteArrayBlock#startUpload}}, can the call to {{buffer.reset()}} be
removed? Resetting sets an internal counter without releasing any of the
underlying heap space, which is what the {{buffer = null}} immediately
afterwards achieves.
In the single-byte {{ByteBufferInputStream#read}}, I think checking
{{hasRemaining()}} would be more readable than handling
{{BufferUnderflowException}}, and it would look consistent with similar logic
in the multi-byte {{read}}.
Please mark {{final}} on the member fields of {{ByteBufferBlock}} and
{{DiskBlock}} where appropriate.
{code}
try {
out.flush();
out.close();
} finally {
out.close();
out = null;
}
{code}
I was a bit confused by the above code in {{DiskBlock#startUpload}}. I think
what this is trying to do (and please correct me if I'm wrong) is to make sure
we propagate an exception thrown from {{flush}}. Therefore, we can't simplify
this code to just a single {{close}} call, because even though
{{BufferedOutputStream#close}} automatically flushes, it ignores exceptions
from flush, and we'd lose those error details. If this is the intent, would
you please add a comment explaining it?
Should {{DiskBlock#startUpload}} wrap the returned stream in a
{{BufferedInputStream}}?
Is {{ForwardingInputStream}} necessary? It looks like {{FilterInputStream}},
so can {{FileDeletingInputStream}} subclass that instead?
In {{MultiPartUpload#shouldRetry}}, is {{Thread.interrupted()}} supposed to be
{{Thread.currentThread().interrupt()}}? If {{InterruptedException}} has been
thrown, then the interrupted flag has been cleared already, so I think
{{Thread.interrupted()}} would effectively be a no-op.
Can we avoid creating the {{LocalDirAllocator}} when we are not using
{{S3AOutputStream}} or {{DiskBlockFactory}}?
I'm unclear why {{MAX_THREADS}} was downtuned to 1 in
{{ITestS3ABlockingThreadPool}}. If that's intentional, please also update the
class-level comment that states 2 threads and describe why the test is still
sufficient with only 1 thread.
> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
> Key: HADOOP-13560
> URL: https://issues.apache.org/jira/browse/HADOOP-13560
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Attachments: HADOOP-13560-branch-2-001.patch,
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch,
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very
> large commit operations for committers using rename
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]