[
https://issues.apache.org/jira/browse/HADOOP-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023042#comment-17023042
]
Steve Loughran commented on HADOOP-16711:
-----------------------------------------
* I'm still waiting for the declaration of endpoint. You know about the
no-endpoint-no-review policy -it hasn't changed.
* what has changed since then is that we've moved to github PRs for ease of
review -please create a PR there with this JIRA # in the title
Having had a quick glance at the patch
h2. -1 to using an existing property is something completely different.
Especially given that if you selectively choose directories to treat as
authoritative in fs.s3a.authoritative.path the option you are using here will
be false. So as soon as you tried to use this feature in a new deployment you'd
find it wouldn't work, want to do the same if that path was non empty, need to
consider that the path can list multiple buckets, etc, etc.
now, HADOOP-15990 has existed for a long time, but we've left it alone because
of fear of backwards compatibility, especially with third party object stores.
With your policy "don't check", we now have three modes and a switch becomes
justifiable.
0. no check
1. v1 check
2. v2 check
I'm going to propose a property which can be used to set this, say
'fs.s3a.bucket.probe',
# read it via intOption() with a min value of 0, default of 2 (the one of
HADOOP-15990),
# use Preconditions to fail if a value > 2 is supplied
# feed to a switch to choose the operations
This could all actually be hidden in verifyBucketExists(); no need to
complicate initialize any more than it is today.
I'm changing the title to indicate this
h2. Testing
There aren't any tests yet. We're going to need to play with the set of dir
options to see what is there
* for our test runs, we could turn off the checks. We create a lot of stores
ourself, after all.
* which we'd do in test/resources/core-site.xml
* and then have two tests to try the other settings on filesystems.
One thing I am very curious about is: if we skip this test and the first
interaction with the store is in operation which we retry on -how is it handled?
# S3AUtils.translateException must recognise this and translate it to something
# this must not be misinterpreted as a simple FileNotFoundException, as some
operations retry on that, such as the rename and read code which tries to
handle 404 caching.
If we skip the checks during initialisation, we had better make sure that real
operations fail fast and meaningfully.
Which means, of course, another test.
Overall then, the patch needs a new test, say ITestBucketExistence with (at
least) 3 tests
* verify checks 1 and 2
* verify check 0 against a filesystem we know does not exist and attempts to
run 1+ IO call which goes to S3 and then fails. Probably makes sense to disable
S3Guard there in case it interferes.
h2. Docs
yes, we need those too. Maybe mention it in the performance.md file
> With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs
> init()
> ---------------------------------------------------------------------------------
>
> Key: HADOOP-16711
> URL: https://issues.apache.org/jira/browse/HADOOP-16711
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Rajesh Balamohan
> Priority: Minor
> Labels: performance
> Attachments: HADOOP-16711.prelim.1.patch
>
>
> When authoritative mode is enabled with s3guard, it would be good to skip
> verifyBuckets call during S3A filesystem init(). This would save call to S3
> during init method.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]