[ 
https://issues.apache.org/jira/browse/HADOOP-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023042#comment-17023042
 ] 

Steve Loughran commented on HADOOP-16711:
-----------------------------------------

* I'm still waiting for the declaration of endpoint. You know about the 
no-endpoint-no-review policy -it hasn't changed.

* what has changed since then is that we've moved to github PRs for ease of 
review -please create a PR there with this JIRA # in the title

Having had a quick glance at the patch

h2. -1 to using an existing property is something completely different.

Especially given that if you selectively choose directories to treat as 
authoritative in fs.s3a.authoritative.path the option you are using here will 
be false. So as soon as you tried to use this feature in a new deployment you'd 
find it wouldn't work, want to do the same if that path was non empty, need to 
consider that the path can list multiple buckets, etc, etc.

now, HADOOP-15990 has existed for a long time, but we've left it alone because 
of fear of backwards compatibility, especially with third party object stores. 
With your policy "don't check", we now have three modes and a switch becomes 
justifiable.

0. no check
1. v1 check
2. v2 check

I'm going to propose a property which can be used to set this, say 
'fs.s3a.bucket.probe', 

# read it via intOption() with a min value of 0, default of 2 (the one of 
HADOOP-15990), 
# use Preconditions to fail if a value > 2 is supplied
# feed to a switch to choose the operations

This could all actually be hidden in verifyBucketExists(); no need to 
complicate initialize any more than it is today.

I'm changing the title to indicate this

h2. Testing

There aren't any tests yet. We're going to need to play with the set of dir 
options to see what is there

* for our test runs, we could turn off the checks. We create a lot of stores 
ourself, after all.
* which we'd do in test/resources/core-site.xml
* and then have two tests to try the other settings on filesystems. 

One thing I am very curious about is: if we skip this test and the first 
interaction with the store is in operation which we retry on -how is it handled?

# S3AUtils.translateException must recognise this and translate it to something
# this must not be misinterpreted as a simple FileNotFoundException, as some 
operations retry on that, such as the rename and read code which tries to 
handle 404 caching.

If we skip the checks during initialisation, we had better make sure that real 
operations fail fast and meaningfully.

Which means, of course, another test.

Overall then, the patch needs a new test, say ITestBucketExistence with (at 
least) 3 tests
* verify checks 1 and 2
* verify check 0 against a filesystem we know does not exist and attempts to 
run 1+ IO call which goes to S3 and then fails. Probably makes sense to disable 
S3Guard there in case it interferes.


h2. Docs

yes, we need those too. Maybe mention it in the performance.md file



> With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs 
> init()
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-16711
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16711
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>              Labels: performance
>         Attachments: HADOOP-16711.prelim.1.patch
>
>
> When authoritative mode is enabled with s3guard, it would be good to skip 
> verifyBuckets call during S3A filesystem init(). This would save call to S3 
> during init method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to