steveloughran commented on code in PR #2149:
URL: https://github.com/apache/hadoop/pull/2149#discussion_r2251063163
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -4086,25 +4175,41 @@ public boolean exists(Path f) throws IOException {
}
/**
- * Override superclass so as to add statistic collection.
+ * Optimized probe for a path referencing a dir.
+ * Even though it is optimized to a single HEAD, applications
+ * should not over-use this method...it is all too common.
* {@inheritDoc}
*/
@Override
@SuppressWarnings("deprecation")
public boolean isDirectory(Path f) throws IOException {
Review Comment:
not good. file a PR, including what you can of the stack of checks.
What is probably happening is that the method calling this is assuming all
the paths are directories (which this call is optimised for) but as all the
paths are files it ends up doing
LIST path
so yes, it would be a step backwards. The code should be calling
getFileStatus to really get everything about a file.
how, why are yo providing a list of many may files, given that spark expects
to be working on a directory at a time?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]