Steve Loughran created HADOOP-13371:
---------------------------------------
Summary: S3A globber to use bulk listObject call over recursive
directory scan
Key: HADOOP-13371
URL: https://issues.apache.org/jira/browse/HADOOP-13371
Project: Hadoop Common
Issue Type: Sub-task
Components: fs, fs/s3
Affects Versions: 2.8.0
Reporter: Steve Loughran
HADOOP-13208 produces O(1) listing of directory trees in
{{FileSystem.listStatus}} calls, but doesn't do anything for
{{FileSystem.globStatus()}}, which uses a completely different codepath, one
which does a selective recursive scan by pattern matching as it goes down,
filtering out those patterns which don't match. Cost is O(matching-directories)
+ cost of examining the files.
It should be possible to do the glob status listing in S3A not through the
filtered treewalk, but through a list + filter operation. This would be an
O(files) lookup *before any filtering took place*.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]