[ 
https://issues.apache.org/jira/browse/HADOOP-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678873#comment-17678873
 ] 

Steve Loughran commented on HADOOP-18599:
-----------------------------------------

1. AzureBlobFileSystemStore shouldn't be public at all, in fact we should make 
sure the @ scope annotations say so. we are free to change those methods 
whenever we feel like and without worrying about breaking anything
2. going through the 

if it makes you feel better, the specification language we use for the fs api 
is sort of python, or more subtly a specification language similar to Z but 
written in python so people can read and write it. 
https://github.com/apache/hadoop/tree/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem
however, you do need to spend time using the api, writing tests etc before 
defining new bits of the API

the problem here is one of long term commitment. you can use the 
AzureBlobFileSystemStore if you can, just don't be surprised if it breaks for 
no obvious reason, and know if you complain we will try to say "you shouldn't 
use that". 

but if the use case is relevant, well, it is something which a "cloud first" 
list API call could offer -as it would also benefit amazon s3 and be usable 
more broadly by other apps.

> Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem`
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-18599
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18599
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/azure
>    Affects Versions: 3.3.2, 3.3.4
>            Reporter: Thomas Newton
>            Priority: Major
>
> When working with Azure blob storage listing operations can often be quite 
> slow even on storage accounts with the hierarchical namespace. 
> This can be mitigated by listing only a specific subset of directories using 
> a function like 
> [https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.html#listStatus-org.apache.hadoop.fs.Path-java.lang.String-org.apache.hadoop.fs.azurebfs.utils.TracingContext-]
> Which accepts a `startFrom` argument and lists all files in order starting 
> from there.
> I'm wondering if we could add a method to the `AzureBlobFileSystem`
> Something like:
> ```
> public FileStatus[] listStatus(final Path f, final String startFrom) throws 
> IOException
> ```
> This exposes the functionality that already exists on the underlying 
> `AzureBlobFileSystemStore`. My understanding from reading a bit of the code 
> is that users should mainly be dealing with `AzureBlobFileSystem`s and 
> `AzureBlobFileSystem` seem easier to use to me hence the benefit of exposing 
> it on the `AzureBlobFileSystem`.
>  
> I'm very un-familiar with java but I'm told that keeping strictly to 
> interfaces is strongly preferred. However I can see some examples already on 
> `AzureBlobFileSystem` that do not belong to any interface (e.g. `breakLease`) 
> so I'm hoping its acceptable to add a method like I described only for the 
> one `FileSystem` implementation.
>  
> The specific motivation for this is to unblock 
> [https://github.com/delta-io/delta/issues/1568]
> I would be willing to contribute this if maintainers think the plan is 
> reasonable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to