That is the only way to do it using the client API.

Just curious why you need the mapping.


On Tue, Dec 31, 2019, 00:41 Davide Vergari <[email protected]> wrote:

> Hi all,
> I need to create a block map for all files in a specific directory (and
> subdir) in HDFS.
>
> I'm using fs.listFiles API then I loop in the
> RemoteIterator[LocatedFileStatus] returned by listFiles and for each
> LocatedFileStatus I use the getFileBlockLocations api to get all the block
> ids of that file, but it takes long time because I have millions of file in
> the HDFS directory.
> I also tried to use Spark to parallelize the execution, but HDFS' API are
> not serializable.
>
> Is there a better way? I know there is the "hdfs oiv" command but I can't
> access directly the Namenode directory, also the ImageFS file could be
> outdated and I can't force the safemode to execute the saveNamespace
> command.
>
> I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)
>
> Thank you
>

Reply via email to