[
https://issues.apache.org/jira/browse/HADOOP-19543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945576#comment-17945576
]
ASF GitHub Bot commented on HADOOP-19543:
-----------------------------------------
anujmodi2021 opened a new pull request, #7632:
URL: https://github.com/apache/hadoop/pull/7632
PR in trunk: https://github.com/apache/hadoop/pull/7614
Commit CP'd:
https://github.com/apache/hadoop/commit/810c42f88cc63a8054edc5a16baeb9a90e3bd523
JIRA: https://issues.apache.org/jira/browse/HADOOP-19543
### Description of PR
On FNS-Blob, the List Blobs API is known to return duplicate entries for
non-empty explicit directories. One entry corresponds to the directory itself,
and another corresponds to the marker blob that the driver internally creates
and maintains to mark that path as a directory. We already know about this
behaviour, and it was handled to remove such duplicate entries from the set of
entries that were returned as part of current list iterations.
Due to a possible partition split, if such duplicate entries happen to be
returned in separate iterations, there is no handling on this, and the caller
might get back the result with duplicate entries, as happened in this case. The
logic to remove duplicates was designed before the realization of the partition
split.
This PR fixes this bug
### How was this patch tested?
A new test for the failing scenario was added and existing test suite was
ran to validate changes across all combinations.
> ABFS: [FnsOverBlob] Remove Duplicates from Blob Endpoint Listing Across
> Iterations
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-19543
> URL: https://issues.apache.org/jira/browse/HADOOP-19543
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.5.0, 3.4.1
> Reporter: Anuj Modi
> Assignee: Anuj Modi
> Priority: Blocker
> Labels: pull-request-available
>
> On FNS-Blob, List Blobs API is known to return duplicate entries for the
> non-empty explicit directories. One entry corresponds to the directory itself
> and another entry corresponding to the marker blob that driver internally
> creates and maintains to mark that path as a directory. We already know about
> this behaviour and it was handled to remove such duplicate entries from the
> set of entries that were returned as part current list iterations.
> Due to possible partition split if such duplicate entries happen to be
> returned in separate iteration, there is no handling on this and caller might
> get back the result with duplicate entries as happening in this case. The
> logic to remove duplicate was designed before the realization of partition
> split came.
> This PR fixes this bug
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]