[ 
https://issues.apache.org/jira/browse/HADOOP-19543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945576#comment-17945576
 ] 

ASF GitHub Bot commented on HADOOP-19543:
-----------------------------------------

anujmodi2021 opened a new pull request, #7632:
URL: https://github.com/apache/hadoop/pull/7632

   PR in trunk: https://github.com/apache/hadoop/pull/7614
   Commit CP'd: 
https://github.com/apache/hadoop/commit/810c42f88cc63a8054edc5a16baeb9a90e3bd523
   JIRA: https://issues.apache.org/jira/browse/HADOOP-19543
   
   ### Description of PR
   On FNS-Blob, the List Blobs API is known to return duplicate entries for 
non-empty explicit directories. One entry corresponds to the directory itself, 
and another corresponds to the marker blob that the driver internally creates 
and maintains to mark that path as a directory. We already know about this 
behaviour, and it was handled to remove such duplicate entries from the set of 
entries that were returned as part of current list iterations.
   
   Due to a possible partition split, if such duplicate entries happen to be 
returned in separate iterations, there is no handling on this, and the caller 
might get back the result with duplicate entries, as happened in this case. The 
logic to remove duplicates was designed before the realization of the partition 
split.
   
   This PR fixes this bug
   
   ### How was this patch tested?
   A new test for the failing scenario was added and existing test suite was 
ran to validate changes across all combinations.




> ABFS: [FnsOverBlob] Remove Duplicates from Blob Endpoint Listing Across 
> Iterations
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-19543
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19543
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.5.0, 3.4.1
>            Reporter: Anuj Modi
>            Assignee: Anuj Modi
>            Priority: Blocker
>              Labels: pull-request-available
>
> On FNS-Blob, List Blobs API is known to return duplicate entries for the 
> non-empty explicit directories. One entry corresponds to the directory itself 
> and another entry corresponding to the marker blob that driver internally 
> creates and maintains to mark that path as a directory. We already know about 
> this behaviour and it was handled to remove such duplicate entries from the 
> set of entries that were returned as part current list iterations.
> Due to possible partition split if such duplicate entries happen to be 
> returned in separate iteration, there is no handling on this and caller might 
> get back the result with duplicate entries as happening in this case. The 
> logic to remove duplicate was designed before the realization of partition 
> split came.
> This PR fixes this bug



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to