[ 
https://issues.apache.org/jira/browse/HBASE-29863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069298#comment-18069298
 ] 

Hudson commented on HBASE-29863:
--------------------------------

Results for branch master
        [build #1429 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1429/]: 
(x) *{color:red}-1 overall{color}*
----
details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1429/General_20Nightly_20Build_20Report/]








(/) {color:green}+1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1429/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


> Add API to KeyValueScanner to retrieve the set of StoreFiles accessed during 
> a scan
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-29863
>                 URL: https://issues.apache.org/jira/browse/HBASE-29863
>             Project: HBase
>          Issue Type: New Feature
>          Components: API, regionserver, Scanners
>            Reporter: Himanshu Gwalani
>            Assignee: Himanshu Gwalani
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2
>
>
> *Goal:* Introduce a mechanism to track and expose the specific HFiles 
> involved in a scan operation.
> {*}Use-case{*}: This is essential for validations on client side to ensure 
> right set of files are scanned (if source of truth is available, for example: 
> snapshot data manifest during snapshot based scans), debugging performance 
> related issues and analysis on data access patterns.
> *Proposed API* Add {{Set<Path> getScannerInitializedFiles()}} to the 
> {{KeyValueScanner}} interface.
> *Implementation Details*
>  * *Capturing list of files when scanner is initialized.*
>  ** Leaf Scanners
>  *** StoreFileScanner: Returns singleton having the path of the associated 
> {{{}HFile{}}}.
>  *** SnapshotSegmentScanner / CollectionBackedScanner / SegmentScanner: 
> Returns empty set.
>  ** Composite Scanners
>  *** StoreScanner & ReversedStoreScanner: Aggregates files from all active 
> {{StoreFileScanners}}
>  *** KeyValueHeap & ReversedKeyValueHeap: Aggregates files from its internal 
> priority queue of scanners.
>  ** Abstract Scanners
>  *** NonLazyKeyValueScanner / NonReversedNonLazyKeyValueScanner: Returns 
> empty set.{*}{{*}}
>  * *Exposing via RegionScanner & TableSnapshotRecordReader*
>  ** RegionScanner: Aggregates files from all underlying StoreScanners
>  ** TableSnapshotRecordReader: Proxies the call through 
> ClientSideRegionScanner to allow MapReduce jobs to access this for 
> snapshot-based scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to