[ 
https://issues.apache.org/jira/browse/HBASE-28878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prathyusha reassigned HBASE-28878:
----------------------------------

    Assignee: Prathyusha

> Introduce Cache for SFT instances created via StoreFileTrackerFactory
> ---------------------------------------------------------------------
>
>                 Key: HBASE-28878
>                 URL: https://issues.apache.org/jira/browse/HBASE-28878
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prathyusha
>            Assignee: Prathyusha
>            Priority: Major
>
> As part of HBASE-28564 the creation of HStoreFile is made SFT aware and 
> anytime a store file is created, it need SFT instance.
> Now with this all the interaction of HStorefiles need SFT instance.
> In case of FileBasedStoreFileTracker, each instance of it loads the backed 
> .filelist file and this can be a costly operation in S3
> This Jira targets to introduce a cache layer at StoreFileTrackerFactory for 
> SFT instances 
> per each
> _TableName + Region + CF + Mode_ (Write/ReadOnly mode of SFT)
> More detailed thought process around the same 
> [here|https://github.com/apache/hbase/pull/5939#discussion_r1759312918]
> {code:java}
> Every time we create a StoreFileTracker object it will have no state, and so 
> it will either need to go to the filesystem and list the directory or read 
> the tracker file depending on which type it is in order to initialize as soon 
> as we try to use it.
> It's fine... because the original code causes IO to happen also.. however,
> What do you think about the possibility of reuse? This is a more general 
> question than a comment about this particular call site. Should the 
> StoreFileTrackerFactory cache instances and return the cached instances that 
> match the arguments to StoreFileTrackerFactory.create() rather than make a 
> new instance? Can StoreFileTracker instances be made thread safe so they can 
> be cached and shared?
> If we have reuse, and all the relevant filesystem ops go through the 
> StoreFileTracker, then we could potentially save a lot of filesystem or 
> object store IO, because a reused StoreFileTracker would have the ground 
> truth already and would not need to go to the filesystem or object store and 
> do IO in order to e.g. return the StoreFileInfo of a given path.
> {code}
> ----
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to