[
https://issues.apache.org/jira/browse/HBASE-28878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prathyusha reassigned HBASE-28878:
----------------------------------
Assignee: Prathyusha
> Introduce Cache for SFT instances created via StoreFileTrackerFactory
> ---------------------------------------------------------------------
>
> Key: HBASE-28878
> URL: https://issues.apache.org/jira/browse/HBASE-28878
> Project: HBase
> Issue Type: Improvement
> Reporter: Prathyusha
> Assignee: Prathyusha
> Priority: Major
>
> As part of HBASE-28564 the creation of HStoreFile is made SFT aware and
> anytime a store file is created, it need SFT instance.
> Now with this all the interaction of HStorefiles need SFT instance.
> In case of FileBasedStoreFileTracker, each instance of it loads the backed
> .filelist file and this can be a costly operation in S3
> This Jira targets to introduce a cache layer at StoreFileTrackerFactory for
> SFT instances
> per each
> _TableName + Region + CF + Mode_ (Write/ReadOnly mode of SFT)
> More detailed thought process around the same
> [here|https://github.com/apache/hbase/pull/5939#discussion_r1759312918]
> {code:java}
> Every time we create a StoreFileTracker object it will have no state, and so
> it will either need to go to the filesystem and list the directory or read
> the tracker file depending on which type it is in order to initialize as soon
> as we try to use it.
> It's fine... because the original code causes IO to happen also.. however,
> What do you think about the possibility of reuse? This is a more general
> question than a comment about this particular call site. Should the
> StoreFileTrackerFactory cache instances and return the cached instances that
> match the arguments to StoreFileTrackerFactory.create() rather than make a
> new instance? Can StoreFileTracker instances be made thread safe so they can
> be cached and shared?
> If we have reuse, and all the relevant filesystem ops go through the
> StoreFileTracker, then we could potentially save a lot of filesystem or
> object store IO, because a reused StoreFileTracker would have the ground
> truth already and would not need to go to the filesystem or object store and
> do IO in order to e.g. return the StoreFileInfo of a given path.
> {code}
> ----
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)