[ https://issues.apache.org/jira/browse/HBASE-28878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prathyusha reassigned HBASE-28878: ---------------------------------- Assignee: Prathyusha > Introduce Cache for SFT instances created via StoreFileTrackerFactory > --------------------------------------------------------------------- > > Key: HBASE-28878 > URL: https://issues.apache.org/jira/browse/HBASE-28878 > Project: HBase > Issue Type: Improvement > Reporter: Prathyusha > Assignee: Prathyusha > Priority: Major > > As part of HBASE-28564 the creation of HStoreFile is made SFT aware and > anytime a store file is created, it need SFT instance. > Now with this all the interaction of HStorefiles need SFT instance. > In case of FileBasedStoreFileTracker, each instance of it loads the backed > .filelist file and this can be a costly operation in S3 > This Jira targets to introduce a cache layer at StoreFileTrackerFactory for > SFT instances > per each > _TableName + Region + CF + Mode_ (Write/ReadOnly mode of SFT) > More detailed thought process around the same > [here|https://github.com/apache/hbase/pull/5939#discussion_r1759312918] > {code:java} > Every time we create a StoreFileTracker object it will have no state, and so > it will either need to go to the filesystem and list the directory or read > the tracker file depending on which type it is in order to initialize as soon > as we try to use it. > It's fine... because the original code causes IO to happen also.. however, > What do you think about the possibility of reuse? This is a more general > question than a comment about this particular call site. Should the > StoreFileTrackerFactory cache instances and return the cached instances that > match the arguments to StoreFileTrackerFactory.create() rather than make a > new instance? Can StoreFileTracker instances be made thread safe so they can > be cached and shared? > If we have reuse, and all the relevant filesystem ops go through the > StoreFileTracker, then we could potentially save a lot of filesystem or > object store IO, because a reused StoreFileTracker would have the ground > truth already and would not need to go to the filesystem or object store and > do IO in order to e.g. return the StoreFileInfo of a given path. > {code} > ---- > -- This message was sent by Atlassian Jira (v8.20.10#820010)