Prathyusha created HBASE-28878: ---------------------------------- Summary: Introduce Cache for SFT instances created via StoreFileTrackerFactory Key: HBASE-28878 URL: https://issues.apache.org/jira/browse/HBASE-28878 Project: HBase Issue Type: Improvement Reporter: Prathyusha
As part of HBASE-28564 the creation of HStoreFile is made SFT aware and anytime a store file is created, it need SFT instance. Now with this all the interaction of HStorefiles need SFT instance. In case of FileBasedStoreFileTracker, each instance of it loads the backed .filelist file and this can be a costly operation in S3 This Jira targets to introduce a cache layer at StoreFileTrackerFactory for SFT instances per each _TableName + Region + CF + Mode_ (Write/ReadOnly mode of SFT) More detailed thought process around the same [here|https://github.com/apache/hbase/pull/5939#discussion_r1759312918] {code:java} Every time we create a StoreFileTracker object it will have no state, and so it will either need to go to the filesystem and list the directory or read the tracker file depending on which type it is in order to initialize as soon as we try to use it. It's fine... because the original code causes IO to happen also.. however, What do you think about the possibility of reuse? This is a more general question than a comment about this particular call site. Should the StoreFileTrackerFactory cache instances and return the cached instances that match the arguments to StoreFileTrackerFactory.create() rather than make a new instance? Can StoreFileTracker instances be made thread safe so they can be cached and shared? If we have reuse, and all the relevant filesystem ops go through the StoreFileTracker, then we could potentially save a lot of filesystem or object store IO, because a reused StoreFileTracker would have the ground truth already and would not need to go to the filesystem or object store and do IO in order to e.g. return the StoreFileInfo of a given path. {code} ---- -- This message was sent by Atlassian Jira (v8.20.10#820010)