Prathyusha created HBASE-28878:
----------------------------------

             Summary: Introduce Cache for SFT instances created via 
StoreFileTrackerFactory
                 Key: HBASE-28878
                 URL: https://issues.apache.org/jira/browse/HBASE-28878
             Project: HBase
          Issue Type: Improvement
            Reporter: Prathyusha


As part of HBASE-28564 the creation of HStoreFile is made SFT aware and anytime 
a store file is created, it need SFT instance.
Now with this all the interaction of HStorefiles need SFT instance.
In case of FileBasedStoreFileTracker, each instance of it loads the backed 
.filelist file and this can be a costly operation in S3


This Jira targets to introduce a cache layer at StoreFileTrackerFactory for SFT 
instances 
per each
_TableName + Region + CF + Mode_ (Write/ReadOnly mode of SFT)

More detailed thought process around the same 
[here|https://github.com/apache/hbase/pull/5939#discussion_r1759312918]
{code:java}
Every time we create a StoreFileTracker object it will have no state, and so it 
will either need to go to the filesystem and list the directory or read the 
tracker file depending on which type it is in order to initialize as soon as we 
try to use it.
It's fine... because the original code causes IO to happen also.. however,
What do you think about the possibility of reuse? This is a more general 
question than a comment about this particular call site. Should the 
StoreFileTrackerFactory cache instances and return the cached instances that 
match the arguments to StoreFileTrackerFactory.create() rather than make a new 
instance? Can StoreFileTracker instances be made thread safe so they can be 
cached and shared?
If we have reuse, and all the relevant filesystem ops go through the 
StoreFileTracker, then we could potentially save a lot of filesystem or object 
store IO, because a reused StoreFileTracker would have the ground truth already 
and would not need to go to the filesystem or object store and do IO in order 
to e.g. return the StoreFileInfo of a given path.
{code}
----
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to