keith-turner commented on issue #4538:
URL: https://github.com/apache/accumulo/issues/4538#issuecomment-2104870011
Chatted w/ @cshannon about this. One challenge we identified is large
compaction operations. For example if a large number of external compaction
processes are temporarily stood up and then a large compaction operation is
initiated, it may start to generate large numbers of files in a short time that
should be deleted. In the worst case if the GC delays deleting files it could
result in the compaction operation filling up DFS. This implies that the delay
may need to adjust depending on DFS free space and the amount of files that
could be deleted, but are delayed. Dynamically adjusting the delay makes it
harder for the scan servers to reason about it.
Another thing discsussed was race conditions. This change would
conceptually create another set of files the GC is tracking.
* File references (existing set)
* GC candidates (existing set)
* Delayed delete files (new set)
* Deleting files (existing set)
When the GC moves a file from the new delayed_delete set to the
deleting_files set, it must be done in a such a way that considers what the
scan servers are using and not have race conditions. We talked through a few
ways to do this, but those possible solutions had race conditions. So still
need to figure out something for that.
This offers a potential speed up to scan servers for writing out scan server
refs to the metadata table. The scan server will still have to read tablet
files from the metadata table, which is an extra cost over tablet servers.
That leads to another potential solution to lower scan time as some way to pre
load tablets on select scan servers, that may be something to explore in tandem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]