Re: [I] EPIC: Rust Based Compaction [iceberg-rust]

via GitHub Mon, 14 Oct 2024 16:57:00 -0700


camuel commented on issue #624:
URL: https://github.com/apache/iceberg-rust/issues/624#issuecomment-2412544556


   Does anyone has any insights on how computation heavy is the compaction 
workload really? Like on a beefy machine what compaction rate will be possible? 
Like 1GB/sec? 10GB/sec? A ball part figure? Separately what is possible 
compaction rate from first principles and what is typical compaction rate per 
node with today Iceberg impl.?
   
   I think in the overwhelming majority of usecases, data is primarily 
partitioned by time, and most data of a huge table is old data, of low value 
and absolutely not to be routinely touched on every compaction job! Am I right? 
Or my experience is not representative? If so, it is not the the size of the 
table which matters, but the rate of new data arrival between compaction jobs, 
as mostly those new data is be compacted and rewritten and only once in its 
lifetime (unless we have DML touching old data which should be extremely rare 
compared to routine compaction).
   
   Separately, what is exactly compaction is a bit confusing to me, should data 
be sorted across resulting (after compaction) parquet files in a partition, or 
it is enough that each resulting parquet file is only internally sorted and it 
is permissible for all parquet files in a partition to be all over the 
partition range? As per specification, it is enough to sort internally each 
resulting parquet file and not only that, the sort order can be different for 
each file that hints that there is no global sort in a partition. I looked into 
java compaction code and it indeed sorts it across files but not across all 
files in a partition but across some files in a partition which they call "a 
group" and "group" is not a concept that exists in the specification which 
puzzles me how Rust implementation should approach it. It is quite a biggie 
here.
   
   Assuming only a local sort within a resulting parquet file (as per current 
spec) and assuming the file size around 100MB (which is currently the default) 
and assuming partition by time and that data comes more or less chronologically 
so compaction is needed only for data arrived between the compaction jobs (like 
always in my experience) the single beefy node running compaction code 
implemented in rust must not find it infeasible to serve quite a huge table 
really.
   
   Of course there are corner cases, like partition scheme is changed and it is 
being asked to re-partition the entire table on the next compaction job but I 
don't think this will ever happen in practice. If the entire table is to be 
repartitioned, this won't be just scheduled for the next compaction, perhaps it 
will be a custom job on a separate cluster and won't be even titled 
"compaction".
   
   Will appreciate any feedback. Thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] EPIC: Rust Based Compaction [iceberg-rust]

Reply via email to