camuel commented on issue #624: URL: https://github.com/apache/iceberg-rust/issues/624#issuecomment-2412544556
Does anyone has any insights on how computation heavy is the compaction workload really? Like on a beefy machine what compaction rate will be possible? Like 1GB/sec? 10GB/sec? A ball part figure? Separately what is possible compaction rate from first principles and what is typical compaction rate per node with today Iceberg impl.? I think in the overwhelming majority of usecases, data is primarily partitioned by time, and most data of a huge table is old data, of low value and absolutely not to be routinely touched on every compaction job! Am I right? Or my experience is not representative? If so, it is not the the size of the table which matters, but the rate of new data arrival between compaction jobs, as mostly those new data is be compacted and rewritten and only once in its lifetime (unless we have DML touching old data which should be extremely rare compared to routine compaction). Separately, what is exactly compaction is a bit confusing to me, should data be sorted across resulting (after compaction) parquet files in a partition, or it is enough that each resulting parquet file is only internally sorted and it is permissible for all parquet files in a partition to be all over the partition range? As per specification, it is enough to sort internally each resulting parquet file and not only that, the sort order can be different for each file that hints that there is no global sort in a partition. I looked into java compaction code and it indeed sorts it across files but not across all files in a partition but across some files in a partition which they call "a group" and "group" is not a concept that exists in the specification which puzzles me how Rust implementation should approach it. It is quite a biggie here. Assuming only a local sort within a resulting parquet file (as per current spec) and assuming the file size around 100MB (which is currently the default) and assuming partition by time and that data comes more or less chronologically so compaction is needed only for data arrived between the compaction jobs (like always in my experience) the single beefy node running compaction code implemented in rust must not find it infeasible to serve quite a huge table really. Of course there are corner cases, like partition scheme is changed and it is being asked to re-partition the entire table on the next compaction job but I don't think this will ever happen in practice. If the entire table is to be repartitioned, this won't be just scheduled for the next compaction, perhaps it will be a custom job on a separate cluster and won't be even titled "compaction". Will appreciate any feedback. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org