hililiwei commented on PR #7460: URL: https://github.com/apache/iceberg/pull/7460#issuecomment-1539307538
> @hililiwei, out of curiosity, you mention it is needed for large tables. In use cases you have, what's the main problem? Is it the time to analyze all of the metadata or is it more about compacting only fresh data? it is more about compacting only fresh data. We have scheduled jobs that rewrites the newly data. > I feel like snapshot start and end is the wrong way to go on this, instead do we have a way of just specifying timestamp? IE only rewrite files created before timestamp X ? I've been thinking about this as being part of an extension of rewrite datafiles that enables writing predicates on file properties or metadata instead of data properties. > > Not sure if this is possible but I was wondering if we could support something like > > rewrite where file.created_at < some timepoint Our current use case also rewrites data files by timestamp. The usage is as follows: ``` CALL %s.system.rewrite_data_files( table => '%s', options => map('start-timestamp','1682677842000', 'end-timestamp','1682677843000') ) CALL %s.system.rewrite_data_files( table => '%s', options => map('end-timestamp','1682677843000') ) ``` > `rewrite where file.created_at < some timepoint`. This syntax is very intuitive for users, but we don’t seem to keep the creation time of the files in the metadata, using the snapshot time achieve the same effect? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org