hililiwei commented on PR #7460:
URL: https://github.com/apache/iceberg/pull/7460#issuecomment-1539307538

   > @hililiwei, out of curiosity, you mention it is needed for large tables. 
In use cases you have, what's the main problem? Is it the time to analyze all 
of the metadata or is it more about compacting only fresh data?
   
   it is more about compacting only fresh data. We have scheduled jobs that 
rewrites the newly data.
   
   
   > I feel like snapshot start and end is the wrong way to go on this, instead 
do we have a way of just specifying timestamp? IE only rewrite files created 
before timestamp X ? I've been thinking about this as being part of an 
extension of rewrite datafiles that enables writing predicates on file 
properties or metadata instead of data properties.
   > 
   > Not sure if this is possible but I was wondering if we could support 
something like
   > 
   > rewrite where file.created_at < some timepoint
   
   Our current use case also rewrites data files by timestamp. The usage is as 
follows:
   ```
   CALL %s.system.rewrite_data_files(
           table => '%s', 
           options => map('start-timestamp','1682677842000', 
'end-timestamp','1682677843000')
   )
   
   CALL %s.system.rewrite_data_files(
           table => '%s', 
           options => map('end-timestamp','1682677843000')
   )
   ```
   
   > `rewrite where file.created_at < some timepoint`.
   
   This syntax is very intuitive for users, but we don’t seem to keep the 
creation time of the files in the metadata, using the snapshot time achieve the 
same effect? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to