wmoustafa commented on PR #9830:
URL: https://github.com/apache/iceberg/pull/9830#issuecomment-2125945879

   > @wmoustafa, Read this today, was wondering if there is something we can 
utilize from CDC (considering iceberg has support for that) perspective ? how 
expensive the refreshes of a PB size tables are and what is the ideal frequency 
of updates in this model, if you can share some datapoints ? rewrite to get 
incremental refresh by computing deltas between the snapshots and then joining 
it with other deltas and having union of those does seems user-friendly though
   
   It really depends on the query and the size of the delta and whole table 
etc. There is an extension of that work that is currently taking place to get 
an idea about the cost of some basic queries (e.g., a few joins/aggregations + 
filters & projections), and coming up with a reasonable cost model (including 
choosing to not perform incremental at all if incremental is deemed more 
expensive).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to