jackye1995 commented on issue #6781:
URL: https://github.com/apache/iceberg/issues/6781#issuecomment-1425214434

   Thanks for the explanation!
   
   I am not sure how Delta leverages its logs. Does each log has a unique ID? 
Is that useable by end users? For Iceberg, users can query and do time travel 
by snapshot ID, and users can search for snapshot ID by system table 
`snapshots`. Is there any similar feature in Delta?
   
   If the log ID is an internal concept, then I would opt for just solution 2. 
Even if log ID is available for user to use, I would still say we should 
prioritize solution 2, because as an end user, I should not really care about 
the starting delta log version when I want to do an migration. It should just 
work. I won't even expose any configurations at this moment and just treat the 
file not found as a bug we need to fix by your proposal:
   
   > we can catch the IOException when trying to build the DataFile and skip 
the whole snapshot if any parquet file can not be found. Specifically, we 
should do this when there has been no version migrated yet. If there are some 
successfully migrated snapshot earlier, then the IOException must be caused by 
something else and we shall not skip the version as delta logs are consecutive. 
   
   Also I am curious is that the same experience in Databricks Delta? I think 
it should not be, because there needs to be a process to keep the delta log 
size short. @ericlgoodman do you know anything about this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to