JarroVGIT commented on issue #12263:
URL: https://github.com/apache/iceberg/issues/12263#issuecomment-3697574473

   Just to butt in, let me try to summarise:
   
   - Zero-copy-cloning creates a table with an independent lifecycle of that of 
the source table. This is both on a catalog level (for example, access control) 
as on the storage level (e.g. when the source table is dropped, the clone 
should not get corrupted). 
   - Branching in Iceberg does **not** create a table with an independent 
lifecycle, but rather another version of the same logical table. As [per the 
documentation](https://iceberg.apache.org/docs/1.10.0/branching/#schema-selection-with-branches-and-tags),
 a prudent difference is that when you write data to a branch of a table, the 
table-level schema must be used. When changing the schema of the table in the 
main branch, this will automatically change the schema of the branched version 
as well. 
   
   I can see a few issues with supporting this (although I would LOVE for this 
to become a feature). 
   
   1. Any maintenance activity on the source (or actually, also on the clone) 
might accidentally delete data files that are referenced by the other table. 
This is a though problem, but I might have an idea on this. [1]
   2. The clone is a completely new table, but its data files are in the base 
location of the source table. The new storage location should not interfere 
with the source table, so metadata must be written to a new location, but the 
manifests will point to a "foreign" location. I can see how this might cause 
issues in REST catalogs that do credential vending, as this is (afaik) based on 
the base path of a table.
   
   [1]: I think the spec can be extended somewhat to support this using tags 
and additional table metadata. The key is to create a double reference (not 
sure if this is a **good**  idea, but it certainly is **an** idea). The source 
table should know somehow that a clone was created from a certain snapshot-id, 
and the clone should know what the source table was. I imagine a workflow as 
something like:
   1. Generate a new table UUID for the clone.
   2. Create a tag on the source, extended with a "cloned-to-ids" array of 
table references, include the cloned table UUID.
   3. Copy the relevant table metadata (not the whole history, independent 
lifecycle and all, just everything for the last snapshot-id) and store it in a 
new location. 
   4. Add some sort of identifier that this table was a clone and reference 
both the table and the metadata file it cloned from. I can think of two ways.
     - Either add a new field to the table metadata like "cloned-from-id" with 
the table UUID of the source table.
     - Or, add a tag with a "cloned-from-id" field.
   
   Now, this in combination with a few rules should bring us a whole end:
   - Maintenance must never touch files outside the base path of a table (to 
prevent the dropping of a clone to result in a deleted data files in the source 
table).
   - Snapshots referenced in tags with a non-empty "cloned-to-ids" array or a 
non-empty "cloned-from-id" field are never to be expired. 
   - Table drops (especially PURGE) should respect still referenced 
snapshot-id's (not sure if this is feasible to be honest, seems very expensive 
to determine what is, and what isn't safe to delete from storage).
   
   Now, we can have independent lifecycles:
   - I can clone a table, and query it, without copying data files, delete 
files, puffin files or even manifests and the manifest list.
   - Doing things to the source table has no impact on the target table.
   - Adding data to the target table, has no impact on the source table. 
   - Dropping the source table should only delete non-clone-tagged 
data/delete/manifest files.
   - Dropping the clone should remove the clone UUID from the array on the tag 
in the table's metadata if the table still exists.
   - If the snapshot-id (the one that was cloned) is expired in the clone, it 
should remove the clone UUID from the array on the tag if the source table 
still exists.
   
   The big issue I see with this is lingering files: right now a table can be 
dropped and purged, because all the files are in the same base location and 
it's pretty much guaranteed that nobody depends on any of the files. By 
preventing deletes through tags (if possible at all), it still is possible that 
files are **never** deleted. For example, in the event of:
   1. Clone a table.
   2. Drop and purge the source (the cloned snapshot-id is somehow retained)
   3. Drop and purge the clone (the files in the new base location are deleted).
   This would result in lingering files in the source table. It is also not 
possible to just delete the "left over files" in the source location with a 
clone-purge, even if the "cloned-to-ids" array has only one element; there 
could be another (later) version of that table that was also cloned which 
relies on the same files as our original clone. Due to the immutability of 
files, this is inevitable I think. 
   
   I tried to do this without a catalog, but the more I was writing this up, 
the more I got convinced that this is, in fact, something that would require a 
catalog. Catalogs are better positioned to keep track of all this, but the very 
next question we would get is on "how to clone cross-catalog" probably, which 
is (I think) impossible to codify currently. Or, maybe, somehow introduce a new 
file in a well-known location that only lists snapshot-ids -> clones mappings 
in an ascending manner, which is never purged until the last cloned table 
removes its own (last) reference from it? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to