[GitHub] [iceberg] W-I-D-EE opened a new issue, #8339: Export to Long Term Storage and Re Loading

via GitHub Wed, 16 Aug 2023 11:36:43 -0700


W-I-D-EE opened a new issue, #8339:
URL: https://github.com/apache/iceberg/issues/8339


   ### Query engine
   
   Spark 3.2.3
   
   ### Question
   
   We have a scenario where we need to export data files into long term tape 
storage, but still need to maintain the ability to re-add those datafiles if 
the data in question is needed. Based on our current understanding of Iceberg 
our procedure is as follows.
   
   Exporting
   1. Copy Data Files and its partition folders to tape storage.
   2. Execute the deleteFile API on each of the exported datafiles to remove 
them from the iceberg table
   
   Reloading
   1. Copy File Structure back into the folder of the iceberg table data folder.
   2. Execute the add_files on the partition folders that were copied into the 
system.
   
   Based on what i described does anyone see potential issues with this 
approach. Is there something better recommended? One thing i guess im concerned 
about is how add_files would handle partition/spec evolution. Simple example 
being that lets say when we exported the data we had a bucket size of 16, but a 
year later when we went to reimport the data our table spec now uses a bucket 
size of 32. Is this a problem? Would we need to essentially rewrite the 
archived data files to match the current existing spec?
   
   This is probably an unorthodox setup, but my situation is isolated 
environments with limited data storage resources, so there is a need to be able 
to move data around as its needed, but more importantly to make room for new 
data being generated.
   
   Appreciate any feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] W-I-D-EE opened a new issue, #8339: Export to Long Term Storage and Re Loading

Reply via email to