[I] SnapshotTableProcedure to migrate iceberg tables from one namespace to another [iceberg]

via GitHub Thu, 02 May 2024 11:25:18 -0700


Gowthami03B opened a new issue, #10262:
URL: https://github.com/apache/iceberg/issues/10262


   ### Feature Request / Improvement
   
   Hello
   
   The current snapshot procedure 
(https://iceberg.apache.org/docs/nightly/spark-procedures/?h=spark_catalog#snapshot)
 seems to be helpful in only migrating from external Hive to iceberg tables.  
   
   But we have a unique use case where we want to **migrate** _some of our 
tables from one namespace to another_ and later run 'alter schema operations' 
(which is metadata only) that would have worked perfectly for us with the 
"snapshot" procedure since it utilizes the underlying data files while having 
the new table's metadata in a new location. 
   The rest of tables in the old namespace would have to be backfilled as we 
have major changes, but we would avoid a bunch of effort and storage 
space(talking TB's here) if we could use **snapshot** procedure.
   
   ```
   spark_jdbc_config = {
       "spark.sql.extensions": 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
       "spark.sql.catalog.my_catalog": "org.apache.iceberg.spark.SparkCatalog",
       "spark.sql.catalog.my_catalog.catalog-impl": 
"org.apache.iceberg.jdbc.JdbcCatalog",
       "spark.sql.catalog.my_catalog.uri": "jdbc:comdb2://",
       "spark.sql.catalog.my_catalog.warehouse": "s3a://abc",   
    }
   spark.sql(
               f"""
                   CALL my_catalog.system.snapshot(
                       source_table => 'ns1.src_dataset' **# 
SparkConnectGrpcException: (org.apache.iceberg.exceptions.NoSuchTableException) 
Cannot not find source table 'datasets.equitynamr'**
                       table => 'ns2.src_dataset',
                       location => 's3a://abc'
                   )
               """
           )
   ```
   my_catalog here is the JDBC catalog that holds both the namespaces ns1, ns2 
and all of our tables.
   
   When I try to provide source_table as fully qualified name 
(my_catalog.ns1.src_dataset), I get this  -IllegalArgumentException: Cannot 
snapshot a table that isn't in the session catalog (i.e. spark_catalog). Found 
source catalog: test.
   
   I also tried explicitly creating a table with a catalog entry for 
'spark_catalog', and that resulted in -  IllegalArgumentException: Cannot use 
non-v1 table 'ns1.src_datasets' as a source
   
   Is there any workaround to achieve my use case? Does this seem like a valid 
request that can be accommodated?
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] SnapshotTableProcedure to migrate iceberg tables from one namespace to another [iceberg]

Reply via email to