GitHub user hugokitano edited a discussion: Glue catalog updating
I'm able to write parquet and metadata/manifest files with the **Glue** catalog, but I am not able to see the catalog update with snapshots, and writing subsequently to the same table makes it clear the catalog always thinks it's the first write. Confused about how I have to tell the catalog that I've committed. ``` // instantiating catalog like so.... // let catalog_config = iceberg_catalog_glue::GlueCatalogConfig::builder() // .warehouse(format!("s3://{}/{}", bucket_name, path_prefix).to_string()) // .build(); // // let glue_catalog = iceberg_catalog_glue::GlueCatalog::new(catalog_config).await // .map_err(|e| iceberg::Error::new( // iceberg::ErrorKind::Unexpected, // format!("Failed to create Glue catalog: {}", e) // ))?; // // IcebergCatalog::Glue(glue_catalog) pub struct IcebergParquetWriteSession { writer: Box<dyn IcebergWriter>, table: Table, tokio_runtime: Arc<tokio::runtime::Runtime>, catalog: IcebergCatalog } pub fn commit_parquet_writes( mut session: IcebergParquetWriteSession ) -> Result<Vec<IcebergDataFile>, iceberg::Error> { session.tokio_runtime.block_on(async { let data_files = session.writer.close().await?; println!("Writer closed, got {} data files", data_files.len()); let tx = Transaction::new(&session.table); println!("Transaction created for table {} at {}", session.table.identifier().name(), session.table.metadata().location()); println!("Table current snapshot before commit: {:?}", session.table.metadata().current_snapshot_id()); let mut fast_append = tx.fast_append(None, vec![])?; fast_append.add_data_files(data_files.clone())?; println!("Added {} data files to transaction", data_files.len()); println!("Applying transaction..."); let updated_tx = fast_append.apply().await?; println!("Transaction applied successfully"); // I tried adding the below to see if it works, but I get `Error: FeatureUnsupported => Updating a table is not supported yet` // println!("Committing transaction to catalog..."); // match &session.catalog { // IcebergCatalog::Memory(mem_catalog) => { // updated_tx.commit(mem_catalog).await?; // }, // IcebergCatalog::Glue(glue_catalog) => { // updated_tx.commit(glue_catalog).await?; // } // } // println!("Transaction committed successfully"); let table_ident = session.table.identifier().clone(); let catalog_ref = match &session.catalog { IcebergCatalog::Memory(mem_catalog) => { let refreshed_table = mem_catalog.load_table(&table_ident).await?; println!("Table snapshot after commit: {:?}", refreshed_table.metadata().current_snapshot_id()); }, IcebergCatalog::Glue(glue_catalog) => { let refreshed_table = glue_catalog.load_table(&table_ident).await?; println!("Table snapshot after commit: {:?}", refreshed_table.metadata().current_snapshot_id()); } }; Ok(data_files) }) } ``` I get the following output. ``` Writer closed, got 1 data files Transaction created for table test_table_name at s3://chalk-staging-data/hugo-test/iceberg_warehouse/test_glue_namespace/test_table_name Table current snapshot before commit: None Added 1 data files to transaction Applying transaction... Transaction applied successfully Table snapshot after commit: None ``` In summary: with Glue/S3, I see all of the files I need, but the catalog doesn't update, I have no snapshots, and subsequent writes don't have a parent snapshot id i.e. think they are the first write. GitHub link: https://github.com/apache/iceberg-rust/discussions/1168 ---- This is an automatically sent email for issues@iceberg.apache.org. To unsubscribe, please send an email to: issues-unsubscr...@iceberg.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org