GitHub user hugokitano edited a discussion: Glue catalog updating

I'm able to write parquet and metadata/manifest files with the **Glue** 
catalog, but I am not able to see the catalog update with snapshots, and 
writing subsequently to the same table makes it clear the catalog always thinks 
it's the first write. Confused about how I have to tell the catalog that I've 
committed.

```

// instantiating catalog like so....
// let catalog_config = iceberg_catalog_glue::GlueCatalogConfig::builder()
//     .warehouse(format!("s3://{}/{}", bucket_name, path_prefix).to_string())
//     .build();
//
// let glue_catalog = 
iceberg_catalog_glue::GlueCatalog::new(catalog_config).await
//     .map_err(|e| iceberg::Error::new(
//         iceberg::ErrorKind::Unexpected,
//         format!("Failed to create Glue catalog: {}", e)
//     ))?;
//
// IcebergCatalog::Glue(glue_catalog)
    
pub struct IcebergParquetWriteSession {
    writer: Box<dyn IcebergWriter>,
    table: Table,
    tokio_runtime: Arc<tokio::runtime::Runtime>,
    catalog: IcebergCatalog
}

pub fn commit_parquet_writes(
    mut session: IcebergParquetWriteSession
) -> Result<Vec<IcebergDataFile>, iceberg::Error> {
    session.tokio_runtime.block_on(async {
        let data_files = session.writer.close().await?;
        println!("Writer closed, got {} data files", data_files.len());

        let tx = Transaction::new(&session.table);
        println!("Transaction created for table {} at {}",
                 session.table.identifier().name(),
                 session.table.metadata().location());

        println!("Table current snapshot before commit: {:?}",
                 session.table.metadata().current_snapshot_id());

        let mut fast_append = tx.fast_append(None, vec![])?;

        fast_append.add_data_files(data_files.clone())?;
        println!("Added {} data files to transaction", data_files.len());

        println!("Applying transaction...");
        let updated_tx = fast_append.apply().await?;
        println!("Transaction applied successfully");
        
        // I tried adding the below to see if it works, but I get `Error: 
FeatureUnsupported => Updating a table is not supported yet`
        // println!("Committing transaction to catalog...");
        // match &session.catalog {
        //     IcebergCatalog::Memory(mem_catalog) => {
        //         updated_tx.commit(mem_catalog).await?;
        //     },
        //     IcebergCatalog::Glue(glue_catalog) => {
        //         updated_tx.commit(glue_catalog).await?;
        //     }
        // }
        // println!("Transaction committed successfully");
        
        let table_ident = session.table.identifier().clone();
        let catalog_ref = match &session.catalog {
            IcebergCatalog::Memory(mem_catalog) => {
                let refreshed_table = 
mem_catalog.load_table(&table_ident).await?;
                println!("Table snapshot after commit: {:?}", 
refreshed_table.metadata().current_snapshot_id());
            },
            IcebergCatalog::Glue(glue_catalog) => {
                let refreshed_table = 
glue_catalog.load_table(&table_ident).await?;
                println!("Table snapshot after commit: {:?}", 
refreshed_table.metadata().current_snapshot_id());
            }
        };

        Ok(data_files)
    })
}
```

I get the following output.
```
Writer closed, got 1 data files
Transaction created for table test_table_name at 
s3://chalk-staging-data/hugo-test/iceberg_warehouse/test_glue_namespace/test_table_name
Table current snapshot before commit: None
Added 1 data files to transaction
Applying transaction...
Transaction applied successfully
Table snapshot after commit: None
```

In summary: with Glue/S3, I see all of the files I need, but the catalog 
doesn't update, I have no snapshots, and subsequent writes don't have a parent 
snapshot id i.e. think they are the first write.



GitHub link: https://github.com/apache/iceberg-rust/discussions/1168

----
This is an automatically sent email for issues@iceberg.apache.org.
To unsubscribe, please send an email to: issues-unsubscr...@iceberg.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to