PetrasTYR opened a new issue, #1105: URL: https://github.com/apache/iceberg-python/issues/1105
### Question Hello, I have a question regarding iceberg table snapshots. I used pyiceberg to create a namespace and a table, then insert a dataframe like so: ``` from pyiceberg.catalog import load_rest from pyiceberg.schema import Schema from pyiceberg.types import StringType, NestedField, DoubleType from pyiceberg.partitioning import PartitionSpec, PartitionField import numpy as np import pandas as pd rows = 10**1 ncols = 10 countries = ["US", "CA", "UK"] attr2 = ["OPEN", "CLOSE", "LOW", "HIGH"] dates = pd.date_range("2020-01-01", "2020-12-31") data_orig = pd.DataFrame( [ { "countries": countries[i % len(countries)], "status": attr2[i % len(attr2)], "return_index": np.random.rand(), } for i in range(rows) ] ) schema = Schema( NestedField(field_id=1, name="countries", field_type=StringType(), required=False), NestedField(field_id=2, name="status", field_type=StringType(), required=False), NestedField( field_id=3, name="return_index", field_type=DoubleType(), required=False ), ) partition_spec = PartitionSpec( fields=[ PartitionField( source_id=1, field_id=1000, name="countries", transform="identity" ), PartitionField(source_id=2, field_id=1001, name="status", transform="identity"), ] ) catalog = load_rest( "rest", conf={ "uri": "http://localhost:19120/iceberg", }, ) catalog.create_namespace_if_not_exists("rpmd") tables = catalog.list_tables(namespace="rpmd") table = catalog.create_table_if_not_exists( identifier="rpmd.performance", schema=schema, partition_spec=partition_spec ) import pyarrow as pa import pyarrow.parquet as pq tbl = pa.Table.from_pandas(data_orig) table.append(tbl) ``` When i run this script a second time, it is my understanding that the `append` method would create a new commit, and in turn, a new snapshot of the rpmd.performance iceberg table, and i should be able to see a list of snapshots in the latest metadata.json file. However, i only see the latest snapshot in the array, and running `table.inspect.snapshots()` only shows 1 snapshot_id, even though i see all the relevant .avro files. May i know if there is some configuration i need to do to ensure that i can see all snapshots? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org