myz540 commented on issue #1790: URL: https://github.com/apache/iceberg-python/issues/1790#issuecomment-2733949029
I am able to create the partition spec with `TruncateTransform` and am able to write, however, when looping over chunks of 10k records and writing, like so: ```python for i, _df in tqdm(enumerate(chunk_dataframe(df)), desc="Processing chunk"): catalog = get_rest_catalog() table = catalog.load_table((DATABASE, table_name)) smol_table = pa.Table.from_pandas(_df, schema=create_lems_pa_schema()) with table.transaction() as transaction: transaction.append(smol_table) print(f"✅ Successfully appended data for {i}") print(f"✅ Successfully committed data for {i}") print("✅ Successfully committed all data") ``` I eventually encounter this error. I've hit this error any time I need to write lots of chunks and it usually happens about an hour and a half in. I am refreshing my catalog connection on each iteration so not sure what the problem is. Any help would be appreciated ```OSError: When initiating multiple part upload for key 'data/sid=HCC1008/gene=R/00000-0-b4aef6a1-d6c0-4c09-9c08-8cd2e91957e8.parquet' in bucket 'a5fc81c2-ccf3-4f36-wbi7imrt75cabpxguuo6i8f1a7n9quse1b--table-s3': AWS Error NETWORK_CONNECTION during CreateMultipartUpload operation: curlCode: 28, Timeout was reached``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org