Re: [I] Can we enable adaptive clustering? [iceberg-python]

via GitHub Tue, 18 Mar 2025 09:38:03 -0700


myz540 commented on issue #1790:
URL: 
https://github.com/apache/iceberg-python/issues/1790#issuecomment-2733949029


   I am able to create the partition spec with `TruncateTransform` and am able 
to write, however, when looping over chunks of 10k records and writing, like so:
   
   ```python
   for i, _df in tqdm(enumerate(chunk_dataframe(df)), desc="Processing chunk"):
       catalog = get_rest_catalog()
       table = catalog.load_table((DATABASE, table_name))
       smol_table = pa.Table.from_pandas(_df, schema=create_lems_pa_schema())
       with table.transaction() as transaction:
           transaction.append(smol_table)
           print(f"✅ Successfully appended data for {i}")
       print(f"✅ Successfully committed data for {i}")
   
   print("✅ Successfully committed all data")
   ```
   
   I eventually encounter this error. I've hit this error any time I need to 
write lots of chunks and it usually happens about an hour and a half in. I am 
refreshing my catalog connection on each iteration so not sure what the problem 
is. Any help would be appreciated
   
   ```OSError: When initiating multiple part upload for key 
'data/sid=HCC1008/gene=R/00000-0-b4aef6a1-d6c0-4c09-9c08-8cd2e91957e8.parquet' 
in bucket 'a5fc81c2-ccf3-4f36-wbi7imrt75cabpxguuo6i8f1a7n9quse1b--table-s3': 
AWS Error NETWORK_CONNECTION during CreateMultipartUpload operation: curlCode: 
28, Timeout was reached```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Can we enable adaptive clustering? [iceberg-python]

Reply via email to