davlee1972 opened a new issue, #2094: URL: https://github.com/apache/arrow-adbc/issues/2094
### What happened? To work around the memory limitations bug (https://github.com/apache/arrow-adbc/issues/1997) with adbc_ingest (Version 1.1.0) I started running adbc_ingest(data=@recordbatchreader) on one parquet file at a time instead of a dataset of parquet files.. ``` adbc_ingest(data=my_dataset.scanner().to_reader()) vs: for file in my_dataset.files: file_dataset = ds.dataset(file) adbc_ingest(data=file_dataset.scanner().to_reader()) ``` The final row counts are coming up 5% short. I think there might be some sort of issue with starting a fresh temporary staging area and running puts with the same file names clashing with the prior adbc_ingest() operation.. I'm going to do some further testing by adding 1 minute delays between calling adbc_ingest()..  I've got 120 gigs worth of parquet files organized in partitioned directories with file sizes ~3 gig eachs and 10 row groups per file.. ``` drwxrwsr-x 2 4096 Aug 20 20:10 risk_date_yyyymmdd=20240806 drwxrwsr-x 2 4096 Aug 20 20:10 risk_date_yyyymmdd=20240807 drwxrwsr-x 2 4096 Aug 20 20:10 risk_date_yyyymmdd=20240808 etc. etc. etc.. ``` ### Stack Trace _No response_ ### How can we reproduce the bug? _No response_ ### Environment/Setup _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org