gli-chris-hao commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2566622940
We have the same use case and concerns about loading too much data into memory for counting, the way I'm doing it to use `DataScan.to_arrow_batch_reader`, and then count number of rows by iterating the batches, this should avoid memory issue for large datascan: ``` count = 0 for batch in datascan.to_arrow_batch_reader(): count += batch.num_rows ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org