kevinjqliu commented on code in PR #1057: URL: https://github.com/apache/iceberg-python/pull/1057#discussion_r1716133239
########## pyiceberg/io/pyarrow.py: ########## @@ -1194,6 +1194,7 @@ def _task_to_record_batches( case_sensitive: bool, name_mapping: Optional[NameMapping] = None, use_large_types: bool = True, + limit: Optional[int] = None, Review Comment: I like the #1043 way where `_task_to_record_batches` is kept as is; its just generating more batches. The function which calls `_task_to_record_batches` can keep track of the `limit` ########## pyiceberg/io/pyarrow.py: ########## @@ -1366,6 +1373,7 @@ def project_table( case_sensitive, table_metadata.name_mapping(), use_large_types, + limit, Review Comment: I think this change is enough to "pushdown the limit". `_task_to_table` will get batches from `_task_to_record_batches` and check the limit, returning a pyarrow Table when the limit is met. If the limit is not met, `_task_to_table` will just return all the batches as a pyarrow Table ########## pyiceberg/io/pyarrow.py: ########## @@ -1265,6 +1270,7 @@ def _task_to_table( case_sensitive: bool, name_mapping: Optional[NameMapping] = None, use_large_types: bool = True, + limit: Optional[int] = None, ) -> Optional[pa.Table]: batches = list( Review Comment: similar to the comment above, i think we can process the `limit` here instead of passing it down to `_task_to_record_batches` ########## pyiceberg/io/pyarrow.py: ########## @@ -1194,6 +1194,7 @@ def _task_to_record_batches( case_sensitive: bool, name_mapping: Optional[NameMapping] = None, use_large_types: bool = True, + limit: Optional[int] = None, Review Comment: "technically" we can also push down the limit to the `ds.Scanner.from_fragment`'s `batch_size` parameter, but that's currently out of scope https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html#pyarrow.dataset.Scanner.from_fragment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org