Fokko commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886483368
########## pyiceberg/table/__init__.py: ########## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] + def _target_split_size(self) -> int: + table_value = property_as_int( + self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT + ) + return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + + def _loop_back(self) -> int: + table_value = property_as_int( + self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT + ) + return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + + def _split_open_file_cost(self) -> int: + table_value = property_as_int( + self.table_metadata.properties, + TableProperties.READ_SPLIT_OPEN_FILE_COST, + TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, + ) + return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + + def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: I have the same concern as @corleyma. For example, Java splits on row-groups to make full use of the parallelism of Spark, Trino, Hive etc. On a local machine, it makes more sense to just plow through the file itself, preferably using a native reader like PyArrow or Iceberg-Rust in the future. There are a lot of details around this API, for example, a task might point to a row-group that doesn't contain any relevant information, and we don't know until we open the file itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org