glesperance commented on issue #240: URL: https://github.com/apache/iceberg-python/issues/240#issuecomment-2248323987
This would be great. In the meantime I naively hacked this to get newly appended rows -- seems to work for my use case. Looking at the code, wouldn't this feature be easier to implement if plan_files allowed to pass an optional screenshot_id argument? https://github.com/apache/iceberg-python/blob/861c5631587f0d54e2550733d0f8557d57f5060a/pyiceberg/table/__init__.py#L1929-L1937 ``` from typing import Iterable, Optional, Tuple, Union from pyiceberg.table import ( DataScan, FileScanTask, Table, Properties, ALWAYS_TRUE, EMPTY_DICT, BooleanExpression ) class AppendScan(DataScan): start_snapshot_id: int | None = None @classmethod def from_table(cls, table: Table, row_filter: Union[str, BooleanExpression] = ALWAYS_TRUE, selected_fields: Tuple[str, ...] = ("*",), case_sensitive: bool = True, start_snapshot_id: Optional[int] = None, snapshot_id: Optional[int] = None, options: Properties = EMPTY_DICT, limit: Optional[int] = None, ) -> DataScan: instance = cls( table_metadata=table.metadata, io=table.io, row_filter=row_filter, selected_fields=selected_fields, case_sensitive=case_sensitive, snapshot_id=snapshot_id, options=options, limit=limit, ) instance.start_snapshot_id = start_snapshot_id return instance def plan_files(self) -> Iterable[FileScanTask]: current_plan = super().plan_files() if self.start_snapshot_id is None: return current_plan # We need to filter out the files that were already in the old snapshot try: orig_snapshot_id = self.snapshot_id self.snapshot_id = self.start_snapshot_id prev_plan = super().plan_files() return [task for task in current_plan if task not in prev_plan] # Restore the snapshot id finally: self.snapshot_id = orig_snapshot_id append_scan = AppendScan.from_table(product, start_snapshot_id=product.history()[-2].snapshot_id) append_scan.to_pandas() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org