Re: [I] Count rows as a metadata-only operation [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2650625059 Closing this issue, https://github.com/apache/iceberg-python/pull/1388 has been merged. Thanks everyone! -- This is an automated message from the Apache Git Service. To respo

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2025-02-11 Thread via GitHub
Fokko closed issue #1223: Count rows as a metadata-only operation URL: https://github.com/apache/iceberg-python/issues/1223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-12-31 Thread via GitHub
gli-chris-hao commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2566622940 We have the same use case and concerns about loading too much data into memory for counting, the way I'm doing it to use `DataScan.to_arrow_batch_reader`, and then coun

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-11-28 Thread via GitHub
tusharchou commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2506625672 Hi @Fokko, Thank you for helping. I attempted to implement `.count()` in `DataScan`. I can test for it using the `SqlCatalog` in `catalog/test_sql` however when I try

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-11-26 Thread via GitHub
Fokko commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2501946144 @tusharchou Thanks. I was noodling on this, and instead of having a `.to_arrow()`, we could also have a `.count()` that will return the number of rows that match the predicate.

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-11-26 Thread via GitHub
tusharchou commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2501709107 # RCA Hi @Visorgood, The behavior expected here is a simple partition push-down implementation in duck db which this pr solves for- https://github.com/duckdb/duckd

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-10-29 Thread via GitHub
Fokko commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2445969277 Thanks @Visorgood for reaching out here, and that's an excellent idea. We actually already do this in a project like Datahub, see: https://github.com/datahub-project/datahub/bl

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-10-08 Thread via GitHub
kevinjqliu commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2400362505 I think this is an optimization for the engine side. I want to balance "pyiceberg, the python library for iceberg" and "pyiceberg, the engines to run queries on iceberg

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-10-08 Thread via GitHub
kevinjqliu commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2400356510 This is a great idea! We should leverage Iceberg's robust metadata whenever possible. As mentioned, this would be a specific optimization for querying Iceberg tabl