Fokko commented on issue #1223:
URL:
https://github.com/apache/iceberg-python/issues/1223#issuecomment-2650625059
Closing this issue, https://github.com/apache/iceberg-python/pull/1388 has
been merged. Thanks everyone!
--
This is an automated message from the Apache Git Service.
To respo
Fokko closed issue #1223: Count rows as a metadata-only operation
URL: https://github.com/apache/iceberg-python/issues/1223
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To u
gli-chris-hao commented on issue #1223:
URL:
https://github.com/apache/iceberg-python/issues/1223#issuecomment-2566622940
We have the same use case and concerns about loading too much data into
memory for counting, the way I'm doing it to use
`DataScan.to_arrow_batch_reader`, and then coun
tusharchou commented on issue #1223:
URL:
https://github.com/apache/iceberg-python/issues/1223#issuecomment-2506625672
Hi @Fokko,
Thank you for helping. I attempted to implement `.count()` in `DataScan`. I
can test for it using the `SqlCatalog` in `catalog/test_sql` however when I try
Fokko commented on issue #1223:
URL:
https://github.com/apache/iceberg-python/issues/1223#issuecomment-2501946144
@tusharchou Thanks. I was noodling on this, and instead of having a
`.to_arrow()`, we could also have a `.count()` that will return the number of
rows that match the predicate.
tusharchou commented on issue #1223:
URL:
https://github.com/apache/iceberg-python/issues/1223#issuecomment-2501709107
# RCA
Hi @Visorgood,
The behavior expected here is a simple partition push-down implementation in
duck db which this pr solves for-
https://github.com/duckdb/duckd
Fokko commented on issue #1223:
URL:
https://github.com/apache/iceberg-python/issues/1223#issuecomment-2445969277
Thanks @Visorgood for reaching out here, and that's an excellent idea. We
actually already do this in a project like Datahub, see:
https://github.com/datahub-project/datahub/bl
kevinjqliu commented on issue #1223:
URL:
https://github.com/apache/iceberg-python/issues/1223#issuecomment-2400362505
I think this is an optimization for the engine side.
I want to balance "pyiceberg, the python library for iceberg" and
"pyiceberg, the engines to run queries on iceberg
kevinjqliu commented on issue #1223:
URL:
https://github.com/apache/iceberg-python/issues/1223#issuecomment-2400356510
This is a great idea! We should leverage Iceberg's robust metadata whenever
possible.
As mentioned, this would be a specific optimization for querying Iceberg
tabl