Fokko commented on PR #8075:
URL: https://github.com/apache/iceberg/pull/8075#issuecomment-1637141671
Nice, I'm also seeing some speedup:
```
➜ python git:(master) hyperfine --warmup 1 "python3 /tmp/benchamrk.py"
Benchmark 1: python3 /tmp/vo.py
Time (mean ± σ): 2.689 s ± 0.028 s [User: 2.799 s, System: 1.945
s]
Range (min … max): 2.659 s … 2.756 s 10 runs
➜ python git:(master) git checkout fix_avro_use_slots
Switched to branch 'fix_avro_use_slots'
Your branch is up to date with 'rustyconover/fix_avro_use_slots'.
➜ python git:(fix_avro_use_slots) hyperfine --warmup 1 "python3
/tmp/benchmark.py"
Benchmark 1: python3 /tmp/vo.py
Time (mean ± σ): 2.194 s ± 0.013 s [User: 2.370 s, System: 1.937
s]
Range (min … max): 2.175 s … 2.217 s 10 runs
```
Where `benchmark.py`:
```python
from pyiceberg.catalog import load_catalog
from pyiceberg.expressions import GreaterThanOrEqual, LessThan, And
catalog = load_catalog('local')
table = catalog.load_table('nyc.taxis')
expected_rows = 63454
df = table.scan(row_filter=And(
GreaterThanOrEqual("tpep_pickup_datetime",
"2022-01-01T00:00:00.000000+00:00"),
LessThan("tpep_pickup_datetime", "2022-01-02T00:00:00.000000+00:00"),
)).to_arrow()
assert len(df) == expected_rows, f"Got {len(df)} rows, instead of
{expected_rows}"
```
This is the taxi dataset with one year of data, and hourly partitioned
(~160kb manifest files).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]