Fokko commented on PR #8075:
URL: https://github.com/apache/iceberg/pull/8075#issuecomment-1637141671

   Nice, I'm also seeing some speedup:
   ```
   ➜  python git:(master) hyperfine --warmup 1 "python3 /tmp/benchamrk.py"
   Benchmark 1: python3 /tmp/vo.py
     Time (mean ± σ):      2.689 s ±  0.028 s    [User: 2.799 s, System: 1.945 
s]
     Range (min … max):    2.659 s …  2.756 s    10 runs
    
   ➜  python git:(master) git checkout fix_avro_use_slots
   Switched to branch 'fix_avro_use_slots'
   Your branch is up to date with 'rustyconover/fix_avro_use_slots'.
   ➜  python git:(fix_avro_use_slots) hyperfine --warmup 1 "python3 
/tmp/benchmark.py"
   Benchmark 1: python3 /tmp/vo.py
     Time (mean ± σ):      2.194 s ±  0.013 s    [User: 2.370 s, System: 1.937 
s]
     Range (min … max):    2.175 s …  2.217 s    10 runs
   ```
   
   Where `benchmark.py`:
   
   ```python
   from pyiceberg.catalog import load_catalog
   from pyiceberg.expressions import GreaterThanOrEqual, LessThan, And
   
   catalog = load_catalog('local')
   
   table = catalog.load_table('nyc.taxis')
   
   expected_rows = 63454
   
   df = table.scan(row_filter=And(
           GreaterThanOrEqual("tpep_pickup_datetime", 
"2022-01-01T00:00:00.000000+00:00"),
           LessThan("tpep_pickup_datetime", "2022-01-02T00:00:00.000000+00:00"),
   )).to_arrow()
   
   assert len(df) == expected_rows, f"Got {len(df)} rows, instead of 
{expected_rows}"
   ```
   
   This is the taxi dataset with one year of data, and hourly partitioned 
(~160kb manifest files).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to