ryzhyk opened a new issue, #811:
URL: https://github.com/apache/iceberg-rust/issues/811

   I ran into a performance issue querying an Iceberg table in S3 via the 
datafusion provider.  The table was created using pyiceberg with the following 
schema:
   
   ```python
   schema = Schema(
       NestedField(1, "id", LongType(), required=True),
       NestedField(2, "name", StringType(), required=False),
       NestedField(3, "b", BooleanType(), required=True),
       NestedField(4, "ts", TimestampType(), required=True),
       NestedField(5, "dt", DateType(), required=True),
   )
   ```
   
   The table is partitioned by date extracted from the `ts` column:
   
   ```python
   partition_spec = PartitionSpec(
       PartitionField(
           source_id=4, field_id=1000, transform=DayTransform(), name="date"
       )
   )
   ```
   
   There are 10,000,000 records in the table spread evenly across ~200 
partitions for dates between 2023-01-01 and 2023-08-02.
   
   I query the table using `iceberg-rust` via the datafusion table provider 
using range queries of the form:
   
   ```sql
   select * from my_table where ts >= timestamp '2023-01-05T00:00:00' and ts < 
timestamp '2023-01-06T00:00:00'
   ```
   
   I expect this query to be very efficient, as it only needs to read one 
partition, however in reality it takes about as long as scanning the entire 
table with `select * from my_table` (approximately 10 seconds). It looks like 
predicate pushdown doesn't work here for some reason.
   
   Questions:
   * Is this a performance issue in `iceberg-rust` or am I doing something 
wrong?
   * Is there a better way to perform this query efficiently?
   
   I am using the latest `main` branch of this repo.
   
   Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to