Re: [I] table.scan(row_filter="x IN (0, 1)") does not include the values for which x=0 when x is a DoubleType and a partition column [iceberg-python]

via GitHub Sat, 19 Apr 2025 19:27:22 -0700


jayceslesar commented on issue #1937:
URL: 
https://github.com/apache/iceberg-python/issues/1937#issuecomment-2816945275


   I did a little digging and just to be safe also tested 
`table.scan(row_filter=In("x", [0.0, 1.0, 2.0]))` which results in the same 
issue. I do however believe that this is happening when filter is pushed down 
to the parquet reading, as iceberg from what I can tell makes the correct 
pyarrow schema inside of `_task_to_record_batches`.
   
   I believe this is the case because when I print each batch in `batches = 
fragment_scanner.to_batches()` I see the following output: 
   
   ```
   Empty DataFrame
   Columns: [x, y]
   Index: []
        x    y
   0  1.0  0.0
        x    y
   0  2.0  0.0
   ```
   
   Note that we never see a value of `0.0` for some x which means that in the 
`fragment_scanner = ds.Scanner.from_fragment` call which is pushing down the 
query to arrow is likely the culprit
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] table.scan(row_filter="x IN (0, 1)") does not include the values for which x=0 when x is a DoubleType and a partition column [iceberg-python]

Reply via email to