ypsah opened a new issue, #46183:
URL: https://github.com/apache/arrow/issues/46183

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hi,
   
   In some (rather specific) situations, the following filter expression: 
`pyarrow.compute.field("x").isin([0.0])` incorrectly filters out matching rows.
   
   Here is a minimal reproducer:
   
   ```python
   from tempfile import TemporaryDirectory
   
   import pyarrow.compute
   import pyarrow.dataset
   import pyarrow.parquet
   
   
   data = pyarrow.table({"x": [0.0]})
   
   f0 = pyarrow.compute.field("x") == 0.0
   f1 = pyarrow.compute.field("x").isin([0.0])
   
   with TemporaryDirectory() as tmpdir:
       pyarrow.parquet.write_to_dataset(data, tmpdir)
   
       assert data == pyarrow.dataset.dataset(tmpdir).to_table()
       assert data == pyarrow.dataset.dataset(tmpdir).filter(f0).to_table()
       assert data == (actual := 
pyarrow.dataset.dataset(tmpdir).filter(f1).to_table()), actual
   ```
   
   Output:
   
   ```
   Traceback (most recent call last):
     File "/tmp/tmp.iectDMOJmW/test.py", line 18, in <module>
       assert data == (actual := 
pyarrow.dataset.dataset(tmpdir).filter(f1).to_table()), actual
   AssertionError: pyarrow.Table
   x: double
   ----
   x: [[]]
   ```
   
   This does not happen if `x` contains other values that are not `0.0` ( 
`pyarrow.table({"x": [0.0, 1.0]})` --> ✅ |  `pyarrow.table({"x": [0.0, 0.0, 
0.0]})` --> 💥).
   
   ### Component(s)
   
   Python, C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to