mlondschien opened a new issue, #45161:
URL: https://github.com/apache/arrow/issues/45161

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Upon trying to reproduce a different error that occurs when using filters 
(predicates) which select none of the data, I came across this error:
   
   ```python
   import numpy as np
   import polars as pl
   import pyarrow.dataset as ds
   from pyarrow.parquet import ParquetDataset
   
   n = 1_000_000
   rng = np.random.default_rng(seed=42)
   
   data = pl.DataFrame(
       {
           "a": rng.uniform(low=0, high=2, size=n),
           "b": rng.choice(["a", "b"], n),
           "c": rng.normal(size=n),
       }
   )
   
   data.write_parquet("data.parquet", row_group_size=500_000)
   
   df = pl.from_arrow(
       ParquetDataset(
           ["data.parquet"],
           filters=~ds.field("c").is_null() & ds.field("a") >= 3,
       ).read(columns=["b"])
   )
   print(df)
   ```
   This yields
   ```
   Traceback (most recent call last):
     File "test_arrow.py", line 24, in <module>
       ).read(columns=["b"])
         ^^^^^^^^^^^^^^^^^^^
     File 
"/cluster/home/lmalte/nobackup/micromamba/envs/psutil/lib/python3.11/site-packages/pyarrow/parquet/core.py",
 line 1485, in read
       table = self._dataset.to_table(
               ^^^^^^^^^^^^^^^^^^^^^^^
     File "pyarrow/_dataset.pyx", line 553, in pyarrow._dataset.Dataset.to_table
     File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.scanner
     File "pyarrow/_dataset.pyx", line 3557, in 
pyarrow._dataset.Scanner.from_dataset
     File "pyarrow/_dataset.pyx", line 3475, in 
pyarrow._dataset.Scanner._make_scan_options
     File "pyarrow/_dataset.pyx", line 3409, in 
pyarrow._dataset._populate_builder
     File "pyarrow/_compute.pyx", line 2724, in pyarrow._compute._bind
     File "pyarrow/error.pxi", line 155, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
   pyarrow.lib.ArrowNotImplementedError: Function 'and_kleene' has no kernel 
matching input types (bool, double)
   ```
   This is using
   ```
   polars  1.14.0   py311hcc3b33b_1  conda-forge
   pyarrow       18.1.0   py311h38be061_0      conda-forge
   pyarrow-core  18.1.0   py311h4854187_0_cpu  conda-forge
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to