adamreeve opened a new issue, #43146:
URL: https://github.com/apache/arrow/issues/43146

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   PyArrow correctly raises an error if trying to cast an array containing 
nulls to a non-nullable field:
   
   ```python
   import pyarrow as pa
   
   schema = pa.schema([
       pa.field("x", pa.int64(), nullable=True)])
   
   table = pa.Table.from_pydict({
           "x": [1, None, 3],
       }, schema=schema)
   
   new_schema = pa.schema([
       pa.field("x", pa.int64(), nullable=False)])
   
   new_table = table.cast(new_schema)
   ```
   
   This raises:
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/table.pxi", line 4455, in pyarrow.lib.Table.cast
   ValueError: Casting field 'x' with null values to non-nullable
   ```
   
   But this doesn't raise any error:
   ```python
   schema = pa.schema([
       pa.field("x", pa.list_(pa.field("", pa.int64(), nullable=True)), 
nullable=False)])
   
   table = pa.Table.from_pydict({
           "x": [[1, None, 3], [], [4, 5]]
       }, schema=schema)
   
   new_schema = pa.schema([
       pa.field("x", pa.list_(pa.field("", pa.int64(), nullable=False)), 
nullable=False)])
   
   new_table = table.cast(new_schema)
   
   print(new_table)
   ```
   
   ```
   pyarrow.Table
   x: list<: int64 not null> not null
     child 0, : int64 not null
   ----
   x: [[[1,null,3],[],[4,5]]]
   ```
   
   I can also write this table to Parquet without any error, but then reading 
it fails:
   ```python
   pq.write_table(new_table, 'data.parquet')
   read = pq.read_table('data.parquet')
   ```
   
   This raises:
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/home/adam/dev/virtualenvs/ml/lib64/python3.12/site-packages/pyarrow/parquet/core.py",
 line 1811, in read_table
       return dataset.read(columns=columns, use_threads=use_threads,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/adam/dev/virtualenvs/ml/lib64/python3.12/site-packages/pyarrow/parquet/core.py",
 line 1454, in read
       table = self._dataset.to_table(
               ^^^^^^^^^^^^^^^^^^^^^^^
     File "pyarrow/_dataset.pyx", line 562, in pyarrow._dataset.Dataset.to_table
     File "pyarrow/_dataset.pyx", line 3804, in 
pyarrow._dataset.Scanner.to_table
     File "pyarrow/error.pxi", line 154, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Length spanned by list offsets (5) larger than 
values array (length 4)
   ```
   
   This is with PyArrow 16.1.0 and Python 3.12 on Fedora 39 Linux.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to