adamreeve opened a new issue, #43146: URL: https://github.com/apache/arrow/issues/43146
### Describe the bug, including details regarding any error messages, version, and platform. PyArrow correctly raises an error if trying to cast an array containing nulls to a non-nullable field: ```python import pyarrow as pa schema = pa.schema([ pa.field("x", pa.int64(), nullable=True)]) table = pa.Table.from_pydict({ "x": [1, None, 3], }, schema=schema) new_schema = pa.schema([ pa.field("x", pa.int64(), nullable=False)]) new_table = table.cast(new_schema) ``` This raises: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/table.pxi", line 4455, in pyarrow.lib.Table.cast ValueError: Casting field 'x' with null values to non-nullable ``` But this doesn't raise any error: ```python schema = pa.schema([ pa.field("x", pa.list_(pa.field("", pa.int64(), nullable=True)), nullable=False)]) table = pa.Table.from_pydict({ "x": [[1, None, 3], [], [4, 5]] }, schema=schema) new_schema = pa.schema([ pa.field("x", pa.list_(pa.field("", pa.int64(), nullable=False)), nullable=False)]) new_table = table.cast(new_schema) print(new_table) ``` ``` pyarrow.Table x: list<: int64 not null> not null child 0, : int64 not null ---- x: [[[1,null,3],[],[4,5]]] ``` I can also write this table to Parquet without any error, but then reading it fails: ```python pq.write_table(new_table, 'data.parquet') read = pq.read_table('data.parquet') ``` This raises: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/adam/dev/virtualenvs/ml/lib64/python3.12/site-packages/pyarrow/parquet/core.py", line 1811, in read_table return dataset.read(columns=columns, use_threads=use_threads, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/adam/dev/virtualenvs/ml/lib64/python3.12/site-packages/pyarrow/parquet/core.py", line 1454, in read table = self._dataset.to_table( ^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/_dataset.pyx", line 562, in pyarrow._dataset.Dataset.to_table File "pyarrow/_dataset.pyx", line 3804, in pyarrow._dataset.Scanner.to_table File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Length spanned by list offsets (5) larger than values array (length 4) ``` This is with PyArrow 16.1.0 and Python 3.12 on Fedora 39 Linux. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org