adamreeve opened a new issue, #43146:
URL: https://github.com/apache/arrow/issues/43146
### Describe the bug, including details regarding any error messages,
version, and platform.
PyArrow correctly raises an error if trying to cast an array containing
nulls to a non-nullable field:
```python
import pyarrow as pa
schema = pa.schema([
pa.field("x", pa.int64(), nullable=True)])
table = pa.Table.from_pydict({
"x": [1, None, 3],
}, schema=schema)
new_schema = pa.schema([
pa.field("x", pa.int64(), nullable=False)])
new_table = table.cast(new_schema)
```
This raises:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/table.pxi", line 4455, in pyarrow.lib.Table.cast
ValueError: Casting field 'x' with null values to non-nullable
```
But this doesn't raise any error:
```python
schema = pa.schema([
pa.field("x", pa.list_(pa.field("", pa.int64(), nullable=True)),
nullable=False)])
table = pa.Table.from_pydict({
"x": [[1, None, 3], [], [4, 5]]
}, schema=schema)
new_schema = pa.schema([
pa.field("x", pa.list_(pa.field("", pa.int64(), nullable=False)),
nullable=False)])
new_table = table.cast(new_schema)
print(new_table)
```
```
pyarrow.Table
x: list<: int64 not null> not null
child 0, : int64 not null
----
x: [[[1,null,3],[],[4,5]]]
```
I can also write this table to Parquet without any error, but then reading
it fails:
```python
pq.write_table(new_table, 'data.parquet')
read = pq.read_table('data.parquet')
```
This raises:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/home/adam/dev/virtualenvs/ml/lib64/python3.12/site-packages/pyarrow/parquet/core.py",
line 1811, in read_table
return dataset.read(columns=columns, use_threads=use_threads,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/adam/dev/virtualenvs/ml/lib64/python3.12/site-packages/pyarrow/parquet/core.py",
line 1454, in read
table = self._dataset.to_table(
^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_dataset.pyx", line 562, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 3804, in
pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 154, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Length spanned by list offsets (5) larger than
values array (length 4)
```
This is with PyArrow 16.1.0 and Python 3.12 on Fedora 39 Linux.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]