rishav394 opened a new issue, #4363:
URL: https://github.com/apache/arrow-adbc/issues/4363
## What happened?
`fetchallarrow()` segfaults (exit 139) when the driver produces an
`ArrowArrayStream` containing types the installed PyArrow doesn't recognize. No
exception, no error message - just a process kill.
This isn't specific to one type. Any time the Arrow spec adds a new type and
a driver (built on a newer arrow-go/arrow-rs) exports it, older PyArrow
consumers will crash at `_import_from_c`. The C Data Interface is meant for
cross-version interop, so an unrecognized type should be a recoverable error,
not a segfault.
Concrete trigger: any `driverbase-go` driver (Trino, BigQuery, Redshift,
MySQL, etc.) returning a DECIMAL column. `driverbase-go` uses
`NarrowestDecimalType()` which picks Decimal64 (format `d:10,4,64`). PyArrow <
15 doesn't know this format and crashes.
Expected: `NotImplementedError` with a message like "Unsupported format
string 'd:10,4,64'. Upgrade PyArrow to >= 15.0.0."
## Stack Trace
```
Fatal Python error: Segmentation fault
Current thread 0x00000001f064df00 (most recent call first):
File ".../adbc_driver_manager/_reader.pyx", line 65 in _import_from_c
File ".../adbc_driver_manager/dbapi.py", line 1346 in fetch_arrow_table
File ".../adbc_driver_manager/dbapi.py", line 1179 in fetch_arrow_table
File ".../adbc_driver_manager/dbapi.py", line 1162 in fetchallarrow
```
Crash at `_reader.pyx:65`:
```python
reader = pyarrow.RecordBatchReader._import_from_c(int(address))
```
## How can we reproduce the bug?
No tables or data needed. Any Trino instance (or any driverbase-go driver):
```python
import faulthandler
import adbc_driver_manager.dbapi as adbc_manager
faulthandler.enable()
conn = adbc_manager.connect(
driver="trino",
db_kwargs={"uri": "https://user:pass@trino-host:443/catalog/schema"},
)
cur = conn.cursor()
cur.execute("SELECT CAST(10.1 AS DECIMAL(10,4)) AS val")
cur.fetchallarrow() # SIGSEGV
```
Not Decimal-specific. Any unknown format string in the ArrowSchema will
trigger the same crash. Decimal32/64 is just the most common real-world trigger
today.
Workarounds:
- Upgrade PyArrow to >= 15.0.0
- `CAST(col AS DECIMAL(19, scale))` - forces Decimal128 (universally
supported)
- `CAST(col AS DOUBLE)`
## Environment/Setup
- adbc-driver-manager: 1.8.0 (pip)
- PyArrow: 11.0.0 through 14.0.2 (crashes), 15.0.0+ (works)
- Driver: adbc-driver-trino 0.3.1 via `dbc install trino`
- driverbase-go: v0.0.0-20260423045143 (uses arrow-go v18.6.0)
- Platform: macOS arm64, Python 3.9
- Package manager: pip
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]