mwinters0 opened a new issue, #3611:
URL: https://github.com/apache/arrow-adbc/issues/3611

   ### What happened?
   
   When selecting a specific 30 second time range from my database, I get this 
pyarrow error from `cursor.fetch_arrow_table()`.  However, if I bisect this 
time range and query the first 15 seconds and the latter 15 seconds separately, 
they each work.  Additionally, if I export the full 30 second range to CSV with 
`psql` I can import it with `pyarrow.csv.read_csv()` without issue.
   
   I've examined the time range in duckdb and can't find anything unusual.  The 
rows have many near-duplicates because they are the result of several LEFT 
JOINs, which is intended.
   
   Zstd was able to compress the 3.0Gb `psql` exported time ranges down to 
1.3Mb (!!) so I've attached them here.  I had to add gzip because Github 
doesn't accept .zst files.
   
   ```
   gunzip bad.csv.zst.gz && zstd -d bad.csv.zst
   ```
   
   
[bad.csv.zst.gz](https://github.com/user-attachments/files/23078531/bad.csv.zst.gz)
   
   
[bad.text.zst.gz](https://github.com/user-attachments/files/23080501/bad.text.zst.gz)
   
   
[bad.bin.zst.gz](https://github.com/user-attachments/files/23080527/bad.bin.zst.gz)
   
   ### Stack Trace
   
   ```
     [...]
     File "/mnt/ssd/fedora/nomaste/nomaste/workflow/db.py", line 272, in 
_generate_normalized_time_chunks
       raw_chunk_table = fetch_raw_time_chunk(
           params.conn, chunk_start_date, chunk_end_date, params.topic
       )
     File "/mnt/ssd/fedora/nomaste/nomaste/workflow/db.py", line 248, in 
fetch_raw_time_chunk
       t = cur.fetch_arrow_table()
     File 
"/mnt/ssd/fedora/nomaste/.venv/lib/python3.13/site-packages/adbc_driver_manager/dbapi.py",
 line 1179, in fetch_arrow_table
       return self._results.fetch_arrow_table()
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
     File 
"/mnt/ssd/fedora/nomaste/.venv/lib/python3.13/site-packages/adbc_driver_manager/dbapi.py",
 line 1346, in fetch_arrow_table
       return _blocking_call(self.reader.read_all, (), {}, self._stmt.cancel)
     File "adbc_driver_manager/_lib.pyx", line 1749, in 
adbc_driver_manager._lib._blocking_call_impl
     File "adbc_driver_manager/_lib.pyx", line 1742, in 
adbc_driver_manager._lib._blocking_call_impl
     File "adbc_driver_manager/_reader.pyx", line 91, in 
adbc_driver_manager._reader.AdbcRecordBatchReader.read_all
     File "adbc_driver_manager/_reader.pyx", line 43, in 
adbc_driver_manager._reader._AdbcErrorHelper.check_error
     File "adbc_driver_manager/_reader.pyx", line 89, in 
adbc_driver_manager._reader.AdbcRecordBatchReader.read_all
     File "pyarrow/ipc.pxi", line 794, in pyarrow.lib.RecordBatchReader.read_all
     File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Expected last offset >= 0 but found -1976420846
   ```
   
   ### How can we reproduce the bug?
   
   _No response_
   
   ### Environment/Setup
   
   - Python 3.13.7 on x86_64.  I've tried both the `uv` managed version and the 
system versions from Fedora and Arch.
   - Postgres is running docker tag 
`timescale/timescaledb-ha:pg13.22-ts2.15.3-oss` from [this 
Dockerfile](https://github.com/timescale/timescaledb-docker-ha/blob/master/Dockerfile)
   
   ```console
   % uv tree
   Resolved 14 packages in 0.50ms
   bus2parq v0.1.0
   ├── adbc-driver-postgresql v1.8.0
   │   ├── adbc-driver-manager v1.8.0
   │   │   └── typing-extensions v4.15.0
   │   └── importlib-resources v6.5.2
   ├── backports-zstd v0.5.0
   ├── click v8.3.0
   ├── pyarrow v21.0.0
   └── pytest v8.4.2 (group: dev)
       ├── iniconfig v2.1.0
       ├── packaging v25.0
       ├── pluggy v1.6.0
       └── pygments v2.19.2
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to