[I] Expose postgres binary to Arrow conversion functions to Python [arrow-adbc]

via GitHub Fri, 25 Jul 2025 03:43:19 -0700


jonnor opened a new issue, #3201:
URL: https://github.com/apache/arrow-adbc/issues/3201


   ### What feature or improvement would you like to see?
   
   Hi,
   thank you for the work on the ADBC library and specifically the Python 
support for Postgres. I have tested it over the last few weeks, and for 
fetching large dataframes (time-series in our case) - it is much faster than 
psycopg with pandas. Between 5-10x throughput, and with lower CPU usage both on 
client and database side. So that is very promising.
   I am now trying to integrate it into an existing Python web application, 
which uses psycopg2 for the database driver, with SQLAlchemy for connection 
management. And gevent for concurrency. And I would need to have the ADBC 
efficient queries integrated with that system somehow. Right now proper 
integration is not really possible, since the driver uses its own IO, with 
blocking reads. So the parts using ADBC would have to duplicate connection 
management (annoying but managable), and the blocking IO prevents concurrency 
with gevent (severely reduces performance, voiding the main motivation for 
using the project).
   
   So I was wondering if it would be possible to expose functions/classes that 
would take data on the Postgresql binary data format, and convert that into 
Arrow tables. That way, one could use the existing database driver for IO, and 
avoid conflicts wrt connection management and concurrency. Most PostgreSQL 
drivers support this now, for example there is `copy_expert()` in psycopg2, 
`cursor.copy()` in psycopg3, and `copy_from_query()` in asyngpg.
   
   I believe this would greatly ease the integration of ADBC into existing 
codebases. Which might in turn increase adoption. I am aware that this approach 
_might_ leave some performance on the table, and that is acceptable in my case 
- I am pretty sure it will continue to be much faster than approaches that use 
serialization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Expose postgres binary to Arrow conversion functions to Python [arrow-adbc]

Reply via email to