danielballan opened a new issue, #2066: URL: https://github.com/apache/arrow-adbc/issues/2066
### What would you like help with? Hello @paleolimbot, we briefly spoke at SciPy a month ago. @cpcloud and @gforsyth sent me your way. My colleauge @skarakuzu and I have the following application: ``` Science Experiment ---> ~10Hz rows of data over HTTP ---> Database ``` In our use case, we are interested in reading partial datasets while an experiment is still ongoing, but we are not especially sensitive to real-time access; a little lag is acceptable. We would like to use Postgres for facility-scale "production" and SQLite for small deployments and dev/test. We think it makes sense to use ADBC to get the data into Postgres and SQLite. Most of our data is simple tabular data with basic types. This is a simple sketch that works: ```python import os import pandas import pyarrow import adbc_driver_postgresql.dbapi import adbc_driver_sqlite.dbapi conn = adbc_driver_sqlite.dbapi.connect("test.sqlite") with conn.cursor() as cur: cur.execute("DROP TABLE IF EXISTS example") dfs = [ pandas.DataFrame({"A": [1, 2, 3], "B": [10., 20., 30.]}), pandas.DataFrame({"A": [4, 5, 6], "B": [40., 50., 60.]}), ] # Create a table and append to it. with conn.cursor() as cur: for df in dfs: table = pyarrow.Table.from_pandas(df) cur.adbc_ingest("example", table, mode="create_append") cur.execute("SELECT * FROM example") print(cur.fetchall()) conn.close() ``` In rare cases, we have detectors that produce a small "waveform" of data, a small variable-length list, alongisde some readings, like this: ```python dfs = [ pandas.DataFrame({"A": [[1, 2, 3], [1, 2], [1, 2, 3, 4]], "B": [10., 20., 30.]}), pandas.DataFrame({"A": [[4, 5, 6], [4, 5, 6, 7], [4]], "B": [40., 50., 60.]}), ] ``` Expand for a complete runnable example with this data: <details> ```python import os import pandas import pyarrow import adbc_driver_postgresql.dbapi import adbc_driver_sqlite.dbapi conn = adbc_driver_sqlite.dbapi.connect("test.sqlite") with conn.cursor() as cur: cur.execute("DROP TABLE IF EXISTS example") dfs = [ pandas.DataFrame({"A": [[1, 2, 3], [1, 2], [1, 2, 3, 4]], "B": [10., 20., 30.]}), pandas.DataFrame({"A": [[4, 5, 6], [4, 5, 6, 7], [4]], "B": [40., 50., 60.]}), ] # Create a table and append to it. with conn.cursor() as cur: for df in dfs: table = pyarrow.Table.from_pandas(df) cur.adbc_ingest("example", table, mode="create_append") cur.execute("SELECT * FROM example") print(cur.fetchall()) conn.close() ``` </details> As you would expect this fails with a clear error message: ``` adbc_driver_manager.NotSupportedError: NOT_IMPLEMENTED: Column 0 has unsupported type list ``` It looks like support for lists was recently added for Postgres in https://github.com/apache/arrow-adbc/pull/1962 but not yet released. We have tried without success to cobble together a C++ example inserting lists into PostgreSQL on `main`. I wonder if you could sketch a short runnable example to help us get started. (Let us know if you want to see what we've tried.) We would happy to contribute to the C++ documentation for ADBC once get something working. Once #1962 is released, should we expect lists to work from Python as well, as it wraps the C/C++, or are more changes needed on the Python side to make that work? Finally, we're interested in adding support for lists in SQLite. As SQLite is loose about types, perhaps one reasonable way to do this would be with a JSON column. I wonder what you think of that approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org