[I] adbc_ingest no possibility of overwrite the exact same elements [arrow-adbc]

via GitHub Sat, 05 Jul 2025 03:30:00 -0700


nikfio opened a new issue, #3102:
URL: https://github.com/apache/arrow-adbc/issues/3102


   ### What happened?
   
   Hello everyone,
   
   I am happily experiencing adbc engine with polars using Python language, 
communicating to and from a DuckDB instance.
   I discovered that if I write two times the exact same dataframe, for any 
possible reason.
   There is no mode available that would let me write to the same table and 
overwrite the elements that are already in.
   Because with "append" or "create_append" modes, the same elements are 
written appending exactly new elments, so I end up having doubles rows with 
exact same elements.
   Meanwhile, "replace" deletes completely the existing table and create a new 
one to insert new fresh data: but, in this case, if the new dataframe has 
elements not present in the previous table. The deletion takes away any old 
element no more available in the new dataframe, and that i think it is a waste.
   
   Long story short, I think in my humble opinion that it should be available, 
when I write a new dataframe to an axisting table of a SQL compliant DB, there 
has to be the the possibility of overwriting automatically the rows with same 
elements if already present in the existing table.
   
   Double elements example generated with append mode
   ```
       import polars as pl
       import adbc_driver_sqlite.dbapi as litedbapi
       
       # Crate example dataframe
       df = pl.DataFrame({
           "id": [1, 2, 3],
           "name": ["Alice", "Bob", "Charlie"],
           "age": [25, 30, 35]
       })
   
       con = litedbapi.connect(uri=r"db2.duckdb")
       
       # First write
       df.write_database(
           table_name="people",
           connection=con,
           if_table_exists ="append",
           engine="adbc"
       )
       
       con.commit()
       
       # Second write
       df.write_database(
           table_name="people",
           connection=con,
           if_table_exists ="append",
           engine="adbc"
       )
       
       con.commit()
   
       # Verify
       query="SELECT * FROM people"
       result = pl.read_database(query=query,
                                 connection=con)
       
       print(result)
   ```
   
   The result is
   ```
   shape: (6, 3)
   ┌─────┬─────────┬─────┐
   │ id  ┆ name    ┆ age │
   │ --- ┆ ---     ┆ --- │
   │ i64 ┆ str     ┆ i64 │
   ╞═════╪═════════╪═════╡
   │ 1   ┆ Alice   ┆ 25  │
   │ 2   ┆ Bob     ┆ 30  │
   │ 3   ┆ Charlie ┆ 35  │
   │ 1   ┆ Alice   ┆ 25  │
   │ 2   ┆ Bob     ┆ 30  │
   │ 3   ┆ Charlie ┆ 35  │
   └─────┴─────────┴─────┘
   ```
   
   
   Old elements gone away example with replace mode
   ```
   df = pl.DataFrame({
           "id": [1, 2, 3],
           "name": ["Alice", "Bob", "Charlie"],
           "age": [25, 30, 35]
       })
   
   con = litedbapi.connect(uri=r"db.duckdb")
       
       # First write
       df.write_database(
           table_name="people",
           connection=con,
           if_table_exists ="replace",
           engine="adbc"
       )
       
       con.commit()
       
       # Verify
       query="SELECT * FROM people"
       result = pl.read_database(query=query,
                                 connection=con)
       
       print(result)
   ```
   An we have here:
   ```
   shape: (3, 3)
   ┌─────┬─────────┬─────┐
   │ id  ┆ name    ┆ age │
   │ --- ┆ ---     ┆ --- │
   │ i64 ┆ str     ┆ i64 │
   ╞═════╪═════════╪═════╡
   │ 1   ┆ Alice   ┆ 25  │
   │ 2   ┆ Bob     ┆ 30  │
   │ 3   ┆ Charlie ┆ 35  │
   └─────┴─────────┴─────┘
   ```
   
   Then let's proceed with the second write with replace mode    
   ```
       # Second write
       
       df = pl.DataFrame({
           "id": [2, 3, 4],
           "name": ["Bob", "Charlie", "Nathan"],
           "age": [30, 35, 40]
       })
   
       df.write_database(
           table_name="people",
           connection=con,
           if_table_exists ="replace",
           engine="adbc"
       )
   
    # Verify
       query="SELECT * FROM people"
       result = pl.read_database(query=query,
                                 connection=con)
       
       print(result)
   ```
   
   And we have: 
   ```
   shape: (3, 3)
   ┌─────┬─────────┬─────┐
   │ id  ┆ name    ┆ age │
   │ --- ┆ ---     ┆ --- │
   │ i64 ┆ str     ┆ i64 │
   ╞═════╪═════════╪═════╡
   │ 2   ┆ Bob     ┆ 30  │
   │ 3   ┆ Charlie ┆ 35  │
   │ 4   ┆ Nathan  ┆ 40  │
   └─────┴─────────┴─────┘
   ```
   
   And so you can see the first of the first write it has gone away.
   
   Or, does anyone have any suggestion on how I could achieve that with the 
present fucntions/implementation?
   
   
   Many thanks,
   Nick
   
   ### Stack Trace
   
   _No response_
   
   ### How can we reproduce the bug?
   
   _No response_
   
   ### Environment/Setup
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] adbc_ingest no possibility of overwrite the exact same elements [arrow-adbc]

Reply via email to