avm19 opened a new issue, #2410:
URL: https://github.com/apache/arrow-adbc/issues/2410

   ### What happened?
   
   Related: PR #868 
   Workaround: Do not use timezone-aware types.
   
   I discovered this by seeing unexpected warnings in my Postgres logs:
   ```
   WARNING: there is no transaction in progress
   ```
   
   Note that the warning is in the database logs, not in any output of the 
client.
   
   It took me a while to reproduce, because I was looking in all the wrong 
places, but eventually I realised that a timezone is to blame (I did not expect 
that `adbc_driver_postgresql` treats tz-aware and tz-naive types so 
differently).
   
   <details>
   <summary>Python code to reproduce</summary>
   
   ```python
   import datetime
   import adbc_driver_postgresql.dbapi
   
   dt_naive = datetime.datetime.now()  # TZ-naive works OK
   dt_aware = datetime.datetime.now(tz=datetime.UTC)  # TZ-aware causes trouble
   
   q = "INSERT INTO test123 VALUES ($1, $2);"
   with adbc_driver_postgresql.dbapi.connect(uri) as conn:
       with conn.cursor() as cursor:
           cursor.execute('DROP TABLE IF EXISTS test123;')
           cursor.execute('CREATE TABLE test123 AS (SELECT 123 AS id, NOW() AS 
ts);')
       conn.commit()  # OK
       with conn.cursor() as cursor:
           cursor.executemany(q, [(111, dt_naive)])
       conn.commit()  # OK
       with conn.cursor() as cursor:
           cursor.executemany(q, [(111, dt_aware)])
       conn.commit()  # WARNING: there is no transaction in progress
   ```
   </details>
   
   Here is the relevant part of the database log with my comments:
   <details>
   <summary>Postgres Log</summary>
   
   ```log
   2025-01-05 21:36:54 
UTC:172.12.123.123(40398):[unknown]@[unknown]:[5990]:LOG: connection received: 
host=172.12.123.123 port=40398
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
connection authenticated: identity="postgres" method=md5 
(/rdsdbdata/config/pg_hba.conf:13)
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
connection authorized: user=postgres database=postgres SSL enabled 
(protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256)
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
statement: BEGIN TRANSACTION
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.117 ms
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.032 ms parse <unnamed>: DROP TABLE IF EXISTS test123
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.008 ms bind <unnamed>: DROP TABLE IF EXISTS test123
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
execute <unnamed>: DROP TABLE IF EXISTS test123
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 1.855 ms
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.119 ms parse <unnamed>: CREATE TABLE test123 AS (SELECT 123 AS id, 
NOW() AS ts)
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.009 ms bind <unnamed>: CREATE TABLE test123 AS (SELECT 123 AS id, 
NOW() AS ts)
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
execute <unnamed>: CREATE TABLE test123 AS (SELECT 123 AS id, NOW() AS ts)
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 2.439 ms
   
   
   ######    conn.commit()  starts here:
   
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
statement: COMMIT; BEGIN TRANSACTION
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 1.288 ms
   
   
   ######         cursor.executemany(q, [(111, dt_naive)]) starts here:
   
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.113 ms parse <unnamed>: INSERT INTO test123 VALUES ($1, $2);
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.046 ms bind <unnamed>: INSERT INTO test123 VALUES ($1, $2);
   2025-01-05 21:36:54 
UTC:172.12.123.123(40398):postgres@postgres:[5990]:DETAIL: parameters: $1 = 
'111', $2 = '2025-01-05 21:36:53.957838'
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
execute <unnamed>: INSERT INTO test123 VALUES ($1, $2);
   2025-01-05 21:36:54 
UTC:172.12.123.123(40398):postgres@postgres:[5990]:DETAIL: parameters: $1 = 
'111', $2 = '2025-01-05 21:36:53.957838'
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.076 ms
   
   
   ######    conn.commit()  starts here:
   
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
statement: COMMIT; BEGIN TRANSACTION
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 5.405 ms
   
   ######         cursor.executemany(q, [(111, dt_aware)]) starts here:
   
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
statement: SELECT current_setting('TIMEZONE')
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.181 ms
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
statement: SET TIME ZONE 'UTC'
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.059 ms
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.040 ms parse <unnamed>: INSERT INTO test123 VALUES ($1, $2);
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.033 ms bind <unnamed>: INSERT INTO test123 VALUES ($1, $2);
   2025-01-05 21:36:54 
UTC:172.12.123.123(40398):postgres@postgres:[5990]:DETAIL: parameters: $1 = 
'111', $2 = '2025-01-05 21:36:53.957806'
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
execute <unnamed>: INSERT INTO test123 VALUES ($1, $2);
   2025-01-05 21:36:54 
UTC:172.12.123.123(40398):postgres@postgres:[5990]:DETAIL: parameters: $1 = 
'111', $2 = '2025-01-05 21:36:53.957806'
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.041 ms
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
statement: SET TIME ZONE 'UTC'
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.036 ms
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
statement: COMMIT
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.480 ms
   
   ######    conn.commit()  starts here:
   
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
statement: COMMIT; BEGIN TRANSACTION
   2025-01-05 21:36:54 
UTC:172.12.123.123(40398):postgres@postgres:[5990]:WARNING: there is no 
transaction in progress
   2025-01-05 21:36:54 UTC:172.12.123.123(40398):postgres@postgres:[5990]:LOG: 
duration: 0.052 ms
   ```
   </details>
   
   From the log, and also from the C code (below), we see that tz-aware 
timestamps are dealt with by `SET TIME ZONE` queries under the hood. After the 
prepared statement is run, the timezone setting is restored: 
https://github.com/apache/arrow-adbc/blob/76a4d78c9556c84bf67714530397b7061d1e954a/c/driver/postgresql/bind_stream.h#L252-L262
   
   I don't write in C and I am not familiar with this code base, so forgive me, 
but I do see some **issues with this code**:
   
   1. Is it a typo in the second repetition `UNWRAP_STATUS(reset.Execute()); ` 
on line 258? Shouldn't it be `UNWRAP_STATUS(commit.Execute()); ` instead? It 
may seem confusing that the typo should prevent `COMMIT` from execution, while 
the logs show that it _was_ executed. This is because I am using an older 
version, before this refactor in PR #2187 , which introduced the typo. The typo 
should be corrected anyway.
   
   2. Why commit? My connection has `autocommit=False` (default), and I would 
prefer to stay in control of my commits and have them only when I call 
`conn.commit()`. Even if there are good reasons for a forced commit, the next 
problem remains.
   
   3. **Most importantly**, `conn.commit()` does `COMMIT; BEGIN TRANSACTION`, 
so that any following statement is a part of transaction 
([source](https://github.com/apache/arrow-adbc/blob/76a4d78c9556c84bf67714530397b7061d1e954a/c/driver/postgresql/connection.cc#L468-L483)).
 This is deliberate and fine. What is not fine, is the bare `COMMIT` that 
breaks this convention, whenever input contains a tz-aware value. As a 
consequence, rollbacks become impossible, unknowingly to the user.
   
   I think database transactions should be taken very seriously, and I wonder 
if when fixing this bug, one could also audit the code to ensure that there no 
lonely `COMMIT`s elsewhere? An extra `BEGIN TRANSACTION` might be not as 
dangerous, but it still generates a warning in the logs.
   
   ### Stack Trace
   
   _No response_
   
   ### How can we reproduce the bug?
   
   _No response_
   
   ### Environment/Setup
   
   ```Python 3.11.5
   adbc-driver-manager       1.1.0
   adbc-driver-postgresql    1.1.0
   adbc-driver-sqlite        1.1.0
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to