WeCodingNow opened a new issue, #755:
URL: https://github.com/apache/arrow-go/issues/755

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   # Problem
   
   **Versions:**
   arrow-go: `v18.5.2`
   python package `adbc-driver-flightsql`: `1.11.0`
   java dependency `flight-sql-jdbc-driver`: `18.3.0`
   
   I have an Arrow Flight SQL server that implements a `do_handshake` method in 
the following way:
   - do basic auth (via OAuth2 ROPC flow in an external OAuth2 provider), 
return some token in the response and attach an `Authorization: Bearer <token>` 
header to the response
   - also _create a session_, generate a session_id and attach it to the 
response as a `session_id` cookie, via a `set-cookie` header.
   
   Context to why I am returning two different identification things: It is not 
possible for me to use the session_id as the token or include it in the token, 
because the token is generated by an IAM and it does not know anything about my 
server's sessions; and it is used only for authenticating users. The session_id 
is used to group together resources allocated for actually executing queries of 
the user in a scope of one session.
   
   What I expect is that after do_handshake returns a token with headers like 
`{ "authorization": "Bearer <token>", "set-cookie": "session_id=<session_id>" 
}`, all following requests from the authenticated client to the server would 
include both the `authorization` header and the `cookie` header with the 
`session_id` cookie.
   
   The problem is that it seems that go's adbc client implementation (used by 
the python's flightsql driver) does not respect a set-cookie header returned by 
the server in response to a do_handshake call. Requests that follow the 
do_handshake authentication from the client to the server **do not include the 
cookies** set by response to a do_handshake call, but they do include the 
correct `authorization` header.
   
   I am testing this functionality by using a python script that uses 
`adbc_driver_flightsql` package (ver. 1.11.0) to establish a connection and 
execute some queries on the server. The script is something like this:
   <details>
   
   ```python
   #!/bin/env python3
   
   from adbc_driver_flightsql import DatabaseOptions
   import adbc_driver_flightsql.dbapi
   import readline
   import argparse
   import os
   import atexit
   import pandas as pd
   
   parser = argparse.ArgumentParser(
       prog="flight_client", description="Simple flight SQL client"
   )
   
   parser.add_argument("uri")
   parser.add_argument("-u", "--username")
   parser.add_argument("-p", "--password")
   parser.add_argument("-t", "--token")
   parser.add_argument("-tr", "--token-raw")
   
   args = parser.parse_args()
   
   db_kwargs = {
       # enable cookies for session management
       DatabaseOptions.WITH_COOKIE_MIDDLEWARE.value: "true",  # <----- use the 
cookies middleware
   }
   
   if args.username is not None and args.password is not None:
       db_kwargs["username"] = args.username
       db_kwargs["password"] = args.password
   
   if args.token is not None:
       db_kwargs["adbc.flight.sql.authorization_header"] = f"Bearer 
{args.token}"
   
   if args.token_raw is not None:
       db_kwargs["adbc.flight.sql.authorization_header"] = args.token_raw
   
   conn = adbc_driver_flightsql.dbapi.connect(
       f"grpc://{args.uri}",
       db_kwargs=db_kwargs,
   )
   
   # do stuff
   ```
   
   </details>
   
   The `set-cookie`  header is handled correctly if it is returned by other 
methods, e.g. `GetFlightInfoSqlInfo`.
   
   The JDBC arrow flight sql driver handles the `set-cookie` header correctly 
in all cases, including when it is returned by the call to a `do_handshake` 
endpoint.
   
   # Analysis (WARNING: LLM USAGE)
   
   I have briefly analyzed the code of the `arrow-adbc` golang package with an 
LLM (GLM 5) and it pointed out a possible race condition when handling 
streaming responses, such as one used in `do_handshake`.
   
   Full analysis:
   <details>
   
   # Cookie Handling in Arrow Flight Client: Handshake vs GetFlightInfo
   
   ## Root Cause
   
   The `set-cookie` header is handled correctly for `GetFlightInfo` but not for 
`Handshake` due to a **timing mismatch in the streaming middleware's 
`HeadersReceived` callback**:
   
   - **Unary RPCs** (`GetFlightInfo`): Headers are captured synchronously via 
`grpc.Header()` option during the call
   - **Streaming RPCs** (`Handshake`): `HeadersReceived` is only called in 
`finishFn`, which executes **after** the stream completes (when `Recv()` 
returns `io.EOF`)
   
   The authentication code reads headers **before** calling `Recv()`, but the 
middleware only calls `HeadersReceived` **after** `Recv()` completes - creating 
a race where cookies aren't in the jar when needed.
   
   ---
   
   ## Sequence Diagrams
   
   ### GetFlightInfo (Unary RPC) - ✅ Works Correctly
   
   ```mermaid
   sequenceDiagram
       participant App as Application
       participant CM as Cookie Middleware
       participant Auth as Auth Interceptor
       participant gRPC as gRPC Channel
   
       App->>CM: GetFlightInfo(ctx, req)
       
       Note over CM: StartCall(ctx) - Add cookies to outgoing context
       CM->>Auth: Invoke with modified ctx
       Note over Auth: Add auth token to context
       Auth->>gRPC: Invoke(ctx, method, req, reply)
       
       gRPC-->>Auth: Response + Headers
       Auth-->>CM: Response + Headers
       
       Note over CM: grpc.Header(&hdrmd) captured synchronously
       CM->>CM: HeadersReceived(ctx, md)
       CM->>CM: Store set-cookie in jar
       
       CM-->>App: Return response
   ```
   
   ### Handshake (Streaming RPC) - ❌ Race Condition
   
   ```mermaid
   sequenceDiagram
       participant App as Application
       participant CM as Cookie Middleware
       participant Auth as Auth Interceptor
       participant gRPC as gRPC Channel
       participant Stream as clientStream Wrapper
   
       App->>CM: Handshake(ctx)
       Note over CM: StartCall(ctx) - Add cookies to outgoing context
       CM->>Auth: Stream(ctx, desc, method)
       Note over Auth: Skips auth for /Handshake method
       Auth->>gRPC: Streamer(ctx, desc, method)
       
       gRPC-->>Auth: ClientStream
       Auth-->>CM: ClientStream
       Note over CM: Wrap stream in clientStream with deferred finishFn
       CM-->>App: Return wrapped stream
       
       Note over App: Authentication flow begins
       App->>Stream: stream.Header()
       Stream->>gRPC: cs.Header()
       gRPC-->>Stream: headers (contains set-cookie!)
       Stream-->>App: headers
       
       rect rgb(255, 200, 200)
           Note over App,CM: PROBLEM: HeadersReceived NOT called yet!
           Note over App: App processes headers directly
           Note over App: Cookie jar NOT updated
       end
       
       App->>Stream: stream.Recv()
       Stream->>gRPC: RecvMsg()
       gRPC-->>Stream: io.EOF
       Stream->>Stream: finishFn(nil)
       
       Note over Stream: NOW HeadersReceived is called
       Stream->>CM: HeadersReceived(ctx, md)
       Note over CM: Too late! App already processed response
   ```
   
   ---
   
   ## Key Code Locations
   
   | Component | File | Line |
   |-----------|------|------|
   | Unary interceptor header capture | 
[`client.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client.go:100)
 | 100-107 |
   | Streaming interceptor deferred header processing | 
[`client.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client.go:154)
 | 154-157 |
   | `finishFn` called on `Recv()` EOF | 
[`client.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client.go:212)
 | 212 |
   | Auth reads headers before `Recv()` | 
[`client_auth.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client_auth.go:310)
 | 310+ |
   | Auth interceptor bypasses Handshake | 
[`client_auth.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client_auth.go:80)
 | 80 |
   
   ---
   
   ## Summary Table
   
   | Aspect | Unary RPC (GetFlightInfo) | Streaming RPC (Handshake) |
   |--------|---------------------------|---------------------------|
   | gRPC method | `Invoke()` | `NewStream()` |
   | Header capture | Via `grpc.Header()` option | Via `stream.Header()` call |
   | When captured | During invoke, before return | In `finishFn`, after stream 
ends |
   | `HeadersReceived` timing | Synchronous with response | Deferred until 
stream completion |
   | Cookie jar updated | Before caller sees response | After caller has 
processed response |
   
   ---
   
   ## Potential Solutions
   
   1. **Modify `clientStream.Header()`** to trigger `HeadersReceived` 
immediately when headers are first retrieved
   2. **Eager header capture** in the streaming interceptor after stream 
creation
   3. **Application-level workaround** in authentication code to explicitly 
process cookies during Handshake
   
   </details>
   
   ### Component(s)
   
   Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to