WeCodingNow opened a new issue, #755:
URL: https://github.com/apache/arrow-go/issues/755
### Describe the bug, including details regarding any error messages,
version, and platform.
# Problem
**Versions:**
arrow-go: `v18.5.2`
python package `adbc-driver-flightsql`: `1.11.0`
java dependency `flight-sql-jdbc-driver`: `18.3.0`
I have an Arrow Flight SQL server that implements a `do_handshake` method in
the following way:
- do basic auth (via OAuth2 ROPC flow in an external OAuth2 provider),
return some token in the response and attach an `Authorization: Bearer <token>`
header to the response
- also _create a session_, generate a session_id and attach it to the
response as a `session_id` cookie, via a `set-cookie` header.
Context to why I am returning two different identification things: It is not
possible for me to use the session_id as the token or include it in the token,
because the token is generated by an IAM and it does not know anything about my
server's sessions; and it is used only for authenticating users. The session_id
is used to group together resources allocated for actually executing queries of
the user in a scope of one session.
What I expect is that after do_handshake returns a token with headers like
`{ "authorization": "Bearer <token>", "set-cookie": "session_id=<session_id>"
}`, all following requests from the authenticated client to the server would
include both the `authorization` header and the `cookie` header with the
`session_id` cookie.
The problem is that it seems that go's adbc client implementation (used by
the python's flightsql driver) does not respect a set-cookie header returned by
the server in response to a do_handshake call. Requests that follow the
do_handshake authentication from the client to the server **do not include the
cookies** set by response to a do_handshake call, but they do include the
correct `authorization` header.
I am testing this functionality by using a python script that uses
`adbc_driver_flightsql` package (ver. 1.11.0) to establish a connection and
execute some queries on the server. The script is something like this:
<details>
```python
#!/bin/env python3
from adbc_driver_flightsql import DatabaseOptions
import adbc_driver_flightsql.dbapi
import readline
import argparse
import os
import atexit
import pandas as pd
parser = argparse.ArgumentParser(
prog="flight_client", description="Simple flight SQL client"
)
parser.add_argument("uri")
parser.add_argument("-u", "--username")
parser.add_argument("-p", "--password")
parser.add_argument("-t", "--token")
parser.add_argument("-tr", "--token-raw")
args = parser.parse_args()
db_kwargs = {
# enable cookies for session management
DatabaseOptions.WITH_COOKIE_MIDDLEWARE.value: "true", # <----- use the
cookies middleware
}
if args.username is not None and args.password is not None:
db_kwargs["username"] = args.username
db_kwargs["password"] = args.password
if args.token is not None:
db_kwargs["adbc.flight.sql.authorization_header"] = f"Bearer
{args.token}"
if args.token_raw is not None:
db_kwargs["adbc.flight.sql.authorization_header"] = args.token_raw
conn = adbc_driver_flightsql.dbapi.connect(
f"grpc://{args.uri}",
db_kwargs=db_kwargs,
)
# do stuff
```
</details>
The `set-cookie` header is handled correctly if it is returned by other
methods, e.g. `GetFlightInfoSqlInfo`.
The JDBC arrow flight sql driver handles the `set-cookie` header correctly
in all cases, including when it is returned by the call to a `do_handshake`
endpoint.
# Analysis (WARNING: LLM USAGE)
I have briefly analyzed the code of the `arrow-adbc` golang package with an
LLM (GLM 5) and it pointed out a possible race condition when handling
streaming responses, such as one used in `do_handshake`.
Full analysis:
<details>
# Cookie Handling in Arrow Flight Client: Handshake vs GetFlightInfo
## Root Cause
The `set-cookie` header is handled correctly for `GetFlightInfo` but not for
`Handshake` due to a **timing mismatch in the streaming middleware's
`HeadersReceived` callback**:
- **Unary RPCs** (`GetFlightInfo`): Headers are captured synchronously via
`grpc.Header()` option during the call
- **Streaming RPCs** (`Handshake`): `HeadersReceived` is only called in
`finishFn`, which executes **after** the stream completes (when `Recv()`
returns `io.EOF`)
The authentication code reads headers **before** calling `Recv()`, but the
middleware only calls `HeadersReceived` **after** `Recv()` completes - creating
a race where cookies aren't in the jar when needed.
---
## Sequence Diagrams
### GetFlightInfo (Unary RPC) - ✅ Works Correctly
```mermaid
sequenceDiagram
participant App as Application
participant CM as Cookie Middleware
participant Auth as Auth Interceptor
participant gRPC as gRPC Channel
App->>CM: GetFlightInfo(ctx, req)
Note over CM: StartCall(ctx) - Add cookies to outgoing context
CM->>Auth: Invoke with modified ctx
Note over Auth: Add auth token to context
Auth->>gRPC: Invoke(ctx, method, req, reply)
gRPC-->>Auth: Response + Headers
Auth-->>CM: Response + Headers
Note over CM: grpc.Header(&hdrmd) captured synchronously
CM->>CM: HeadersReceived(ctx, md)
CM->>CM: Store set-cookie in jar
CM-->>App: Return response
```
### Handshake (Streaming RPC) - ❌ Race Condition
```mermaid
sequenceDiagram
participant App as Application
participant CM as Cookie Middleware
participant Auth as Auth Interceptor
participant gRPC as gRPC Channel
participant Stream as clientStream Wrapper
App->>CM: Handshake(ctx)
Note over CM: StartCall(ctx) - Add cookies to outgoing context
CM->>Auth: Stream(ctx, desc, method)
Note over Auth: Skips auth for /Handshake method
Auth->>gRPC: Streamer(ctx, desc, method)
gRPC-->>Auth: ClientStream
Auth-->>CM: ClientStream
Note over CM: Wrap stream in clientStream with deferred finishFn
CM-->>App: Return wrapped stream
Note over App: Authentication flow begins
App->>Stream: stream.Header()
Stream->>gRPC: cs.Header()
gRPC-->>Stream: headers (contains set-cookie!)
Stream-->>App: headers
rect rgb(255, 200, 200)
Note over App,CM: PROBLEM: HeadersReceived NOT called yet!
Note over App: App processes headers directly
Note over App: Cookie jar NOT updated
end
App->>Stream: stream.Recv()
Stream->>gRPC: RecvMsg()
gRPC-->>Stream: io.EOF
Stream->>Stream: finishFn(nil)
Note over Stream: NOW HeadersReceived is called
Stream->>CM: HeadersReceived(ctx, md)
Note over CM: Too late! App already processed response
```
---
## Key Code Locations
| Component | File | Line |
|-----------|------|------|
| Unary interceptor header capture |
[`client.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client.go:100)
| 100-107 |
| Streaming interceptor deferred header processing |
[`client.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client.go:154)
| 154-157 |
| `finishFn` called on `Recv()` EOF |
[`client.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client.go:212)
| 212 |
| Auth reads headers before `Recv()` |
[`client_auth.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client_auth.go:310)
| 310+ |
| Auth interceptor bypasses Handshake |
[`client_auth.go`](../../../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/flight/client_auth.go:80)
| 80 |
---
## Summary Table
| Aspect | Unary RPC (GetFlightInfo) | Streaming RPC (Handshake) |
|--------|---------------------------|---------------------------|
| gRPC method | `Invoke()` | `NewStream()` |
| Header capture | Via `grpc.Header()` option | Via `stream.Header()` call |
| When captured | During invoke, before return | In `finishFn`, after stream
ends |
| `HeadersReceived` timing | Synchronous with response | Deferred until
stream completion |
| Cookie jar updated | Before caller sees response | After caller has
processed response |
---
## Potential Solutions
1. **Modify `clientStream.Header()`** to trigger `HeadersReceived`
immediately when headers are first retrieved
2. **Eager header capture** in the streaming interceptor after stream
creation
3. **Application-level workaround** in authentication code to explicitly
process cookies during Handshake
</details>
### Component(s)
Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]