QYZing opened a new issue, #1481:
URL: https://github.com/apache/pulsar-client-go/issues/1481
# Issue: partitionConsumer reconnectToBroker reaches max retry but no
notification/closure to external user
## Environment
- pulsar-go sdk version: 0.16
- Component: partition consumer reconnection logic
## Problem Description
In the `reconnectToBroker` method of `partitionConsumer`, when the internal
reconnection to broker reaches the **max retry count** set by
`maxReconnectToBroker` option, there is **no mechanism to notify the external
consumer user**, and the consumer is **not closed/stopped** either.
This causes critical usability issues:
1. We cannot listen/monitor when the internal reconnection logic exhausts
all retries
2. The consumer remains in a broken state (silent failure)
3. We cannot safely recreate consumer/producer from outside because we have
no event/callback for max reconnect failure
4. Our service frequently hits internal reconnection logic failures but has
no way to react
## Root Cause Analysis
In the retry logic:
1. When `maxRetry == 0` or `bo.IsMaxBackoffReached()`, it only increments a
metric (`ConsumersReconnectMaxRetry.Inc()`)
2. **No error is returned to the caller**
3. **No callback/event is triggered**
4. **The consumer is not closed or transitioned to a failed state**
5. The retry loop just exits silently, leaving the consumer in an invalid
state
The core logic that causes the issue:
```go
if maxRetry == 0 || bo.IsMaxBackoffReached() {
pc.metrics.ConsumersReconnectMaxRetry.Inc()
}
return struct{}{}, err
```
Even after max retry reached, it still returns `err` but the retry loop
swallows it, and **no failure notification is sent to the consumer user**.
## Expected Behavior
When internal broker reconnection reaches max retry limit:
1. **Trigger a failure notification/callback** to the external consumer
(e.g. error channel, connection listener, or state change)
2. **Close/stop the consumer** automatically (or provide an option to do so)
3. Expose consumer state change so external code can detect the failure
4. Allow users to react (recreate consumer/producer, alert, etc.) when max
reconnect fails
## Actual Behavior
- Max reconnect retry reached → only metric incremented
- No notification to external user
- Consumer not closed
- No way to monitor this failure from outside
- Consumer stuck in broken state
## Code Snippet (Problematic Logic)
```go
if maxRetry == 0 || bo.IsMaxBackoffReached() {
pc.metrics.ConsumersReconnectMaxRetry.Inc()
}
return struct{}{}, err
```
This is the key area that lacks failure handling and user notification.
## Impact
- Production services using pulsar-go 0.16 cannot reliably handle broker
connection failures
- Silent consumer failures lead to message consumption stop without alerting
- Cannot implement automatic recovery (recreate consumer/producer) because
no failure event is exposed
- Poor observability into internal reconnection failures
## Suggested Fix
1. Add a **reconnection failure callback/listener** when max retry reached
2. Close the consumer and mark it as failed after max retries
3. Propagate the error to the consumer's error channel
4. Update consumer state to a terminal failed state
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]