[
https://issues.apache.org/jira/browse/KAFKA-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949363#comment-17949363
]
Matthias J. Sax commented on KAFKA-17019:
-----------------------------------------
{quote}I take a look into producer code to contribute this issue.
{quote}
Nice! Thank you.
{quote}the {{TimeoutException}} *is* the root cause
{quote}
A timeout only happens when we try to do something, but cannot complete what we
want to do within a timeout. So there must be a root cause why we could not
complete what we tried to do? – I guess the only real timeout as a root cause
is, if we send a request and don't get any response at all, ie, an actual
request timeout.
{quote}But in that case, hasn’t the {{ProducerBatch}} already raised the real
exception, so nothing is actually “missing”?
{quote}
This depends. If the error is not retriable, yes. The error would be directly
re-thrown into the application. However, if the error is retriable, the
producer would, well, retry internally (and only log the error), and eventually
might give up if some high level timeout expires. For example, I believe
`ProducerBatch` could return "not enough replicas" exception, which we would
retry internally, until eventually `max.block.ms` expires.
{quote}Or do you think the exception raised inside {{ProducerBatch}} should
instead be set as the root cause of the subsequent {{{}TimeoutException{}}}?
{quote}
Yes, that is the idea.
I don't know all scenarios from top of my head, and I guess we need to take it
on a case-by-case basis. But most `TimeoutException` should have some actual
root cause I believe.
> Producer TimeoutException should include root cause
> ---------------------------------------------------
>
> Key: KAFKA-17019
> URL: https://issues.apache.org/jira/browse/KAFKA-17019
> Project: Kafka
> Issue Type: Improvement
> Components: clients, producer
> Reporter: Matthias J. Sax
> Priority: Major
>
> With KAFKA-16965 we added a "root cause" to some `TimeoutException` thrown by
> the producer. However, it's only a partial solution to address a specific
> issue.
> We should consider to add the "root cause" for _all_ `TimeoutException` cases
> and unify/cleanup the code to get an holistic solution to the problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)