mjsax commented on PR #16684:
URL: https://github.com/apache/kafka/pull/16684#issuecomment-2251545602

   I am just catching up more on the impl, and caching mechanism and others 
similar things, seems to be a general issue for passing in the original raw 
source data (also applies to `Punctuators` for which I did see a PR about).
   
   Even if we ignore all DSL features (eg `suppress()`, emit-final 
aggregations, caching/forward coupling), I can have a custom `Processor` with 
state store, I can put a record into the store and don't forward anything. 
Next, when a second input record comes in, I pull out the first record and 
forward it and processing downstream could lead to an error. For an error 
handling, it seems most useful to get the first record handed into it, and the 
source record (which is the second record) might not be too helpful for this 
case?
   
   Thus, making a step back, I am wondering why we not just pass in the current 
key/value (or full `Record`) into the handler? Of course, for doing a DQL which 
is a follow up feature we want to build on top, having something unserialized 
at hand might not be ideal, but at the source-node level we should always be 
able to pass in the unserialized source data. -- Should we change the handler 
to pass in both current input `Record` and source raw key/value (making both 
`sourceRawKey` and `sourceRawValue` type `Optional<byte[]>`)? In the end, 
messing with the store cache seems to be brittle, and not solve the problem for 
all cases? Do we really think it would be the right way forward?
   
   While we want to use this new handler to build a DLQ, it's not the only way 
it can be used and thus we should not blindly optimize for the DLQ case, but 
try to make it useful for other cases as much as we can, too? (And we revisit 
this question what are serialized data we can pass into a DLQ handler on the 
DQL KIP and try to decouple the ProcessingExceptionHandler a little bit more 
from the DLQ KIP?)
   
   IRRC, we did have some discussion about this issue on the mailing list, but 
considered it a DSL issue that we might want to address in a follow up KIP. But 
maybe this assessment was wrong, and it would be better to address it right 
away (at least partially)? In the end, won't it be easier for the handler to 
determine what to do, if we pass in the current input record of the called 
`Processor`, instead of some related (or maybe even unrelated source record)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to