mjsax commented on PR #16684: URL: https://github.com/apache/kafka/pull/16684#issuecomment-2251545602
I am just catching up more on the impl, and caching mechanism and others similar things, seems to be a general issue for passing in the original raw source data (also applies to `Punctuators` for which I did see a PR about). Even if we ignore all DSL features (eg `suppress()`, emit-final aggregations, caching/forward coupling), I can have a custom `Processor` with state store, I can put a record into the store and don't forward anything. Next, when a second input record comes in, I pull out the first record and forward it and processing downstream could lead to an error. For an error handling, it seems most useful to get the first record handed into it, and the source record (which is the second record) might not be too helpful for this case? Thus, making a step back, I am wondering why we not just pass in the current key/value (or full `Record`) into the handler? Of course, for doing a DQL which is a follow up feature we want to build on top, having something unserialized at hand might not be ideal, but at the source-node level we should always be able to pass in the unserialized source data. -- Should we change the handler to pass in both current input `Record` and source raw key/value (making both `sourceRawKey` and `sourceRawValue` type `Optional<byte[]>`)? In the end, messing with the store cache seems to be brittle, and not solve the problem for all cases? Do we really think it would be the right way forward? While we want to use this new handler to build a DLQ, it's not the only way it can be used and thus we should not blindly optimize for the DLQ case, but try to make it useful for other cases as much as we can, too? (And we revisit this question what are serialized data we can pass into a DLQ handler on the DQL KIP and try to decouple the ProcessingExceptionHandler a little bit more from the DLQ KIP?) IRRC, we did have some discussion about this issue on the mailing list, but considered it a DSL issue that we might want to address in a follow up KIP. But maybe this assessment was wrong, and it would be better to address it right away (at least partially)? In the end, won't it be easier for the handler to determine what to do, if we pass in the current input record of the called `Processor`, instead of some related (or maybe even unrelated source record)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
