mjsax commented on PR #16684:
URL: https://github.com/apache/kafka/pull/16684#issuecomment-2253080256

   Ah sorry. Mixed up the handler and the context... My bad.
   
   > > do we have to consider that the the first point of this comment  
https://github.com/apache/kafka/pull/16684#issuecomment-2250417070 is enough ?
   >
   > Checking in ProcessorNode#process if rawRecord != null before accessing 
the raw key and the raw value. It is the safest approach for me and will avoid 
crashing to NPE. The drawback is the processing exception handler can end up 
with no sourceRawKey nor sourceRawValue while the values are actually available 
in upstream
   
   Yes, that is what I meant here: 
   
   > Thus, making a step back, I am wondering why we not just pass in the 
current key/value (or full Record) into the handler? Of course, for doing a DQL 
which is a follow up feature we want to build on top, having something 
unserialized at hand might not be ideal, but at the source-node level we should 
always be able to pass in the unserialized source data. -- Should we change the 
handler to pass in both current input Record and source raw key/value (making 
both sourceRawKey and sourceRawValue type Optional<byte[]>)? In the end, 
messing with the store cache seems to be brittle, and not solve the problem for 
all cases? Do we really think it would be the right way forward? (Also, I think 
that caches are only an issue in the DSL, for which we couple cache flushing 
and forwarding -- for the PAPI, caching and forwarding is independent, and 
there won't be a reason to add any raw record, but it would just be a waste of 
memory.)
   
   I think there is too many corner cases right now, and we might just want to 
go the path of least resistance. So just doing the `null` check seems 
reasonable to me for now. But as said, I would change to `Optional` type to 
highlight was both `sourceRawKey` and `sourceRawValue` may be `null`.
   
   > The drawback is the processing exception handler can end up with no 
sourceRawKey nor sourceRawValue while the values are actually available in 
upstream
   
   Well, if we find good way later, to set both correctly, it would just be an 
improvement we can do anytime. The API contract just says, may or may not be 
there (if we use `Optional`) but we don't define when it will be there and when 
not, so we can change it IMHO w/o the need of a KIP or anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to