[
https://issues.apache.org/jira/browse/KAFKA-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17909000#comment-17909000
]
Matthias J. Sax commented on KAFKA-17825:
-----------------------------------------
I dug into this, and I am actually not sure if it's indeed a bug – my
understanding is, that the KIP intentionally changes the contract, what means
it's not a backward compatible change, which the KIP does not clearly call out
though IMHO.
The sentence
{quote}If someone wants the deserializer to be compatible with older versions
of the kafka-clients library they should always implement the byte array-based
deserialize methods.
{quote}
{color:#172b4d}is rather cryptic.{color}
{color:#172b4d}Overall, the contract of `ByteBufferDeserializer` is _not_
clearly specified from my POV.{color}
{color:#172b4d}Prior to the KIP, we did make a deep copy, and thus using
`array()` was "safe" in the sense that one does not need to care about
arrayOffset,position,limit... Eg, a ByteBuffer could be converted into a String
via `new String(byteBuffer.array())`.{color}{color:#172b4d} With this KIP,
using `array()` is not "safe" any longer though and one would need to use
`{color}{color:#172b4d}new String(buffer.array(), buffer.arrayOffset() +
buffer.position(), buffer.remaining())` instead.{color}
{color:#172b4d}However, it's not really clear what the contract is (not in 3.5,
nor in 3.6). I assume that the statement in the KIP was supposed to mean, if
you want a deep-copy, use the old `deserialize(String topic, byte[] data)` but
that is not really practical, because if I use the `KafkaConsumer` I don't
control which method the consumer calls, and it will in fact call the new
`deserialize(String topic, Headers headers, ByteBuffer data)` so as a user I
cannot force a deep copy. (User code does usually not call `deserialize()`
themselves...){color}
{color:#172b4d}In the end, `ByterBufferDeserializer` is not an interface and
thus "should always implement the byte array-based deserialize methods" does no
apply – the user does not implement anything when using this class... They just
plug it into the KafkaConsumer.{color}
{color:#172b4d}In the end, the goal of the KIP was to avoid a deep-copy, but it
did introduce a breaking change... This ship has sailed so the only thing we
can do is to actually update the KIP, docs, and JavaDocs IMHO. Of course, we
could also fall back to the old behavior, but this seems to defeat the purpose,
and people can still force a deep-copy of the data by using
`ByteArrayDeserializer` instead of `ByteBufferDeserializer`. I agree to the
idea that `ByteBufferDeserializer` is more advanced and thus should avoid a
deep-copy, and users need to understand how to use it correctly.{color}
{color:#172b4d}Thoughts? \cc [~LSK] [~pnee] (also [~guozhang] [~ChrisEgerton]
[~showuon] who reviews and vote the KIP).{color}
> ByteBufferDeserializaer's array size can be inconsistent with the older
> version
> -------------------------------------------------------------------------------
>
> Key: KAFKA-17825
> URL: https://issues.apache.org/jira/browse/KAFKA-17825
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 3.6.0
> Reporter: Philip Nee
> Assignee: LinShunkang
> Priority: Blocker
> Fix For: 4.0.0, 3.9.1, 3.8.2
>
>
> We've noticed that using the ByteBufferDeserializer can yield a different
> byte array length compare to the deserializer from 3.5.2. This is attributed
> by KIP-863, in particular, the old deserializer truncated the byte array
> starting from
> `buffer.position() + buffer.arrayOffset() + offset` using `Utils.toArray`
>
> Whereas the current implementation is a passthrough.
>
> This can be reproduced using the
> {code:java}
> KafkaConsumerProducerDemo.java{code}
> by changing the type to <Integer, ByteBuffer> and perform a print after poll.
> For example, the producer produces a record [0, test0] (key is an int,
> "test0" is a 5 bytes long string, converted to byte buffer using
> {code:java}
> ByteBuffer.wrap(value.getBytes()){code}
> Prior to KIP-863 we see the following after polling the record from consumer:
> 3.5.2: test0
> 3.6.0:
> {code:java}
> ?$���y�NNޅ�-p�=�����NAc�����8D���8D���������������
> test0{code}
>
> And if you analyze the ByteBuffer post 3.6.0, we can see the current offset
> is at 140 with array length of 149.
>
> [~LSK] - since you wrote the kip and did the implementation, can you address
> this ?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)