[
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817412#comment-17817412
]
Omnia Ibrahim commented on KAFKA-16212:
---------------------------------------
The moment the cache in `ReplicaManager.allPartitions` is represented as
`Pool[TopicPartition, HostedPartition]` which is a wrapper around
`ConcurrentHashMap[opicPartition, HostedPartition]` update this to use
`TopicIdPartition` as a key turn out to be tricky as
# not all APIs that interact with `ReplicaManager` in order to fetch/update
partition cache are aware of topicId like consumer coordinator, handling some
requests in KafkaApi where the request schema doesn't have topicId, etc .
# TopicId is represented as Optional in many places which means we might endup
populate it with null or dummy uuid multiple times to construct
TopicIdPartition.
I have 3 proposals at the moment:
* *Proposal #1 :* Update TopicIdPartitions to have constructor with topicId as
optional. And change `ReplicaManager.allPartitions` to be
`LinkedBlockingQueue[TopicIdPartition, HostedPartition]`. _*This might be the
simplest one as far as I can see.*_
** any API that is not topic id aware will just get the last entry that match
topicIdPartition.topicPartition.
** The code will need to make sure that we don't have duplicates by
`TopicIdPartition` in the `LinkedBlockingQueue`.
** We will need to revert having topic Id as optional in TopicIdPartitions
once everywhere in Kafka is topic-id aware.
* *Proposal #2 :* change `ReplicaManager.allPartitions` to `new
Pool[TopicPartition, LinkedBlockingQueue[(Option[Uuid], HostedPartition)]]`
where `Option[Uuid]` represent topic id. This make the cache scheme bit
complex. The proposal will
** consider the last entry in `LinkedBlockingQueue` is the current value.
** The code will make sure that `LinkedBlockingQueue` has only entry for the
same topic id
** Topic Id aware APIs that need to fetch/update the partition will be updated
to use `TopicPartition` and topic Id
** Topic Id non-aware APIs will remain using topic partitions and the
replicaManager will assume that these APIs referring to the last entry in
`LinkedBlockingQueue`
* *Proposal#3:* The other option is to keep two separate caches one
`Pool[TopicIdPartition, HostedPartition]` for partitions and another one
`Pool[TopicPartition, Uuid]` for the last assigned topic id for each partition
in order to form `TopicIdPartition`. This is the least favourite as having 2
caches will risk that one of them can go out of data at any time.
[~jolshan] Do you have any strong preferences? I am leaning toward 1st as it is
less messy than the others. WDYT?
> Cache partitions by TopicIdPartition instead of TopicPartition
> --------------------------------------------------------------
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
> Issue Type: Improvement
> Affects Versions: 3.7.0
> Reporter: Gaurav Narula
> Assignee: Omnia Ibrahim
> Priority: Major
>
> From the discussion in [PR
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of
> {{TopicPartition}} to avoid ambiguity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)