[
https://issues.apache.org/jira/browse/KAFKA-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sophie Blee-Goldman updated KAFKA-9455:
---------------------------------------
Summary: Consider using TreeMap for in-memory stores of Streams (was:
Consider using TreeMap for In-memory stores of Streams)
> Consider using TreeMap for in-memory stores of Streams
> ------------------------------------------------------
>
> Key: KAFKA-9455
> URL: https://issues.apache.org/jira/browse/KAFKA-9455
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Guozhang Wang
> Priority: Major
> Labels: newbie++
>
> From [~ableegoldman]: It's worth noting that it might be a good idea to
> switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap
> allows us to safely perform range queries without copying over the entire
> keyset, but the performance on point queries seems to scale noticeably worse
> with the number of unique keys. Point queries are used by aggregations while
> range queries are used by windowed joins, but of course both are available
> within the PAPI and for interactive queries so it's hard to say which we
> should prefer. Maybe rather than make that tradeoff we should have one
> version for efficient range queries (a "JoinWindowStore") and one for
> efficient point queries ("AggWindowStore") - or something. I know we've had
> similar thoughts for a different RocksDB store layout for Joins (although I
> can't find that ticket anywhere..), it seems like the in-memory stores could
> benefit from a special "Join" version as well cc/ Guozhang Wang
> Here are some random thoughts:
> 1. For kafka streams processing logic (i.e. without IQ), it's better to make
> all processing logic relying on point queries rather than range queries.
> Right now the only processor that use range queries are, as mentioned above,
> windowed stream-stream joins. I think we should consider using a different
> window implementation for this (and as a result also get rid of the
> retainDuplicate flags) to refactor the windowed stream-stream join operation.
> 2. With 1), range queries would only be exposed as IQ. Depending on its usage
> frequency I think it makes lots of sense to optimize for single-point queries.
> Of course, even without step 1) we should still consider using tree-map for
> windowed in-memory stores to have a better scaling effect.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)