[
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852127#comment-16852127
]
Guozhang Wang commented on KAFKA-8367:
--------------------------------------
[~pavelsavov] Could you share the code of transformer / processor that access
the `(stores: [performance_windowed_store])` as well?
> Non-heap memory leak in Kafka Streams
> -------------------------------------
>
> Key: KAFKA-8367
> URL: https://issues.apache.org/jira/browse/KAFKA-8367
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.2.0
> Reporter: Pavel Savov
> Priority: Major
> Attachments: memory-prod.png, memory-test.png
>
>
> We have been observing a non-heap memory leak after upgrading to Kafka
> Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the
> leak only happens when we enable stateful stream operations (utilizing
> stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0
> and ported the fix scheduled for release in 2.2.1 to our fork. It did not
> stop the leak, however.
> We are having this memory leak in our production environment where the
> consumer group is auto-scaled in and out in response to changes in traffic
> volume, and in our test environment where we have two consumers, no
> autoscaling and relatively constant traffic.
> Below is some information I'm hoping will be of help:
> * RocksDB Config:
> Block cache size: 4 MiB
> Write buffer size: 2 MiB
> Block size: 16 KiB
> Cache index and filter blocks: true
> Manifest preallocation size: 64 KiB
> Max write buffer number: 3
> Max open files: 6144
>
> * Memory usage in production
> The attached graph (memory-prod.png) shows memory consumption for each
> instance as a separate line. The horizontal red line at 6 GiB is the memory
> limit.
> As illustrated on the attached graph from production, memory consumption in
> running instances goes up around autoscaling events (scaling the consumer
> group either in or out) and associated rebalancing. It stabilizes until the
> next autoscaling event but it never goes back down.
> An example of scaling out can be seen from around 21:00 hrs where three new
> instances are started in response to a traffic spike.
> Just after midnight traffic drops and some instances are shut down. Memory
> consumption in the remaining running instances goes up.
> Memory consumption climbs again from around 6:00AM due to increased traffic
> and new instances are being started until around 10:30AM. Memory consumption
> never drops until the cluster is restarted around 12:30.
>
> * Memory usage in test
> As illustrated by the attached graph (memory-test.png) we have a fixed number
> of two instances in our test environment and no autoscaling. Memory
> consumption rises linearly until it reaches the limit (around 2:00 AM on
> 5/13) and Mesos restarts the offending instances, or we restart the cluster
> manually.
>
> * No heap leaks observed
> * Window retention: 2 or 11 minutes (depending on operation type)
> * Issue not present in Kafka Streams 2.0.1
> * No memory leak for stateless stream operations (when no RocksDB stores are
> used)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)