[jira] [Commented] (KAFKA-17424) Memory optimisation for Kafka-connect

Greg Harris (Jira) Mon, 26 Aug 2024 09:00:22 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876766#comment-17876766
 ]


Greg Harris commented on KAFKA-17424:
-------------------------------------

Hi [~ajit97] Thanks for the ticket!

Can you provide some more supporting documentation? Perhaps some profiles with 
evidence that this array-copy is the source of the problem?

As far as I can tell by reading the code, this should prevent an extra 
8*max.poll.records byte reservation on each batch. For example, for this 
reservation to require an additional 1GB of heap, the max.poll.records would 
have to be >134217728. At that scale, the size of the SinkRecord becomes 
significant, and I would expect would drown out any memory used for the 
ArrayList itself.

> Memory optimisation for Kafka-connect
> -------------------------------------
>
>                 Key: KAFKA-17424
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17424
>             Project: Kafka
>          Issue Type: Improvement
>          Components: connect
>    Affects Versions: 3.8.0
>            Reporter: Ajit Singh
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When Kafka connect gives sink task it's own copy of List<SinkRecords> that 
> RAM utilisation shoots up and at that particular moment the there will be two 
> lists and the original list gets cleared after the sink worker finishes the 
> current batch.
>  
> Originally the list is declared final and it's copy is provided to sink task 
> as those can be custom and we let user process it however they want without 
> any risk. But one of the most popular uses of kafka connect is OLTP - OLAP 
> replication, and during initial copying/snapshots a lot of data is generated 
> rapidly which fills the list to it's max batch size length, and we are prone 
> to "Out of Memory" exceptions. And the only use of the list is to get filled 
> > cloned for sink > get size  > cleared > repeat. So I have taken the size of 
> list before giving the original list to sink task and after sink has 
> performed it's operations , set list = new ArrayList<>(). I did not use clear 
> for just in case sink task has set our list to null.
> There is a time vs memory trade-off, 
> In the original approach the jvm does not have spend time to find free memory 
> In new approach the jvm will have to create new list by finding free memory 
> addresses but this results in more free memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-17424) Memory optimisation for Kafka-connect

Reply via email to