Re: [I] Avro reader memory leak [iceberg-python]

via GitHub Mon, 08 Dec 2025 03:08:04 -0800


thomas-pfeiffer commented on issue #2325:
URL: 
https://github.com/apache/iceberg-python/issues/2325#issuecomment-3626362611


   @andormarkus reg. your questions:
   
   > What led you to disable the cache entirely rather than clearing at 
invocation boundaries?
   
   I use the repro script from 
https://github.com/apache/iceberg-python/issues/2325#issuecomment-3221265037 
and when I disabled the cache completely, the memory was fully stable even 
after 2.000 executions. I guess the stability was much more a concern since our 
individual Lambda executions run 2-3 mins, hence repeating failed executions 
caused by out of memory errors are worse for us then losing a few seconds on a 
not-cached manifest. (Not taking the remark from 
https://github.com/apache/iceberg-python/issues/2325#issuecomment-3625428629 
into account.)
   
   > Did you try clearing at execution start/end and find it insufficient?
   
   Not really. Disabling the cache was the simpler solution. And since we 
leverage some other things we felt eradicating this memory leak source 
completely seemed the better choice for us.
   
   > Are your Lambda executions particularly long-running or processing very 
large manifest lists?
   
   Average duration was between 2-3 min in the last days; Depends a bit on the 
incoming data in our use case. Haven't checked regarding the manifest list tbh.
   
   > What memory allocation are you working with?
   
   We allocated `2048MB` to the AWS Lambda. On some days the usage goes up to 
`2022MB`, but it's not very consistent e.g., yesterday it was mostly around 
`~1400MB`. Depends highly on the incoming data for us, but no out of memory 
errors since the workaround so far. 
   
   Small remark / question reg. your approach:
   
   > Init step - Clear cache at the beginning of Lambda execution
   > Post-execution step - Clear cache after completing the operation
   
   Doesn't that mean you clean your cache twice immediately  one another? I 
would have assumed the post-execution cache cleaning to be enough.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Avro reader memory leak [iceberg-python]

Reply via email to