RussellSpitzer commented on PR #13976:
URL: https://github.com/apache/iceberg/pull/13976#issuecomment-3271447046
> @kevinjqliu @RussellSpitzer while working on this I discovered, that we
still might have a memory leak in the Arrow reader. The leak is with last
update sequence vectorized reader, and rowId reader. The problem is that these
don't implement the reuse logic which other readers do. When `ArrowBatchReader`
populates the `vectorsHolders`
>
> ```
> for (int i = 0; i < readers.length; i += 1) {
> vectorHolders[i] = readers[i].read(vectorHolders[i], numRowsToRead);
> int numRowsInVector = vectorHolders[i].numValues();
> ```
>
> these two readers always allocate a new vector, and we lose reference to
the old value vector (hence nobody closes them later on). It is not yet clear
for me why tests are not failing, if I recall @RussellSpitzer you recently
added tests to avoid similar memory leaks.
>
> If you think that this is indeed a potential memory leak, should we
address it in a separate item, or in this one?
It probably isn't hitting because our tests don't populate these non-table
columns (I think?) . These are populated based on id not on table schema
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]