karsov opened a new issue, #371: URL: https://github.com/apache/arrow-go/issues/371
### Describe the bug, including details regarding any error messages, version, and platform. Hi team, I've been playing recently with the [Go SQL Driver For DuckDB](https://github.com/marcboeker/go-duckdb) which uses the arrow code here for some functionalities. Specifically the [RegisterView method here](https://github.com/marcboeker/go-duckdb/blob/main/arrow.go#L251) is making use of the `arrow.cdata.ExportRecordReader()` function. The issue is that in my testing I found that this method is causing a pretty big memory leak that cannot be prevented without changes in `arrow-go`. I was able to fix it by forking and doing changes both in this repo and `go-duckdb`, however as I'm new to `arrow-go` could you please help me understand if I'm missing something and my fixes (described below) are actually unnecessary or they are OK and I should contribute them! Here is the sequence of events I observed when the memory leak is occurring: - In a function... - I create a new `array.RecordReader` (it is created with `refCount = 1`) - I pass that record reader to the `go-duckdb` `RegisterView()` method and therefore to `cdata.ExportRecordReader()` - `cdata.ExportRecordReader()` internally does a `Retain()` on the `array.RecordReader` (thus making `refCount = 2`) - some work is done - The `release()` function returned by the DuckDB method is called. It only frees the memory for `cdata.CArrowArrayStream` (I don't observe the `refCount` going down, so `Release()` is not called to compensate for the internal `Retain()`) - I call `Release()` on the `array.RecordReader` to release its memory, but because the `refCount` was internally increased to 2, now it is 1 and its memory is not released - then the function is called again and again and on every call a new `array.RecordReader` is created whose memory is not released ... and so on and so on until I have millions of record readers in memory. My initial solution was just to call `Release()` on the `array.RecordReader` one more time and that fixed the memory issue right away. However, this doesn't seem like a proper solution given that not my code did the extra `Retain()`. Therefore I attempted to do the change in `go-duckdb` but initially wasn't able to figure out how. The only example of using the `cdata.ExportRecordReader()` where the memory is explicitly released is this test [TestExportRecordReaderStreamLifetime here](https://github.com/apache/arrow-go/blob/main/arrow/cdata/cdata_test.go#L830). It is using the non-exported `C.ArrowArrayStreamRelease()` function which I guess is somehow calling the [streamRelease function](https://github.com/apache/arrow-go/blob/main/arrow/cdata/exports.go#L175) . Therefore I went on to see what would happen if I added in to `cdata` a new `ReleaseCArrowArrayStream()` function only calling `C.ArrowArrayStreamRelease()` (similarly to the [ReleaseCArrowArray here](https://github.com/apache/arrow-go/blob/main/arrow/cdata/interface.go#L282)) and then calling that in the `release()` function for `RegisterView()` in `go-duckdb`. This worked as I expected (no visible memory leak) however I'm still not sure if I and/or `go-duckdb` is using the `cdata.ExportRecordReader()` function incorrectly and all those code changes are unnecessary. Please advise! Thanks, Kalin ### Component(s) Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org