RussellSpitzer commented on PR #11781:
URL: https://github.com/apache/iceberg/pull/11781#issuecomment-2569914295

   From a discussion I had with @sopel39 today;
   
   I think we can go forward this solution but I think it will basically 
re-introduce the memory usage issue that we saw previously. (For some use cases)
   
   From our discussion I believe we have been working at this from the wrong 
direction.
   
   ## Recap - 
   
   The original implementation basically assumed that we could read as made 
iterators as parallelism allowed. This led to basically unbounded memory usage. 
This ended up becoming an issue for systems using a lightweight coordinator now 
requiring them to have a huge memory footprint. 
   
   Next, to solve our issue with the memory footprint we added what is 
essentially a buffered read-ahead. A queue is filled with elements from the 
files we are reading in parallel and we check the queue depth every time we add 
elements to bound its size. The max queue size here is checked at every pull so 
we really can never go more than parallelism items over the max queue size. 
Unfortunately, this leads to the current deadlock issue since we can 
potentially yield an iterator in the middle of a file and be left only with 
iterators for files which cannot yet be opened because all file handles are 
owned by yielded iterators.
   
   The current proposed solution is to switch checking the queue size per 
element and instead check only before opening a new file for the first time. 
This means that any file that is opened is read completely into the read-ahead 
queue. This fixes the dead lock issue as we will never yield in the middle of 
the file but possibly reintroduces the memory issue. We could potentially open 
up to "parallelism" files and load them all into the queue before having a 
chance to lead so we may be back where we started.
   
   ## Where do we go from here -
   
   The current implementation is basically trying to solve the issue of : 
   *How do we read a bunch of unbounded streams in parallel without consuming 
too much memory?*
   
   But this is actually a bit over general for what we are actually trying to 
do. Our actual problem is
   *How do we read from a bunch of files without reading too many records at a 
time?
   
   The key difference here is that we know exactly how long each file is 
*before* we open it. Instead of simply opening
   every file and checking how much we've actually read in parallel, we can 
open only as many files as are required to fill our read ahead buffer and yield 
on the others.
   
   I think what we should do is a bit like this (haven't thought this part 
through too much)
   ```
   AtomicLong usedBuffer
   
   if file not opened
     Synchronized {
        if (usedBuffer + file.length < MAX_BUFFER) {
           usedBuffer += file.length
        } else {
          yield, do not yet open this file
        }
   
   Read File into Buffer
   
   onNext {
     usedBuffer --
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to