Fokko commented on PR #6775:
URL: https://github.com/apache/iceberg/pull/6775#issuecomment-1460088078

   Did another pass:
   
   - Added `_OrderedChunkedArrayConsumer` so we don't have to reallocate and 
sort into a single array. We can even further optimize this by using a tree 
structure of iterators. But the [ChunkedArray is currently not iterable]( 
https://github.com/apache/arrow/issues/34495), so we would need to wrap this. 
Also, there is no `.peek` functionality in Python, so we would have to add 
[another wrapper for 
that](https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.peekable).
 I expect the number of deleted files that affect a data file to be fairly 
small, so I think we're good here.
   - Fixed a rather nasty bug where we first filtered a table, and then applied 
the positional deletes. This would mess up the positions. Instead, when there 
are positional deletes, I first read all the data, and then filter on the 
positions. Of course, this breaks the predicate pushdown and we'll read 
everything into Arrow buffers.
   
   I think it would be great to merge 
https://github.com/apache/iceberg/pull/6398 first so we can add some 
integration tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to