dojiong opened a new pull request, #1915:
URL: https://github.com/apache/iceberg-rust/pull/1915

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## What changes are included in this PR?
   
   A stack overflow occurs when processing data files containing a large number 
of equality deletes (e.g., > 6000 rows).
   This happens because parse_equality_deletes_record_batch_stream previously 
constructed the final predicate by linearly calling .and() in a loop:
   ```rust
   result_predicate = result_predicate.and(row_predicate.not());
   ```
     This resulted in a deeply nested, left-skewed tree structure with a depth 
equal to the number of rows (N). When rewrite_not() (which uses a recursive 
visitor
     pattern) was subsequently called on this structure, or when the structure 
was dropped, the call stack limit was exceeded.
   
   Changes
      1. Balanced Tree Construction: Refactored the predicate combination 
logic. Instead of linear accumulation, row predicates are collected and 
combined using a
         pairwise combination approach to build a balanced tree. This reduces 
the tree depth from O(N) to O(log N).
      2. Early Rewrite: rewrite_not() is now called immediately on each 
individual row predicate before they are combined. This ensures we are 
combining simplified
         predicates and avoids traversing a massive unoptimized tree later.
      3. Regression Test: Added 
test_large_equality_delete_batch_stack_overflow, which processes 20,000 
equality delete rows to verify the fix.
   
   ## Are these changes tested?
      - [x] New regression test test_large_equality_delete_batch_stack_overflow 
passed.
      - [x] All existing tests in arrow::caching_delete_file_loader passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to