SubhamSinghal opened a new pull request, #21851:
URL: https://github.com/apache/datafusion/pull/21851

   ## Which issue does this PR close?     
                                                                 
     - Closes [#16973](https://github.com/apache/datafusion/issues/16973) 
(partial — adds dynamic filter support for NestedLoopJoinExec)
                                                              
     ## Rationale for this change                                           
                                                              
     NestedLoopJoinExec handles non-equi joins (range, temporal, inequality) 
but currently reads ALL probe-side data even
     when the build side has a narrow range. For example:                       
                                                                            
     ```sql                                                                     
                                            
     SELECT * FROM events e JOIN windows w ON e.ts BETWEEN w.start AND w.end
    ```
                                                                                
                                            
     If the build side (windows) has start values in [100, 300] and end values 
in [150, 400], the probe scan reads all
     events even though only events with ts in [100, 400] can possibly match. 
With dynamic filters, the probe scan can skip 
     row groups outside this range.                                             
      
                                                                                
                                            
     HashJoinExec already supports dynamic filter pushdown for equi-joins. This 
PR extends the same mechanism to NLJ for
     non-equi joins by analyzing the JoinFilter expression to derive bounds 
from build-side data.                           
                                                                                
                                            
     ## What changes are included in this PR?                                   
                                                                                
                                            
     New: nlj_filter_analysis.rs — Expression analysis module that walks the 
JoinFilter expression tree to extract          
     (probe_col, operator, build_col) pairs and derive probe-side bounds:       
                                            
     - extract_bound_pairs() — finds BinaryExpr comparisons between build and 
probe columns                                 
     - compute_build_bounds() — computes min/max from the merged build batch    
           
     - build_probe_predicate() — converts bounds into probe-side filter (e.g., 
ts >= 100 AND ts <= 400)                     
                                                                                
                       
     Modified: nested_loop_join.rs:                                             
                                            
     - NLJDynamicFilter struct holding DynamicFilterPhysicalExpr and extracted 
bound pairs                                  
     - gather_filters_for_pushdown() — creates dynamic filter in Post phase, 
routes parent filters
     - handle_child_pushdown_result() — captures the filter Arc reference       
                                            
     - handle_buffering_left() — after build data is ready, computes bounds and 
pushes filter before probe starts
   
    ## Are these changes tested? 
   
   Yes, with UT
   
    ## Are there any user-facing changes?                                       
                                              
                                                                                
      
     No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to