sgup432 opened a new pull request, #15793:
URL: https://github.com/apache/lucene/pull/15793

   ### Description
   
   <!--
   If this is your first contribution to Lucene, please make sure you have 
reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   
   Related issue for more details - 
https://github.com/apache/lucene/issues/15770
   
    - This PR adds MultiFieldDocValuesRangeQuery, which coordinates 
DocValuesSkipper evaluation across fields. BooleanQuery.rewrite() detects the 
pattern (2+ required NumericDocValuesRangeQuery clauses on distinct fields) and 
replaces them with a single coordinated query.
   
    - MultiFieldDocValuesRangeQuery contains Concatenated iterator where the 
main logic lies. It work together with all the desired fields docValueSkipper 
and move them together.
   
   - Also contains a jmh benchmark to validate this.
   
   - Tested across different data patterns, document counts, and number of 
concurrent range fields.
   
   
   
   ## JMH Benchmark Results
   
   
   | Pattern   | Docs | Fields | Without Optimization | With (Run 1) | With 
(Run 2) | Speedup |
   
|-----------|------|--------|---------------------|--------------|--------------|---------|
   | clustered | 1M   | 3      | 16,417              | 60,758       | 61,342    
   | **3.7x** |
   | clustered | 1M   | 5      | 11,523              | 55,922       | 57,487    
   | **5.0x** |
   | clustered | 10M  | 3      | 16,148              | 54,827       | 55,677    
   | **3.4x** |
   | clustered | 10M  | 5      | 13,128              | 40,920       | 42,154    
   | **3.2x** |
   | mixed     | 1M   | 3      | 859                 | 836          | 1,001     
   | **1.17x** |
   | mixed     | 1M   | 5      | 514                 | 706          | 873       
   | **1.70x** |
   | mixed     | 10M  | 3      | 76                  | 79           | 79        
   | **1.03x** |
   | mixed     | 10M  | 5      | 50                  | 65           | 69        
   | **1.38x** |
   | random    | 1M   | 3      | 62                  | 65           | 68        
   | **1.10x** |
   | random    | 1M   | 5      | 45                  | 65           | 64        
   | **1.42x** |
   | random    | 10M  | 3      | 4.3                 | 6.4          | 6.5       
   | **1.51x** |
   | random    | 10M  | 5      | 3.5                 | 5.7          | 5.8       
   | **1.65x** |
   | sorted    | 1M   | 3      | 920                 | 881          | 841       
   | 0.91x |
   | sorted    | 1M   | 5      | 611                 | 711          | 882       
   | **1.44x** |
   | sorted    | 10M  | 3      | 69                  | 75           | 78        
   | **1.14x** |
   | sorted    | 10M  | 5      | 55                  | 67           | 68        
   | **1.22x** |
   
   
   **Data Pattern:** 
    - **clustered**: All field values increase with docID (e.g., time-series 
data where timestamp, sequence number, and sensor readings grow together). 
Narrow query ranges eliminate most blocks. Best case for coordination 
(3.2–5.0x).
   
   - **mixed**: Combination of monotonic (timestamp), low-cardinality (20 
values, like order status), and random fields (price). Resembles e-commerce 
order filtering. Moderate gains (1.2–1.7x).
   
   - **sorted**: Index sorted by one field (timestamp), other fields random. 
Resembles time-series indexed by ingestion time but queried on unsorted metric 
fields. Similar to mixed (1.1–1.4x).
   
   - **random**: All fields uniformly random with wide query ranges. Worst 
case, but still gains (1.1–1.7x) — when one field eliminates a block, it saves 
checking all others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to