xiaoxuandev opened a new pull request, #13451:
URL: https://github.com/apache/iceberg/pull/13451

   This PR implements limit pushdown optimization for Iceberg on Spark 3.5 and 
4.0, enabling early termination during scan task planning to improve 
performance for `LIMIT` queries. Resolves: #13383 
   
   ### Notes
   
   Since Spark's native limit pushdown has limitations when filters are 
present, this implementation:
   
   1. Leverages Spark's native partial limit pushdown when available  
      _(e.g., `SELECT * FROM table LIMIT n` or queries with partition pruning)_
   
   2. Implements Iceberg-level early termination during task group planning 
once the required number of records is reached.
   
   3. disable limit push down when `preserve-data-grouping` is enabled.
   
   ### Testing
   
   - Unit Tests
   - Performance Benchmarks
   
   #### Benchmark Results 
   (These results are illustrative,  table with large number of data files 
generally lead to longer execution times if limit push down is disabled.)
   #### 1 row per data file
   
   | Query Type | Limit |Push Down Enabled | Push Down Disabled | Improvement |
   
|------------------|--------|-------------------|---------------------|-------------|
   | Limit Query   | 100    | 0.093 sec         | 37.96 sec           | 
**99.75% faster** |
   | Limit Query    | 1000   | 0.484 sec         | 41.04 sec           | 
**98.82% faster** |
   | Limit Query  | 10000  | 7.023 sec         | 38.99 sec           | **81.99% 
faster** |
   
   #### 5000 rows per data file
   
   | Query Type   | Limit | Push Down Enabled | Push Down Disabled  | 
Improvement |
   |--------------|--------|----------------|-------------------|-------------|
   | Limit Query | 100    | 0.0163s         | 0.0488s           | **66.5% 
faster** |
   | Limit Query | 1000   | 0.0170s         | 0.0499s           | **66.0% 
faster** |
   | Limit Query | 10000  | 0.0177s         | 0.0632s           | **71.9% 
faster** |
   
   #### 20000 rows per data file
   
   | Query Type   | Limit | Push Down Enabled | Push Down Disabled  | 
Improvement |
   |--------------|--------|----------------|-------------------|-------------|
   | Limit Query | 100    | 0.0416s         | 0.0529s           | **21.4% 
faster** |
   | Limit Query | 1000   | 0.0421s         | 0.0524s           | **19.7% 
faster** |
   | Limit Query | 10000  | 0.0422s         | 0.0576s           | **26.7% 
faster** |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to