GPX99 opened a new issue, #13383: URL: https://github.com/apache/iceberg/issues/13383
### Feature Request / Improvement Request to add `limit` pushdown to improve the performance of reading a big table by skipping full batch scan, where the batch scan is implemented [here](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.1/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java#L755-L762) **How is this observed?** When `select * from table_name limit 1`, the spark will actually scan all the data from the table; the bigger the table, the longer it takes. For example, ``` (1) BatchScan glue_catalog.lakehouse_bronze.table_name Output [51]: [ISTEST#69, LEADUUID#70, UPDATEDAT#71, ...etc] glue_catalog.lakehouse_bronze.table_name (branch=null) [filters=, groupedBy=] <-- don't have limit pushdown ``` Hence, the input size is big <img width="687" alt="Image" src="https://github.com/user-attachments/assets/864d9349-6280-439f-8689-4a66541a6e4c" /> ### Query engine Spark ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [x] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org