mbutrovich commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-4801629397
So for kicks I turned it on in Comet with TPC-DS SF1000. We still see pretty large regressions: <img width="3500" height="600" alt="Image" src="https://github.com/user-attachments/assets/eba54a2b-25f1-41e9-8a25-963ea1df42b1" /> <img width="1000" height="600" alt="Image" src="https://github.com/user-attachments/assets/6d127cfc-45e9-434a-b743-75e8dbb03a7d" /> If I dig into the biggest regression on Q88 with Spark UI: No row-level filtering/filter pushdown/late materialization: <img width="756" height="675" alt="Image" src="https://github.com/user-attachments/assets/0184c0b0-aae8-494f-ac10-d6385962eff1" /> Row-level filtering/filter pushdown/late materialization: <img width="756" height="712" alt="Image" src="https://github.com/user-attachments/assets/b3143fe0-da2d-46ae-830d-1ca4215ff986" /> Even if we try to add an optimization to omit the CometFilter node when everything is pushed into the scan, the extra time in the scan doesn't offset eliding the CometFilter, so it's strictly slower. I'm not sure what next steps would be to help optimize this, but don't have cycles for it in the immediate future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
