Dandandan commented on issue #21598: URL: https://github.com/apache/datafusion/issues/21598#issuecomment-4295379544
Cool @alamb . I've a PoC / bench to show the impact of going from file -> row group morsel using the similar approach earlier of using 2 queues (files / morsels). https://github.com/apache/datafusion/pull/21766 * On partitioned clickbench it looks slightly better overall, but current approach seems to create small bit of overhead (not sure yet if it is due to doing things multiple times or...) on some queries * On clickbench_1 (single file) it yields big gains! * On TPC-DS it is the same still as everything is a single row group. Using `try_next_reader` looks great, I think it might simplify the implementation a bit I didn't try to delay morsel splitting to the query tail, just split always into row groups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
