Dandandan commented on issue #21598:
URL: https://github.com/apache/datafusion/issues/21598#issuecomment-4295379544

   Cool @alamb . 
   
   I've a PoC / bench to show the impact of going from file -> row group morsel 
using the similar approach earlier of using 2 queues (files / morsels).
   
   https://github.com/apache/datafusion/pull/21766
   
   * On partitioned clickbench it looks slightly better overall, but current 
approach seems to create small bit of overhead (not sure yet if it is due to 
doing things multiple times or...) on some queries
   * On clickbench_1 (single file) it yields big gains!
   * On TPC-DS it is the same still as everything is a single row group.
   
   Using `try_next_reader` looks great, I think it might simplify the 
implementation a bit 
   
   I didn't try to delay morsel splitting to the query tail, just split always 
into row groups.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to