asolimando commented on PR #21828: URL: https://github.com/apache/datafusion/pull/21828#issuecomment-4341946315
I understand where the need comes from, but there is a good reason why databases treat scans without order by as unordered, it's because a lot of logical/physical planning optimizations depend on this assumption, and they can only rely on metadata to tell if the plan changes they want to do are safe or not. If the underlying data is truly sorted over something that can be encoded similarly to what you can write with an ORDER BY (or at least producing the same metadata DataFusion uses), that's could be fine, but if the order is just the order rows happen to have in the files, and we can't encode this promise nowhere, then it gets complex. At that point, the planner should have a mode to disable all possible optimizations allowing a different results set order without an order by, which is definitely a non-trivial scrutiny, that every future contribution to the planner will have to go through, and since it's a major deviation from the SQL standard, everything must be re-checked for safety, a major task. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
