asolimando commented on PR #21828:
URL: https://github.com/apache/datafusion/pull/21828#issuecomment-4341946315

   I understand where the need comes from, but there is a good reason why 
databases treat scans without order by as unordered, it's because a lot of 
logical/physical planning optimizations depend on this assumption, and they can 
only rely on metadata to tell if the plan changes they want to do are safe or 
not.
   
   If the underlying data is truly sorted over something that can be encoded 
similarly to what you can write with an ORDER BY (or at least producing the 
same metadata DataFusion uses), that's could be fine, but if the order is just 
the order rows happen to have in the files, and we can't encode this promise 
nowhere, then it gets complex.
   
   At that point, the planner should have a mode to disable all possible 
optimizations allowing a different results set order without an order by, which 
is definitely a non-trivial scrutiny, that every future contribution to the 
planner will have to go through, and since it's a major deviation from the SQL 
standard, everything must be re-checked for safety, a major task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to