asolimando commented on PR #21388: URL: https://github.com/apache/datafusion/pull/21388#issuecomment-4221924837
I don't think this is necessarily a bug, there are two types of semantics for similar approximate UDFs: discrete where you are guaranteed to have output values from the universe of input values, continuous where any intermediate value can be used and expected. Switching between the two semantics based on the amount of values doesn't seem desirable, especially because compression is usually a parameter for sketches, that is not necessary visible/relevant to the end-user. You cite determinism, and I agree that's a property we must preserve, but that's orthogonal to discrete vs continuous, and I believe that results are still deterministic even when compression kicks in, so unless you can reproduce a non-deterministic example, I'd remove that from the current discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
