adriangb commented on PR #22000: URL: https://github.com/apache/datafusion/pull/22000#issuecomment-4375608962
Regarding how other systems implement this: it seems most columnar, analytical DBs are approximate for the same reasons we are: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-qry-select-sampling <img width="896" height="237" alt="image" src="https://github.com/user-attachments/assets/e8974f3b-59cd-4c25-9003-355163ef1638" /> https://docs.cloud.google.com/bigquery/docs/table-sampling <img width="877" height="294" alt="Screenshot 2026-05-04 at 7 19 59 PM" src="https://github.com/user-attachments/assets/83e4d314-d7b0-419c-85d1-3e6695f818b8" /> https://duckdb.org/docs/current/sql/samples <img width="916" height="210" alt="image" src="https://github.com/user-attachments/assets/f422938f-24d4-45da-a222-1d84b9a83c31" /> Similar for [Spark](https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-sampling.html) and [Trino](https://trino.io/docs/current/sql/select.html#tablesample). I think it's also worth highlighting that DuckDB has the behavior built in. In other words @alamb: if we merged 202641277d4ccc6bf735a2752f38821c508dc20e that's at least enough for someone like me to make it work in my system / in future DataFusion use cases. But I think it's valuable to take it one step further and expose this to users of `datafusion-cli`, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
