GitHub user yjhjstz added a comment to the discussion: Extend the gpfdist tool to support SFTP/HDFS protocols for high-performance multi-source data ingestion
I’d like to propose enhancing CloudBerry by integrating DuckDB, leveraging its [Data Sources](https://duckdb.org/docs/stable/data/data_sources.html) capability. DuckDB supports querying a wide variety of sources—CSV, Parquet, JSON, SQLite, PostgreSQL, MySQL, and cloud storage (S3, Azure Blob, Iceberg, etc.)—directly via SQL Why this matters for CloudBerry Direct data source access: DuckDB can read and filter data from diverse formats and storages without preprocessing. In-process analytics engine: Embedding DuckDB allows CloudBerry to push down complex SQL operations into a fast, vectorized OLAP engine. GitHub link: https://github.com/apache/cloudberry/discussions/1205#discussioncomment-13698874 ---- This is an automatically sent email for dev@cloudberry.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org For additional commands, e-mail: dev-h...@cloudberry.apache.org