GitHub user yjhjstz added a comment to the discussion: Extend the gpfdist tool 
to support SFTP/HDFS protocols for high-performance multi-source data ingestion

I’d like to propose enhancing CloudBerry by integrating DuckDB, leveraging its 
[Data Sources](https://duckdb.org/docs/stable/data/data_sources.html) 
capability. DuckDB supports querying a wide variety of sources—CSV, Parquet, 
JSON, SQLite, PostgreSQL, MySQL, and cloud storage (S3, Azure Blob, Iceberg, 
etc.)—directly via SQL 

Why this matters for CloudBerry
Direct data source access: DuckDB can read and filter data from diverse formats 
and storages without preprocessing.

In-process analytics engine: Embedding DuckDB allows CloudBerry to push down 
complex SQL operations into a fast, vectorized OLAP engine.


GitHub link: 
https://github.com/apache/cloudberry/discussions/1205#discussioncomment-13698874

----
This is an automatically sent email for dev@cloudberry.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: dev-h...@cloudberry.apache.org

Reply via email to