GitHub user jianlirong added a comment to the discussion: Extend the gpfdist 
tool to support SFTP/HDFS protocols for high-performance multi-source data 
ingestion

It appears that data import and export is indeed a frequently used and 
frequently discussed functionality in MPP databases. Based on previous 
discussions and this topic, Cloudberry currently has three main frameworks for 
parallel data import and export: (1) PXF; (2) FDW; (3) gpfdist, which is 
discussed here. Each framework aims to access more data sources by adding 
support for more protocols and file formats. Apart from the differences in the 
frameworks themselves, the logic for protocol support and file format support 
is basically the same. Different technical teams have chosen different 
frameworks due to their own historical reasons.

To avoid duplicated development work (even though Apache Cloudberry is an 
open-source project, development resources are still very valuable), I 
personally think that each team can first submit all the code they're willing 
to open-source to GitHub, then create a dedicated discussion topic. Through 
public discussion, we can determine which framework the Cloudberry project 
should focus on supporting to achieve support for more data sources.

GitHub link: 
https://github.com/apache/cloudberry/discussions/1205#discussioncomment-13662136

----
This is an automatically sent email for dev@cloudberry.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: dev-h...@cloudberry.apache.org

Reply via email to