GitHub user jianlirong added a comment to the discussion: Extend the gpfdist tool to support SFTP/HDFS protocols for high-performance multi-source data ingestion
It appears that data import and export is indeed a frequently used and frequently discussed functionality in MPP databases. Based on previous discussions and this topic, Cloudberry currently has three main frameworks for parallel data import and export: (1) PXF; (2) FDW; (3) gpfdist, which is discussed here. Each framework aims to access more data sources by adding support for more protocols and file formats. Apart from the differences in the frameworks themselves, the logic for protocol support and file format support is basically the same. Different technical teams have chosen different frameworks due to their own historical reasons. To avoid duplicated development work (even though Apache Cloudberry is an open-source project, development resources are still very valuable), I personally think that each team can first submit all the code they're willing to open-source to GitHub, then create a dedicated discussion topic. Through public discussion, we can determine which framework the Cloudberry project should focus on supporting to achieve support for more data sources. GitHub link: https://github.com/apache/cloudberry/discussions/1205#discussioncomment-13662136 ---- This is an automatically sent email for dev@cloudberry.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org For additional commands, e-mail: dev-h...@cloudberry.apache.org