GitHub user ZTE-EBASE added a comment to the discussion: Extend the gpfdist tool to support SFTP/HDFS protocols for high-performance multi-source data ingestion
Minimize kernel code changes by reusing the gpfdist protocol. Add an sftp/hdfs protocol marker and use it to call the corresponding functions for data reading. Meanwhile, it can address the issue of data files not being on the same machine as gpfdist. For example: CREATE EXTERNAL TABLE ext1 (d varchar(20)) location ('gpfdist://ip:port/<sftp://sftp-user:passwd@sftp-hostip:sftp-port/file.csv>') format 'csv' (DELIMITER '|'); CREATE EXTERNAL TABLE ext2 (d varchar(20)) location ('gpfdist://ip:port/<hdfs://namenode:port/file-path.parquet>') format 'csv' (DELIMITER '|'); GitHub link: https://github.com/apache/cloudberry/discussions/1205#discussioncomment-13636999 ---- This is an automatically sent email for dev@cloudberry.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org For additional commands, e-mail: dev-h...@cloudberry.apache.org