GitHub user ZTE-EBASE added a comment to the discussion: Extend the gpfdist 
tool to support SFTP/HDFS protocols for high-performance multi-source data 
ingestion

Minimize kernel code changes by reusing the gpfdist protocol. Add an sftp/hdfs 
protocol marker and use it to call the corresponding functions for data 
reading. Meanwhile, it can address the issue of data files not being on the 
same machine as gpfdist.
For example:
CREATE EXTERNAL TABLE ext1 (d varchar(20)) location 
('gpfdist://ip:port/<sftp://sftp-user:passwd@sftp-hostip:sftp-port/file.csv>') 
format 'csv' (DELIMITER '|');

CREATE EXTERNAL TABLE ext2 (d varchar(20)) location 
('gpfdist://ip:port/<hdfs://namenode:port/file-path.parquet>') format 'csv' 
(DELIMITER '|');


GitHub link: 
https://github.com/apache/cloudberry/discussions/1205#discussioncomment-13636999

----
This is an automatically sent email for dev@cloudberry.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: dev-h...@cloudberry.apache.org

Reply via email to