GitHub user gfphoenix78 added a comment to the discussion: Extend the gpfdist 
tool to support SFTP/HDFS protocols for high-performance multi-source data 
ingestion

The discussion seems to support more protocols for external tables, not 
multiple data sources for a single external table. To be clear, the external 
table has supported multiple data sources for a single external table.

The topic has two targets:
1. support more transfer protocol
2. support addtional file format
Let's discuss about them one by one.

## Transfer Protocol
Looks to support more clients to fetch files. BTW, gpfdist supports to 
transform on server, like
https://github.com/apache/cloudberry/blob/main/src/bin/gpfdist/regress/input/exttab1.source#L541

## File Format
The external table support **CUSTOM** format:
```
Syntax:
CREATE [READABLE] EXTERNAL [TEMPORARY | TEMP] TABLE table_name
     ( column_name data_type [, ...] | LIKE other_table )
      LOCATION ('file://seghost[:port]/path/file' [, ...])
        | ('gpfdist://filehost[:port]/file_pattern[#transform]'
        | ('gpfdists://filehost[:port]/file_pattern[#transform]'
            [, ...])
      FORMAT 'TEXT'
            [( [HEADER]
               [DELIMITER [AS] 'delimiter' | 'OFF']
               [NULL [AS] 'null string']
               [ESCAPE [AS] 'escape' | 'OFF']
               [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
               [FILL MISSING FIELDS] )]
           | 'CSV'
            [( [HEADER]
               [QUOTE [AS] 'quote']
               [DELIMITER [AS] 'delimiter']
               [NULL [AS] 'null string']
               [FORCE NOT NULL column [, ...]]
               [ESCAPE [AS] 'escape']
               [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
               [FILL MISSING FIELDS] )]
           | 'CUSTOM' (Formatter=<formatter specifications>)
     [ OPTIONS ( key 'value' [, ...] ) ]
     [ ENCODING 'encoding' ]
     [ [LOG ERRORS] SEGMENT REJECT LIMIT count
       [ROWS | PERCENT] ]
```
You could consider to implement a new file format.

GitHub link: 
https://github.com/apache/cloudberry/discussions/1205#discussioncomment-13646968

----
This is an automatically sent email for dev@cloudberry.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: dev-h...@cloudberry.apache.org

Reply via email to