New Copy Formats - avro/orc/parquet

2018-02-10 Thread Nicolas Paris
Hello

I d'found useful to be able to import/export from postgres to those modern data
formats:
- avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html)
- parquet (c++ writer=https://github.com/apache/parquet-cpp)
- orc (all writers=https://github.com/apache/orc)

Something like :
COPY table TO STDOUT ORC;

Would be lovely.

This would greatly enhance how postgres integrates in big-data ecosystem.

Any thought ?

Thanks



Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Adrian Klaver

On 02/10/2018 07:13 AM, Nicolas Paris wrote:

Hello

I d'found useful to be able to import/export from postgres to those modern data
formats:
- avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html)
- parquet (c++ writer=https://github.com/apache/parquet-cpp)
- orc (all writers=https://github.com/apache/orc)

Something like :
COPY table TO STDOUT ORC;

Would be lovely.

This would greatly enhance how postgres integrates in big-data ecosystem.

Any thought ?


https://www.postgresql.org/docs/10/static/sql-copy.html

"PROGRAM

A command to execute. In COPY FROM, the input is read from standard 
output of the command, and in COPY TO, the output is written to the 
standard input of the command.


Note that the command is invoked by the shell, so if you need to 
pass any arguments to shell command that come from an untrusted source, 
you must be careful to strip or escape any special characters that might 
have a special meaning for the shell. For security reasons, it is best 
to use a fixed command string, or at least avoid passing any user input 
in it.

"



Thanks




--
Adrian Klaver
adrian.kla...@aklaver.com



Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Nicolas Paris
> > I d'found useful to be able to import/export from postgres to those modern 
> > data
> > formats:
> > - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html)
> > - parquet (c++ writer=https://github.com/apache/parquet-cpp)
> > - orc (all writers=https://github.com/apache/orc)
> > 
> > Something like :
> > COPY table TO STDOUT ORC;
> > 
> > Would be lovely.
> > 
> > This would greatly enhance how postgres integrates in big-data ecosystem.
> > 
> > Any thought ?
> 
> https://www.postgresql.org/docs/10/static/sql-copy.html
> 
> "PROGRAM
> 
> A command to execute. In COPY FROM, the input is read from standard
> output of the command, and in COPY TO, the output is written to the standard
> input of the command.
> 
> Note that the command is invoked by the shell, so if you need to pass
> any arguments to shell command that come from an untrusted source, you must
> be careful to strip or escape any special characters that might have a
> special meaning for the shell. For security reasons, it is best to use a
> fixed command string, or at least avoid passing any user input in it.
> "
>

PROGRAM would involve overhead of transforming data from CSV or BINARY
to AVRO for example. 

Here, I am talking about native format exports/imports for performance
considerations.



Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread David G. Johnston
On Saturday, February 10, 2018, Nicolas Paris  wrote:

> Hello
>
> I d'found useful to be able to import/export from postgres to those modern
> data
> formats:
> - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html)
> - parquet (c++ writer=https://github.com/apache/parquet-cpp)
> - orc (all writers=https://github.com/apache/orc)
>
> Something like :
> COPY table TO STDOUT ORC;
>
> Would be lovely.
>
> This would greatly enhance how postgres integrates in big-data ecosystem.
>
>
It would be written "... with (format 'orc')" and your best bet would be to
create an extension.  I don't think that having such code in core (or
contrib) is desirable.

David J.


Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Tomas Vondra

On 02/10/2018 04:38 PM, David G. Johnston wrote:
> On Saturday, February 10, 2018, Nicolas Paris  > wrote:
> 
> Hello
> 
> I d'found useful to be able to import/export from postgres to those
> modern data
> formats:
> - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html
> )
> - parquet (c++ writer=https://github.com/apache/parquet-cpp
> )
> - orc (all writers=https://github.com/apache/orc
> )
> 
> Something like :
> COPY table TO STDOUT ORC;
> 
> Would be lovely.
> 
> This would greatly enhance how postgres integrates in big-data
> ecosystem.
> 
>  
> It would be written "... with (format 'orc')" and your best bet would be
> to create an extension.  I don't think that having such code in core (or
> contrib) is desirable.
> 

I don't think make this extensible by an extension (i.e. the formats
supported by COPY are hard-coded in core). But I agree that if we are to
add multiple new formats, it'd be nice to allow doing that in extension.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Tomas Vondra


On 02/10/2018 04:30 PM, Nicolas Paris wrote:
>>> I d'found useful to be able to import/export from postgres to those modern 
>>> data
>>> formats:
>>> - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html)
>>> - parquet (c++ writer=https://github.com/apache/parquet-cpp)
>>> - orc (all writers=https://github.com/apache/orc)
>>>
>>> Something like :
>>> COPY table TO STDOUT ORC;
>>>
>>> Would be lovely.
>>>
>>> This would greatly enhance how postgres integrates in big-data ecosystem.
>>>
>>> Any thought ?
>>
>> https://www.postgresql.org/docs/10/static/sql-copy.html
>>
>> "PROGRAM
>>
>> A command to execute. In COPY FROM, the input is read from standard
>> output of the command, and in COPY TO, the output is written to the standard
>> input of the command.
>>
>> Note that the command is invoked by the shell, so if you need to pass
>> any arguments to shell command that come from an untrusted source, you must
>> be careful to strip or escape any special characters that might have a
>> special meaning for the shell. For security reasons, it is best to use a
>> fixed command string, or at least avoid passing any user input in it.
>> "
>>
> 
> PROGRAM would involve overhead of transforming data from CSV or
> BINARY to AVRO for example.
> 
> Here, I am talking about native format exports/imports for
> performance considerations.
> 

That is true, but the question is how significant the overhead is. If
it's 50% then reducing it would make perfect sense. If it's 1% then no
one if going to be bothered by it.

Without these numbers it's hard to make any judgments.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



execute block like Firebird does

2018-02-10 Thread PegoraroF10
We are migrating our databases from Firebird to PostGres. A useful feature
Firebird has is Execute Block.
What it does is just return a record set from that dynamic SQL, just like a
PostGres function, but without creating it.
It sound like ...
execute block returns(ID Integer, Name varchar(50), LastInvoice Date, ...)
as 
begin
  select bla, bla, bla into ...;
  select bla, bla into ...;
  suspend;
end
I know we could create a function but we have several hundred of these
blocks running, so ... it would be a huge work to do. 
So, there is a way to run a dynamic sql which returns a set of records ?



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html