event trigger should provide more details

2023-05-30 Thread Lian Jiang
hi,

I plan to create an event trigger to detect schema change (e.g. add/remove
a column, change column type), and write it into a separate table (e.g.
EVENTS). Then a process periodically reads this table to send schema change
notification. However, the event trigger
 (43.10.2.
Triggers on Events) does not provide me info such as which table is
altered, old and new schema. Am I missing something? Thanks very much for
any hints.

Regards
Lian


Re: event trigger should provide more details

2023-05-30 Thread Lian Jiang
Thanks. This is helpful. Below is the result when I add a column to
public.accounts.

obj.classid, -- 1259
obj.objid, -- 16409
obj.objsubid, -- 0
obj.command_tag, -- ALTER TABLE
obj.object_type, -- table
obj.schema_name, -- public
obj.object_identity, -- public.accounts
obj.in_extension; -- f

The info useful for me is command_tag, object_type, object_identity.
classid, objid is not useful since object_identity is more explicit.
objsubid is not useful because I don't need comment
<https://www.postgresql.org/message-id/pine.lnx.4.33.0212091822050.15095-100...@leary.csoft.net>
information for schema change.

Besides table name, I still need:
* which columns are added and their types.
* which columns have type change, the old and new types.
* which columns are dropped.

Will command field provide this info? I don't have an example and decoding
it needs C code
<https://www.postgresql.org/message-id/2019071343.GA26924%40alvherre.pgsql>
. If I cannot get such info from pg_event_trigger_ddl_commands, I may need
to maintain schema snapshots myself and diff the old and new snapshots upon
an alter table/view event. Which way should I go? Thanks a lot.




On Tue, May 30, 2023 at 2:42 PM Erik Wienhold  wrote:

> > On 30/05/2023 22:23 CEST Lian Jiang  wrote:
> >
> > I plan to create an event trigger to detect schema change (e.g.
> add/remove
> > a column, change column type), and write it into a separate table (e.g.
> > EVENTS). Then a process periodically reads this table to send schema
> change
> > notification. However, the event trigger (
> https://www.postgresql.org/docs/current/plpgsql-trigger.html)
> > (43.10.2. Triggers on Events) does not provide me info such as which
> table
> > is altered, old and new schema. Am I missing something? Thanks very much
> for
> > any hints.
>
> You must use ddl_command_end event triggers[0] and call function
> pg_event_trigger_ddl_commands[1] to get info such as altered table and
> column.
>
> [0] https://www.postgresql.org/docs/current/event-trigger-definition.html
> [1] https://www.postgresql.org/docs/current/functions-event-triggers.html
>
> --
> Erik
>
>
>

-- 

Create your own email signature
<https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>


speed up full table scan using psql

2023-05-30 Thread Lian Jiang
hi,

I am using psql to periodically dump the postgres tables into json files
which are imported into snowflake. For large tables (e.g. 70M rows), it
takes hours for psql to complete. Using spark to read the postgres table
seems not to work as the postgres read only replication is the bottleneck
so spark cluster never uses >1 worker node and the working node timeout or
out of memory.

Will vertical scaling the postgres db speed up psql? Or any thread related
parameter of psql can help? Thanks for any hints.

Regards
Lian


Re: speed up full table scan using psql

2023-05-30 Thread Lian Jiang
Thanks! Will try.

Is there a way to speed up Spark ingestion (postgres -> spark -> snowflake)
which does not have issues such as alignment or inefficient json file
format? I favor spark since it simplifies the ingest path of postgres ->
local json.gz -> s3 -> snowflake stage -> snowflake table. Hope this makes
sense.

On Tue, May 30, 2023 at 10:17 PM David Rowley  wrote:

> On Wed, 31 May 2023 at 16:26, Lian Jiang  wrote:
> > I am using psql to periodically dump the postgres tables into json files
> which are imported into snowflake. For large tables (e.g. 70M rows), it
> takes hours for psql to complete. Using spark to read the postgres table
> seems not to work as the postgres read only replication is the bottleneck
> so spark cluster never uses >1 worker node and the working node timeout or
> out of memory.
> >
> > Will vertical scaling the postgres db speed up psql? Or any thread
> related parameter of psql can help? Thanks for any hints.
>
> This is probably slow due to psql's alignment.  It needs to read the
> entire result to know how much to whitespace to pad columns with
> before it can output anything. Naturally, that requires quite a bit of
> memory when large tables are queried and also lots of length checking.
>
> As mentioned by Rob, you'll probably want to use COPY, or you could
> look at using pg_dump.  Something like:
>
> pg_dump --inserts --rows-per-insert=100 --table= export> --data-only 
>
> should give you something that you might have some luck directly
> importing to Snowflake without any transformation tool required.
>
> However, if you do need to do some transformation before loading, then
> you might want to do that in PostgreSQL.  For that, something like:
>
> psql -c "copy (select col1,col2,col3+col4 from your_table) to stdout"
> 
>
> would allow you to run a query, which you could maybe do your
> transformations in before importing into Snowflake
>
> David
>


-- 

Create your own email signature
<https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>


Re: speed up full table scan using psql

2023-05-31 Thread Lian Jiang
The command is: psql $db_url -c "copy (select row_to_json(x_tmp_uniq) from
public.mytable x_tmp_uniq) to stdout"
postgres version:  14.7
Does this mean COPY and java CopyManager may not help since my psql command
already uses copy?

Regarding pg_dump, it does not support json format which means extra work
is needed to convert the supported format to jsonl (or parquet) so that
they can be imported into snowflake. Still exploring but want to call it
out early. Maybe 'custom' format can be parquet?


Thanks
Lian


Re: speed up full table scan using psql

2023-05-31 Thread Lian Jiang
The whole command is:

psql %(pg_uri)s -c %(sql)s | %(sed)s | %(pv)s | %(split)s) 2>&1 | %(tr)s

where:
sql is "copy (select row_to_json(x_tmp_uniq) from public.mytable
x_tmp_uniq) to stdout"
sed, pv, split, tr together format and split the stdout into jsonl files.

Hope this helps.


On Wed, May 31, 2023 at 9:16 PM Adrian Klaver 
wrote:

> On 5/31/23 13:57, Lian Jiang wrote:
> > The command is: psql $db_url -c "copy (select row_to_json(x_tmp_uniq)
> > from public.mytable x_tmp_uniq) to stdout"
>
> What is taking the stdout and what it is it doing?
>
> > postgres version:  14.7
> > Does this mean COPY and java CopyManager may not help since my psql
> > command already uses copy?
> >
> > Regarding pg_dump, it does not support json format which means extra
> > work is needed to convert the supported format to jsonl (or parquet) so
> > that they can be imported into snowflake. Still exploring but want to
> > call it out early. Maybe 'custom' format can be parquet?
> >
> >
> > Thanks
> > Lian
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

-- 

Create your own email signature
<https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>