Insertion time is very high for inserting data in postgres

2021-02-10 Thread prachi surangalikar
Hello Team,
Greetings!

We are using Postgres 12.2.1 for fetching per minute data for about 25
machines but running parallely via a single thread in python.
But suddenly the insertion time has increased to a very high level, about
30 second for one machine.
We are in so much problem as the data fetching is becoming slow.

if anyone could help us to solve this problem it would be of great help to
us.


Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased

2021-02-10 Thread Rajnish Vishwakarma
Hi Postgres Team,

The below are the scenarios which we are dealing with.

1) There are 20 Tables - On an average each having 150 columns.

2) There are 20 Threads Handled by Thread Pool Executor ( here we are using
Python's - psycopg2 module / library to  fetch the data .)

3) I am using the below statement to insert the data using Python -
psycopg2 module - using the exceute(...) command as .

sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" +
str(tuple(array_of_curly_values))
print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor()
v = col_cursor_db.execute(sql_stmt);

But earlier the same 22 threads were running and the insertion time was
gradually increased from 1 second to 30-35 seconds.

Requesting and urging the postgres general support team to help me out on
this.

How can i increase the INSERTION speed to minimize the insertion time taken
by each thread in the THREAD POOL.

Or there any different python libraries other than psycopg2 ?

Is there any different functions in python psycopg2 ?

Or what performance tuning has to be done to increaser the insertion speed ?


Re: Insertion time is very high for inserting data in postgres

2021-02-10 Thread Ganesh Korde
On Wed, 10 Feb 2021, 1:56 pm prachi surangalikar, <
surangalikarprachi...@gmail.com> wrote:

> Hello Team,
> Greetings!
>
> We are using Postgres 12.2.1 for fetching per minute data for about 25
> machines but running parallely via a single thread in python.
> But suddenly the insertion time has increased to a very high level, about
> 30 second for one machine.
> We are in so much problem as the data fetching is becoming slow.
>
> if anyone could help us to solve this problem it would be of great help to
> us.
>
Are you doing vaccuum analyze table regularly?  If not then that might
delay insertion.


Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Thorsten Schöning
Guten Tag Thorsten Schöning,
am Dienstag, 9. Februar 2021 um 10:56 schrieben Sie:

> Any explanation for the increased dump size? Thanks!

While I don't have a detailed explanation myself, there's the
following hint [in the docs][1]: 

> For the custom archive format, this specifies compression of
> individual table-data segments[...]

https://www.postgresql.org/docs/10/app-pgdump.html

I've changed the layout of my tables entirely and most likely have
changed the "individual table-data segments" as well. So I did some
further testing and got the following numbers. It's always OLD vs. NEW
schema for the schema documented earlier:

> --compress=0: 25  vs. 20  GiB
> --compress=?: 5,6 vs. 6,5 GiB
> --compress=9: 5,5 vs. 6,4 GiB

"?" means default settings, meaning "--compress" has been REMOVED from
the shell command.

As can be seen, pretty much as soon as compression comes into play,
the new table layout is less efficient, even though in OLD and NEW
schema exactly the same "bytea" were used, have been placed at the end
of their individual tables and alike. But things become more
interesting with another version of my schema, which as well indicates
a strong relationship of dump size, compression and table layout/size.

I've changed the new table "datagram" to be a partitioned one with
partitions containing rows per year and per half-year. Each partition
contains far less rows than before this way and while I only tested
with "--compress=9" this time, the numbers are quite interesting:

> unpartitioned: 6,4 GiB
> half-yearly parts: 4,8 GiB
> yearly parts:  4,8 GiB

The interesting part this time is that the table layout for all
partitions is the same like before, only the number of rows per table
is different. Though, the number of rows overall is the same like
before, the same data, IDs etc. Though, this time the dump really is
smaller than with the OLD schema containing far more data because of
duplicate IDs and stuff.

I wouldn't have expected table layout to be that important.

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska










Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased

2021-02-10 Thread Daniele Varrazzo
On Wed, 10 Feb 2021, 09:26 Rajnish Vishwakarma, <
rajnish.nationfi...@gmail.com> wrote:

> Hi Postgres Team,
>
> The below are the scenarios which we are dealing with.
>
> 1) There are 20 Tables - On an average each having 150 columns.
>
> 2) There are 20 Threads Handled by Thread Pool Executor ( here we are
> using Python's - psycopg2 module / library to  fetch the data .)
>
> 3) I am using the below statement to insert the data using Python -
> psycopg2 module - using the exceute(...) command as .
>
> sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" +
> str(tuple(array_of_curly_values))
> print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor()
> v = col_cursor_db.execute(sql_stmt);
>

This is an insecure way to do it, but that's beside the point.


But earlier the same 22 threads were running and the insertion time was
> gradually increased from 1 second to 30-35 seconds.
>
> Requesting and urging the postgres general support team to help me out on
> this.
>
> How can i increase the INSERTION speed to minimize the insertion time
> taken by each thread in the THREAD POOL.
>

Using a COPY statement instead of insert. For a more moderate change in
your code, but for a smaller increase of speed, you can look at the
batching helpers (
https://www.psycopg.org/docs/extras.html#fast-execution-helpers).


Or there any different python libraries other than psycopg2 ?
>

Psycopg3 hasn't been released yet, so using it is on the experimental side.
However it provides a better support to using copy which would be perfect
for your use case (
https://www.psycopg.org/psycopg3/docs/copy.html#writing-data-row-by-row).

-- Daniele


Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Francisco Olarte
Thorsten:

On Wed, Feb 10, 2021 at 9:58 AM Thorsten Schöning  wrote:
...
> I've changed the new table "datagram" to be a partitioned one with
> partitions containing rows per year and per half-year. Each partition
> contains far less rows than before this way and while I only tested
> with "--compress=9" this time, the numbers are quite interesting:
>
> > unpartitioned: 6,4 GiB
> > half-yearly parts: 4,8 GiB
> > yearly parts:  4,8 GiB
>
> The interesting part this time is that the table layout for all
> partitions is the same like before, only the number of rows per table
> is different. Though, the number of rows overall is the same like
> before, the same data, IDs etc. Though, this time the dump really is
> smaller than with the OLD schema containing far more data because of
> duplicate IDs and stuff.
> I wouldn't have expected table layout to be that important.

Compresion is dependent on detectable redundancy on the input. pg_dump
more or less gzips per-table "copy to stdout" dumps. If your data
evolves in a determined way having it sorted by ingestion time may
increase detectable redundancy a lot, and partitioning sorts partially
by date ( or fully if you have made the partitions by range-querying
via index scan ). In this case it may not be the layout, but the
order.

Given you seem to be able to test, you may try sorting the full table
by the column you use for partitioning. IIRC cluster will do the trick
if it is indexed.

( This has happened to me compressing document dumps, presorting by
some chosen fields improved my compression ratio a bit, IIRC it was
due to the compressor searching for duplicates on a limited window
only, this is why I use "detectable" redundancy )

Francisco Olarte.




Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased

2021-02-10 Thread cen



On 10. 02. 21 09:14, Rajnish Vishwakarma wrote:

Hi Postgres Team,

The below are the scenarios which we are dealing with.

1) There are 20 Tables - On an average each having 150 columns.

2) There are 20 Threads Handled by Thread Pool Executor ( here we are 
using Python's - psycopg2 module / library to fetch the data .)


3) I am using the below statement to insert the data using Python - 
psycopg2 module - using the exceute(...) command as .


sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" + 
str(tuple(array_of_curly_values))

print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor()
v = col_cursor_db.execute(sql_stmt);

But earlier the same 22 threads were running and the insertion time 
was gradually increased from 1 second to 30-35 seconds.


Requesting and urging the postgres general support team to help me out 
on this.


How can i increase the INSERTION speed to minimize the insertion time 
taken by each thread in the THREAD POOL.


Or there any different python libraries other than psycopg2 ?

Is there any different functions in python psycopg2 ?

Or what performance tuning has to be done to increaser the insertion 
speed ?





Is a single insert taking 30 seconds or do you have such a large number 
of inserts that your thread pool can't handle it and you are waiting for 
a free connection?


For single insert, one reason for slowness at large databases could be 
indexes which need to be updated for each insert. For the latter, you 
should increase the thread pool size.






SV: Insertion time is very high for inserting data in postgres

2021-02-10 Thread Niels Jespersen
>Fra: prachi surangalikar 
>
>Hello Team,
>Greetings!
>
>We are using Postgres 12.2.1 for fetching per minute data for about 25 
>machines but running parallely via a single thread in python.
>But suddenly the insertion time has increased to a very high level, about 30 
>second for one machine.
>We are in so much problem as the data fetching is becoming slow.
>
>if anyone could help us to solve this problem it would be of great help to us.

Get your data into a Text.IO memory structure and then use copy 
https://www.psycopg.org/docs/usage.html#using-copy-to-and-copy-from

This is THE way of high-performant inserts using Postgres.

Regards Niels Jespersen


Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Thorsten Schöning
Guten Tag Francisco Olarte,
am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:

> [...]and partitioning sorts partially
> by date ( or fully if you have made the partitions by range-querying
> via index scan ).[...]

That statement is especially interesting not only because of my
dump-size, but I'm running into the problem that queries spanning more
than one partition seem to prefer sequential scan over using indexes.
My indexe seems to only be used when querying the rows of one
partition.

So, the following is my definition, should that be "range-queried via
index scan" properly? :-)

> CREATE TABLE datagram
> (
>   id bigserial NOT NULL,
>   src_re integer   NOT NULL,
>   src_cltinteger   NOT NULL,
>   src_meter  integer   NOT NULL,
>   captured_attimestamp with time zone  NOT NULL,
>   captured_rssi  smallint  NOT NULL,
>   oms_status smallint  NOT NULL,
>   oms_encbytea,
>   oms_decbytea
> ) PARTITION BY RANGE (captured_at);

> CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM 
> ('1970-01-01') TO ('1970-07-01');
> CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM 
> ('1970-07-01') TO ('1971-01-01');
> [...]
> INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska










Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Thorsten Schöning
Guten Tag Francisco Olarte,
am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:

> [...]and partitioning sorts partially
> by date ( or fully if you have made the partitions by range-querying
> via index scan ).[...]

That statement is especially interesting not only because of my
dump-size, but I'm running into the problem that queries spanning more
than one partition seem to prefer sequential scan over using indexes.
My indexe seems to only be used when querying the rows of one
partition.

So, the following is my definition, should that be "range-queried via
index scan" properly? :-)

> CREATE TABLE datagram
> (
>   id bigserial NOT NULL,
>   src_re integer   NOT NULL,
>   src_cltinteger   NOT NULL,
>   src_meter  integer   NOT NULL,
>   captured_attimestamp with time zone  NOT NULL,
>   captured_rssi  smallint  NOT NULL,
>   oms_status smallint  NOT NULL,
>   oms_encbytea,
>   oms_decbytea
> ) PARTITION BY RANGE (captured_at);

> CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM 
> ('1970-01-01') TO ('1970-07-01');
> CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM 
> ('1970-07-01') TO ('1971-01-01');
> [...]
> INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;
> CREATE INDEX idx_datagram_for_time_window ON datagram USING btree (src_meter, 
> captured_at DESC);

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska










Re: Insertion time is very high for inserting data in postgres

2021-02-10 Thread Dave Cramer
On Wed, 10 Feb 2021 at 06:11, Niels Jespersen  wrote:

> >Fra: prachi surangalikar 
>
> >
>
> >Hello Team,
>
> >Greetings!
>
> >
>
> >We are using Postgres 12.2.1 for fetching per minute data for about 25
> machines but running parallely via a single thread in python.
>
> >But suddenly the insertion time has increased to a very high level, about
> 30 second for one machine.
>
> >We are in so much problem as the data fetching is becoming slow.
>


Before anyone can help you, you will have to provide much more information.

Schema, data that you are inserting, size of the machine, configuration
settings. etc.

Dave

> >
>
> >if anyone could help us to solve this problem it would be of great help
> to us.
>
>
>
> Get your data into a Text.IO memory structure and then use copy
> https://www.psycopg.org/docs/usage.html#using-copy-to-and-copy-from
>
>
>
> This is THE way of high-performant inserts using Postgres.
>
>
>
> Regards Niels Jespersen
>


Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Francisco Olarte
Thorsten:

On Wed, Feb 10, 2021 at 12:36 PM Thorsten Schöning
 wrote:
> Guten Tag Francisco Olarte,
> am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:
> > [...]and partitioning sorts partially
> > by date ( or fully if you have made the partitions by range-querying
> > via index scan ).[...]
> That statement is especially interesting not only because of my
> dump-size, but I'm running into the problem that queries spanning more
> than one partition seem to prefer sequential scan over using indexes.
> My indexe seems to only be used when querying the rows of one
> partition.

You know the drill, show your indexes, post some explain for these,
those statements are too fuzzy to infer any useful advice.

> So, the following is my definition, should that be "range-queried via
> index scan" properly? :-)
> > CREATE TABLE datagram
> > (
> >   id bigserial NOT NULL,
> >   src_re integer   NOT NULL,
> >   src_cltinteger   NOT NULL,
> >   src_meter  integer   NOT NULL,
> >   captured_attimestamp with time zone  NOT NULL,
> >   captured_rssi  smallint  NOT NULL,
> >   oms_status smallint  NOT NULL,
> >   oms_encbytea,
> >   oms_decbytea
> > ) PARTITION BY RANGE (captured_at);
> > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM 
> > ('1970-01-01') TO ('1970-07-01');
> > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM 
> > ('1970-07-01') TO ('1971-01-01');
> > [...]

Given that definition there is no possible index scan, as there are no indexes.

> > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;

But given this what I was suggesting is not true. What I meant is that
if you had big datagram_unpart table indexed by captured_at with good
index correlation, which given the name of the field would be
expected, and populated the partitions one by one by doing range
queries over datagram unpart you may have hit index scans on each
population query and end up with perfectly sorted partitions, and that
could help compressions.

Given your query, unless the optimizer is performing some supernatural
( for me ) trickery my bet would be on a full table scan plus
insertion which would left you with whatever order you had originally
for each partition.

But if your data, or your "data shape" has some natural correlation
with captured timestamps ( i.e., I know that my CDR data has it ),
puttig similar data together could have improved your compression
ratios. Bear in mind I do not know it.

Anyway, I would not worry too much about the backup ratios with the
sizes you have, I would worry more on doing things like augmenting
fill factors and similar things if your data processing is similar to
mine ( data enters at time captured_at, sometimes is fixed due to
errors but normally never moves after being in the tables for a couple
of months, and when it does it is a special problems which can be
manually tuned ). It will not improve the backup too much, but can
improve other things. i.e., in my big CDR ( that means call detail
record, telephony ) tables I let them insert into "normal" monthly
partitions, but when they are some month olds I coalesce them into
fully sorted, unindexed, 100% filled yearly partitions. Those
partitions are then put into an "historic" schema which I do NOT
backup automatically, I only do it after the ( manually fired )
coalescing and the ( manually done ) very ocasional long term fixups (
I rarely have to touch nothing older than a month ).

Regards.
Francisco Olarte.




Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Francisco Olarte
Thorsten:

On Wed, Feb 10, 2021 at 12:46 PM Thorsten Schöning
 wrote:

UUppzz, completely missed the second message.

> > CREATE INDEX idx_datagram_for_time_window ON datagram USING btree 
> > (src_meter, captured_at DESC);

Depending on the query and data "shape", this may be chosen or not for
queries. You should be the one more probable to know it.

Francisco Olarte.




Re: Increased size of database dump even though LESS consumed storage

2021-02-10 Thread Thorsten Schöning
Guten Tag Thorsten Schöning,
am Mittwoch, 10. Februar 2021 um 09:58 schrieben Sie:

>> unpartitioned: 6,4 GiB
>> half-yearly parts: 4,8 GiB
>> yearly parts:  4,8 GiB

The above number for `unpartitioned` might be wrong: I've re-created
the same database, applied the same data migration like in the past
and created a dump which was 4,8 GiB in size. Maybe I simply did
something wrong and left some renamed table with data in the formerly
used database, while NOT doing so for the other test databases. So my
entire question might simply be based on user errors. :-)

Sadly I don't seem to have the former dumps anymore to have a look at
its content.

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska










Re: How does Postgres decide if to use additional workers?

2021-02-10 Thread Philip Semanchuk



> On Feb 9, 2021, at 10:52 AM, Thorsten Schöning  wrote:
> 
> So, based on which facts does Postgres decide if to use aadditional
> workers or not? Can I see those decisions explained somewhere? I don't
> see anything in the query plan. Thanks!

Hi Thorsten,
This is an interesting topic for me too. here’s a formula for the max number of 
workers that Postgres will consider for a table. Of course, most queries use 
more than just one table, and I don’t yet understand how Postgres handles the 
situation where the formula suggests multiple workers for some tables and a 
single worker for others.

There was some conversation about this on the performance mailing list in June. 
Here’s a link to the message that contains the formula; there’s more items of 
interest in the whole thread:
https://www.postgresql.org/message-id/89423FD3-0F13-447D-8D9E-EB1722150F94%40americanefficient.com

Hope this helps
Philip 



Which partition scheme makes sense for my time based IoT-datagrams?

2021-02-10 Thread Thorsten Schöning
Hi all,

I have a table storing datagrams from some IoT-devices, with one
datagram per device per day most likely for around 75'000 devices
currently. I want to test query performance with a partitioned table
and am interested in the following queries mostly: 

* querying arbitrary datagrams by their ID
* querying datagrams being X old based on some timestamp
  * 15 minutes
  * 15 days
  * 15 months

My app isn't prepared to e.g. migrate things in the background,
instead I would like to have Postgres deal with all apsects as
transparent as possible. At least for the next few years, things
worked without partitions in the past as well.

Therefore I thought of simply partitioning by RANGE of the timestamp
when the datagram has been received and create individual partitions
per half a year. That means that in most cases only the most current
1 partition needs to be queried, with the last 3 in many other cases.

Other approaches I've read on this list were e.g. partitioning the
most current months individually and afterwards moving rows to some
other "archive"-like table. Besides of the lack of the necessary
infrastructure, in my use case in theory users need to be able to ask
for the last 15 months at some arbitrary point in history at any time.
I wouldn't like to deal with different tables or alike in my app.

My approach would result in 24 tables already, with 2 more per year.
Does that for itself sound bad already? Is that a limit the planner
can deal with most likely or do I don't even need to care for hundreds
or partitions?

If partitioned by timestamp, how are lookups by ID performed? Is that
a sequential scan on all partitions, e.g. with using available indexes
per partition?

Is there some way to RANGE by timestamp and ID, by keeping the
half-year tables only? So that the planner knows easier which
partition to look at in case of IDs only? Or is that simply like
querying an ID-index of each partition?

The following is what I have currently, extra verbose to support
Postgres 10:

> CREATE TABLE datagram
> (
>   idbigserial NOT NULL,
>   src_reinteger   NOT NULL,
>   src_clt   integer   NOT NULL,
>   src_meter integer   NOT NULL,
>   captured_at   timestamp with time zone  NOT NULL,
>   captured_rssi smallint  NOT NULL,
>   oms_statussmallint  NOT NULL,
>   oms_enc   bytea,
>   oms_dec   bytea
> ) PARTITION BY RANGE (captured_at);

> CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM 
> ('1970-01-01') TO ('1970-07-01');
> CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM 
> ('1970-07-01') TO ('1971-01-01');
> [...]

> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT pk_datagram_y1970_h1 
> PRIMARY KEY  (id);
> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT 
> ck_datagram_y1970_h1_oms_data_avail  CHECK(oms_enc IS NOT NULL OR 
> oms_dec IS NOT NULL);
> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_re  
> FOREIGN KEY  (src_re)  REFERENCES real_estate  (id);
> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_clt 
> FOREIGN KEY  (src_clt)  REFERENCES collector  (id);
> ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_meter   
> FOREIGN KEY  (src_meter)  REFERENCES meter(id);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT pk_datagram_y1970_h2 
> PRIMARY KEY  (id);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT 
> ck_datagram_y1970_h2_oms_data_avail  CHECK(oms_enc IS NOT NULL OR 
> oms_dec IS NOT NULL);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_re  
> FOREIGN KEY  (src_re)  REFERENCES real_estate  (id);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_clt 
> FOREIGN KEY  (src_clt)  REFERENCES collector  (id);
> ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_meter   
> FOREIGN KEY  (src_meter)  REFERENCES meter(id);
> [...]

> CREATE INDEX idx_datagram_y1970_h1_for_time_window ON datagram_y1970_h1 USING 
> btree (src_meter, captured_at DESC);
> CREATE INDEX idx_datagram_y1970_h2_for_time_window ON datagram_y1970_h2 USING 
> btree (src_meter, captured_at DESC);
> [...]

An example query condition is the following, while "P5D" can simply be
"P15M" or alike instead.

> WHERE (
>   "real_estate"."id" IN ([...])
>   AND "meter"."id" IN ([...])
>   AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' 
> AS TIMESTAMP WITH TIME ZONE) - CAST('P5D' AS INTERVAL)) AND 
> (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) + 
> CAST('P0D' AS INTERVAL))
> )

Thanks for sharing your opinions!

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E

Re: Which partition scheme makes sense for my time based IoT-datagrams?

2021-02-10 Thread Laurenz Albe
On Wed, 2021-02-10 at 16:09 +0100, Thorsten Schöning wrote:
> I have a table storing datagrams from some IoT-devices, with one
> datagram per device per day most likely for around 75'000 devices
> currently. I want to test query performance with a partitioned table
> and am interested in the following queries mostly: 
> 
> * querying arbitrary datagrams by their ID
> * querying datagrams being X old based on some timestamp
>   * 15 minutes
>   * 15 days
>   * 15 months

You want to query ALL individual datagrams from the past 15 months??

I guess not, and that you want some aggregation.  For that, materialized
views might be interesting.

> Therefore I thought of simply partitioning by RANGE of the timestamp
> when the datagram has been received and create individual partitions
> per half a year. That means that in most cases only the most current
> 1 partition needs to be queried, with the last 3 in many other cases.
> 
> My approach would result in 24 tables already, with 2 more per year.
> Does that for itself sound bad already? Is that a limit the planner
> can deal with most likely or do I don't even need to care for hundreds
> or partitions?

24 partitions is harmless.  I think that a partition per half year might
not be enough, but it depends on the queries.

You should use PostgreSQL v12 or better.

One point you didn't touch, but should consider, is getting rid of old data.
You will want to do that at some point, or at least you want to keep only
aggregates of the data for historical statistical data.

That would be very easy if you partitoin correctly, and it would keep the
number of partitions in a manageable range.

> If partitioned by timestamp, how are lookups by ID performed? Is that
> a sequential scan on all partitions, e.g. with using available indexes
> per partition?

They will scan all partitions.  You have to be prepared that most queries
will become at least slightly slower with partitioning.  That is
expected.

> Is there some way to RANGE by timestamp and ID, by keeping the
> half-year tables only? So that the planner knows easier which
> partition to look at in case of IDs only? Or is that simply like
> querying an ID-index of each partition?

Sure, you can partition by ID (ranges; or perhaps hash partitioning,
if you only ever query for a single ID) and subpartition by timestamp.

This will speed up the queries by ID, since only a single partition
will be searched.  You could then almost be as fast as with a single
monolitic table.  Don't forget that the speed of an index scan does
not (or almost not) depend on the size of the table.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Connecting to database through username,passphrase and private key on PgAdmin

2021-02-10 Thread Yambu
  Hello


May i please know how i can connect to a db server using pgAdmin. How do i
use the combination of username,passphrase and private key .



Below is when i connect using putty. I then login as postgres user and
connect to the database.

login as: user1
Authenticating with public key "***"
Passphrase for key "*":

regards


Re: Which partition scheme makes sense for my time based IoT-datagrams?

2021-02-10 Thread Thorsten Schöning
Guten Tag Laurenz Albe,
am Mittwoch, 10. Februar 2021 um 17:45 schrieben Sie:

> They will scan all partitions.  You have to be prepared that most queries
> will become at least slightly slower with partitioning.  That is
> expected.

Does "most queries" address thos efor IDs scanning all partitions or
those time-based as well? In the end, I'm trying to improve query
performance by reducing the size of indexes, number of rows etc. per
table using partitions. :-)

The docs read like my use case would fit to partitioning as well and
I've already tried all sorts of indexing on the unpartitioned table to
improve some of my example queries with not much luck.

Mit freundlichen Grüßen

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

E-Mail: thorsten.schoen...@am-soft.de
Web:http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax: 05151-  9468-88
Mobil:0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 
Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil: 
Webseite: https://www.am-soft.de 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore 
Gruppe - Ihr Full-Service-Dienstleister für IT und TK

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz 
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 
33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska










Compiler warnings on Debian 10

2021-02-10 Thread Daniel Westermann (DWE)
HI,

I was wondering if someone already has seen these warnings on Debian 10 
(PostgreSQL 13.1):

../../../../src/include/port.h:176:70: warning: 'format' attribute argument not 
supported: gnu_printf
  [-Wignored-attributes]
extern int  pg_snprintf(char *str, size_t count, const char *fmt,...) 
pg_attribute_printf(3, 4);
  ^
../../../../src/include/c.h:127:49: note: expanded from macro 
'pg_attribute_printf'
#define pg_attribute_printf(f,a) __attribute__((format(PG_PRINTF_ATTRIBUTE, f, 
a)))
^
In file included from llvmjit_error.cpp:19:
In file included from ../../../../src/include/postgres.h:46:
In file included from ../../../../src/include/c.h:1318:
../../../../src/include/port.h:178:55: warning: 'format' attribute argument not 
supported: gnu_printf
  [-Wignored-attributes]
extern int  pg_sprintf(char *str, const char *fmt,...) 
pg_attribute_printf(2, 3);
   ^
../../../../src/include/c.h:127:49: note: expanded from macro 
'pg_attribute_printf'
#define pg_attribute_printf(f,a) __attribute__((format(PG_PRINTF_ATTRIBUTE, f, 
a)))
^
In file included from llvmjit_error.cpp:19:
In file included from ../../../../src/include/postgres.h:46:
In file included from ../../../../src/include/c.h:1318:
../../../../src/include/port.h:180:58: warning: 'format' attribute argument not 
supported: gnu_printf
  [-Wignored-attributes]
extern int  pg_fprintf(FILE *stream, const char *fmt,...) 
pg_attribute_printf(2, 3);
  ^

Configure was successful:
...
configure: creating ./config.status
config.status: creating GNUmakefile
config.status: creating src/Makefile.global
config.status: creating src/include/pg_config.h
config.status: creating src/include/pg_config_ext.h
config.status: creating src/interfaces/ecpg/include/ecpg_config.h
config.status: linking src/backend/port/tas/dummy.s to src/backend/port/tas.s
config.status: linking src/backend/port/posix_sema.c to 
src/backend/port/pg_sema.c
config.status: linking src/backend/port/sysv_shmem.c to 
src/backend/port/pg_shmem.c
config.status: linking src/include/port/linux.h to src/include/pg_config_os.h
config.status: linking src/makefiles/Makefile.linux to src/Makefile.port


This is the official Debian AMI on AWS:
postgres@pg13:~/postgresql-13.1$ cat /etc/debian_version 
10.8

Regards
Daniel



Unable to execute Query in parallel for partitioned table

2021-02-10 Thread Brajendra Pratap
Hi,

I am unable to execute the below in parallel plz suggest how can I achieve
parallelism here.

select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER()  OVER (order by
trn_transaction_date desc ) AS RowNumber from (
select * from transactions where trn_store_date_id=20201202) abc;

Query plan is as mentioned below :-

explain analyze select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER()
OVER (order by trn_transaction_date desc ) AS RowNumber from (
select * from transactions where trn_store_date_id=20201218) abc;
LOG:  duration: 25820.176 ms  statement: explain analyze select count(*)
over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER()  OVER (order by
trn_transaction_date desc ) AS RowNumber from (
select * from transactions where trn_store_date_id=20201218) abc;

  QUERY PLAN
--
 WindowAgg  (cost=4474843.51..4498066.81 rows=774110 width=21297) (actual
time=21455.495..25241.738 rows=795190 loops=1)
   ->  WindowAgg  (cost=4474843.51..4488390.44 rows=774110 width=21289)
(actual time=10588.494..15311.865 rows=795190 loops=1)
 ->  Sort  (cost=4474843.51..4476778.79 rows=774110 width=21281)
(actual time=10588.422..11771.300 rows=795190 loops=1)
   Sort Key: transactions.trn_transaction_date DESC
   Sort Method: external merge  Disk: 1496856kB
   ->  Result  (cost=0.00..270640.32 rows=774110 width=21281)
(actual time=0.117..4504.159 rows=795190 loops=1)
 ->  Append  (cost=0.00..262899.22 rows=774110
width=21281) (actual time=0.094..1449.532 rows=795190 loops=1)
   ->  Seq Scan on transactions  (cost=0.00..0.00
rows=1 width=47554) (actual time=0.019..0.019 rows=0 loops=1)
 Filter: (trn_store_date_id = 20201218)
   ->  Index Scan using
idx_202012_trn_store_date_id on transactions_202012  (cost=0.56..259028.67
rows=774109 width=21281) (actual time=0.074..1357.764 rows=795190 loops=1)
 Index Cond: (trn_store_date_id = 20201218)
 Planning Time: 116.472 ms
 Execution Time: 25676.098 ms

Note :- We had tried different options like max_worker_processes,
max_parallel_workers,
max_parallel_workers_per_gather,max_parallel_maintenance_worker to execute
it in parallel but no luck.

Please suggest.

Thanks


PostgreSQL occasionally unable to rename WAL files (NTFS)

2021-02-10 Thread Guy Burgess

Hello,

Running 13.1 on Windows Server 2019, I am getting the following log 
entries occasionally:


    2021-02-11 12:34:10.149 NZDT [6072] LOG:  could not rename file 
"pg_wal/0001009900D3": Permission denied
    2021-02-11 12:40:31.377 NZDT [6072] LOG:  could not rename file 
"pg_wal/0001009900D3": Permission denied
    2021-02-11 12:46:06.294 NZDT [6072] LOG:  could not rename file 
"pg_wal/0001009900D3": Permission denied
    2021-02-11 12:46:16.502 NZDT [6072] LOG:  could not rename file 
"pg_wal/0001009900DA": Permission denied
    2021-02-11 12:50:20.917 NZDT [6072] LOG:  could not rename file 
"pg_wal/0001009900D3": Permission denied
    2021-02-11 12:50:31.098 NZDT [6072] LOG:  could not rename file 
"pg_wal/0001009900DA": Permission denied


What appears to be happening is the affected WAL files (which is usually 
only 2 or 3 WAL files at a time) are somehow "losing" their NTFS 
permissions, so the PG process can't rename them - though of course the 
PG process created them. Even running icacls as admin gives "Access is 
denied" on those files. A further oddity is the affected files do end up 
disappearing after a while.


The NTFS permissions on the pg_wal directory are correct, and most WAL 
files are unaffected. Chkdsk reports no problems, and the database is 
working fine otherwise. Have tried disabling antivirus software in case 
that was doing something but no difference.


I found another recent report of similar behaviour here: 
https://stackoverflow.com/questions/65405479/postgresql-13-log-could-not-rename-file-pg-wal-0001000100c6


WAL config as follows:

   wal_level = replica
   fsync = on
   synchronous_commit = on
   wal_sync_method = fsync
   full_page_writes = on
   wal_compression = off
   wal_log_hints = off
   wal_init_zero = on
   wal_recycle = on
   wal_buffers = -1
   wal_writer_delay = 200ms
   wal_writer_flush_after = 1MB
   wal_skip_threshold = 2MB
   commit_delay = 0
   commit_siblings = 5
   checkpoint_timeout = 5min
   max_wal_size = 2GB
   min_wal_size = 256MB
   checkpoint_completion_target = 0.7
   checkpoint_flush_after = 0
   checkpoint_warning = 30s
   archive_mode = off

I'm thinking of disabling wal_recycle as a first step to see if that 
makes any difference, but thought I'd seek some comments first.


Not sure how much of a problem this is - the database is running fine 
otherwise - but any thoughts would be appreciated.


Thanks & regards,

Guy



Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea

2021-02-10 Thread Jagmohan Kaintura
HI All,

For POstgreSQL database to store data for multiple tenants, the approach
decided was to have
Shared Database (Holding data for all tenants)
  => Data would be segregated on basis of some additional column
(tennatid,different tenants having different tenantId)
   => Data would be accessed through Views on the basis of tenantId
value.

This is the basic process of most of the customers who are trying to
implement multiple tenants in PostgreSQL, rather than choosing
separate databases for each tenant.

Now we need to encrypt the data related to a tenantId, so that now one
knows this data belongs to which tenant even from Operations group.
Is there a method in POstgreSQL for encrypting data with different keys
with respect to different values in a single column.  Moreover pg_crypto
will impose a single key on the column.

Please share your thoughts in which direction i can start analysing this
area for encryption of data specific to a tenant.

Thanks in advance.
*Best Regards,*
Jagmohan
Senior Consultant, TecoreLabs.


Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea

2021-02-10 Thread Tim Cross


Jagmohan Kaintura  writes:

> HI All,
>
> For POstgreSQL database to store data for multiple tenants, the approach
> decided was to have
> Shared Database (Holding data for all tenants)
>   => Data would be segregated on basis of some additional column
> (tennatid,different tenants having different tenantId)
>=> Data would be accessed through Views on the basis of tenantId
> value.
>
> This is the basic process of most of the customers who are trying to
> implement multiple tenants in PostgreSQL, rather than choosing
> separate databases for each tenant.
>
> Now we need to encrypt the data related to a tenantId, so that now one
> knows this data belongs to which tenant even from Operations group.
> Is there a method in POstgreSQL for encrypting data with different keys
> with respect to different values in a single column.  Moreover pg_crypto
> will impose a single key on the column.
>
> Please share your thoughts in which direction i can start analysing this
> area for encryption of data specific to a tenant.
>

The decision to have all tenants in a single database seems rather
unusual to me. Isolating one tenant from adversely impacting another
would seem complicated and I'm not sure how you would implement a clear
security model. Your model has effectively bypassed all the provided PG
facilities for isolation of data. Disaster recovery and business
continuity planning under this model must be a nightmare!

I doubt you can adopt a solution which is solely within the database.
How would the database know which key to use for which rows of data? How
would you select the data for your tenant views if all that data is
encrypted with different keys? How would you manage these keys in a
secure manner?

With the model you have adopted, I would be looking at performing
encryption/decryption at the client level. However, depending on your
data types, this could be challenging. this is really a requirement
which should have been factored into the initial architecture design.
Anything you try to bolt on now is likely to be complex and have
significant performance impact and that is assuming you can re-interpret
the requirement to make the objective feasible.

--
Tim Cross




Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea

2021-02-10 Thread Rob Sargent


> On Feb 10, 2021, at 6:45 PM, Tim Cross  wrote:
> 
> 
> Jagmohan Kaintura mailto:jagmo...@tecorelabs.com>> 
> writes:
> 
>> HI All,
>> 
>> For POstgreSQL database to store data for multiple tenants, the approach
>> decided was to have
>> Shared Database (Holding data for all tenants)
>>  => Data would be segregated on basis of some additional column
>> (tennatid,different tenants having different tenantId)
>>   => Data would be accessed through Views on the basis of tenantId
>> value.
>> 
>> This is the basic process of most of the customers who are trying to
>> implement multiple tenants in PostgreSQL, rather than choosing
>> separate databases for each tenant.
>> 
>> Now we need to encrypt the data related to a tenantId, so that now one
>> knows this data belongs to which tenant even from Operations group.
>> Is there a method in POstgreSQL for encrypting data with different keys
>> with respect to different values in a single column.  Moreover pg_crypto
>> will impose a single key on the column.
>> 
>> Please share your thoughts in which direction i can start analysing this
>> area for encryption of data specific to a tenant.
>> 
> 
> The decision to have all tenants in a single database seems rather
> unusual to me. Isolating one tenant from adversely impacting another
> would seem complicated and I'm not sure how you would implement a clear
> security model. Your model has effectively bypassed all the provided PG
> facilities for isolation of data. Disaster recovery and business
> continuity planning under this model must be a nightmare!
> 
> I doubt you can adopt a solution which is solely within the database.
> How would the database know which key to use for which rows of data? How
> would you select the data for your tenant views if all that data is
> encrypted with different keys? How would you manage these keys in a
> secure manner?
> 
> With the model you have adopted, I would be looking at performing
> encryption/decryption at the client level. However, depending on your
> data types, this could be challenging. this is really a requirement
> which should have been factored into the initial architecture design.
> Anything you try to bolt on now is likely to be complex and have
> significant performance impact and that is assuming you can re-interpret
> the requirement to make the objective feasible.
> 
Yeah, I lost that same arguement in ~2007, where the forces against my push for 
separation was shouted down with rants on scheme maintenance (divergence) and 
multiple rollouts per update.  I hadn’t had any coffee before the 9:00am 
meeting so the hotshot from Amazon got his way.  Then we tried “veils” (a 
concoction of view and rule re-writing) and we all know how that went.  The 
company folded before our “next gen” software saw the light of day.

I get the feeling multi-tenancy is, if not the rule these days, at least quite 
common (on the last of “big iron”?) but it still doesn’t sit well with me.
 



Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea

2021-02-10 Thread Guyren Howe
An interesting option is to make your tenants work through views, and have the 
views work through a variable that contains the tenant’s id. There would be a 
bit of coding, but it would be the same for every table, so you could automate 
it easy enough.

When you’re done, client software just sets the id at the beginning of the 
session and the database looks like it only contains the tenant’s data.

This is a particularly nice security guarantee: if you make the id a UUID (ie 
unguessable) then someone can entirely compromise the client application, and 
can still only get at the data for one tenant, and then only given their ID.
On Feb 10, 2021, 18:44 -0800, Rob Sargent , wrote:
>
>
> > On Feb 10, 2021, at 6:45 PM, Tim Cross  wrote:
> >
> >
> > Jagmohan Kaintura  writes:
> >
> > > HI All,
> > >
> > > For POstgreSQL database to store data for multiple tenants, the approach
> > > decided was to have
> > > Shared Database (Holding data for all tenants)
> > >  => Data would be segregated on basis of some additional column
> > > (tennatid,different tenants having different tenantId)
> > >   => Data would be accessed through Views on the basis of tenantId
> > > value.
> > >
> > > This is the basic process of most of the customers who are trying to
> > > implement multiple tenants in PostgreSQL, rather than choosing
> > > separate databases for each tenant.
> > >
> > > Now we need to encrypt the data related to a tenantId, so that now one
> > > knows this data belongs to which tenant even from Operations group.
> > > Is there a method in POstgreSQL for encrypting data with different keys
> > > with respect to different values in a single column.  Moreover pg_crypto
> > > will impose a single key on the column.
> > >
> > > Please share your thoughts in which direction i can start analysing this
> > > area for encryption of data specific to a tenant.
> > >
> >
> > The decision to have all tenants in a single database seems rather
> > unusual to me. Isolating one tenant from adversely impacting another
> > would seem complicated and I'm not sure how you would implement a clear
> > security model. Your model has effectively bypassed all the provided PG
> > facilities for isolation of data. Disaster recovery and business
> > continuity planning under this model must be a nightmare!
> >
> > I doubt you can adopt a solution which is solely within the database.
> > How would the database know which key to use for which rows of data? How
> > would you select the data for your tenant views if all that data is
> > encrypted with different keys? How would you manage these keys in a
> > secure manner?
> >
> > With the model you have adopted, I would be looking at performing
> > encryption/decryption at the client level. However, depending on your
> > data types, this could be challenging. this is really a requirement
> > which should have been factored into the initial architecture design.
> > Anything you try to bolt on now is likely to be complex and have
> > significant performance impact and that is assuming you can re-interpret
> > the requirement to make the objective feasible.
> >
> Yeah, I lost that same arguement in ~2007, where the forces against my push 
> for separation was shouted down with rants on scheme maintenance (divergence) 
> and multiple rollouts per update.  I hadn’t had any coffee before the 9:00am 
> meeting so the hotshot from Amazon got his way.  Then we tried “veils” (a 
> concoction of view and rule re-writing) and we all know how that went.  The 
> company folded before our “next gen” software saw the light of day.
>
> I get the feeling multi-tenancy is, if not the rule these days, at least 
> quite common (on the last of “big iron”?) but it still doesn’t sit well with 
> me.
>
>


Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea

2021-02-10 Thread Jagmohan Kaintura
Hi Guyren,

Yup right now data is being accessed in this manner only.
application access using tenant user only who have specific tenantId in
that session and can see its own data only. It doesn't know about anyone
else's data and neither can get/fetch.

So isolation is 100% guaranteed right now.

But isolation is not enough from an operations perspective, so I need
encryption too in some way or another way, whatever postgreSQL supports
and  encryption key should differ for  a tenant .


On Thu, Feb 11, 2021 at 8:56 AM Guyren Howe  wrote:

> An interesting option is to make your tenants work through views, and have
> the views work through a variable that contains the tenant’s id. There
> would be a bit of coding, but it would be the same for every table, so you
> could automate it easy enough.
>
> When you’re done, client software just sets the id at the beginning of the
> session and the database looks like it only contains the tenant’s data.
>
> This is a particularly nice security guarantee: if you make the id a UUID
> (ie unguessable) then someone can entirely compromise the client
> application, and can still only get at the data for one tenant, and then
> only given their ID.
> On Feb 10, 2021, 18:44 -0800, Rob Sargent , wrote:
>
>
>
> On Feb 10, 2021, at 6:45 PM, Tim Cross  wrote:
>
>
> Jagmohan Kaintura  writes:
>
> HI All,
>
> For POstgreSQL database to store data for multiple tenants, the approach
> decided was to have
> Shared Database (Holding data for all tenants)
>  => Data would be segregated on basis of some additional column
> (tennatid,different tenants having different tenantId)
>   => Data would be accessed through Views on the basis of tenantId
> value.
>
> This is the basic process of most of the customers who are trying to
> implement multiple tenants in PostgreSQL, rather than choosing
> separate databases for each tenant.
>
> Now we need to encrypt the data related to a tenantId, so that now one
> knows this data belongs to which tenant even from Operations group.
> Is there a method in POstgreSQL for encrypting data with different keys
> with respect to different values in a single column.  Moreover pg_crypto
> will impose a single key on the column.
>
> Please share your thoughts in which direction i can start analysing this
> area for encryption of data specific to a tenant.
>
>
> The decision to have all tenants in a single database seems rather
> unusual to me. Isolating one tenant from adversely impacting another
> would seem complicated and I'm not sure how you would implement a clear
> security model. Your model has effectively bypassed all the provided PG
> facilities for isolation of data. Disaster recovery and business
> continuity planning under this model must be a nightmare!
>
> I doubt you can adopt a solution which is solely within the database.
> How would the database know which key to use for which rows of data? How
> would you select the data for your tenant views if all that data is
> encrypted with different keys? How would you manage these keys in a
> secure manner?
>
> With the model you have adopted, I would be looking at performing
> encryption/decryption at the client level. However, depending on your
> data types, this could be challenging. this is really a requirement
> which should have been factored into the initial architecture design.
> Anything you try to bolt on now is likely to be complex and have
> significant performance impact and that is assuming you can re-interpret
> the requirement to make the objective feasible.
>
> Yeah, I lost that same arguement in ~2007, where the forces against my
> push for separation was shouted down with rants on scheme maintenance
> (divergence) and multiple rollouts per update.  I hadn’t had any coffee
> before the 9:00am meeting so the hotshot from Amazon got his way.  Then we
> tried “veils” (a concoction of view and rule re-writing) and we all
> know how that went.  The company folded before our “next gen” software saw
> the light of day.
>
> I get the feeling multi-tenancy is, if not the rule these days, at least
> quite common (on the last of “big iron”?) but it still doesn’t sit well
> with me.
>
>
>

-- 
*Best Regards,*
Jagmohan
Senior Consultant, TecoreLabs.


Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea

2021-02-10 Thread Jagmohan Kaintura
Hi Tim,

Yeh datatype is even not supported right now. As tenantId is numeric and
encryption/decryption would go through text/bytea , so even tougher to
encrypt that column data.


On Thu, Feb 11, 2021 at 7:38 AM Tim Cross  wrote:

>
> Jagmohan Kaintura  writes:
>
> > HI All,
> >
> > For POstgreSQL database to store data for multiple tenants, the approach
> > decided was to have
> > Shared Database (Holding data for all tenants)
> >   => Data would be segregated on basis of some additional column
> > (tennatid,different tenants having different tenantId)
> >=> Data would be accessed through Views on the basis of
> tenantId
> > value.
> >
> > This is the basic process of most of the customers who are trying to
> > implement multiple tenants in PostgreSQL, rather than choosing
> > separate databases for each tenant.
> >
> > Now we need to encrypt the data related to a tenantId, so that now one
> > knows this data belongs to which tenant even from Operations group.
> > Is there a method in POstgreSQL for encrypting data with different keys
> > with respect to different values in a single column.  Moreover pg_crypto
> > will impose a single key on the column.
> >
> > Please share your thoughts in which direction i can start analysing this
> > area for encryption of data specific to a tenant.
> >
>
> The decision to have all tenants in a single database seems rather
> unusual to me. Isolating one tenant from adversely impacting another
> would seem complicated and I'm not sure how you would implement a clear
> security model. Your model has effectively bypassed all the provided PG
> facilities for isolation of data. Disaster recovery and business
> continuity planning under this model must be a nightmare!
>
> I doubt you can adopt a solution which is solely within the database.
> How would the database know which key to use for which rows of data? How
> would you select the data for your tenant views if all that data is
> encrypted with different keys? How would you manage these keys in a
> secure manner?
>
> With the model you have adopted, I would be looking at performing
> encryption/decryption at the client level. However, depending on your
> data types, this could be challenging. this is really a requirement
> which should have been factored into the initial architecture design.
> Anything you try to bolt on now is likely to be complex and have
> significant performance impact and that is assuming you can re-interpret
> the requirement to make the objective feasible.
>
> --
> Tim Cross
>
>
>

-- 
*Best Regards,*
Jagmohan
Senior Consultant, TecoreLabs.


Re: Unable to execute Query in parallel for partitioned table

2021-02-10 Thread Laurenz Albe
On Thu, 2021-02-11 at 05:09 +0530, Brajendra Pratap wrote:
> I am unable to execute the below in parallel plz suggest how can I achieve 
> parallelism here.
> 
> select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER()  OVER (order by 
> trn_transaction_date desc ) AS RowNumber from (
> select * from transactions where trn_store_date_id=20201202) abc;
> 
> Query plan is as mentioned below :-
> 
> explain analyze select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER()  
> OVER (order by trn_transaction_date desc ) AS RowNumber from (
> select * from transactions where trn_store_date_id=20201218) abc;
> LOG:  duration: 25820.176 ms  statement: explain analyze select count(*) over 
> () VIEWALLROWCOUNT,abc.*,ROW_NUMBER()  OVER (order by trn_transaction_date 
> desc ) AS RowNumber from (
> select * from transactions where trn_store_date_id=20201218) abc;
>   
> QUERY PLAN
> --
>  WindowAgg  (cost=4474843.51..4498066.81 rows=774110 width=21297) (actual 
> time=21455.495..25241.738 rows=795190 loops=1)
>->  WindowAgg  (cost=4474843.51..4488390.44 rows=774110 width=21289) 
> (actual time=10588.494..15311.865 rows=795190 loops=1)
>  ->  Sort  (cost=4474843.51..4476778.79 rows=774110 width=21281) 
> (actual time=10588.422..11771.300 rows=795190 loops=1)
>Sort Key: transactions.trn_transaction_date DESC
>Sort Method: external merge  Disk: 1496856kB
>->  Result  (cost=0.00..270640.32 rows=774110 width=21281) 
> (actual time=0.117..4504.159 rows=795190 loops=1)
>  ->  Append  (cost=0.00..262899.22 rows=774110 
> width=21281) (actual time=0.094..1449.532 rows=795190 loops=1)
>->  Seq Scan on transactions  (cost=0.00..0.00 
> rows=1 width=47554) (actual time=0.019..0.019 rows=0 loops=1)
>  Filter: (trn_store_date_id = 20201218)
>->  Index Scan using idx_202012_trn_store_date_id 
> on transactions_202012  (cost=0.56..259028.67 rows=774109 width=21281) 
> (actual time=0.074..1357.764 rows=795190 loops=1)
>  Index Cond: (trn_store_date_id = 20201218)
>  Planning Time: 116.472 ms
>  Execution Time: 25676.098 ms
> 
> Note :- We had tried different options like max_worker_processes, 
> max_parallel_workers, 
> max_parallel_workers_per_gather,max_parallel_maintenance_worker to execute it 
> in parallel but no luck.

I don't think parallelization will help you here.

Your problem is probably the "abc.*" in the SELECT list.

There must be really large data in this table, so it takes a long time to fetch 
and
sort the rows.  Try selecting only the columns you need.

Alternatively, add a LIMIT clause.  Do you really need all 80 rows?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Which partition scheme makes sense for my time based IoT-datagrams?

2021-02-10 Thread Laurenz Albe
On Wed, 2021-02-10 at 18:59 +0100, Thorsten Schöning wrote:
> > They will scan all partitions.  You have to be prepared that most queries
> > will become at least slightly slower with partitioning.  That is
> > expected.
> 
> Does "most queries" address thos efor IDs scanning all partitions or
> those time-based as well? In the end, I'm trying to improve query
> performance by reducing the size of indexes, number of rows etc. per
> table using partitions. :-)

The size of the index will barely affect its performance.

But having to figure out which partitions to scan takes planning time.

Very few queries become notably faster through partitioning.
Mostly those are aggregates and joins that match the partitioning scheme,
and sequential scans that include the partitioning key.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com