Insertion time is very high for inserting data in postgres
Hello Team, Greetings! We are using Postgres 12.2.1 for fetching per minute data for about 25 machines but running parallely via a single thread in python. But suddenly the insertion time has increased to a very high level, about 30 second for one machine. We are in so much problem as the data fetching is becoming slow. if anyone could help us to solve this problem it would be of great help to us.
Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased
Hi Postgres Team, The below are the scenarios which we are dealing with. 1) There are 20 Tables - On an average each having 150 columns. 2) There are 20 Threads Handled by Thread Pool Executor ( here we are using Python's - psycopg2 module / library to fetch the data .) 3) I am using the below statement to insert the data using Python - psycopg2 module - using the exceute(...) command as . sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" + str(tuple(array_of_curly_values)) print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor() v = col_cursor_db.execute(sql_stmt); But earlier the same 22 threads were running and the insertion time was gradually increased from 1 second to 30-35 seconds. Requesting and urging the postgres general support team to help me out on this. How can i increase the INSERTION speed to minimize the insertion time taken by each thread in the THREAD POOL. Or there any different python libraries other than psycopg2 ? Is there any different functions in python psycopg2 ? Or what performance tuning has to be done to increaser the insertion speed ?
Re: Insertion time is very high for inserting data in postgres
On Wed, 10 Feb 2021, 1:56 pm prachi surangalikar, < surangalikarprachi...@gmail.com> wrote: > Hello Team, > Greetings! > > We are using Postgres 12.2.1 for fetching per minute data for about 25 > machines but running parallely via a single thread in python. > But suddenly the insertion time has increased to a very high level, about > 30 second for one machine. > We are in so much problem as the data fetching is becoming slow. > > if anyone could help us to solve this problem it would be of great help to > us. > Are you doing vaccuum analyze table regularly? If not then that might delay insertion.
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Thorsten Schöning, am Dienstag, 9. Februar 2021 um 10:56 schrieben Sie: > Any explanation for the increased dump size? Thanks! While I don't have a detailed explanation myself, there's the following hint [in the docs][1]: > For the custom archive format, this specifies compression of > individual table-data segments[...] https://www.postgresql.org/docs/10/app-pgdump.html I've changed the layout of my tables entirely and most likely have changed the "individual table-data segments" as well. So I did some further testing and got the following numbers. It's always OLD vs. NEW schema for the schema documented earlier: > --compress=0: 25 vs. 20 GiB > --compress=?: 5,6 vs. 6,5 GiB > --compress=9: 5,5 vs. 6,4 GiB "?" means default settings, meaning "--compress" has been REMOVED from the shell command. As can be seen, pretty much as soon as compression comes into play, the new table layout is less efficient, even though in OLD and NEW schema exactly the same "bytea" were used, have been placed at the end of their individual tables and alike. But things become more interesting with another version of my schema, which as well indicates a strong relationship of dump size, compression and table layout/size. I've changed the new table "datagram" to be a partitioned one with partitions containing rows per year and per half-year. Each partition contains far less rows than before this way and while I only tested with "--compress=9" this time, the numbers are quite interesting: > unpartitioned: 6,4 GiB > half-yearly parts: 4,8 GiB > yearly parts: 4,8 GiB The interesting part this time is that the table layout for all partitions is the same like before, only the number of rows per table is different. Though, the number of rows overall is the same like before, the same data, IDs etc. Though, this time the dump really is smaller than with the OLD schema containing far more data because of duplicate IDs and stuff. I wouldn't have expected table layout to be that important. Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased
On Wed, 10 Feb 2021, 09:26 Rajnish Vishwakarma, < rajnish.nationfi...@gmail.com> wrote: > Hi Postgres Team, > > The below are the scenarios which we are dealing with. > > 1) There are 20 Tables - On an average each having 150 columns. > > 2) There are 20 Threads Handled by Thread Pool Executor ( here we are > using Python's - psycopg2 module / library to fetch the data .) > > 3) I am using the below statement to insert the data using Python - > psycopg2 module - using the exceute(...) command as . > > sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" + > str(tuple(array_of_curly_values)) > print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor() > v = col_cursor_db.execute(sql_stmt); > This is an insecure way to do it, but that's beside the point. But earlier the same 22 threads were running and the insertion time was > gradually increased from 1 second to 30-35 seconds. > > Requesting and urging the postgres general support team to help me out on > this. > > How can i increase the INSERTION speed to minimize the insertion time > taken by each thread in the THREAD POOL. > Using a COPY statement instead of insert. For a more moderate change in your code, but for a smaller increase of speed, you can look at the batching helpers ( https://www.psycopg.org/docs/extras.html#fast-execution-helpers). Or there any different python libraries other than psycopg2 ? > Psycopg3 hasn't been released yet, so using it is on the experimental side. However it provides a better support to using copy which would be perfect for your use case ( https://www.psycopg.org/psycopg3/docs/copy.html#writing-data-row-by-row). -- Daniele
Re: Increased size of database dump even though LESS consumed storage
Thorsten: On Wed, Feb 10, 2021 at 9:58 AM Thorsten Schöning wrote: ... > I've changed the new table "datagram" to be a partitioned one with > partitions containing rows per year and per half-year. Each partition > contains far less rows than before this way and while I only tested > with "--compress=9" this time, the numbers are quite interesting: > > > unpartitioned: 6,4 GiB > > half-yearly parts: 4,8 GiB > > yearly parts: 4,8 GiB > > The interesting part this time is that the table layout for all > partitions is the same like before, only the number of rows per table > is different. Though, the number of rows overall is the same like > before, the same data, IDs etc. Though, this time the dump really is > smaller than with the OLD schema containing far more data because of > duplicate IDs and stuff. > I wouldn't have expected table layout to be that important. Compresion is dependent on detectable redundancy on the input. pg_dump more or less gzips per-table "copy to stdout" dumps. If your data evolves in a determined way having it sorted by ingestion time may increase detectable redundancy a lot, and partitioning sorts partially by date ( or fully if you have made the partitions by range-querying via index scan ). In this case it may not be the layout, but the order. Given you seem to be able to test, you may try sorting the full table by the column you use for partitioning. IIRC cluster will do the trick if it is indexed. ( This has happened to me compressing document dumps, presorting by some chosen fields improved my compression ratio a bit, IIRC it was due to the compressor searching for duplicates on a limited window only, this is why I use "detectable" redundancy ) Francisco Olarte.
Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased
On 10. 02. 21 09:14, Rajnish Vishwakarma wrote: Hi Postgres Team, The below are the scenarios which we are dealing with. 1) There are 20 Tables - On an average each having 150 columns. 2) There are 20 Threads Handled by Thread Pool Executor ( here we are using Python's - psycopg2 module / library to fetch the data .) 3) I am using the below statement to insert the data using Python - psycopg2 module - using the exceute(...) command as . sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" + str(tuple(array_of_curly_values)) print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor() v = col_cursor_db.execute(sql_stmt); But earlier the same 22 threads were running and the insertion time was gradually increased from 1 second to 30-35 seconds. Requesting and urging the postgres general support team to help me out on this. How can i increase the INSERTION speed to minimize the insertion time taken by each thread in the THREAD POOL. Or there any different python libraries other than psycopg2 ? Is there any different functions in python psycopg2 ? Or what performance tuning has to be done to increaser the insertion speed ? Is a single insert taking 30 seconds or do you have such a large number of inserts that your thread pool can't handle it and you are waiting for a free connection? For single insert, one reason for slowness at large databases could be indexes which need to be updated for each insert. For the latter, you should increase the thread pool size.
SV: Insertion time is very high for inserting data in postgres
>Fra: prachi surangalikar > >Hello Team, >Greetings! > >We are using Postgres 12.2.1 for fetching per minute data for about 25 >machines but running parallely via a single thread in python. >But suddenly the insertion time has increased to a very high level, about 30 >second for one machine. >We are in so much problem as the data fetching is becoming slow. > >if anyone could help us to solve this problem it would be of great help to us. Get your data into a Text.IO memory structure and then use copy https://www.psycopg.org/docs/usage.html#using-copy-to-and-copy-from This is THE way of high-performant inserts using Postgres. Regards Niels Jespersen
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Francisco Olarte, am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie: > [...]and partitioning sorts partially > by date ( or fully if you have made the partitions by range-querying > via index scan ).[...] That statement is especially interesting not only because of my dump-size, but I'm running into the problem that queries spanning more than one partition seem to prefer sequential scan over using indexes. My indexe seems to only be used when querying the rows of one partition. So, the following is my definition, should that be "range-queried via index scan" properly? :-) > CREATE TABLE datagram > ( > id bigserial NOT NULL, > src_re integer NOT NULL, > src_cltinteger NOT NULL, > src_meter integer NOT NULL, > captured_attimestamp with time zone NOT NULL, > captured_rssi smallint NOT NULL, > oms_status smallint NOT NULL, > oms_encbytea, > oms_decbytea > ) PARTITION BY RANGE (captured_at); > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM > ('1970-01-01') TO ('1970-07-01'); > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM > ('1970-07-01') TO ('1971-01-01'); > [...] > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart; Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Francisco Olarte, am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie: > [...]and partitioning sorts partially > by date ( or fully if you have made the partitions by range-querying > via index scan ).[...] That statement is especially interesting not only because of my dump-size, but I'm running into the problem that queries spanning more than one partition seem to prefer sequential scan over using indexes. My indexe seems to only be used when querying the rows of one partition. So, the following is my definition, should that be "range-queried via index scan" properly? :-) > CREATE TABLE datagram > ( > id bigserial NOT NULL, > src_re integer NOT NULL, > src_cltinteger NOT NULL, > src_meter integer NOT NULL, > captured_attimestamp with time zone NOT NULL, > captured_rssi smallint NOT NULL, > oms_status smallint NOT NULL, > oms_encbytea, > oms_decbytea > ) PARTITION BY RANGE (captured_at); > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM > ('1970-01-01') TO ('1970-07-01'); > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM > ('1970-07-01') TO ('1971-01-01'); > [...] > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart; > CREATE INDEX idx_datagram_for_time_window ON datagram USING btree (src_meter, > captured_at DESC); Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: Insertion time is very high for inserting data in postgres
On Wed, 10 Feb 2021 at 06:11, Niels Jespersen wrote: > >Fra: prachi surangalikar > > > > > >Hello Team, > > >Greetings! > > > > > >We are using Postgres 12.2.1 for fetching per minute data for about 25 > machines but running parallely via a single thread in python. > > >But suddenly the insertion time has increased to a very high level, about > 30 second for one machine. > > >We are in so much problem as the data fetching is becoming slow. > Before anyone can help you, you will have to provide much more information. Schema, data that you are inserting, size of the machine, configuration settings. etc. Dave > > > > >if anyone could help us to solve this problem it would be of great help > to us. > > > > Get your data into a Text.IO memory structure and then use copy > https://www.psycopg.org/docs/usage.html#using-copy-to-and-copy-from > > > > This is THE way of high-performant inserts using Postgres. > > > > Regards Niels Jespersen >
Re: Increased size of database dump even though LESS consumed storage
Thorsten: On Wed, Feb 10, 2021 at 12:36 PM Thorsten Schöning wrote: > Guten Tag Francisco Olarte, > am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie: > > [...]and partitioning sorts partially > > by date ( or fully if you have made the partitions by range-querying > > via index scan ).[...] > That statement is especially interesting not only because of my > dump-size, but I'm running into the problem that queries spanning more > than one partition seem to prefer sequential scan over using indexes. > My indexe seems to only be used when querying the rows of one > partition. You know the drill, show your indexes, post some explain for these, those statements are too fuzzy to infer any useful advice. > So, the following is my definition, should that be "range-queried via > index scan" properly? :-) > > CREATE TABLE datagram > > ( > > id bigserial NOT NULL, > > src_re integer NOT NULL, > > src_cltinteger NOT NULL, > > src_meter integer NOT NULL, > > captured_attimestamp with time zone NOT NULL, > > captured_rssi smallint NOT NULL, > > oms_status smallint NOT NULL, > > oms_encbytea, > > oms_decbytea > > ) PARTITION BY RANGE (captured_at); > > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM > > ('1970-01-01') TO ('1970-07-01'); > > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM > > ('1970-07-01') TO ('1971-01-01'); > > [...] Given that definition there is no possible index scan, as there are no indexes. > > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart; But given this what I was suggesting is not true. What I meant is that if you had big datagram_unpart table indexed by captured_at with good index correlation, which given the name of the field would be expected, and populated the partitions one by one by doing range queries over datagram unpart you may have hit index scans on each population query and end up with perfectly sorted partitions, and that could help compressions. Given your query, unless the optimizer is performing some supernatural ( for me ) trickery my bet would be on a full table scan plus insertion which would left you with whatever order you had originally for each partition. But if your data, or your "data shape" has some natural correlation with captured timestamps ( i.e., I know that my CDR data has it ), puttig similar data together could have improved your compression ratios. Bear in mind I do not know it. Anyway, I would not worry too much about the backup ratios with the sizes you have, I would worry more on doing things like augmenting fill factors and similar things if your data processing is similar to mine ( data enters at time captured_at, sometimes is fixed due to errors but normally never moves after being in the tables for a couple of months, and when it does it is a special problems which can be manually tuned ). It will not improve the backup too much, but can improve other things. i.e., in my big CDR ( that means call detail record, telephony ) tables I let them insert into "normal" monthly partitions, but when they are some month olds I coalesce them into fully sorted, unindexed, 100% filled yearly partitions. Those partitions are then put into an "historic" schema which I do NOT backup automatically, I only do it after the ( manually fired ) coalescing and the ( manually done ) very ocasional long term fixups ( I rarely have to touch nothing older than a month ). Regards. Francisco Olarte.
Re: Increased size of database dump even though LESS consumed storage
Thorsten: On Wed, Feb 10, 2021 at 12:46 PM Thorsten Schöning wrote: UUppzz, completely missed the second message. > > CREATE INDEX idx_datagram_for_time_window ON datagram USING btree > > (src_meter, captured_at DESC); Depending on the query and data "shape", this may be chosen or not for queries. You should be the one more probable to know it. Francisco Olarte.
Re: Increased size of database dump even though LESS consumed storage
Guten Tag Thorsten Schöning, am Mittwoch, 10. Februar 2021 um 09:58 schrieben Sie: >> unpartitioned: 6,4 GiB >> half-yearly parts: 4,8 GiB >> yearly parts: 4,8 GiB The above number for `unpartitioned` might be wrong: I've re-created the same database, applied the same data migration like in the past and created a dump which was 4,8 GiB in size. Maybe I simply did something wrong and left some renamed table with data in the formerly used database, while NOT doing so for the other test databases. So my entire question might simply be based on user errors. :-) Sadly I don't seem to have the former dumps anymore to have a look at its content. Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Re: How does Postgres decide if to use additional workers?
> On Feb 9, 2021, at 10:52 AM, Thorsten Schöning wrote: > > So, based on which facts does Postgres decide if to use aadditional > workers or not? Can I see those decisions explained somewhere? I don't > see anything in the query plan. Thanks! Hi Thorsten, This is an interesting topic for me too. here’s a formula for the max number of workers that Postgres will consider for a table. Of course, most queries use more than just one table, and I don’t yet understand how Postgres handles the situation where the formula suggests multiple workers for some tables and a single worker for others. There was some conversation about this on the performance mailing list in June. Here’s a link to the message that contains the formula; there’s more items of interest in the whole thread: https://www.postgresql.org/message-id/89423FD3-0F13-447D-8D9E-EB1722150F94%40americanefficient.com Hope this helps Philip
Which partition scheme makes sense for my time based IoT-datagrams?
Hi all, I have a table storing datagrams from some IoT-devices, with one datagram per device per day most likely for around 75'000 devices currently. I want to test query performance with a partitioned table and am interested in the following queries mostly: * querying arbitrary datagrams by their ID * querying datagrams being X old based on some timestamp * 15 minutes * 15 days * 15 months My app isn't prepared to e.g. migrate things in the background, instead I would like to have Postgres deal with all apsects as transparent as possible. At least for the next few years, things worked without partitions in the past as well. Therefore I thought of simply partitioning by RANGE of the timestamp when the datagram has been received and create individual partitions per half a year. That means that in most cases only the most current 1 partition needs to be queried, with the last 3 in many other cases. Other approaches I've read on this list were e.g. partitioning the most current months individually and afterwards moving rows to some other "archive"-like table. Besides of the lack of the necessary infrastructure, in my use case in theory users need to be able to ask for the last 15 months at some arbitrary point in history at any time. I wouldn't like to deal with different tables or alike in my app. My approach would result in 24 tables already, with 2 more per year. Does that for itself sound bad already? Is that a limit the planner can deal with most likely or do I don't even need to care for hundreds or partitions? If partitioned by timestamp, how are lookups by ID performed? Is that a sequential scan on all partitions, e.g. with using available indexes per partition? Is there some way to RANGE by timestamp and ID, by keeping the half-year tables only? So that the planner knows easier which partition to look at in case of IDs only? Or is that simply like querying an ID-index of each partition? The following is what I have currently, extra verbose to support Postgres 10: > CREATE TABLE datagram > ( > idbigserial NOT NULL, > src_reinteger NOT NULL, > src_clt integer NOT NULL, > src_meter integer NOT NULL, > captured_at timestamp with time zone NOT NULL, > captured_rssi smallint NOT NULL, > oms_statussmallint NOT NULL, > oms_enc bytea, > oms_dec bytea > ) PARTITION BY RANGE (captured_at); > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM > ('1970-01-01') TO ('1970-07-01'); > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM > ('1970-07-01') TO ('1971-01-01'); > [...] > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT pk_datagram_y1970_h1 > PRIMARY KEY (id); > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT > ck_datagram_y1970_h1_oms_data_avail CHECK(oms_enc IS NOT NULL OR > oms_dec IS NOT NULL); > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_re > FOREIGN KEY (src_re) REFERENCES real_estate (id); > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_clt > FOREIGN KEY (src_clt) REFERENCES collector (id); > ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_meter > FOREIGN KEY (src_meter) REFERENCES meter(id); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT pk_datagram_y1970_h2 > PRIMARY KEY (id); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT > ck_datagram_y1970_h2_oms_data_avail CHECK(oms_enc IS NOT NULL OR > oms_dec IS NOT NULL); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_re > FOREIGN KEY (src_re) REFERENCES real_estate (id); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_clt > FOREIGN KEY (src_clt) REFERENCES collector (id); > ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_meter > FOREIGN KEY (src_meter) REFERENCES meter(id); > [...] > CREATE INDEX idx_datagram_y1970_h1_for_time_window ON datagram_y1970_h1 USING > btree (src_meter, captured_at DESC); > CREATE INDEX idx_datagram_y1970_h2_for_time_window ON datagram_y1970_h2 USING > btree (src_meter, captured_at DESC); > [...] An example query condition is the following, while "P5D" can simply be "P15M" or alike instead. > WHERE ( > "real_estate"."id" IN ([...]) > AND "meter"."id" IN ([...]) > AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' > AS TIMESTAMP WITH TIME ZONE) - CAST('P5D' AS INTERVAL)) AND > (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) + > CAST('P0D' AS INTERVAL)) > ) Thanks for sharing your opinions! Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E
Re: Which partition scheme makes sense for my time based IoT-datagrams?
On Wed, 2021-02-10 at 16:09 +0100, Thorsten Schöning wrote: > I have a table storing datagrams from some IoT-devices, with one > datagram per device per day most likely for around 75'000 devices > currently. I want to test query performance with a partitioned table > and am interested in the following queries mostly: > > * querying arbitrary datagrams by their ID > * querying datagrams being X old based on some timestamp > * 15 minutes > * 15 days > * 15 months You want to query ALL individual datagrams from the past 15 months?? I guess not, and that you want some aggregation. For that, materialized views might be interesting. > Therefore I thought of simply partitioning by RANGE of the timestamp > when the datagram has been received and create individual partitions > per half a year. That means that in most cases only the most current > 1 partition needs to be queried, with the last 3 in many other cases. > > My approach would result in 24 tables already, with 2 more per year. > Does that for itself sound bad already? Is that a limit the planner > can deal with most likely or do I don't even need to care for hundreds > or partitions? 24 partitions is harmless. I think that a partition per half year might not be enough, but it depends on the queries. You should use PostgreSQL v12 or better. One point you didn't touch, but should consider, is getting rid of old data. You will want to do that at some point, or at least you want to keep only aggregates of the data for historical statistical data. That would be very easy if you partitoin correctly, and it would keep the number of partitions in a manageable range. > If partitioned by timestamp, how are lookups by ID performed? Is that > a sequential scan on all partitions, e.g. with using available indexes > per partition? They will scan all partitions. You have to be prepared that most queries will become at least slightly slower with partitioning. That is expected. > Is there some way to RANGE by timestamp and ID, by keeping the > half-year tables only? So that the planner knows easier which > partition to look at in case of IDs only? Or is that simply like > querying an ID-index of each partition? Sure, you can partition by ID (ranges; or perhaps hash partitioning, if you only ever query for a single ID) and subpartition by timestamp. This will speed up the queries by ID, since only a single partition will be searched. You could then almost be as fast as with a single monolitic table. Don't forget that the speed of an index scan does not (or almost not) depend on the size of the table. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Connecting to database through username,passphrase and private key on PgAdmin
Hello May i please know how i can connect to a db server using pgAdmin. How do i use the combination of username,passphrase and private key . Below is when i connect using putty. I then login as postgres user and connect to the database. login as: user1 Authenticating with public key "***" Passphrase for key "*": regards
Re: Which partition scheme makes sense for my time based IoT-datagrams?
Guten Tag Laurenz Albe, am Mittwoch, 10. Februar 2021 um 17:45 schrieben Sie: > They will scan all partitions. You have to be prepared that most queries > will become at least slightly slower with partitioning. That is > expected. Does "most queries" address thos efor IDs scanning all partitions or those time-based as well? In the end, I'm trying to improve query performance by reducing the size of indexes, number of rows etc. per table using partitions. :-) The docs read like my use case would fit to partitioning as well and I've already tried all sorts of indexing on the unpartitioned table to improve some of my example queries with not much luck. Mit freundlichen Grüßen Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK E-Mail: thorsten.schoen...@am-soft.de Web:http://www.AM-SoFT.de/ Telefon: 05151- 9468- 0 Telefon: 05151- 9468-55 Fax: 05151- 9468-88 Mobil:0178-8 9468-04 AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB neu - Geschäftsführer: Janine Galonska Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung. Mit freundlichen Grüßen Thorsten Schöning Tel: 05151 9468 0 Fax: 05151 9468 88 Mobil: Webseite: https://www.am-soft.de AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK AM-Soft IT-Service - Bitstore Hameln GmbH i.G. Brandenburger Str. 7c 31789 Hameln Tel: 05151 9468 0 Bitstore IT-Consulting GmbH Zentrale - Berlin Lichtenberg Frankfurter Allee 285 10317 Berlin Tel: 030 453 087 80 CBS IT-Service - Bitstore Kaulsdorf UG Tel: 030 453 087 880 1 Büro Dallgow-Döberitz Tel: 03322 507 020 Büro Kloster Lehnin Tel: 033207 566 530 PCE IT-Service - Bitstore Darmstadt UG Darmstadt Tel: 06151 392 973 0 Büro Neuruppin Tel: 033932 606 090 ACI EDV Systemhaus Dresden GmbH Dresden Tel: 0351 254 410 Das Systemhaus - Bitstore Magdeburg GmbH Magdeburg Tel: 0391 636 651 0 Allerdata.IT - Bitstore Wittenberg GmbH Wittenberg Tel: 03491 876 735 7 Büro Liebenwalde Tel: 033054 810 00 HSA - das Büro - Bitstore Altenburg UG Altenburg Tel: 0344 784 390 97 Bitstore IT – Consulting GmbH NL Piesteritz Piesteritz Tel: 03491 644 868 6 Solltec IT-Services - Bitstore Braunschweig UG Braunschweig Tel: 0531 206 068 0 MF Computer Service - Bitstore Gütersloh GmbH Gütersloh Tel: 05245 920 809 3 Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz Geschäftsführer Janine Galonska
Compiler warnings on Debian 10
HI, I was wondering if someone already has seen these warnings on Debian 10 (PostgreSQL 13.1): ../../../../src/include/port.h:176:70: warning: 'format' attribute argument not supported: gnu_printf [-Wignored-attributes] extern int pg_snprintf(char *str, size_t count, const char *fmt,...) pg_attribute_printf(3, 4); ^ ../../../../src/include/c.h:127:49: note: expanded from macro 'pg_attribute_printf' #define pg_attribute_printf(f,a) __attribute__((format(PG_PRINTF_ATTRIBUTE, f, a))) ^ In file included from llvmjit_error.cpp:19: In file included from ../../../../src/include/postgres.h:46: In file included from ../../../../src/include/c.h:1318: ../../../../src/include/port.h:178:55: warning: 'format' attribute argument not supported: gnu_printf [-Wignored-attributes] extern int pg_sprintf(char *str, const char *fmt,...) pg_attribute_printf(2, 3); ^ ../../../../src/include/c.h:127:49: note: expanded from macro 'pg_attribute_printf' #define pg_attribute_printf(f,a) __attribute__((format(PG_PRINTF_ATTRIBUTE, f, a))) ^ In file included from llvmjit_error.cpp:19: In file included from ../../../../src/include/postgres.h:46: In file included from ../../../../src/include/c.h:1318: ../../../../src/include/port.h:180:58: warning: 'format' attribute argument not supported: gnu_printf [-Wignored-attributes] extern int pg_fprintf(FILE *stream, const char *fmt,...) pg_attribute_printf(2, 3); ^ Configure was successful: ... configure: creating ./config.status config.status: creating GNUmakefile config.status: creating src/Makefile.global config.status: creating src/include/pg_config.h config.status: creating src/include/pg_config_ext.h config.status: creating src/interfaces/ecpg/include/ecpg_config.h config.status: linking src/backend/port/tas/dummy.s to src/backend/port/tas.s config.status: linking src/backend/port/posix_sema.c to src/backend/port/pg_sema.c config.status: linking src/backend/port/sysv_shmem.c to src/backend/port/pg_shmem.c config.status: linking src/include/port/linux.h to src/include/pg_config_os.h config.status: linking src/makefiles/Makefile.linux to src/Makefile.port This is the official Debian AMI on AWS: postgres@pg13:~/postgresql-13.1$ cat /etc/debian_version 10.8 Regards Daniel
Unable to execute Query in parallel for partitioned table
Hi, I am unable to execute the below in parallel plz suggest how can I achieve parallelism here. select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER() OVER (order by trn_transaction_date desc ) AS RowNumber from ( select * from transactions where trn_store_date_id=20201202) abc; Query plan is as mentioned below :- explain analyze select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER() OVER (order by trn_transaction_date desc ) AS RowNumber from ( select * from transactions where trn_store_date_id=20201218) abc; LOG: duration: 25820.176 ms statement: explain analyze select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER() OVER (order by trn_transaction_date desc ) AS RowNumber from ( select * from transactions where trn_store_date_id=20201218) abc; QUERY PLAN -- WindowAgg (cost=4474843.51..4498066.81 rows=774110 width=21297) (actual time=21455.495..25241.738 rows=795190 loops=1) -> WindowAgg (cost=4474843.51..4488390.44 rows=774110 width=21289) (actual time=10588.494..15311.865 rows=795190 loops=1) -> Sort (cost=4474843.51..4476778.79 rows=774110 width=21281) (actual time=10588.422..11771.300 rows=795190 loops=1) Sort Key: transactions.trn_transaction_date DESC Sort Method: external merge Disk: 1496856kB -> Result (cost=0.00..270640.32 rows=774110 width=21281) (actual time=0.117..4504.159 rows=795190 loops=1) -> Append (cost=0.00..262899.22 rows=774110 width=21281) (actual time=0.094..1449.532 rows=795190 loops=1) -> Seq Scan on transactions (cost=0.00..0.00 rows=1 width=47554) (actual time=0.019..0.019 rows=0 loops=1) Filter: (trn_store_date_id = 20201218) -> Index Scan using idx_202012_trn_store_date_id on transactions_202012 (cost=0.56..259028.67 rows=774109 width=21281) (actual time=0.074..1357.764 rows=795190 loops=1) Index Cond: (trn_store_date_id = 20201218) Planning Time: 116.472 ms Execution Time: 25676.098 ms Note :- We had tried different options like max_worker_processes, max_parallel_workers, max_parallel_workers_per_gather,max_parallel_maintenance_worker to execute it in parallel but no luck. Please suggest. Thanks
PostgreSQL occasionally unable to rename WAL files (NTFS)
Hello, Running 13.1 on Windows Server 2019, I am getting the following log entries occasionally: 2021-02-11 12:34:10.149 NZDT [6072] LOG: could not rename file "pg_wal/0001009900D3": Permission denied 2021-02-11 12:40:31.377 NZDT [6072] LOG: could not rename file "pg_wal/0001009900D3": Permission denied 2021-02-11 12:46:06.294 NZDT [6072] LOG: could not rename file "pg_wal/0001009900D3": Permission denied 2021-02-11 12:46:16.502 NZDT [6072] LOG: could not rename file "pg_wal/0001009900DA": Permission denied 2021-02-11 12:50:20.917 NZDT [6072] LOG: could not rename file "pg_wal/0001009900D3": Permission denied 2021-02-11 12:50:31.098 NZDT [6072] LOG: could not rename file "pg_wal/0001009900DA": Permission denied What appears to be happening is the affected WAL files (which is usually only 2 or 3 WAL files at a time) are somehow "losing" their NTFS permissions, so the PG process can't rename them - though of course the PG process created them. Even running icacls as admin gives "Access is denied" on those files. A further oddity is the affected files do end up disappearing after a while. The NTFS permissions on the pg_wal directory are correct, and most WAL files are unaffected. Chkdsk reports no problems, and the database is working fine otherwise. Have tried disabling antivirus software in case that was doing something but no difference. I found another recent report of similar behaviour here: https://stackoverflow.com/questions/65405479/postgresql-13-log-could-not-rename-file-pg-wal-0001000100c6 WAL config as follows: wal_level = replica fsync = on synchronous_commit = on wal_sync_method = fsync full_page_writes = on wal_compression = off wal_log_hints = off wal_init_zero = on wal_recycle = on wal_buffers = -1 wal_writer_delay = 200ms wal_writer_flush_after = 1MB wal_skip_threshold = 2MB commit_delay = 0 commit_siblings = 5 checkpoint_timeout = 5min max_wal_size = 2GB min_wal_size = 256MB checkpoint_completion_target = 0.7 checkpoint_flush_after = 0 checkpoint_warning = 30s archive_mode = off I'm thinking of disabling wal_recycle as a first step to see if that makes any difference, but thought I'd seek some comments first. Not sure how much of a problem this is - the database is running fine otherwise - but any thoughts would be appreciated. Thanks & regards, Guy
Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea
HI All, For POstgreSQL database to store data for multiple tenants, the approach decided was to have Shared Database (Holding data for all tenants) => Data would be segregated on basis of some additional column (tennatid,different tenants having different tenantId) => Data would be accessed through Views on the basis of tenantId value. This is the basic process of most of the customers who are trying to implement multiple tenants in PostgreSQL, rather than choosing separate databases for each tenant. Now we need to encrypt the data related to a tenantId, so that now one knows this data belongs to which tenant even from Operations group. Is there a method in POstgreSQL for encrypting data with different keys with respect to different values in a single column. Moreover pg_crypto will impose a single key on the column. Please share your thoughts in which direction i can start analysing this area for encryption of data specific to a tenant. Thanks in advance. *Best Regards,* Jagmohan Senior Consultant, TecoreLabs.
Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea
Jagmohan Kaintura writes: > HI All, > > For POstgreSQL database to store data for multiple tenants, the approach > decided was to have > Shared Database (Holding data for all tenants) > => Data would be segregated on basis of some additional column > (tennatid,different tenants having different tenantId) >=> Data would be accessed through Views on the basis of tenantId > value. > > This is the basic process of most of the customers who are trying to > implement multiple tenants in PostgreSQL, rather than choosing > separate databases for each tenant. > > Now we need to encrypt the data related to a tenantId, so that now one > knows this data belongs to which tenant even from Operations group. > Is there a method in POstgreSQL for encrypting data with different keys > with respect to different values in a single column. Moreover pg_crypto > will impose a single key on the column. > > Please share your thoughts in which direction i can start analysing this > area for encryption of data specific to a tenant. > The decision to have all tenants in a single database seems rather unusual to me. Isolating one tenant from adversely impacting another would seem complicated and I'm not sure how you would implement a clear security model. Your model has effectively bypassed all the provided PG facilities for isolation of data. Disaster recovery and business continuity planning under this model must be a nightmare! I doubt you can adopt a solution which is solely within the database. How would the database know which key to use for which rows of data? How would you select the data for your tenant views if all that data is encrypted with different keys? How would you manage these keys in a secure manner? With the model you have adopted, I would be looking at performing encryption/decryption at the client level. However, depending on your data types, this could be challenging. this is really a requirement which should have been factored into the initial architecture design. Anything you try to bolt on now is likely to be complex and have significant performance impact and that is assuming you can re-interpret the requirement to make the objective feasible. -- Tim Cross
Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea
> On Feb 10, 2021, at 6:45 PM, Tim Cross wrote: > > > Jagmohan Kaintura mailto:jagmo...@tecorelabs.com>> > writes: > >> HI All, >> >> For POstgreSQL database to store data for multiple tenants, the approach >> decided was to have >> Shared Database (Holding data for all tenants) >> => Data would be segregated on basis of some additional column >> (tennatid,different tenants having different tenantId) >> => Data would be accessed through Views on the basis of tenantId >> value. >> >> This is the basic process of most of the customers who are trying to >> implement multiple tenants in PostgreSQL, rather than choosing >> separate databases for each tenant. >> >> Now we need to encrypt the data related to a tenantId, so that now one >> knows this data belongs to which tenant even from Operations group. >> Is there a method in POstgreSQL for encrypting data with different keys >> with respect to different values in a single column. Moreover pg_crypto >> will impose a single key on the column. >> >> Please share your thoughts in which direction i can start analysing this >> area for encryption of data specific to a tenant. >> > > The decision to have all tenants in a single database seems rather > unusual to me. Isolating one tenant from adversely impacting another > would seem complicated and I'm not sure how you would implement a clear > security model. Your model has effectively bypassed all the provided PG > facilities for isolation of data. Disaster recovery and business > continuity planning under this model must be a nightmare! > > I doubt you can adopt a solution which is solely within the database. > How would the database know which key to use for which rows of data? How > would you select the data for your tenant views if all that data is > encrypted with different keys? How would you manage these keys in a > secure manner? > > With the model you have adopted, I would be looking at performing > encryption/decryption at the client level. However, depending on your > data types, this could be challenging. this is really a requirement > which should have been factored into the initial architecture design. > Anything you try to bolt on now is likely to be complex and have > significant performance impact and that is assuming you can re-interpret > the requirement to make the objective feasible. > Yeah, I lost that same arguement in ~2007, where the forces against my push for separation was shouted down with rants on scheme maintenance (divergence) and multiple rollouts per update. I hadn’t had any coffee before the 9:00am meeting so the hotshot from Amazon got his way. Then we tried “veils” (a concoction of view and rule re-writing) and we all know how that went. The company folded before our “next gen” software saw the light of day. I get the feeling multi-tenancy is, if not the rule these days, at least quite common (on the last of “big iron”?) but it still doesn’t sit well with me.
Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea
An interesting option is to make your tenants work through views, and have the views work through a variable that contains the tenant’s id. There would be a bit of coding, but it would be the same for every table, so you could automate it easy enough. When you’re done, client software just sets the id at the beginning of the session and the database looks like it only contains the tenant’s data. This is a particularly nice security guarantee: if you make the id a UUID (ie unguessable) then someone can entirely compromise the client application, and can still only get at the data for one tenant, and then only given their ID. On Feb 10, 2021, 18:44 -0800, Rob Sargent , wrote: > > > > On Feb 10, 2021, at 6:45 PM, Tim Cross wrote: > > > > > > Jagmohan Kaintura writes: > > > > > HI All, > > > > > > For POstgreSQL database to store data for multiple tenants, the approach > > > decided was to have > > > Shared Database (Holding data for all tenants) > > > => Data would be segregated on basis of some additional column > > > (tennatid,different tenants having different tenantId) > > > => Data would be accessed through Views on the basis of tenantId > > > value. > > > > > > This is the basic process of most of the customers who are trying to > > > implement multiple tenants in PostgreSQL, rather than choosing > > > separate databases for each tenant. > > > > > > Now we need to encrypt the data related to a tenantId, so that now one > > > knows this data belongs to which tenant even from Operations group. > > > Is there a method in POstgreSQL for encrypting data with different keys > > > with respect to different values in a single column. Moreover pg_crypto > > > will impose a single key on the column. > > > > > > Please share your thoughts in which direction i can start analysing this > > > area for encryption of data specific to a tenant. > > > > > > > The decision to have all tenants in a single database seems rather > > unusual to me. Isolating one tenant from adversely impacting another > > would seem complicated and I'm not sure how you would implement a clear > > security model. Your model has effectively bypassed all the provided PG > > facilities for isolation of data. Disaster recovery and business > > continuity planning under this model must be a nightmare! > > > > I doubt you can adopt a solution which is solely within the database. > > How would the database know which key to use for which rows of data? How > > would you select the data for your tenant views if all that data is > > encrypted with different keys? How would you manage these keys in a > > secure manner? > > > > With the model you have adopted, I would be looking at performing > > encryption/decryption at the client level. However, depending on your > > data types, this could be challenging. this is really a requirement > > which should have been factored into the initial architecture design. > > Anything you try to bolt on now is likely to be complex and have > > significant performance impact and that is assuming you can re-interpret > > the requirement to make the objective feasible. > > > Yeah, I lost that same arguement in ~2007, where the forces against my push > for separation was shouted down with rants on scheme maintenance (divergence) > and multiple rollouts per update. I hadn’t had any coffee before the 9:00am > meeting so the hotshot from Amazon got his way. Then we tried “veils” (a > concoction of view and rule re-writing) and we all know how that went. The > company folded before our “next gen” software saw the light of day. > > I get the feeling multi-tenancy is, if not the rule these days, at least > quite common (on the last of “big iron”?) but it still doesn’t sit well with > me. > >
Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea
Hi Guyren, Yup right now data is being accessed in this manner only. application access using tenant user only who have specific tenantId in that session and can see its own data only. It doesn't know about anyone else's data and neither can get/fetch. So isolation is 100% guaranteed right now. But isolation is not enough from an operations perspective, so I need encryption too in some way or another way, whatever postgreSQL supports and encryption key should differ for a tenant . On Thu, Feb 11, 2021 at 8:56 AM Guyren Howe wrote: > An interesting option is to make your tenants work through views, and have > the views work through a variable that contains the tenant’s id. There > would be a bit of coding, but it would be the same for every table, so you > could automate it easy enough. > > When you’re done, client software just sets the id at the beginning of the > session and the database looks like it only contains the tenant’s data. > > This is a particularly nice security guarantee: if you make the id a UUID > (ie unguessable) then someone can entirely compromise the client > application, and can still only get at the data for one tenant, and then > only given their ID. > On Feb 10, 2021, 18:44 -0800, Rob Sargent , wrote: > > > > On Feb 10, 2021, at 6:45 PM, Tim Cross wrote: > > > Jagmohan Kaintura writes: > > HI All, > > For POstgreSQL database to store data for multiple tenants, the approach > decided was to have > Shared Database (Holding data for all tenants) > => Data would be segregated on basis of some additional column > (tennatid,different tenants having different tenantId) > => Data would be accessed through Views on the basis of tenantId > value. > > This is the basic process of most of the customers who are trying to > implement multiple tenants in PostgreSQL, rather than choosing > separate databases for each tenant. > > Now we need to encrypt the data related to a tenantId, so that now one > knows this data belongs to which tenant even from Operations group. > Is there a method in POstgreSQL for encrypting data with different keys > with respect to different values in a single column. Moreover pg_crypto > will impose a single key on the column. > > Please share your thoughts in which direction i can start analysing this > area for encryption of data specific to a tenant. > > > The decision to have all tenants in a single database seems rather > unusual to me. Isolating one tenant from adversely impacting another > would seem complicated and I'm not sure how you would implement a clear > security model. Your model has effectively bypassed all the provided PG > facilities for isolation of data. Disaster recovery and business > continuity planning under this model must be a nightmare! > > I doubt you can adopt a solution which is solely within the database. > How would the database know which key to use for which rows of data? How > would you select the data for your tenant views if all that data is > encrypted with different keys? How would you manage these keys in a > secure manner? > > With the model you have adopted, I would be looking at performing > encryption/decryption at the client level. However, depending on your > data types, this could be challenging. this is really a requirement > which should have been factored into the initial architecture design. > Anything you try to bolt on now is likely to be complex and have > significant performance impact and that is assuming you can re-interpret > the requirement to make the objective feasible. > > Yeah, I lost that same arguement in ~2007, where the forces against my > push for separation was shouted down with rants on scheme maintenance > (divergence) and multiple rollouts per update. I hadn’t had any coffee > before the 9:00am meeting so the hotshot from Amazon got his way. Then we > tried “veils” (a concoction of view and rule re-writing) and we all > know how that went. The company folded before our “next gen” software saw > the light of day. > > I get the feeling multi-tenancy is, if not the rule these days, at least > quite common (on the last of “big iron”?) but it still doesn’t sit well > with me. > > > -- *Best Regards,* Jagmohan Senior Consultant, TecoreLabs.
Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea
Hi Tim, Yeh datatype is even not supported right now. As tenantId is numeric and encryption/decryption would go through text/bytea , so even tougher to encrypt that column data. On Thu, Feb 11, 2021 at 7:38 AM Tim Cross wrote: > > Jagmohan Kaintura writes: > > > HI All, > > > > For POstgreSQL database to store data for multiple tenants, the approach > > decided was to have > > Shared Database (Holding data for all tenants) > > => Data would be segregated on basis of some additional column > > (tennatid,different tenants having different tenantId) > >=> Data would be accessed through Views on the basis of > tenantId > > value. > > > > This is the basic process of most of the customers who are trying to > > implement multiple tenants in PostgreSQL, rather than choosing > > separate databases for each tenant. > > > > Now we need to encrypt the data related to a tenantId, so that now one > > knows this data belongs to which tenant even from Operations group. > > Is there a method in POstgreSQL for encrypting data with different keys > > with respect to different values in a single column. Moreover pg_crypto > > will impose a single key on the column. > > > > Please share your thoughts in which direction i can start analysing this > > area for encryption of data specific to a tenant. > > > > The decision to have all tenants in a single database seems rather > unusual to me. Isolating one tenant from adversely impacting another > would seem complicated and I'm not sure how you would implement a clear > security model. Your model has effectively bypassed all the provided PG > facilities for isolation of data. Disaster recovery and business > continuity planning under this model must be a nightmare! > > I doubt you can adopt a solution which is solely within the database. > How would the database know which key to use for which rows of data? How > would you select the data for your tenant views if all that data is > encrypted with different keys? How would you manage these keys in a > secure manner? > > With the model you have adopted, I would be looking at performing > encryption/decryption at the client level. However, depending on your > data types, this could be challenging. this is really a requirement > which should have been factored into the initial architecture design. > Anything you try to bolt on now is likely to be complex and have > significant performance impact and that is assuming you can re-interpret > the requirement to make the objective feasible. > > -- > Tim Cross > > > -- *Best Regards,* Jagmohan Senior Consultant, TecoreLabs.
Re: Unable to execute Query in parallel for partitioned table
On Thu, 2021-02-11 at 05:09 +0530, Brajendra Pratap wrote: > I am unable to execute the below in parallel plz suggest how can I achieve > parallelism here. > > select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER() OVER (order by > trn_transaction_date desc ) AS RowNumber from ( > select * from transactions where trn_store_date_id=20201202) abc; > > Query plan is as mentioned below :- > > explain analyze select count(*) over () VIEWALLROWCOUNT,abc.*,ROW_NUMBER() > OVER (order by trn_transaction_date desc ) AS RowNumber from ( > select * from transactions where trn_store_date_id=20201218) abc; > LOG: duration: 25820.176 ms statement: explain analyze select count(*) over > () VIEWALLROWCOUNT,abc.*,ROW_NUMBER() OVER (order by trn_transaction_date > desc ) AS RowNumber from ( > select * from transactions where trn_store_date_id=20201218) abc; > > QUERY PLAN > -- > WindowAgg (cost=4474843.51..4498066.81 rows=774110 width=21297) (actual > time=21455.495..25241.738 rows=795190 loops=1) >-> WindowAgg (cost=4474843.51..4488390.44 rows=774110 width=21289) > (actual time=10588.494..15311.865 rows=795190 loops=1) > -> Sort (cost=4474843.51..4476778.79 rows=774110 width=21281) > (actual time=10588.422..11771.300 rows=795190 loops=1) >Sort Key: transactions.trn_transaction_date DESC >Sort Method: external merge Disk: 1496856kB >-> Result (cost=0.00..270640.32 rows=774110 width=21281) > (actual time=0.117..4504.159 rows=795190 loops=1) > -> Append (cost=0.00..262899.22 rows=774110 > width=21281) (actual time=0.094..1449.532 rows=795190 loops=1) >-> Seq Scan on transactions (cost=0.00..0.00 > rows=1 width=47554) (actual time=0.019..0.019 rows=0 loops=1) > Filter: (trn_store_date_id = 20201218) >-> Index Scan using idx_202012_trn_store_date_id > on transactions_202012 (cost=0.56..259028.67 rows=774109 width=21281) > (actual time=0.074..1357.764 rows=795190 loops=1) > Index Cond: (trn_store_date_id = 20201218) > Planning Time: 116.472 ms > Execution Time: 25676.098 ms > > Note :- We had tried different options like max_worker_processes, > max_parallel_workers, > max_parallel_workers_per_gather,max_parallel_maintenance_worker to execute it > in parallel but no luck. I don't think parallelization will help you here. Your problem is probably the "abc.*" in the SELECT list. There must be really large data in this table, so it takes a long time to fetch and sort the rows. Try selecting only the columns you need. Alternatively, add a LIMIT clause. Do you really need all 80 rows? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Which partition scheme makes sense for my time based IoT-datagrams?
On Wed, 2021-02-10 at 18:59 +0100, Thorsten Schöning wrote: > > They will scan all partitions. You have to be prepared that most queries > > will become at least slightly slower with partitioning. That is > > expected. > > Does "most queries" address thos efor IDs scanning all partitions or > those time-based as well? In the end, I'm trying to improve query > performance by reducing the size of indexes, number of rows etc. per > table using partitions. :-) The size of the index will barely affect its performance. But having to figure out which partitions to scan takes planning time. Very few queries become notably faster through partitioning. Mostly those are aggregates and joins that match the partitioning scheme, and sequential scans that include the partitioning key. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com