Re: Performance degrade in Planning Time to find appropriate Partial Index
Hi, https://heapanalytics.com/blog/engineering/running-10-million-postgresql-indexes-in-production >From the link shared above, it looks like what Meenatchi has done should work. Do the conditions on the partial index and query match exactly? ( greater than / greater than equals mismatch maybe?) If conditions for those partial indexes are mutually exclusive and the query has a matching condition then Postgres can use that index alone. Are we missing something here? Regards, Nanda On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe wrote: > Meenatchi Sandanam wrote: >> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table >> contains >> multiple form data differentiated by ID range. Hence a column contains more >> than one form data. >> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL >> Partial Indexes >> which suits my requirement. I have created Partial Indexes with ID Range as >> criteria and >> it provides Uniqueness and Indexing per form basis as expected. But DML >> operations on a >> particular form scans all the Indexes created for the entire table instead >> of scanning >> the Indexes created for that particular form ID Range. This degrades Planner >> Performance >> and Query Time more than 10 times as below, >> >> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) : > > It is crazy to create 3000 partial indexes on one table. > > No wonder planning and DML statements take very long, they have to consider > all the > indexes. > >> explain analyse select id from form_data_copy where id between 3001 and 4000 >> and bigint50=789; > > Use a single index on (bigint50, id) for best performance. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com >
Re: Performance degrade in Planning Time to find appropriate Partial Index
2018-03-02 14:49 GMT+01:00 Nandakumar M : > Hi, > > https://heapanalytics.com/blog/engineering/running-10- > million-postgresql-indexes-in-production > > From the link shared above, it looks like what Meenatchi has done should > work. > It can be different situation, there are not specified indexes per table. And if some projects works, it doesn't mean, so they are well designed. PostgreSQL has not column storage. Look on column databases. They are designed for extra wide tables. Regards Pavel > > Do the conditions on the partial index and query match exactly? ( > greater than / greater than equals mismatch maybe?) > > If conditions for those partial indexes are mutually exclusive and the > query has a matching condition then Postgres can use that index alone. > Are we missing something here? > > Regards, > Nanda > > On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe > wrote: > > Meenatchi Sandanam wrote: > >> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The > table contains > >> multiple form data differentiated by ID range. Hence a column contains > more than one form data. > >> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL > Partial Indexes > >> which suits my requirement. I have created Partial Indexes with ID > Range as criteria and > >> it provides Uniqueness and Indexing per form basis as expected. But DML > operations on a > >> particular form scans all the Indexes created for the entire table > instead of scanning > >> the Indexes created for that particular form ID Range. This degrades > Planner Performance > >> and Query Time more than 10 times as below, > >> > >> Query Result for the table with 3000 Partial Indexes(15 Indexes per > form) : > > > > It is crazy to create 3000 partial indexes on one table. > > > > No wonder planning and DML statements take very long, they have to > consider all the > > indexes. > > > >> explain analyse select id from form_data_copy where id between 3001 and > 4000 and bigint50=789; > > > > Use a single index on (bigint50, id) for best performance. > > > > Yours, > > Laurenz Albe > > -- > > Cybertec | https://www.cybertec-postgresql.com > > > >
Re: Performance degrade in Planning Time to find appropriate Partial Index
2018-03-02 15:29 GMT+01:00 Pavel Stehule : > > > 2018-03-02 14:49 GMT+01:00 Nandakumar M : > >> Hi, >> >> https://heapanalytics.com/blog/engineering/running-10-millio >> n-postgresql-indexes-in-production >> >> From the link shared above, it looks like what Meenatchi has done should >> work. >> > > It can be different situation, there are not specified indexes per table. > And if some projects works, it doesn't mean, so they are well designed. > > PostgreSQL has not column storage. Look on column databases. They are > designed for extra wide tables. > read the article: 1. Probably they use Citus 2. Since partial indexes are so easy to create and work with, we’ve wound up with over 10 million partial indexes across our entire cluster. > Regards > > Pavel > > >> >> Do the conditions on the partial index and query match exactly? ( >> greater than / greater than equals mismatch maybe?) >> >> If conditions for those partial indexes are mutually exclusive and the >> query has a matching condition then Postgres can use that index alone. >> Are we missing something here? >> >> Regards, >> Nanda >> >> On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe >> wrote: >> > Meenatchi Sandanam wrote: >> >> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The >> table contains >> >> multiple form data differentiated by ID range. Hence a column contains >> more than one form data. >> >> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL >> Partial Indexes >> >> which suits my requirement. I have created Partial Indexes with ID >> Range as criteria and >> >> it provides Uniqueness and Indexing per form basis as expected. But >> DML operations on a >> >> particular form scans all the Indexes created for the entire table >> instead of scanning >> >> the Indexes created for that particular form ID Range. This degrades >> Planner Performance >> >> and Query Time more than 10 times as below, >> >> >> >> Query Result for the table with 3000 Partial Indexes(15 Indexes per >> form) : >> > >> > It is crazy to create 3000 partial indexes on one table. >> > >> > No wonder planning and DML statements take very long, they have to >> consider all the >> > indexes. >> > >> >> explain analyse select id from form_data_copy where id between 3001 >> and 4000 and bigint50=789; >> > >> > Use a single index on (bigint50, id) for best performance. >> > >> > Yours, >> > Laurenz Albe >> > -- >> > Cybertec | https://www.cybertec-postgresql.com >> > >> >> >
why does this query not use a parallel query
Have a query: explain analyze SELECT minion_id FROM mob_player_mob_118 WHERE player_id = 55351078; QUERY PLAN - Index Only Scan using mob_player_mob_118_pkey on mob_player_mob_118 (cost=0.44..117887.06 rows=4623076 width=4) (actual time=0.062..3716.105 rows=4625123 loops=1) Index Cond: (player_id = 55351078) Heap Fetches: 1152408 Planning time: 0.241 ms Execution time: 5272.171 ms If I just get the count it will use a parallel query explain analyze SELECT count(minion_id) FROM mob_player_mob_118 WHERE player_id = 55351078; Thanks Dave Cramer
Re: why does this query not use a parallel query
Dave Cramer writes: > Have a query: > explain analyze SELECT minion_id FROM mob_player_mob_118 WHERE player_id = > 55351078; > Index Only Scan using mob_player_mob_118_pkey on mob_player_mob_118 > (cost=0.44..117887.06 rows=4623076 width=4) (actual time=0.062..3716.105 > rows=4625123 loops=1) I don't think we have parallel IOS yet (I might be wrong). If so, it probably thinks this is cheaper than the best available parallel plan. > If I just get the count it will use a parallel query Likely a parallelized aggregation. regards, tom lane
Re: Updating large tables without dead tuples
On 02/24/2018 12:27 AM, [email protected] wrote: > Hello > > > > I work with a large and wide table (about 300 million rows, about 50 > columns), and from time to time, we get business requirements to make > some modifications. But sometimes, it’s just some plain mistake. This > has happened to us a few weeks ago where someone made a mistake and we > had to update a single column of a large and wide table. Literally, the > source data screwed up a zip code and we had to patch on our end. > > > > Anyways… Query ran was: > > update T set source_id = substr(sourceId, 2, 10); > > Took about 10h and created 100’s of millions of dead tuples, causing > another couple of hours of vacuum. > > > > This was done during a maintenance window, and that table is read-only > except when we ETL data to it on a weekly basis, and so I was just > wondering why I should pay the “bloat” penalty for this type of > transaction. Is there a trick that could be use here? Yes, there is a trick I like to use here, as long as you don't mind locking the table (even against reads). I'll assume T.source_id is of type text. If it's not, use whatever the actual type is. ALTER TABLE T ALTER COLUMN source_id TYPE text USING substr(sourceId, 2, 10); I copied what you had verbatim, I earnestly hope you don't have two columns source_id and sourceId in your table. This will rewrite the entire table just the same as a VACUUM FULL after your UPDATE would. Don't forget to VACUUM ANALYZE this table after the operation. Even though there will be no dead rows, you still need to VACUUM it to generate the visibility map and you need to ANALYZE it for statistics on your "new" column. Foreign keys remain intact with this solution and you don't have double wal logging like for an UPDATE. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
