Re: Performance degrade in Planning Time to find appropriate Partial Index

2018-03-02 Thread Nandakumar M
Hi,

https://heapanalytics.com/blog/engineering/running-10-million-postgresql-indexes-in-production

>From the link shared above, it looks like what Meenatchi has done should work.

Do the conditions on the partial index and query match exactly? (
greater than / greater than equals mismatch maybe?)

If conditions for those partial indexes are mutually exclusive and the
query has a matching condition then Postgres can use that index alone.
Are we missing something here?

Regards,
Nanda

On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe  wrote:
> Meenatchi Sandanam wrote:
>> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The table 
>> contains
>> multiple form data differentiated by ID range. Hence a column contains more 
>> than one form data.
>> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL 
>> Partial Indexes
>> which suits my requirement. I have created Partial Indexes with ID Range as 
>> criteria and
>> it provides Uniqueness and Indexing per form basis as expected. But DML 
>> operations on a
>> particular form scans all the Indexes created for the entire table instead 
>> of scanning
>> the Indexes created for that particular form ID Range. This degrades Planner 
>> Performance
>> and Query Time more than 10 times as below,
>>
>> Query Result for the table with 3000 Partial Indexes(15 Indexes per form) :
>
> It is crazy to create 3000 partial indexes on one table.
>
> No wonder planning and DML statements take very long, they have to consider 
> all the
> indexes.
>
>> explain analyse select id from form_data_copy where id between 3001 and 4000 
>> and bigint50=789;
>
> Use a single index on (bigint50, id) for best performance.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>



Re: Performance degrade in Planning Time to find appropriate Partial Index

2018-03-02 Thread Pavel Stehule
2018-03-02 14:49 GMT+01:00 Nandakumar M :

> Hi,
>
> https://heapanalytics.com/blog/engineering/running-10-
> million-postgresql-indexes-in-production
>
> From the link shared above, it looks like what Meenatchi has done should
> work.
>

It can be different situation, there are not specified indexes per table.
And if some projects works, it doesn't mean, so they are well designed.

PostgreSQL has not column storage. Look on column databases. They are
designed for extra wide tables.

Regards

Pavel


>
> Do the conditions on the partial index and query match exactly? (
> greater than / greater than equals mismatch maybe?)
>
> If conditions for those partial indexes are mutually exclusive and the
> query has a matching condition then Postgres can use that index alone.
> Are we missing something here?
>
> Regards,
> Nanda
>
> On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe 
> wrote:
> > Meenatchi Sandanam wrote:
> >> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The
> table contains
> >> multiple form data differentiated by ID range. Hence a column contains
> more than one form data.
> >> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL
> Partial Indexes
> >> which suits my requirement. I have created Partial Indexes with ID
> Range as criteria and
> >> it provides Uniqueness and Indexing per form basis as expected. But DML
> operations on a
> >> particular form scans all the Indexes created for the entire table
> instead of scanning
> >> the Indexes created for that particular form ID Range. This degrades
> Planner Performance
> >> and Query Time more than 10 times as below,
> >>
> >> Query Result for the table with 3000 Partial Indexes(15 Indexes per
> form) :
> >
> > It is crazy to create 3000 partial indexes on one table.
> >
> > No wonder planning and DML statements take very long, they have to
> consider all the
> > indexes.
> >
> >> explain analyse select id from form_data_copy where id between 3001 and
> 4000 and bigint50=789;
> >
> > Use a single index on (bigint50, id) for best performance.
> >
> > Yours,
> > Laurenz Albe
> > --
> > Cybertec | https://www.cybertec-postgresql.com
> >
>
>


Re: Performance degrade in Planning Time to find appropriate Partial Index

2018-03-02 Thread Pavel Stehule
2018-03-02 15:29 GMT+01:00 Pavel Stehule :

>
>
> 2018-03-02 14:49 GMT+01:00 Nandakumar M :
>
>> Hi,
>>
>> https://heapanalytics.com/blog/engineering/running-10-millio
>> n-postgresql-indexes-in-production
>>
>> From the link shared above, it looks like what Meenatchi has done should
>> work.
>>
>
> It can be different situation, there are not specified indexes per table.
> And if some projects works, it doesn't mean, so they are well designed.
>
> PostgreSQL has not column storage. Look on column databases. They are
> designed for extra wide tables.
>

read the article:

1. Probably they use Citus

2. Since partial indexes are so easy to create and work with, we’ve wound
up with over 10 million partial indexes across our entire cluster.



> Regards
>
> Pavel
>
>
>>
>> Do the conditions on the partial index and query match exactly? (
>> greater than / greater than equals mismatch maybe?)
>>
>> If conditions for those partial indexes are mutually exclusive and the
>> query has a matching condition then Postgres can use that index alone.
>> Are we missing something here?
>>
>> Regards,
>> Nanda
>>
>> On Thu, Mar 1, 2018 at 6:33 PM, Laurenz Albe 
>> wrote:
>> > Meenatchi Sandanam wrote:
>> >> I have created a table with 301 columns(ID, 150 BIGINT, 150 TEXT). The
>> table contains
>> >> multiple form data differentiated by ID range. Hence a column contains
>> more than one form data.
>> >> To achieve Unique Constraint and Indexing per form, I chose PostgreSQL
>> Partial Indexes
>> >> which suits my requirement. I have created Partial Indexes with ID
>> Range as criteria and
>> >> it provides Uniqueness and Indexing per form basis as expected. But
>> DML operations on a
>> >> particular form scans all the Indexes created for the entire table
>> instead of scanning
>> >> the Indexes created for that particular form ID Range. This degrades
>> Planner Performance
>> >> and Query Time more than 10 times as below,
>> >>
>> >> Query Result for the table with 3000 Partial Indexes(15 Indexes per
>> form) :
>> >
>> > It is crazy to create 3000 partial indexes on one table.
>> >
>> > No wonder planning and DML statements take very long, they have to
>> consider all the
>> > indexes.
>> >
>> >> explain analyse select id from form_data_copy where id between 3001
>> and 4000 and bigint50=789;
>> >
>> > Use a single index on (bigint50, id) for best performance.
>> >
>> > Yours,
>> > Laurenz Albe
>> > --
>> > Cybertec | https://www.cybertec-postgresql.com
>> >
>>
>>
>


why does this query not use a parallel query

2018-03-02 Thread Dave Cramer
Have a query:

explain analyze SELECT minion_id FROM mob_player_mob_118 WHERE player_id =
55351078;

 QUERY PLAN
-
 Index Only Scan using mob_player_mob_118_pkey on mob_player_mob_118
(cost=0.44..117887.06 rows=4623076 width=4) (actual time=0.062..3716.105
rows=4625123 loops=1)
   Index Cond: (player_id = 55351078)
   Heap Fetches: 1152408
 Planning time: 0.241 ms
 Execution time: 5272.171 ms

If I just get the count it will use a parallel query

explain analyze SELECT count(minion_id) FROM mob_player_mob_118 WHERE
player_id = 55351078;

Thanks

Dave Cramer


Re: why does this query not use a parallel query

2018-03-02 Thread Tom Lane
Dave Cramer  writes:
> Have a query:
> explain analyze SELECT minion_id FROM mob_player_mob_118 WHERE player_id =
> 55351078;

>  Index Only Scan using mob_player_mob_118_pkey on mob_player_mob_118
> (cost=0.44..117887.06 rows=4623076 width=4) (actual time=0.062..3716.105
> rows=4625123 loops=1)

I don't think we have parallel IOS yet (I might be wrong).  If so,
it probably thinks this is cheaper than the best available parallel plan.

> If I just get the count it will use a parallel query

Likely a parallelized aggregation.

regards, tom lane



Re: Updating large tables without dead tuples

2018-03-02 Thread Vik Fearing
On 02/24/2018 12:27 AM, [email protected] wrote:
> Hello
> 
>  
> 
> I work with a large and wide table (about 300 million rows, about 50
> columns), and from time to time, we get business requirements to make
> some modifications. But sometimes, it’s just some plain mistake. This
> has happened to us a few weeks ago where someone made a mistake and we
> had to update a single column of a large and wide table. Literally, the
> source data screwed up a zip code and we had to patch on our end.
> 
>  
> 
> Anyways… Query ran was:
> 
>     update T set source_id = substr(sourceId, 2, 10);
> 
> Took about 10h and created 100’s of millions of dead tuples, causing
> another couple of hours of vacuum.
> 
>  
> 
> This was done during a maintenance window, and that table is read-only
> except when we ETL data to it on a weekly basis, and so I was just
> wondering why I should pay the “bloat” penalty for this type of
> transaction. Is there a trick that could be use here?
Yes, there is a trick I like to use here, as long as you don't mind
locking the table (even against reads).

I'll assume T.source_id is of type text.  If it's not, use whatever the
actual type is.

ALTER TABLE T
ALTER COLUMN source_id TYPE text USING substr(sourceId, 2, 10);

I copied what you had verbatim, I earnestly hope you don't have two
columns source_id and sourceId in your table.

This will rewrite the entire table just the same as a VACUUM FULL after
your UPDATE would.

Don't forget to VACUUM ANALYZE this table after the operation.  Even
though there will be no dead rows, you still need to VACUUM it to
generate the visibility map and you need to ANALYZE it for statistics on
your "new" column.

Foreign keys remain intact with this solution and you don't have double
wal logging like for an UPDATE.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support