Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Kristjan Mustkivi
Hello Tomas,

The auto explain analyze caught this:

2021-09-14 06:55:33 UTC, pid=12345  db=mydb, usr=myuser, client=ip,
app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
  Query Text: SELECT *   FROM myschema.mytable pbh WHERE
pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
$5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
  Limit  (cost=0.70..6.27 rows=1 width=66) (actual
time=5934.154..5934.155 rows=1 loops=1)
Buffers: shared hit=7623 read=18217
->  Index Scan Backward using mytable_idx2 on mytable pbh
(cost=0.70..21639.94 rows=3885 width=66) (actual
time=5934.153..5934.153 rows=1 loops=1)
  Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND
(modified_time < $5))

So it expected to get 3885 rows, but got just 1. So this is the
statistics issue, right?

For testing, I set autovacuum_vacuum_scale_factor = 0.0 and
autovacuum_vacuum_threshold = 1 for the table and am now
monitoring the behavior.

Best regards,

Kristjan

On Mon, Sep 13, 2021 at 4:50 PM Tomas Vondra
 wrote:
>
> On 9/13/21 3:24 PM, Kristjan Mustkivi wrote:
> > Dear community,
> >
> > I have a query that most of the time gets executed in a few
> > milliseconds yet occasionally takes ~20+ seconds. The difference, as
> > far as I am able to tell, comes whether it uses the table Primary Key
> > (fast) or an additional index with smaller size. The table in question
> > is INSERT ONLY - no updates or deletes done there.
> >
>
> It'd be really useful to have explain analyze for the slow execution.
>
> My guess is there's a poor estimate, affecting some of the parameter
> values, and it probably resolves itself after autoanalyze run.
>
> I see you mentioned SET STATISTICS, so you tried increasing the
> statistics target for some of the columns? Have you tried lowering
> autovacuum_analyze_scale_factor to make autoanalyze more frequent?
>
> It's also possible most values are independent, but some values have a
> rather strong dependency, skewing the estimates. The MCV would help with
> that, but those are in PG12 :-(
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Kristjan Mustkivi

Email: [email protected]




Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Kristjan Mustkivi
Hi Jeff,

The specialized index is present due to some other queries and the
index is used frequently (according to the statistics). I do agree
that in this particular case the index btree (cage_code,
cage_player_id, product_code, balance_type, modified_time) would solve
the problem but at the moment it is not possible to change that
without unexpected consequences (this odd behavior manifests only in
one of our sites).

I will try if more aggressive autovacuum analyze will alleviate the
case as Tomas Vondra suggested.


Thank you for the help!

Kristjan

On Mon, Sep 13, 2021 at 10:21 PM Jeff Janes  wrote:
>
> On Mon, Sep 13, 2021 at 9:25 AM Kristjan Mustkivi  
> wrote:
>
>>
>> SELECT
>> *
>> FROM
>> myschema.mytable pbh
>> WHERE
>> pbh.product_code = $1
>> AND pbh.cage_player_id = $2
>> AND pbh.cage_code = $3
>> AND balance_type = $4
>> AND pbh.modified_time < $5
>> ORDER BY
>> pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY;
>
>
>>
>> "mytable_idx2" btree (cage_code, cage_player_id, modified_time)
>
>
> Why does this index exist?  It seems rather specialized, but what is it 
> specialized for?
>
> If you are into specialized indexes, the ideal index for this query would be:
>
> btree (cage_code, cage_player_id, product_code, balance_type, modified_time)
>
> But the first 4 columns can appear in any order if that helps you combine 
> indexes.  If this index existed, then it wouldn't have to choose between two 
> other suboptimal indexes, and so would be unlikely to choose incorrectly 
> between them.
>
> Cheers,
>
> Jeff



-- 
Kristjan Mustkivi

Email: [email protected]




Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Laurenz Albe
On Tue, 2021-09-14 at 10:55 +0300, Kristjan Mustkivi wrote:
> 2021-09-14 06:55:33 UTC, pid=12345  db=mydb, usr=myuser, client=ip,
> app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
>   Query Text: SELECT *   FROM myschema.mytable pbh WHERE
> pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
> pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
> $5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
>   Limit  (cost=0.70..6.27 rows=1 width=66) (actual time=5934.154..5934.155 
> rows=1 loops=1)
>     Buffers: shared hit=7623 read=18217
>     ->  Index Scan Backward using mytable_idx2 on mytable pbh 
> (cost=0.70..21639.94 rows=3885 width=66) (actual time=5934.153..5934.153 
> rows=1 loops=1)
>   Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND 
> (modified_time < $5))

If it scanned the index for 6 seconds before finding the first result,
I'd suspect one of the following:

- the index is terribly bloated

- there were lots of deleted rows, and the index entries were marked as "dead"

- something locked the table for a long time

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Jeff Janes
On Tue, Sep 14, 2021 at 3:55 AM Kristjan Mustkivi 
wrote:

> Hello Tomas,
>
> The auto explain analyze caught this:
>
> 2021-09-14 06:55:33 UTC, pid=12345  db=mydb, usr=myuser, client=ip,
> app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
>   Query Text: SELECT *   FROM myschema.mytable pbh WHERE
> pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
> pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
> $5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
>   Limit  (cost=0.70..6.27 rows=1 width=66) (actual
> time=5934.154..5934.155 rows=1 loops=1)
> Buffers: shared hit=7623 read=18217
> ->  Index Scan Backward using mytable_idx2 on mytable pbh
> (cost=0.70..21639.94 rows=3885 width=66) (actual
> time=5934.153..5934.153 rows=1 loops=1)
>   Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND
> (modified_time < $5))
>
> So it expected to get 3885 rows, but got just 1. So this is the
> statistics issue, right?
>

That would be true if there were no LIMIT.  But with the LIMIT, all this
means is that it stopped actually scanning after it found one row, but it
estimates that if it didn't stop it would have found 3885.  So it is not
very informative.  But the above plan appears incomplete, there should be a
line for "Rows Removed by Filter", and I think that that is what we really
want to see in this case.

Cheers,

Jeff
Cheers,

Jeff


Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Kristjan Mustkivi
I am very sorry, I indeed copy-pasted an incomplete plan. Here it is in full:

2021-09-14 06:55:33 UTC, pid=27576  db=mydb, usr=myuser, client=ip,
app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
Query Text: SELECT *   FROM myschema.mytable pbh WHERE
pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
$5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
Limit  (cost=0.70..6.27 rows=1 width=66) (actual
time=5934.154..5934.155 rows=1 loops=1)
  Buffers: shared hit=7623 read=18217
  ->  Index Scan Backward using player_balance_history_idx2 on
mytable pbh  (cost=0.70..21639.94 rows=3885 width=66) (actual
time=5934.153..5934.153 rows=1 loops=1)
Index Cond: ((cage_code = $3) AND (cage_player_id =
$2) AND (modified_time < $5))
Filter: (((product_code)::text = ($1)::text) AND
((balance_type)::text = ($4)::text))
Rows Removed by Filter: 95589
Buffers: shared hit=7623 read=18217

Also, I have made incrementally the following changes: set autovacuum
more aggressive, then added column based stats targets and then
created a statistics object. Yet there is no change in the plan
behavior. Table as it is now:

\d+ myschema.mytable
Table "myschema.mytable"
 Column │Type │ Collation │ Nullable │
Default │ Storage  │ Stats target │ Description
┼─┼───┼──┼─┼──┼──┼─
 cage_code  │ integer │   │ not null │
│ plain│ 500  │
 cage_player_id │ bigint  │   │ not null │
│ plain│ 500  │
 product_code   │ character varying(30)   │   │ not null │
│ extended │ 500  │
 balance_type   │ character varying(30)   │   │ not null │
│ extended │ 500  │
 version│ bigint  │   │ not null │
│ plain│  │
 modified_time  │ timestamp(3) with time zone │   │ not null │
│ plain│ 500  │
 amount │ numeric(38,8)   │   │ not null │
│ main │  │
 change │ numeric(38,8)   │   │ not null │
│ main │  │
 transaction_id │ bigint  │   │ not null │
│ plain│  │
Indexes:
"mytable_pk" PRIMARY KEY, btree (cage_code, cage_player_id,
product_code, balance_type, version)
"mytable_idx1" btree (modified_time)
"mytable_idx2" btree (cage_code, cage_player_id, modified_time)
Statistics objects:
"myschema"."statistics_pbh_1" (ndistinct, dependencies) ON
cage_code, cage_player_id, product_code, balance_type FROM
myschema.mytable
Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000

I will investigate the index bloat and if something is blocking the
query as suggested by Laurenz Albe.

Best,

Kristjan

On Tue, Sep 14, 2021 at 3:26 PM Jeff Janes  wrote:
>
> On Tue, Sep 14, 2021 at 3:55 AM Kristjan Mustkivi  
> wrote:
>>
>> Hello Tomas,
>>
>> The auto explain analyze caught this:
>>
>> 2021-09-14 06:55:33 UTC, pid=12345  db=mydb, usr=myuser, client=ip,
>> app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
>>   Query Text: SELECT *   FROM myschema.mytable pbh WHERE
>> pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
>> pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
>> $5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
>>   Limit  (cost=0.70..6.27 rows=1 width=66) (actual
>> time=5934.154..5934.155 rows=1 loops=1)
>> Buffers: shared hit=7623 read=18217
>> ->  Index Scan Backward using mytable_idx2 on mytable pbh
>> (cost=0.70..21639.94 rows=3885 width=66) (actual
>> time=5934.153..5934.153 rows=1 loops=1)
>>   Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND
>> (modified_time < $5))
>>
>> So it expected to get 3885 rows, but got just 1. So this is the
>> statistics issue, right?
>
>
> That would be true if there were no LIMIT.  But with the LIMIT, all this 
> means is that it stopped actually scanning after it found one row, but it 
> estimates that if it didn't stop it would have found 3885.  So it is not very 
> informative.  But the above plan appears incomplete, there should be a line 
> for "Rows Removed by Filter", and I think that that is what we really want to 
> see in this case.
>
> Cheers,
>
> Jeff
> Cheers,
>
> Jeff



-- 
Kristjan Mustkivi

Email: [email protected]




Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Tom Lane
Kristjan Mustkivi  writes:
>   ->  Index Scan Backward using player_balance_history_idx2 on
> mytable pbh  (cost=0.70..21639.94 rows=3885 width=66) (actual
> time=5934.153..5934.153 rows=1 loops=1)
> Index Cond: ((cage_code = $3) AND (cage_player_id =
> $2) AND (modified_time < $5))
> Filter: (((product_code)::text = ($1)::text) AND
> ((balance_type)::text = ($4)::text))
> Rows Removed by Filter: 95589
> Buffers: shared hit=7623 read=18217

So indeed, the core issue is that that filter condition is very selective,
and applying it after the index scan is expensive.  Perhaps it would help
to create an index that includes those columns along with cage_code and
cage_player_id.  (It's not clear whether to bother with modified_time in
this specialized index, but if you do include it, it needs to be the last
column since you're putting a non-equality condition on it.)

regards, tom lane




Re: EnterpriseDB

2021-09-14 Thread sbob
EnterpriseDB is basically postgres with the added oracle compatability 
and some added external tools. THe default user & db will no longer be 
postgres but 'enterprisedb', but it is still postgresql so you won't 
have any issues working with EDB if you already know postgres




On 9/13/21 20:52, Mladen Gogala wrote:
The company I work for will test EnterpriseDB. I am fairly well 
acquainted with Postgres but have no experience whatsoever with 
EnterpriseDB. How compatible to Postgres it is? Do pgAdmin4 and 
pgBadger work with EnterpriseDB? Are psql commands the same? Can 
anyone here share some impressions?


Regards






Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Kristjan Mustkivi
On Tue, Sep 14, 2021 at 5:15 PM Tom Lane  wrote:
>
> Kristjan Mustkivi  writes:
> >   ->  Index Scan Backward using player_balance_history_idx2 on
> > mytable pbh  (cost=0.70..21639.94 rows=3885 width=66) (actual
> > time=5934.153..5934.153 rows=1 loops=1)
> > Index Cond: ((cage_code = $3) AND (cage_player_id =
> > $2) AND (modified_time < $5))
> > Filter: (((product_code)::text = ($1)::text) AND
> > ((balance_type)::text = ($4)::text))
> > Rows Removed by Filter: 95589
> > Buffers: shared hit=7623 read=18217
>
> So indeed, the core issue is that that filter condition is very selective,
> and applying it after the index scan is expensive.  Perhaps it would help
> to create an index that includes those columns along with cage_code and
> cage_player_id.  (It's not clear whether to bother with modified_time in
> this specialized index, but if you do include it, it needs to be the last
> column since you're putting a non-equality condition on it.)
>
> regards, tom lane

But the Primary Key is defined as btree (cage_code, cage_player_id,
product_code, balance_type, version) so this should be exactly that
(apart from the extra "version" column). And the majority of the query
plans are using the PK with only a small number of cases going for the
IDX2 that is btree (cage_code, cage_player_id, modified_time). So I am
wondering how to make them not do that.

But perhaps the index bloat is indeed playing a part here as both the
PK and IDX2 have ~50% bloat ratio. I will try REINDEX-ing the table
although finding a good window to do it might require some time.

Best regards,

Kristjan




Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Tom Lane
Kristjan Mustkivi  writes:
>>> Filter: (((product_code)::text = ($1)::text) AND
>>> ((balance_type)::text = ($4)::text))

> But the Primary Key is defined as btree (cage_code, cage_player_id,
> product_code, balance_type, version) so this should be exactly that
> (apart from the extra "version" column).

Oh, interesting.  So this is really a datatype mismatch problem.
I'd wondered idly why you were getting the explicit casts to text
in these conditions, but now it seems that that's key to the
problem: the casts prevent these clauses from being matched to
the index.  What are the declared data types of product_code
and balance_type?  And of the parameters they're compared to?

regards, tom lane




Re: EnterpriseDB

2021-09-14 Thread manish yadav
In addition to the Sbob comments, if we install the EDB with postgres 
compatibility option in such a case we may continue to use postgres user as 
default super user and other configuration parameters would remain the same as 
community PostgreSQL.  
It would work perfectly fine with pgAdmin4 and pgbadger.
If you choose to install in Compatible with PostgreSQL mode, the Advanced 
Server superuser name is postgres.
Thanks and Regards,
Manish Yadav
Mobile : +91 8527607945 

On Tuesday, 14 September 2021, 08:33:31 PM IST, sbob 
 wrote:  
 
 EnterpriseDB is basically postgres with the added oracle compatability 
and some added external tools. THe default user & db will no longer be 
postgres but 'enterprisedb', but it is still postgresql so you won't 
have any issues working with EDB if you already know postgres



On 9/13/21 20:52, Mladen Gogala wrote:
> The company I work for will test EnterpriseDB. I am fairly well 
> acquainted with Postgres but have no experience whatsoever with 
> EnterpriseDB. How compatible to Postgres it is? Do pgAdmin4 and 
> pgBadger work with EnterpriseDB? Are psql commands the same? Can 
> anyone here share some impressions?
>
> Regards
>


  

Re: Postgres chooses slow query plan from time to time

2021-09-14 Thread Kristjan Mustkivi
Hello!

Both are of type varchar(30).

So is this something odd: Filter: (((product_code)::text = ($1)::text)
AND ((balance_type)::text = ($4)::text)) ?

But why does it do the type-cast if both product_code and balance_type
are of type text (although with constraint 30) and the values are also
of type text?

Best regards,

Kristjan

On Tue, Sep 14, 2021 at 6:47 PM Tom Lane  wrote:
>
> Kristjan Mustkivi  writes:
> >>> Filter: (((product_code)::text = ($1)::text) AND
> >>> ((balance_type)::text = ($4)::text))
>
> > But the Primary Key is defined as btree (cage_code, cage_player_id,
> > product_code, balance_type, version) so this should be exactly that
> > (apart from the extra "version" column).
>
> Oh, interesting.  So this is really a datatype mismatch problem.
> I'd wondered idly why you were getting the explicit casts to text
> in these conditions, but now it seems that that's key to the
> problem: the casts prevent these clauses from being matched to
> the index.  What are the declared data types of product_code
> and balance_type?  And of the parameters they're compared to?
>
> regards, tom lane



-- 
Kristjan Mustkivi

Email: [email protected]