RE: wrong rows and cost estimation when generic plan

2022-12-05 Thread James Pang (chaolpan)
Hi,
   It's a prepared sql statement on a non-partitioned table , 16millions tuples 
 and multiple indexes on this table. pk_x  primary key (aid,bid,btype) all 
3 cols are bigint datatype, there is another index idx_x(starttime,endtime) 
, both cols are "timestamp(0) without time zone".
   the data distribution is skewed, not even. with first 5 times execution 
custom_plan, optimizer choose primary key, but when it start building generic 
plan and choose another index idx_, obviously generic plan make significant 
different rows and cost estimation.
below is the sql , sensitive info got masked here (tablename, columnname) .

   --with custom_plan
  Update on x  (cost=0.56..8.60 rows=1 width=2923) (actual 
time=0.030..0.031 rows=0 loops=1)
   Buffers: shared hit=4
   ->  Index Scan using pk_x on xxx  (cost=0.56..8.60 rows=1 
width=2923) (actual time=0.028..0.028 rows=0 loops=1)
 Index Cond: ((aid = '14654072'::bigint) AND (bid = 
'243379969878556159'::bigint) AND (btype = '0'::bigint))
 Filter: ((password IS NULL) AND ...) AND (starttime = '2071-12-31 
00:00:00'::timestamp without time zone) AND (endtime = '2072-01-01 
00:00:00'::timestamp without time zone) AND (opentime = '2022-11-07 09:
40:26'::timestamp without time zone)
 Buffers: shared hit=4
 Planning Time: 1.575 ms
 Execution Time: 0.123 ms

 --after 5 times execution, it start to build generic plan and thought generic 
plan cost=0.44..8.48 that less than the customer plan ,so it choose generic 
plan for following sql executions,
   Update on x  (cost=0.44..8.48 rows=1 width=2923) (actual 
time=8136.243..8136.245 rows=0 loops=1)
   Buffers: shared hit=1284549
   ->  Index Scan using idx_x_time on x  (cost=0.44..8.48 rows=1 
width=2923) (actual time=8136.242..8136.242 rows=0 loops=1)
 Index Cond: ((starttime = $7) AND (endtime = $8))
 Filter: ((password IS NULL) AND ...(aid = $4) AND (bid = $5) AND 
(btype = $6) AND...
 Rows Removed by Filter: 5534630
 Buffers: shared hit=1284549
 Planning Time: 0.754 ms
 Execution Time: 8136.302 ms

as a workaround, I remove "starttime" and "endtime" stats tuple from  
pg_statistic, and optimizer use a DEFAULT value with NULL stats tuple  so that 
index_path cost > the primary key index_path cost, following eqsel function 
logic,  postgres/selfuncs.c at REL_13_STABLE · postgres/postgres · 
GitHub
optimzer is very complicated, could you direct me how optimizer to do 
selectivity estimation when building generic plan, for this case? for 
custom_plan, optimizer knows boundparams values, but when generic_plan, 
planner() use boundparams=NULL,  it try to calculate average value based on mcv 
list of the index attributes (starttime,endtime)  ?
please check attached about sql details and pg_stats tuple for the index 
attributes.

Thanks,

James


RE: wrong rows and cost estimation when generic plan

2022-12-05 Thread James Pang (chaolpan)
Hi,
   It's a prepared sql statement on a non-partitioned table , 16millions tuples 
 and multiple indexes on this table. pk_x  primary key (aid,bid,btype) all 
3 cols are bigint datatype, there is another index idx_x(starttime,endtime) 
, both cols are "timestamp(0) without time zone".
   the data distribution is skewed, not even. with first 5 times execution 
custom_plan, optimizer choose primary key, but when it start building generic 
plan and choose another index idx_, obviously generic plan make significant 
different rows and cost estimation.
below is the sql , sensitive info got masked here (tablename, columnname) .

   --with custom_plan
  Update on x  (cost=0.56..8.60 rows=1 width=2923) (actual 
time=0.030..0.031 rows=0 loops=1)
   Buffers: shared hit=4
   ->  Index Scan using pk_x on xxx  (cost=0.56..8.60 rows=1 
width=2923) (actual time=0.028..0.028 rows=0 loops=1)
 Index Cond: ((aid = '14654072'::bigint) AND (bid = 
'243379969878556159'::bigint) AND (btype = '0'::bigint))
 Filter: ((password IS NULL) AND ...) AND (starttime = '2071-12-31 
00:00:00'::timestamp without time zone) AND (endtime = '2072-01-01 
00:00:00'::timestamp without time zone) AND (opentime = '2022-11-07 09:
40:26'::timestamp without time zone)
 Buffers: shared hit=4
 Planning Time: 1.575 ms
 Execution Time: 0.123 ms

 --after 5 times execution, it start to build generic plan and thought generic 
plan cost=0.44..8.48 that less than the customer plan ,so it choose generic 
plan for following sql executions,
   Update on x  (cost=0.44..8.48 rows=1 width=2923) (actual 
time=8136.243..8136.245 rows=0 loops=1)
   Buffers: shared hit=1284549
   ->  Index Scan using idx_x_time on x  (cost=0.44..8.48 rows=1 
width=2923) (actual time=8136.242..8136.242 rows=0 loops=1)
 Index Cond: ((starttime = $7) AND (endtime = $8))
 Filter: ((password IS NULL) AND ...(aid = $4) AND (bid = $5) AND 
(btype = $6) AND...
 Rows Removed by Filter: 5534630
 Buffers: shared hit=1284549
 Planning Time: 0.754 ms
 Execution Time: 8136.302 ms

as a workaround, I remove "starttime" and "endtime" stats tuple from  
pg_statistic, and optimizer use a DEFAULT value with NULL stats tuple  so that 
index_path cost > the primary key index_path cost, following eqsel function 
logic,  postgres/selfuncs.c at REL_13_STABLE · postgres/postgres · 
GitHub
optimzer is very complicated, could you direct me how optimizer to do 
selectivity estimation when building generic plan, for this case? for 
custom_plan, optimizer knows boundparams values, but when generic_plan, 
planner() use boundparams=NULL,  it try to calculate average value based on mcv 
list of the index attributes (starttime,endtime)  ?
please check attached about sql details and pg_stats tuple for the index 
attributes.

Thanks,

James
Postgres v13, on Redhat8 RHEL8.4

It's a prepared sql statement on a non-partitioned table , 
16millions tuples  and multiple indexes on this table. 
pk_x  primary key (aid,bid,btype) all 3 cols are bigint datatype, 
another index idx_x(starttime,endtime) , both cols are "timestamp(0) 
without time zone".    
the data distribution is skewed, not even. with first 5 times execution 
custom_plan, optimizer choose primary key, 
but when it start building generic plan and choose another index idx_, 
obviously generic plan make significant different rows and cost estimation.

SQL text:

UPDATE ""."x" set "bid" = $1::numeric::int8,"opentime" = 
$2,"lastmodifiedtime" = $3 
 where "aid" = $4::numeric::int8 AND "bid" = $5::numeric::int8 AND "btype" = 
$6::numeric::int8 
 AND "password" IS NULL  AND "starttime" = $7 AND "endtime" = $8 AND 


--with custom_plan
 
 Update on x  (cost=0.56..8.60 rows=1 width=2923) (actual time=0.030..0.031 
rows=0 loops=1)
   Buffers: shared hit=4
   ->  Index Scan using pk_x on xxx  (cost=0.56..8.60 rows=1 
width=2923) (actual time=0.028..0.028 rows=0 loops=1)
 Index Cond: ((aid = '14654072'::bigint) AND (bid = 
'243379969878556159'::bigint) AND (btype = '0'::bigint))
 Filter: ((password IS NULL) AND ...) AND (starttime = '2071-12-31 
00:00:00'::timestamp without time zone) AND (endtime = '2072-01-01 
00:00:00'::timestamp without time zone) AND (opentime = '2022-11-07 09:
40:26'::timestamp without time zone)
 Buffers: shared hit=4
 Planning Time: 1.575 ms
 Execution Time: 0.123 ms
 
 
--after 5 times exeution, it start to build generic plan

 Update on x  (cost=0.44..8.48 rows=1 width=2923) (actual 
time=8136.243..8136.245 rows=0 loops=1)
   Buffers: shared hit=1284549
   ->  Index Scan using idx_x_time on x  (cost=0.44..8.48 rows=1 
width=2923) (actual time=8136.242..8136.242 rows=0 loops=1)
 Index Cond: ((starttime = $7) AND (endtime = $8))
 Filter: ((password IS NULL) AND 

Re: wrong rows and cost estimation when generic plan

2022-12-05 Thread David Rowley
On Tue, 6 Dec 2022 at 18:28, James Pang (chaolpan)  wrote:
>->  Index Scan using idx_x_time on x  (cost=0.44..8.48 rows=1 
> width=2923) (actual time=8136.242..8136.242 rows=0 loops=1)
>  Index Cond: ((starttime = $7) AND (endtime = $8))
>  Filter: ((password IS NULL) AND ...(aid = $4) AND (bid = $5) AND 
> (btype = $6) AND...
>  Rows Removed by Filter: 5534630

I wonder if you did:

create statistics x_starttime_endtime_stats  (ndistinct) on
starttime,endtime from x;
analyze x;

if the planner would come up with a higher estimate than what it's
getting for the above and cause it to use the other index instead.

> optimzer is very complicated, could you direct me how optimizer to do 
> selectivity estimation when building generic plan, for this case? for 
> custom_plan, optimizer knows boundparams values, but when generic_plan, 
> planner() use boundparams=NULL,  it try to calculate average value based on 
> mcv list of the index attributes (starttime,endtime)  ?

IIRC, generic plan estimates become based on distinct estimations
rather than histograms or MCVs.

David




RE: wrong rows and cost estimation when generic plan

2022-12-05 Thread James Pang (chaolpan)
  No create statistics on starttime_endtime(distinct), that index on 
"starttime,endtime", use default analyze, I tested to increase 
statistics_targets but that no help.  Could you provide the function name for 
generic plan selectivity estimation?  

Thanks,

James

-Original Message-
From: David Rowley  
Sent: Tuesday, December 6, 2022 1:59 PM
To: James Pang (chaolpan) 
Cc: [email protected]; [email protected]
Subject: Re: wrong rows and cost estimation when generic plan

On Tue, 6 Dec 2022 at 18:28, James Pang (chaolpan)  wrote:
>->  Index Scan using idx_x_time on x  (cost=0.44..8.48 rows=1 
> width=2923) (actual time=8136.242..8136.242 rows=0 loops=1)
>  Index Cond: ((starttime = $7) AND (endtime = $8))
>  Filter: ((password IS NULL) AND ...(aid = $4) AND (bid = $5) AND 
> (btype = $6) AND...
>  Rows Removed by Filter: 5534630

I wonder if you did:

create statistics x_starttime_endtime_stats  (ndistinct) on 
starttime,endtime from x; analyze x;

if the planner would come up with a higher estimate than what it's getting for 
the above and cause it to use the other index instead.

> optimzer is very complicated, could you direct me how optimizer to do 
> selectivity estimation when building generic plan, for this case? for 
> custom_plan, optimizer knows boundparams values, but when generic_plan, 
> planner() use boundparams=NULL,  it try to calculate average value based on 
> mcv list of the index attributes (starttime,endtime)  ?

IIRC, generic plan estimates become based on distinct estimations rather than 
histograms or MCVs.

David