Incorrect index used in few cases..
Hi
We recently upgraded one of the instances from 9.6.3 to 9.6.12 and seeing
following issue that occurs for few cases.
I have tried running analyze on the table with different values from 1000 -
5000 but it doesn't seem to help the issue. There is some skew in a_id
but the combination index i_tc_adid_tid btree (a_id, id) makes the index
unique as it includes primary key.
Is there an explanation why it is using incorrect index?
SQL:
SELECT count(*) FROM tc WHERE ((tc.a_id = $1)) AND ((tc.m_id = $2)) AND
((tc.ag_id is not null)) AND ((tc.id in ($3))) AND ((tc.pt in ($4, $5, $6)))
Indexes on the table:
i_tc_adid_tid btree (a_id, id)
pk_id PRIMARY KEY, btree (id)
i_agi_tc_tcn btree (ag_id, tname) ---> index that gets used
duration: 49455.649 ms execute S_10: SELECT count(*) FROM tc WHERE
((tc.a_id = $1)) AND ((tc.m_id = $2)) AND ((tc.ag_id is not null)) AND ((
tc.id in ($3))) AND ((tc.pt in ($4, $5, $6)))
DETAIL: parameters: $1 = '11786959222', $2 = '6', $3 = '54460816501', $4 =
'3', $5 = '6', $6 = '103'
LOG: duration: 49455.639 ms plan:
Query Text: SELECT count(*) FROM tc WHERE ((tc.a_id = $1)) AND
((tc.m_id = $2)) AND ((tc.ag_id is not null)) AND ((tc.id in ($3))) AND ((
tc.pt in ($4, $5, $6)))
Aggregate (cost=5009342.34..5009342.35 rows=1 width=8) (actual
time=49455.626..49455.626 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=56288997
-> Index Scan using i_agi_tc_tcn on b.tc (cost=0.57..5009342.34
rows=1 width=0) (actual time=46452.555..49455.616 rows=1 loops=1)
Output: id, tname, ...
Index Cond: (tc.ag_id IS NOT NULL)
Filter: ((tc.a_id = '11786959222'::numeric) AND (tc.m_id =
'6'::numeric) AND (tc.id = '54460816501'::numeric) AND (tc.pt = ANY
('{3,6,103}'::numeric[])))
Rows Removed by Filter: 70996637
Buffers: shared hit=56288997
Thanks
Re: Incorrect index used in few cases..
AminPG Jaffer writes: > Is there an explanation why it is using incorrect index? > SQL: > SELECT count(*) FROM tc WHERE ((tc.a_id = $1)) AND ((tc.m_id = $2)) AND > ((tc.ag_id is not null)) AND ((tc.id in ($3))) AND ((tc.pt in ($4, $5, $6))) What data types are these columns? For that matter, could we see the whole schema for the table (psql \d+ output or equivalent)? regards, tom lane
Re: Incorrect index used in few cases..
Here is the table structure.
Column |Type |
Modifiers
---+-+---
id| numeric(38,0) | not null
tname | character varying(255) | not null
ag_id | numeric(38,0) |
tc| character varying(255) | not null
status| numeric(10,0) | not null
internal_status | numeric(10,0) | not null
create_date | timestamp(6) with time zone | not null
version | numeric(38,0) | not null
match_type| numeric(10,0) | not null default 0
c_id | numeric(38,0) | not null
m_id | numeric(38,0) | not null
a_id | numeric(38,0) | not null
maxb | numeric(18,6) |
b_cc | character varying(10) |
ui_status | numeric(10,0) | not null default 0
destination_url | character varying(2084) |
created_by| character varying(64) | not null
creation_date | timestamp(0) with time zone | not null default
timezone('UTC'::text, clock_timestamp())
last_updated_by | character varying(64) | not null
last_updated_date | timestamp(0) with time zone | not null
pr| numeric(5,0)| not null default 0
ts| numeric(1,0)| not null default 0
uniqueness_hash_v2| numeric(29,0) | not null
pt| numeric(5,0)|
history | bigint |
t_secondary | text|
Indexes:
"pk_id" PRIMARY KEY, btree (id)
"i_agi_tc_tcn" btree (ag_id, tname)
"i_cid_agid_tcn" btree (c_id, ag_id, tname)
"i_tc_adid_tid" btree (a_id, id)
"i_tc_advertiser_id" btree (a_id)
"i_tc_campaign_id" btree (c_id)
"i_tc_lud_agi" btree (last_updated_date, ag_id)
"i_tc_uniqueness_hash_v2" btree (uniqueness_hash_v2)
Check constraints:
"tc_secondary" CHECK (length(t_secondary) <= 4500)
On Tue, Jun 18, 2019 at 6:35 AM Tom Lane wrote:
> AminPG Jaffer writes:
> > Is there an explanation why it is using incorrect index?
>
> > SQL:
> > SELECT count(*) FROM tc WHERE ((tc.a_id = $1)) AND ((tc.m_id = $2)) AND
> > ((tc.ag_id is not null)) AND ((tc.id in ($3))) AND ((tc.pt in ($4, $5,
> $6)))
>
> What data types are these columns? For that matter, could we see the
> whole schema for the table (psql \d+ output or equivalent)?
>
> regards, tom lane
>
Re: Incorrect index used in few cases..
AminPG Jaffer writes:
> Here is the table structure.
Hpmh. I thought it was just barely possible that you had a datatype
mismatch between the columns and the parameters, but nope, the columns
are "numeric" just like the parameters.
I'm pretty baffled. I tried to duplicate the problem with some dummy
data (as attached) and could not. In my hands, it wants to use the
i_tc_adid_tid index, or if I drop that then the pkey index, and any
other possible plan is orders of magnitude more expensive than those.
Another far-fetched theory is that the theoretically-better indexes
are so badly bloated as to discourage the planner from using them.
You could eliminate that one by checking the index sizes with "\di+".
Are you perhaps running with non-default values for any planner cost
parameters? Or it's not a stock build of Postgres?
If you could find a way to adjust the attached example so that it
produces the same misbehavior you see with live data, that would be
very interesting ...
regards, tom lane
drop table tc;
create table tc(
id numeric(38,0)primary key,
tname character varying(255) not null,
ag_id numeric(38,0) ,
tc character varying(255) not null,
status numeric(10,0)not null,
internal_statusnumeric(10,0)not null,
create_datetimestamp(6) with time zone not null,
versionnumeric(38,0)not null,
match_type numeric(10,0)not null default 0,
c_id numeric(38,0)not null,
m_id numeric(38,0)not null,
a_id numeric(38,0)not null,
maxb numeric(18,6) ,
b_cc character varying(10) ,
ui_status numeric(10,0)not null default 0,
destination_urlcharacter varying(2084) ,
created_by character varying(64)not null,
creation_date timestamp(0) with time zone not null default
timezone('UTC'::text, clock_timestamp()),
last_updated_bycharacter varying(64)not null,
last_updated_date timestamp(0) with time zone not null,
pr numeric(5,0) not null default 0,
ts numeric(1,0) not null default 0,
uniqueness_hash_v2 numeric(29,0)not null,
pt numeric(5,0),
historybigint ,
t_secondarytextCHECK (length(t_secondary) <= 4500)
);
insert into tc
select
x as id,
'tname_' || x as tname,
case when x % 10 = 0 then null else x end as ag_id,
'tc' as tc,
0 as status,
0 as internal_status,
now() as create_date,
42 as version,
0 as match_type,
(random()*1000)::int as c_id,
x/4 as m_id,
(random()*100)::int as a_id,
0 as maxb,
null as b_cc,
0 as ui_status,
null as destination_url,
'c_b' as created_by,
now() as creation_date,
'l_u' as last_updated_by,
now() as last_updated_date,
0 as pr,
0 as ts,
x as uniqueness_hash_v2,
x % 100 as pt,
0 as history,
null as t_secondary
from generate_series(1,100) x;
create index"i_agi_tc_tcn" on tc (ag_id, tname);
create index"i_cid_agid_tcn" on tc (c_id, ag_id, tname);
create index"i_tc_adid_tid" on tc (a_id, id);
create index"i_tc_advertiser_id" on tc (a_id);
create index"i_tc_campaign_id" on tc (c_id);
create index"i_tc_lud_agi" on tc (last_updated_date, ag_id);
create index"i_tc_uniqueness_hash_v2" on tc (uniqueness_hash_v2);
vacuum analyze tc;
prepare p as
SELECT count(*) FROM tc WHERE ((tc.a_id = $1)) AND ((tc.m_id = $2)) AND
((tc.ag_id is not null)) AND ((tc.id in ($3))) AND ((tc.pt in ($4, $5, $6)));
explain verbose execute p(0, 1, 42, 4,5,6);
Re: Incorrect index used in few cases..
On Tue, Jun 18, 2019 at 2:08 PM Tom Lane wrote: > Are you perhaps running with non-default values for any planner cost > parameters? Or it's not a stock build of Postgres? > > If you could find a way to adjust the attached example so that it > produces the same misbehavior you see with live data, that would be > very interesting ... FWIW, if you move the CREATE INDEX statements before the INSERT, and compared earlier versions of Postgres to 12, you'll see that the size of some of the indexes are a lot smaller on 12. v11 (representative of 9.6): pg@tc:5411 [1067]=# \di+ i_* List of relations Schema │ Name │ Type │ Owner │ Table │ Size │ Description ┼─┼───┼───┼───┼───┼─ public │ i_agi_tc_tcn│ index │ pg│ tc│ 74 MB │ public │ i_cid_agid_tcn │ index │ pg│ tc│ 82 MB │ public │ i_tc_adid_tid │ index │ pg│ tc│ 57 MB │ public │ i_tc_advertiser_id │ index │ pg│ tc│ 27 MB │ public │ i_tc_campaign_id│ index │ pg│ tc│ 28 MB │ public │ i_tc_lud_agi│ index │ pg│ tc│ 57 MB │ public │ i_tc_uniqueness_hash_v2 │ index │ pg│ tc│ 21 MB │ (7 rows) v12/master: pg@regression:5432 [1022]=# \di+ i_* List of relations Schema │ Name │ Type │ Owner │ Table │ Size │ Description ┼─┼───┼───┼───┼───┼─ public │ i_agi_tc_tcn│ index │ pg│ tc│ 69 MB │ public │ i_cid_agid_tcn │ index │ pg│ tc│ 78 MB │ public │ i_tc_adid_tid │ index │ pg│ tc│ 36 MB │ public │ i_tc_advertiser_id │ index │ pg│ tc│ 20 MB │ public │ i_tc_campaign_id│ index │ pg│ tc│ 24 MB │ public │ i_tc_lud_agi│ index │ pg│ tc│ 30 MB │ public │ i_tc_uniqueness_hash_v2 │ index │ pg│ tc│ 21 MB │ (7 rows) Note that i_tc_lud_agi is 30 MB, not 57MB, and that i_tc_adid_tid is 36 MB, not 57 MB. I can see that both i_tc_lud_agi and i_tc_adid_tid consistently use the "split after new tuple" optimization on v12. -- Peter Geoghegan
Re: Incorrect index used in few cases..
Hi, On 2019-06-18 06:11:54 -0700, AminPG Jaffer wrote: > We recently upgraded one of the instances from 9.6.3 to 9.6.12 and seeing > following issue that occurs for few cases. > > We recently upgraded one of the instances from 9.6.3 to 9.6.12 and seeing > following issue that occurs for few cases. > > I have tried running analyze on the table with different values from 1000 - > 5000 but it doesn't seem to help the issue. There is some skew in a_id > but the combination index i_tc_adid_tid btree (a_id, id) makes the index > unique as it includes primary key. > > Is there an explanation why it is using incorrect index? > > SQL: > SELECT count(*) FROM tc WHERE ((tc.a_id = $1)) AND ((tc.m_id = $2)) AND > ((tc.ag_id is not null)) AND ((tc.id in ($3))) AND ((tc.pt in ($4, $5, $6))) > > Indexes on the table: >i_tc_adid_tid btree (a_id, id) >pk_id PRIMARY KEY, btree (id) >i_agi_tc_tcn btree (ag_id, tname) ---> index that gets used Are those indexes used for other queries? Any chance they've been recently created? SELECT indexrelid::regclass, xmin, indcheckxmin, indisvalid, indisready, indislive, txid_current(), txid_current_snapshot() FROM pg_index WHERE indrelid = 'tc'::regclass; might tell us. On 2019-06-18 17:07:55 -0400, Tom Lane wrote: > I'm pretty baffled. I tried to duplicate the problem with some dummy > data (as attached) and could not. In my hands, it wants to use the > i_tc_adid_tid index, or if I drop that then the pkey index, and any > other possible plan is orders of magnitude more expensive than those. > Another far-fetched theory is that the theoretically-better indexes > are so badly bloated as to discourage the planner from using them. > You could eliminate that one by checking the index sizes with "\di+". > > Are you perhaps running with non-default values for any planner cost > parameters? Or it's not a stock build of Postgres? > > If you could find a way to adjust the attached example so that it > produces the same misbehavior you see with live data, that would be > very interesting ... Amin, might be worth to see what the query plan is if you disable that index. I assume it's too big to quickly drop (based on the ? Something like: BEGIN; LOCK tc; UPDATE pg_index SET indisvalid = false WHERE indexrelid = 'name_of_index'::regclass AND indisvalid; EXPLAIN yourquery; ROLLBACK; might allow to test that without actually dropping the index. But that of course requires superuser access. Greetings, Andres Freund
Re: Incorrect index used in few cases..
Andres Freund writes: > Are those indexes used for other queries? Any chance they've been > recently created? > SELECT indexrelid::regclass, xmin, indcheckxmin, indisvalid, indisready, > indislive, txid_current(), txid_current_snapshot() > FROM pg_index WHERE indrelid = 'tc'::regclass; > might tell us. Oh, that's a good idea. > Amin, might be worth to see what the query plan is if you disable that > index. I assume it's too big to quickly drop (based on the ? Considering that the "right" query plan would have a cost estimate in the single digits or close to it, I have to suppose that the planner is rejecting that index as unusable, not making a cost-based decision not to use it. (Well, maybe if it's bloated by three orders of magnitude compared to the other indexes, it'd lose on cost. Doesn't seem likely though.) So I think we're looking for a hard "can't use the index" reason, and now we've eliminated datatype mismatch which'd be the most obvious such reason. But index-isnt-valid or index-isnt-ready might do the trick. regards, tom lane
