Re: Catching up with performance & PostgreSQL 15

2022-12-01 Thread Andrew Dunstan


On 2022-11-30 We 11:36, Tom Lane wrote:
> Andres Freund  writes:
>> On November 30, 2022 3:47:32 AM PST, Andrew Dunstan  
>> wrote:
>>> I think Alvaro's point is that it would have been better to work out
>>> these wrinkles before turning on JIT by default. Based on anecdotal
>>> reports from the field I'm inclined to agree.
>> The problem is that back when it was introduced these problems didn't exist 
>> to a significant degree. JIT was developed when partitioning was very 
>> minimal- and the problems we're seeing are almost exclusively with queries 
>> with many partitions. The problems really only started much more recently. 
>> It also wasn't enabled in the first release..
> Well, wherever you want to pin the blame, it seems clear that we
> have a problem now.  And I don't think flipping back to off-by-default
> is the answer -- surely there is some population of users who will
> not be happy with that.  We really need to prioritize fixing the
> cost-estimation problems, and/or tweaking the default thresholds.
>
>   


+1


FTR I am not trying to pin blame anywhere. I think the work that's been
done on JIT is more than impressive.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Odd Choice of seq scan

2022-12-01 Thread Paul McGarry
Hi there,

I'm wondering if anyone has any insight into what might make the database
choose a sequential scan for a query (table defs and plan below) like :

SELECT orders.orderid FROM orders
WHERE (
orders.orderid IN ('546111')
  OR
orders.orderid IN  (select orderid FROM orderstotrans WHERE (transid IN
('546111')))
);

I have a couple of environments, all on Postgresql 13.7 and:
- on one the query executes with an sequential scan on the orders table
- on the other sequential scan on an index (ie walks index and filters,
rather than looking up ids on the index as an index condition.)

Plan and tables are below, but it seems to me that the planner knows the
subplan is going to return 1 row (max) and should "know" that there is a
max of 2 IDs to look up an indexes would be faster than a sequential scan
(of either table or index) and filter. I've tried re analyzing to make sure
stats are good and it hasn't helped

I can get a good plan that does use the index efficiently by using a union,
eg:

select orders.orderid FROM orders
WHERE (
orders.orderid IN (
  SELECT '546111'
  UNION
  SELECT orderid FROM orderstotrans WHERE (transid IN ('546111'))
)
);

but I want to understand what warning signs I should be aware of with the
original query that put it on the path of a bad plan, so I don't do it
again.


Plan - seq scan of table:
=
> explain
select orders.orderid FROM orders WHERE (orders.orderid IN ('546111') OR
orders.orderid IN  (select orderid FROM orderstotrans WHERE (transid IN
('546111';

  QUERY PLAN

---
 Seq Scan on orders  (cost=8.45..486270.87 rows=4302781 width=8)
   Filter: ((orderid = '546111'::bigint) OR (hashed SubPlan 1))
   SubPlan 1
 ->  Index Scan using orderstotrans_transid_key on orderstotrans
 (cost=0.43..8.45 rows=1 width=8)
   Index Cond: (transid = '546111'::bigint)
(5 rows)
=

Plan - Seq scan and filter of index:
=
> explain select orders.orderid FROM orders WHERE (orders.orderid IN
('546111') OR orders.orderid IN  (select orderid FROM orderstotrans WHERE
(transid IN ('546111';
  QUERY PLAN

---
 Index Only Scan using orders_pkey on orders  (cost=9.16..4067888.60
rows=64760840 width=8)
   Filter: ((orderid = '546111'::bigint) OR (hashed SubPlan 1))
   SubPlan 1
 ->  Index Scan using orderstotrans_transid_key on orderstotrans
 (cost=0.57..8.59 rows=1 width=8)
   Index Cond: (transid = '546111'::bigint)
(5 rows)
=


Tables:
=
   Table "test.orders"
Column|Type | Collation | Nullable
|   Default
--+-+---+--+--
 orderid  | bigint  |   | not null |
 istest   | smallint|   | not null
| 0
 orderstatusid| integer |   |  |
 customername | text|   |  |
 customeraddress  | text|   |  |
 customercountry  | text|   |
 |
 customercity | text|   |
 |
 customerstate| text|   |
 |
 customerzip  | text|   |  |
"orders_pkey" PRIMARY KEY, btree (orderid)

  Table "test.orderstotrans"
   Column|  Type   | Collation | Nullable | Default
-+-+---+--+-
 orderid | bigint  |   |  |
 transid | bigint  |   |  |
 orderitemid | integer |   |  |
Indexes:
"orderstotrans_orderid_idx" btree (orderid)
"orderstotrans_orderitemid_idx" btree (orderitemid)
"orderstotrans_transid_key" UNIQUE, btree (transid)


Happier plan for the union version:

explain select orders.orderid FROM orders
WHERE (
orders.orderid IN (
  SELECT '3131275553'
  UNION
  select orderid FROM orderstotrans WHERE (transid IN ('3131275553'))
)
);

 QUERY PLAN

-
 Nested Loop  (cost=9.21..21.84 rows=2 width=8) (actual time=0.034..0.043
rows=1 loops=1)
   ->  Unique  (cost=8.64..8.65 rows=2 width=8) (actual time=0.024..0.026
rows=2 loops=1)
 ->  Sort  (cost=8.64..8.64 rows=2 width=8) (actual
time=0.023..0.024 rows=2 loops=1)
   Sort Key: ('3131275553'::bigint)
   Sort Method: quicksort  Memory: 25kB
   ->  Append  (cost=0.00..8.63 rows=2 width=8) (actual
time=0.001..0.0

Re: Odd Choice of seq scan

2022-12-01 Thread Justin Pryzby
On Fri, Dec 02, 2022 at 11:52:19AM +1100, Paul McGarry wrote:
> Hi there,
> 
> I'm wondering if anyone has any insight into what might make the database
> choose a sequential scan for a query (table defs and plan below) like :

> Plan - seq scan of table:
> =
> > explain select orders.orderid FROM orders WHERE (orders.orderid IN 
> > ('546111') OR orders.orderid IN  (select orderid FROM orderstotrans WHERE 
> > (transid IN ('546111';

> Plan - Seq scan and filter of index:
> =
> > explain select orders.orderid FROM orders WHERE (orders.orderid IN 
> > ('546111') OR orders.orderid IN  (select orderid FROM orderstotrans WHERE 
> > (transid IN ('546111';

Could you show explain analyze ?

Show the size of the table and its indexes 
And GUC settings
And the "statistics" here:
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram
Maybe on both a well-behaving instance and a badly-beving instance.

-- 
Justin




Re: Odd Choice of seq scan

2022-12-01 Thread Ronuk Raval
On Thu, Dec 1, 2022 at 8:21 PM Justin Pryzby  wrote:
> Could you show explain analyze ?
>
> Maybe on both a well-behaving instance and a badly-beving instance.

Apologies for barging into this thread with a potentially unrelated
"me too" but here's a similar OR-causes-seqscan from 2018:
https://www.postgresql.org/message-id/CAPhHnhpc6bdGbRBa9hG7FQiKByVqR3s37VoY64DSMUxjeJGOjQ%40mail.gmail.com

I don't have other versions handy but can confirm that the problem
exists on Postgres 11.17 (dated but newer than the 10.1 in that post).

We've been working around the problem by rewriting queries to use UNION instead.




Re: Odd Choice of seq scan

2022-12-01 Thread Tom Lane
Ronuk Raval  writes:
> We've been working around the problem by rewriting queries to use UNION 
> instead.

Yeah, that.  The real issue here is that the seqscan and indexscan plans
both suck, because they will both run that sub-select for every row
in the table.  The index-only plan might fetch fewer blocks along the
way, because it only has to read the index not the table proper ...
but that's only true if the table's pages are mostly marked all-visible.
(My bet about the plan instability is that the planner might choose
differently depending on how much of the table it believes is
all-visible.)  That only helps a bit, though.

What you really want to have happen, assuming there are not too many
interesting orderid values, is to do a point indexscan for each one
of them.  Currently the planner won't think of that by itself when
faced with OR'd conditions in WHERE.  You have to help it along with
UNION or some similar locution.

regards, tom lane