Partitioning update-heavy queue with hash partitions vs partial indexes

2023-08-10 Thread Dorian Hoxha
Hi list,

I have a queue table with the schema:

```
create table queue(id bigserial primary key, view_time timestamp with
timezone not null);
create index queue_view_time ON queue(view_time ASC);
```

The most concurrent operation is:
```
UPDATE queue SET view_time=view_time+INTERVAL '60 seconds' WHERE id=(
SELECT id FROM queue WHERE view_time<=now() at time zone 'utc' ORDER BY
view_time ASC LIMIT 1 FOR UPDATE SKIP LOCKED
)
```
As you can imagine, with increased concurrency, this query will have to
read & skip a lot of locked+dead index entries, so taking a lot of cpu-time.

I'm assuming 10K+ queries/second will do the update above and actually
return a row.
You may think about how you'll maintain 10K connections, but you can
increase the limit, the queries being fast, use a connection pooler, use
auto-commit, etc.

--

Since most of the overhead is in the `queue_view_time` index, I thought of
partitioning just that with partial indexes and then querying the indexes
randomly. This is with 2 partitions:

```
create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=0;
create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=1;
```
Adding `where id%2=0` to the select query above and trying the partitions
randomly until I get a row or searched all partitions.


But looking at the docs
https://www.postgresql.org/docs/current/indexes-partial.html, it says:

> Do Not Use Partial Indexes as a Substitute for Partitioning
> While a search in this larger index might have to descend through a
couple more tree levels than a search in a smaller index, that's almost
certainly going to be cheaper than the planner effort needed to select the
appropriate one of the partial indexes. The core of the problem is that the
system does not understand the relationship among the partial indexes, and
will laboriously test each one to see if it's applicable to the current
query.

Would this be true in my case too?

Is it faster for the planner to select a correct partition(hash
partitioning on `id` column) instead of a correct partial index like in my
case? I don't think I'll need more than ~32 partitions/partial-indexes in
an extreme scenario.

Regards,
Dorian


Re: Partitioning update-heavy queue with hash partitions vs partial indexes

2023-08-10 Thread David Rowley
On Thu, 10 Aug 2023 at 20:36, Dorian Hoxha  wrote:
> > Do Not Use Partial Indexes as a Substitute for Partitioning
> > While a search in this larger index might have to descend through a couple 
> > more tree levels than a search in a smaller index, that's almost certainly 
> > going to be cheaper than the planner effort needed to select the 
> > appropriate one of the partial indexes. The core of the problem is that the 
> > system does not understand the relationship among the partial indexes, and 
> > will laboriously test each one to see if it's applicable to the current 
> > query.
>
> Would this be true in my case too?

Yes.  The process of determining which partial indexes are valid for
the given query must consider each index one at a time and validate
the index's WHERE clause against the query's WHERE clause to see if it
can be used.  There is no shortcut that sees you have a series of
partial indexes with WHERE id % 10 = N; which just picks 1 index
without searching all of them.

> Is it faster for the planner to select a correct partition(hash partitioning 
> on `id` column) instead of a correct partial index like in my case? I don't 
> think I'll need more than ~32 partitions/partial-indexes in an extreme 
> scenario.

I mean, test it and find out, but probably, yes, the partition pruning
code for hash partitioning is an O(1) operation and is very fast.
Once the given Constants have been hashed, finding the partition is
just a single divide operation away.

David




Re: Partitioning update-heavy queue with hash partitions vs partial indexes

2023-08-10 Thread burcinyazici
hi,
Consider adding id%10 as a new column?you will have one time burden but after 
creating index on it, update perf will satisfy.

Burcin 📱

> On 11 Aug 2023, at 07:49, David Rowley  wrote:
> 
> On Thu, 10 Aug 2023 at 20:36, Dorian Hoxha  wrote:
>>> Do Not Use Partial Indexes as a Substitute for Partitioning
>>> While a search in this larger index might have to descend through a couple 
>>> more tree levels than a search in a smaller index, that's almost certainly 
>>> going to be cheaper than the planner effort needed to select the 
>>> appropriate one of the partial indexes. The core of the problem is that the 
>>> system does not understand the relationship among the partial indexes, and 
>>> will laboriously test each one to see if it's applicable to the current 
>>> query.
>> 
>> Would this be true in my case too?
> 
> Yes.  The process of determining which partial indexes are valid for
> the given query must consider each index one at a time and validate
> the index's WHERE clause against the query's WHERE clause to see if it
> can be used.  There is no shortcut that sees you have a series of
> partial indexes with WHERE id % 10 = N; which just picks 1 index
> without searching all of them.
> 
>> Is it faster for the planner to select a correct partition(hash partitioning 
>> on `id` column) instead of a correct partial index like in my case? I don't 
>> think I'll need more than ~32 partitions/partial-indexes in an extreme 
>> scenario.
> 
> I mean, test it and find out, but probably, yes, the partition pruning
> code for hash partitioning is an O(1) operation and is very fast.
> Once the given Constants have been hashed, finding the partition is
> just a single divide operation away.
> 
> David
> 
>