Re: Query planning read a large amount of buffers for partitioned tables

2025-02-18 Thread bruno vieira da silva
Hello, Your explanation aligns with the idea I had that having more
shared_buffers and connection pooling are very important in the context of
the partitioned tables.

Thanks. Regards.

On Tue, Feb 18, 2025 at 7:16 AM David Rowley  wrote:

> On Tue, 18 Feb 2025 at 09:18, bruno vieira da silva
>  wrote:
> >
> > Hello, I did a more comprehensive test with a different number of
> partitions and I found this:
> >
> > Summary buffers usage for the first call vs second call on the same
> session.
> >
> > Query 200, 100, 50, and 10 partitions:
> > 200 Partitions: 12,828 (100MB)
> > 100 Partitions:  9,329 (72MB)
> >  50 Partitions:  3,305 (25MB)
> >  10 Partitions:875 (7MB)
> >
> > Same query on the same session:
> > 200 Partitions:205 (1.6MB)
> > 100 Partitions:  5 (40KB)
> > 50  Partitions:  5 (40KB)
> > 10  Partitions:  5 (40KB)
> >
> > I did test on PG 17.3 no relevant changes.
> >
> > Question is, does it make sense?
>
> I didn't analyze this in great detail, but nothing looks too
> surprising to me. I get roughly the same numbers on the latest git
> master branch as you've shown above.
>
> A PostgreSQL backend will cache various metadata about relations the
> first time they're accessed in a backend.  Building those caches
> requires accessing the system catalogue tables. I expect the majority
> of the buffer accesses are for those tables. If you're curious about
> what's being accessed and have a fresh test instance handy, you could
> use strace to see which buffers are being read. You'll need to ensure
> the shared buffers are not caching anything. Restarting PostgreSQL
> should clear those out sufficiently. You can translate the filenodes
> back into relation names by using a query such as: select relname from
> pg_class where pg_relation_filenode(oid)=1259;
>
> If this is causing you problems then maybe a connection pooler would
> help you. With one of those, the backend will live longer than just 1
> query. You could also perhaps revisit your partition count to see if
> the number you've chosen gives you the best performance. It's very
> common for people to over-partition and not properly consider the
> overheads of partitioning.
>
> David
>


-- 
Bruno Vieira da Silva


Re: Query planning read a large amount of buffers for partitioned tables

2025-02-18 Thread David Rowley
On Tue, 18 Feb 2025 at 09:18, bruno vieira da silva
 wrote:
>
> Hello, I did a more comprehensive test with a different number of partitions 
> and I found this:
>
> Summary buffers usage for the first call vs second call on the same session.
>
> Query 200, 100, 50, and 10 partitions:
> 200 Partitions: 12,828 (100MB)
> 100 Partitions:  9,329 (72MB)
>  50 Partitions:  3,305 (25MB)
>  10 Partitions:875 (7MB)
>
> Same query on the same session:
> 200 Partitions:205 (1.6MB)
> 100 Partitions:  5 (40KB)
> 50  Partitions:  5 (40KB)
> 10  Partitions:  5 (40KB)
>
> I did test on PG 17.3 no relevant changes.
>
> Question is, does it make sense?

I didn't analyze this in great detail, but nothing looks too
surprising to me. I get roughly the same numbers on the latest git
master branch as you've shown above.

A PostgreSQL backend will cache various metadata about relations the
first time they're accessed in a backend.  Building those caches
requires accessing the system catalogue tables. I expect the majority
of the buffer accesses are for those tables. If you're curious about
what's being accessed and have a fresh test instance handy, you could
use strace to see which buffers are being read. You'll need to ensure
the shared buffers are not caching anything. Restarting PostgreSQL
should clear those out sufficiently. You can translate the filenodes
back into relation names by using a query such as: select relname from
pg_class where pg_relation_filenode(oid)=1259;

If this is causing you problems then maybe a connection pooler would
help you. With one of those, the backend will live longer than just 1
query. You could also perhaps revisit your partition count to see if
the number you've chosen gives you the best performance. It's very
common for people to over-partition and not properly consider the
overheads of partitioning.

David