Re: Postgresql 13 partitioning advice

2022-08-03 Thread Ameya Bidwalkar
Hello David,

Thank you for the valuable inputs.We will test these scenarios .

Regards,
Ameya

On Tue, Aug 2, 2022 at 12:16 PM David Rowley  wrote:

> On Tue, 2 Aug 2022 at 19:48, Ameya Bidwalkar
>  wrote:
> > We have a Postgresql 13 database where we have a single table with
> several millions of rows  . We plan to partition it based on timestamp .
> > We have  been seeking  advice for best practices for building this.
> > This table will get lots of updates for the same rows during a short
> period of time.During this time rows would be in a single partition .
> > After this short time these rows would move to  another partition
> .Where no more updates take place on these rows.But might  have some SELECT
> queries  running.
> > We plan to l have  partitions based on months and then roll them up in a
> year  and then archive these older partitions
> > One consultant we talked with told us this  row movement between the
> partitions will have
> > huge complications .But  this was an issue during the Postgres 10
> version .
>
> Define "huge complications"?
>
> The capabilities of partitioned tables have changed quite a bit since
> the feature was added. It's very easy for knowledge to get out-dated
> in this area.  I did quite a bit of work on them and I struggle to
> remember off the top of my head which versions saw which improvements.
> PG12 saw lots. See [1], search for "partition".
>
> One possible complication is what is mentioned in [2] about
> "serialization failure error".  UPDATEs that cause a tuple to move to
> another partition can cause a serialization failure at transaction
> isolation level, not just serializable transactions. If it's not
> already, you might want to have your application retry transactions on
> SQL:40001 errors.
>
> Apart from that, assuming there's comparatively a small number of rows
> in the partition being updated compared to the partition with the
> static rows, then it sounds fairly efficient. As you describe it,  the
> larger static partition is effectively INSERT only and auto-vacuum
> will need to touch it only for tuple freezing work.  The smaller of
> the two tables will receive more churn but will be faster to vacuum.
> PG13 got a new feature that makes sure auto-vacuum also does the
> rounds on INSERT-only tables too, so the static partition is not going
> to be neglected until anti-wrap-around-autovacuums trigger, like they
> would have in PG12 and earlier.
>
> Another thing to consider is that an UPDATE of a non-partitioned table
> has a chance at being a HOT update. That's possible if the tuple can
> fit on the same page and does not update any of the indexed columns. A
> HOT update means no indexes need to be updated so these perform faster
> and require less space in WAL than a non-HOT update.  An UPDATE that
> moves a tuple to another partition can never be a HOT update.  That's
> something you might want to consider. If you're updating indexed
> columns already then it's not a factor to consider.There's also
> overhead to postgres having to find the partition for the newly
> updated version of the tuple.  That's not hugely expensive, but it's
> generally measurable. RANGE partitioned tables with a large number of
> partitions will have the most overhead for this. HASH partitioned
> tables, the least.
>
> The best thing you can likely do is set up a scenario with pgbench and
> compare the performance. pgbench is a pretty flexible tool that will
> allow you to run certain queries X% of the time and even throttle the
> workload at what you expect your production server to experience. You
> could then run it overnight on a test server, or even for weeks and
> see how auto-vacuum keeps up when compared to the non-partitioned
> case. You can also check how much extra WAL is generated vs the
> non-partitioned case.
>
> > So we are seeking advice on the performance perspective and things we
> should take care of along with manual vacuums on a regular schedule and
> indexing.
> > Are there any tunables I should experiment with in particular ?
>
> Perhaps if you want to keep a small high-chun table in check you might
> want to consider if autovacuum_naptime is set low enough. You may not
> care if the space being consumed in the standard 1min
> autovacuum_naptime is small enough not to be of concern.
>
> David
>
> [1] https://www.postgresql.org/docs/release/12.0/
> [2] https://www.postgresql.org/docs/13/sql-update.html
>


Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

2022-08-03 Thread bruno da silva
Hello Tom. Thanks for your response.
I spent most of the time looking for evidence and checking other
installations with similar patterns since your response.

this installation is in the habit of doing pg_stat_statements_reset() a lot?
* resetting is very rare. How can I get "pgss->mean_query_len" via sql?*

Maybe it does get truncated, but then the cycle repeats after a while?
*it is possible as the slowness happened some days apart 3 times.*

*Question: *Besides the gc issue that you mentioned, having a large ( 700MB
or 1GB ) pgss_query_texts.stat could cause slowness in pg_stat_statement
processing
than leading to slower query responses with a 32bit PG? I'm thinking in
reducing pg_stat_statements.max from 10k to 3k


Thanks

On Tue, Aug 2, 2022 at 3:14 PM Tom Lane  wrote:

> I wrote:
> > bruno da silva  writes:
> >> Do you have a lot of especially long statements being tracked
> >> in the pg_stat_statements view?* well, the view was showing the query
> >> column null.*
> >> * but looking on  pgss_query_texts.stat there are very large sql
> >> statements, of around ~ 400kb, multiple thousands. *
>
> I see one possible piece of the puzzle here: since you're using a 32-bit
> build, overflowing size_t is a reachable hazard.  Specifically, in this
> test to see if we need to garbage-collect the query text file:
>
> if (extent < pgss->mean_query_len * pgss_max * 2)
> return false;
>
> You said earlier that pg_stat_statements.max = 1, so a mean_query_len
> exceeding about 2^32 / 1 / 2 = 214748.3648 would be enough to overflow
> size_t and break this comparison.  Now, a mean SQL query length in excess
> of 200kB sounds mighty improbable, but it's really the mean length of the
> query texts in the view.  If your "normal" queries fall into just a few
> patterns they might be represented by a relatively small number of view
> entries.  And if the "big" queries are sufficiently not alike, they might
> each get their own view entry, which could potentially drive the mean high
> enough to cause trouble.  It'd be interesting to track what
> "SELECT avg(length(query)) FROM pg_stat_statements" gives.
>
> However, even if we grant that mean_query_len is that big, overflow here
> would make garbage collection of the query text file more likely not less
> so.  What I'm speculating is that overflow is occurring and causing all
> processes to decide they need to run gc_qtexts() every time they insert
> a new query entry, even though the query texts file isn't actually
> bloated.  That could possibly explain your performance issues: a garbage
> collection pass over a multi-gig file will take awhile, and what's worse
> is that it's done under an exclusive lock, meaning that all the backends
> stack up waiting their turn to perform a useless GC pass.
>
> What this doesn't explain is why the condition doesn't clear once you
> observe one of those "out of memory" complaints, because that should
> lead to truncating the texts file.  Maybe it does get truncated, but
> then the cycle repeats after awhile?  If you have a steady stream of
> incoming new 400kB queries, you could build back up to 2.2GB of text
> after five thousand or so of those.
>
> I'm also curious whether this installation is in the habit of doing
> pg_stat_statements_reset() a lot.  It looks like that fails to
> reset mean_query_len, which might be intentional but perhaps it
> could play into getting a silly result here later on.
>
> regards, tom lane
>


-- 
Bruno da Silva


Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

2022-08-03 Thread Tom Lane
bruno da silva  writes:
> *Question: *Besides the gc issue that you mentioned, having a large ( 700MB
> or 1GB ) pgss_query_texts.stat could cause slowness in pg_stat_statement
> processing
> than leading to slower query responses with a 32bit PG? I'm thinking in
> reducing pg_stat_statements.max from 10k to 3k

Whether or not we've fully identified the problem, I think cutting
pg_stat_statements.max is a good idea.  Especially as long as you're
stuck on an unsupported PG version.

regards, tom lane