Re: Postgresql 13 partitioning advice
Hello David, Thank you for the valuable inputs.We will test these scenarios . Regards, Ameya On Tue, Aug 2, 2022 at 12:16 PM David Rowley wrote: > On Tue, 2 Aug 2022 at 19:48, Ameya Bidwalkar > wrote: > > We have a Postgresql 13 database where we have a single table with > several millions of rows . We plan to partition it based on timestamp . > > We have been seeking advice for best practices for building this. > > This table will get lots of updates for the same rows during a short > period of time.During this time rows would be in a single partition . > > After this short time these rows would move to another partition > .Where no more updates take place on these rows.But might have some SELECT > queries running. > > We plan to l have partitions based on months and then roll them up in a > year and then archive these older partitions > > One consultant we talked with told us this row movement between the > partitions will have > > huge complications .But this was an issue during the Postgres 10 > version . > > Define "huge complications"? > > The capabilities of partitioned tables have changed quite a bit since > the feature was added. It's very easy for knowledge to get out-dated > in this area. I did quite a bit of work on them and I struggle to > remember off the top of my head which versions saw which improvements. > PG12 saw lots. See [1], search for "partition". > > One possible complication is what is mentioned in [2] about > "serialization failure error". UPDATEs that cause a tuple to move to > another partition can cause a serialization failure at transaction > isolation level, not just serializable transactions. If it's not > already, you might want to have your application retry transactions on > SQL:40001 errors. > > Apart from that, assuming there's comparatively a small number of rows > in the partition being updated compared to the partition with the > static rows, then it sounds fairly efficient. As you describe it, the > larger static partition is effectively INSERT only and auto-vacuum > will need to touch it only for tuple freezing work. The smaller of > the two tables will receive more churn but will be faster to vacuum. > PG13 got a new feature that makes sure auto-vacuum also does the > rounds on INSERT-only tables too, so the static partition is not going > to be neglected until anti-wrap-around-autovacuums trigger, like they > would have in PG12 and earlier. > > Another thing to consider is that an UPDATE of a non-partitioned table > has a chance at being a HOT update. That's possible if the tuple can > fit on the same page and does not update any of the indexed columns. A > HOT update means no indexes need to be updated so these perform faster > and require less space in WAL than a non-HOT update. An UPDATE that > moves a tuple to another partition can never be a HOT update. That's > something you might want to consider. If you're updating indexed > columns already then it's not a factor to consider.There's also > overhead to postgres having to find the partition for the newly > updated version of the tuple. That's not hugely expensive, but it's > generally measurable. RANGE partitioned tables with a large number of > partitions will have the most overhead for this. HASH partitioned > tables, the least. > > The best thing you can likely do is set up a scenario with pgbench and > compare the performance. pgbench is a pretty flexible tool that will > allow you to run certain queries X% of the time and even throttle the > workload at what you expect your production server to experience. You > could then run it overnight on a test server, or even for weeks and > see how auto-vacuum keeps up when compared to the non-partitioned > case. You can also check how much extra WAL is generated vs the > non-partitioned case. > > > So we are seeking advice on the performance perspective and things we > should take care of along with manual vacuums on a regular schedule and > indexing. > > Are there any tunables I should experiment with in particular ? > > Perhaps if you want to keep a small high-chun table in check you might > want to consider if autovacuum_naptime is set low enough. You may not > care if the space being consumed in the standard 1min > autovacuum_naptime is small enough not to be of concern. > > David > > [1] https://www.postgresql.org/docs/release/12.0/ > [2] https://www.postgresql.org/docs/13/sql-update.html >
Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries
Hello Tom. Thanks for your response. I spent most of the time looking for evidence and checking other installations with similar patterns since your response. this installation is in the habit of doing pg_stat_statements_reset() a lot? * resetting is very rare. How can I get "pgss->mean_query_len" via sql?* Maybe it does get truncated, but then the cycle repeats after a while? *it is possible as the slowness happened some days apart 3 times.* *Question: *Besides the gc issue that you mentioned, having a large ( 700MB or 1GB ) pgss_query_texts.stat could cause slowness in pg_stat_statement processing than leading to slower query responses with a 32bit PG? I'm thinking in reducing pg_stat_statements.max from 10k to 3k Thanks On Tue, Aug 2, 2022 at 3:14 PM Tom Lane wrote: > I wrote: > > bruno da silva writes: > >> Do you have a lot of especially long statements being tracked > >> in the pg_stat_statements view?* well, the view was showing the query > >> column null.* > >> * but looking on pgss_query_texts.stat there are very large sql > >> statements, of around ~ 400kb, multiple thousands. * > > I see one possible piece of the puzzle here: since you're using a 32-bit > build, overflowing size_t is a reachable hazard. Specifically, in this > test to see if we need to garbage-collect the query text file: > > if (extent < pgss->mean_query_len * pgss_max * 2) > return false; > > You said earlier that pg_stat_statements.max = 1, so a mean_query_len > exceeding about 2^32 / 1 / 2 = 214748.3648 would be enough to overflow > size_t and break this comparison. Now, a mean SQL query length in excess > of 200kB sounds mighty improbable, but it's really the mean length of the > query texts in the view. If your "normal" queries fall into just a few > patterns they might be represented by a relatively small number of view > entries. And if the "big" queries are sufficiently not alike, they might > each get their own view entry, which could potentially drive the mean high > enough to cause trouble. It'd be interesting to track what > "SELECT avg(length(query)) FROM pg_stat_statements" gives. > > However, even if we grant that mean_query_len is that big, overflow here > would make garbage collection of the query text file more likely not less > so. What I'm speculating is that overflow is occurring and causing all > processes to decide they need to run gc_qtexts() every time they insert > a new query entry, even though the query texts file isn't actually > bloated. That could possibly explain your performance issues: a garbage > collection pass over a multi-gig file will take awhile, and what's worse > is that it's done under an exclusive lock, meaning that all the backends > stack up waiting their turn to perform a useless GC pass. > > What this doesn't explain is why the condition doesn't clear once you > observe one of those "out of memory" complaints, because that should > lead to truncating the texts file. Maybe it does get truncated, but > then the cycle repeats after awhile? If you have a steady stream of > incoming new 400kB queries, you could build back up to 2.2GB of text > after five thousand or so of those. > > I'm also curious whether this installation is in the habit of doing > pg_stat_statements_reset() a lot. It looks like that fails to > reset mean_query_len, which might be intentional but perhaps it > could play into getting a silly result here later on. > > regards, tom lane > -- Bruno da Silva
Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries
bruno da silva writes: > *Question: *Besides the gc issue that you mentioned, having a large ( 700MB > or 1GB ) pgss_query_texts.stat could cause slowness in pg_stat_statement > processing > than leading to slower query responses with a 32bit PG? I'm thinking in > reducing pg_stat_statements.max from 10k to 3k Whether or not we've fully identified the problem, I think cutting pg_stat_statements.max is a good idea. Especially as long as you're stuck on an unsupported PG version. regards, tom lane
