Postgresql server gets stuck at low load
I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host). My hardware 56 cores (Intel Core Processor (Skylake, IBRS)) 400 GB RAM RAID10 with about 40k IOPS Os CentOS Linux release 7.7.1908 kernel 3.10.0-1062.18.1.el7.x86_64 Databasesize 100 GB (entirely fit in memory :) ) server_version 10.12 effective_cache_size 192000 MB maintenance_work_mem 2048 MB max_connections 150 shared_buffers 64000 MB work_mem 96 MB On normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120). In correlation with stucks i see informations in kernel log about NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935] but i don't know this is reason or effect of problem I made investigation with pgBadger and ... nothing strange happens, just normal statements Any ideas? Thanks, Kris
Re: Postgresql server gets stuck at low load
De: "Krzysztof Olszewski" Para: [email protected] Enviadas: Sexta-feira, 5 de junho de 2020 7:07:02 Assunto: Postgresql server gets stuck at low load BQ_BEGIN I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host). My hardware56 cores (Intel Core Processor (Skylake, IBRS))400 GB RAMRAID10 with about 40k IOPS Os CentOS Linux release 7.7.1908 kernel 3.10.0-1062.18.1.el7.x86_64 Databasesize 100 GB (entirely fit in memory :) )server_version 10.12effective_cache_size 192000 MBmaintenance_work_mem 2048 MBmax_connections 150 shared_buffers 64000 MBwork_mem 96 MBOn normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120).In correlation with stucks i see informations in kernel log aboutNMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]but i don't know this is reason or effect of problemI made investigation with pgBadger and ... nothing strange happens, just normal statements Any ideas? Thanks, Kris BQ_END Hi Krzysztof! I would enable pg_stat_statements extension and check if there are long running queries that should be quick.
Re: Postgresql server gets stuck at low load
pá 5. 6. 2020 v 12:07 odesílatel Krzysztof Olszewski napsal: > I have problem with one of my Postgres production server. Server works > fine almost always, but sometimes without any increase of transactions or > statements amount, machine gets stuck. Cores goes up to 100%, load up to > 160%. When it happens then there are problems with connect to database and > even it will succeed, simple queries works several seconds instead of > milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), > sometimes we must restart Postgres, Linux, or even KVM (which exists as > virtualization host). > > My hardware > 56 cores (Intel Core Processor (Skylake, IBRS)) > 400 GB RAM > RAID10 with about 40k IOPS > > Os > CentOS Linux release 7.7.1908 > kernel 3.10.0-1062.18.1.el7.x86_64 > > Databasesize 100 GB (entirely fit in memory :) ) > server_version 10.12 > effective_cache_size 192000 MB > maintenance_work_mem 2048 MB > max_connections 150 > shared_buffers 64000 MB > work_mem 96 MB > > On normal state, i have about 500 tps, 5% usage of cores, about 3% of > load, whole database fits in memory, no reads from disk, only writes on > about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this > hardware there is no problem with this values (no iowaits on cores). In > normal state this machine does "nothing". Connections to database are > created by two app servers based on Java, through connection pools, so > connections count is limited by configuration of pools and max is 120, is > lower value than in Postgres configuration (150). On normal state there is > about 20 connections, when stuck goes into max (120). > > In correlation with stucks i see informations in kernel log about > NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935] > but i don't know this is reason or effect of problem > I made investigation with pgBadger and ... nothing strange happens, just > normal statements > > Any ideas? > you can try to install perf + debug symbols for postgres. When you will have this problem again run "perf top". You can see what routines eat your CPU. Maybe it can be a spinlock problem https://www.postgresql.org/message-id/CAHyXU0yAsVxoab2PcyoCuPjqymtnaE93v7bN4ctv2aNi92fefA%40mail.gmail.com Can be interesting a reply on Merlin's question from mail/. cat /sys/kernel/mm/redhat_transparent_hugepage/enabled cat /sys/kernel/mm/redhat_transparent_hugepage/defrag Regards Pavel > > Thanks, > Kris > > >
Re: When to use PARTITION BY HASH?
> "Bulk loads ...", As I see - There is an interesting bulkload benchmark: "How Bulkload performance is affected by table partitioning in PostgreSQL" by Beena Emerson (Enterprisedb, December 4, 2019 ) *SUMMARY: This article covers how benchmark tests can be used to demonstrate the effect of table partitioning on performance. Tests using range- and hash-partitioned tables are compared and the reasons for their different results are explained: 1. Range partitions 2. Hash partitions 3. Combination graphs 4. Explaining the behavior 5. Conclusion* *"For the hash-partitioned table, the first value is inserted in the first partition, the second number in the second partition and so on till all the partitions are reached before it loops back to the first partition again until all the data is exhausted. Thus it exhibits the worst-case scenario where the partition is repeatedly switched for every value inserted. As a result, the number of times the partition is switched in a range-partitioned table is equal to the number of partitions, while in a hash-partitioned table, the number of times the partition has switched is equal to the amount of data being inserted. This causes the massive difference in timing for the two partition types."* https://www.enterprisedb.com/postgres-tutorials/how-bulkload-performance-affected-table-partitioning-postgresql Regards, Imre Oleksandr Shulgin ezt írta (időpont: 2020. jún. 2., K, 19:17): > Hi! > > I was reading up on declarative partitioning[1] and I'm not sure what > could be a possible application of Hash partitioning. > > Is anyone actually using it? What are typical use cases? What benefits > does such a partitioning scheme provide? > > On its face, it seems that it can only give you a number of tables which > are smaller than the un-partitioned one, but I fail to see how it would > provide any of the potential advantages listed in the documentation. > > With a reasonable hash function, the distribution of rows across > partitions should be more or less equal, so I wouldn't expect any of the > following to hold true: > - "...most of the heavily accessed rows of the table are in a single > partition or a small number of partitions." > - "Bulk loads and deletes can be accomplished by adding or removing > partitions...", > etc. > > That *might* turn out to be the case with a small number of distinct > values in the partitioning column(s), but then why rely on hash > assignment instead of using PARTITION BY LIST in the first place? > > Regards, > -- > Alex > > [1] https://www.postgresql.org/docs/12/ddl-partitioning.html > >
Re: Date vs Timestamp without timezone Partition Key
Somewhat unrelated but note to anyone who wants to swap out partition keys. Don't create a clone of the table with the new partition key and insert data. It messes up the query planner massively and makes everything much slower. On Mon, May 25, 2020 at 12:48 AM Tom Lane wrote: > Cedric Leong writes: > > Just in case someone is interested enough to answer this. Does anyone > know > > if the performance for a date column vs a timestamp column as a partition > > key is large? > > I doubt it's even measurable, at least on 64-bit machines. You're > basically talking about 32-bit integer comparisons vs 64-bit integer > comparisons. > > On a 32-bit machine it's possible that an index on a date column > will be physically smaller, so you could get some wins from reduced > I/O. But on (most?) 64-bit machines that difference goes away too, > because of alignment restrictions. > > As always, YMMV; it never hurts to do your own testing. > > regards, tom lane >
Re: Date vs Timestamp without timezone Partition Key
On Sat, 6 Jun 2020 at 14:12, Cedric Leong wrote: > Somewhat unrelated but note to anyone who wants to swap out partition keys. > Don't create a clone of the table with the new partition key and insert data. > It messes up the query planner massively and makes everything much slower. That complaint would have more meaning if you'd mentioned which version of PostgreSQL you're using. The performance of partitioning in PostgreSQL has changed significantly over the past 3 releases. Also would be useful to know what you've actually done (actual commands). I can't imagine it makes *everything* slower, so it might be good to mention what is actually slower. David
Re: Date vs Timestamp without timezone Partition Key
It's less of a complaint rather than just a warning not to do what I did. Version: PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit The actual command list would probably be impractical to put in here just because the majority of it would just be creating a large amount of partition tables. But in summary what i've done is basically this: Existing database has a partitioned fact table 1. Create an exact clone of that partitioned fact table which includes all the same indexes, columns, and partitioned tables 2. Change the partitioned table's partition key from an indexed date column to an indexed timestamp without timezone column 3. Do an insert into from the old partitioned fact table to the new partitioned fact table which includes all the same rows (insert into since i wanted the timestamp without timezone column to be in a new timezone) 4. Switch the names of the tables so the new one will be the one that's used 5. VACUUM FULL; ANALYZE; For my use case which is a data warehouse star schema, this fact table is basically the base table of every report. To be more specific, the reports I've tested on varied from 2x slower to 4x slower. From what I see so far that's because the query plan is drastically different for both. An example of this test would look like this: https://explain.depesz.com/s/6rP8 and https://explain.depesz.com/s/cLUY These tests are running the exact same query on two different tables with the exception that they use their respective partition keys. On Fri, Jun 5, 2020 at 10:17 PM David Rowley wrote: > On Sat, 6 Jun 2020 at 14:12, Cedric Leong wrote: > > Somewhat unrelated but note to anyone who wants to swap out partition > keys. Don't create a clone of the table with the new partition key and > insert data. It messes up the query planner massively and makes everything > much slower. > > That complaint would have more meaning if you'd mentioned which > version of PostgreSQL you're using. The performance of partitioning in > PostgreSQL has changed significantly over the past 3 releases. Also > would be useful to know what you've actually done (actual commands). > I can't imagine it makes *everything* slower, so it might be good to > mention what is actually slower. > > David >
Re: Date vs Timestamp without timezone Partition Key
On Sat, 6 Jun 2020 at 14:49, Cedric Leong wrote:
> It's less of a complaint rather than just a warning not to do what I did.
My point was really that nobody really knew what you did or what you
did it on. So it didn't seem like a worthwhile warning as it
completely lacked detail.
> These tests are running the exact same query on two different tables with the
> exception that they use their respective partition keys.
Are you sure? It looks like the old one does WHERE date =
((now())::date - '7 days'::interval) and the new version does
(date(created_at) = ((now())::date - '7 days'::interval). I guess you
renamed date to "created_at" and changed the query to use date(). If
that expression is not indexed then I imagine that would be a good
reason for the planner to have moved away from using the index on that
column. Also having date(created_at) will also not allow run-time
pruning to work since your partition key is "created_at".
You might be able to change the query to query a range of value on the
new timestamp column. This will allow you to get rid of the date()
function. For example:
where created_at >= date_trunc('day', now() - '7 days'::interval) and
created_at < date_trunc('day', now() - '6 days'::interval)
David
Re: Date vs Timestamp without timezone Partition Key
I can confirm that was the issue, after removing the expression and using
only what was indexed it definitely fixed the query plan. I appreciate all
the help you've given me, I didn't really think to look there but it makes
a ton of sense that a filter on the database would only work well if it's
indexed.
Thanks again,
On Fri, Jun 5, 2020 at 11:13 PM David Rowley wrote:
> On Sat, 6 Jun 2020 at 14:49, Cedric Leong wrote:
> > It's less of a complaint rather than just a warning not to do what I did.
>
> My point was really that nobody really knew what you did or what you
> did it on. So it didn't seem like a worthwhile warning as it
> completely lacked detail.
>
> > These tests are running the exact same query on two different tables
> with the exception that they use their respective partition keys.
>
> Are you sure? It looks like the old one does WHERE date =
> ((now())::date - '7 days'::interval) and the new version does
> (date(created_at) = ((now())::date - '7 days'::interval). I guess you
> renamed date to "created_at" and changed the query to use date(). If
> that expression is not indexed then I imagine that would be a good
> reason for the planner to have moved away from using the index on that
> column. Also having date(created_at) will also not allow run-time
> pruning to work since your partition key is "created_at".
>
> You might be able to change the query to query a range of value on the
> new timestamp column. This will allow you to get rid of the date()
> function. For example:
>
> where created_at >= date_trunc('day', now() - '7 days'::interval) and
> created_at < date_trunc('day', now() - '6 days'::interval)
>
> David
>
