Postgresql 13.7 hangs down
Hi all, I have a postgresql cluster (13.7) with two nodes (master,slave) compiled in docker (alpine 3.15) with flags: ./configure --prefix=${PG_DIR} --exec-prefix=${PG_DIR} --enable-integer-datetimes --enable-thread-safety --disable-rpath --with-uuid=e2fs --with-gnu-ld --with-pgport=5432 --with-system-tzdata=/usr/share/zoneinfo --with-llvm --with-gssapi --with-ldap --with-icu --with-tcl --with-perl --with-python --with-pam --with-openssl --with-libxml --with-libxslt and running with the following configuration: listen_addresses = '*' port = 5432 max_connections = 1000 unix_socket_directories = '/opt/pg/data' superuser_reserved_connections = 3 shared_buffers = 6GB temp_buffers = 32MB max_prepared_transactions = 100 work_mem = 1146kB maintenance_work_mem = 1792MB max_stack_depth = 4MB dynamic_shared_memory_type = posix archive_command = '/opt/pg/bin/pgbackrest --stanza=aws-prdan archive-push %p' archive_mode = on max_wal_senders = 10 min_wal_size = 2GB max_wal_size = 3GB wal_level = logical checkpoint_completion_target = 0.9 effective_cache_size = 18GB log_destination = 'stderr' logging_collector = on log_directory = 'log' log_filename = 'postgresql-%Y-%m-%d.log' log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_checkpoints = on log_line_prefix = '%m [%p] - [%a - %u - %d] [%h] : %e' log_min_duration_statement = 0 stats_temp_directory = '/opt/pg/pg_stat_tmp' autovacuum = on autovacuum_max_workers = 3 autovacuum_vacuum_threshold = 50 autovacuum_analyze_threshold = 50 datestyle = 'iso, dmy' default_text_search_config = 'pg_catalog.simple' jit = on jit_above_cost = 10 jit_debugging_support = off jit_dump_bitcode = off jit_expressions = on jit_inline_above_cost = 50 jit_optimize_above_cost = 50 jit_profiling_support = off jit_provider = llvmjit jit_tuple_deforming = on max_worker_processes = 8 max_parallel_workers_per_gather = 2 max_parallel_workers = 2 default_statistics_target = 100 synchronous_commit = off random_page_cost = 1.1 effective_io_concurrency = 200 track_activity_query_size = 1 pg_stat_statements.track = all Sometimes my cluster nodes hangs down and not responging. Logs ends without any error: 2022-07-02 00:42:51.755 P00 INFO: archive-push command begin 2.39: [pg_wal/00015596002C] --archive-async --compress-type=lz4 --exec-id=787394-196ba324 --log-level-console=info --log-level-file=detail --pg1-path=/opt/pg/data --process-max=4 --repo1-cipher-pass= --repo1-cipher-type=aes-256-cbc --repo1-path=/repo-path --repo1-s3-bucket=backup-postgresql --repo1-s3-endpoint=s3.eu-central-1.amazonaws.com --repo1-s3-key= --repo1-s3-key-secret= --repo1-s3-region=eu-central-1 --repo1-type=s3 --stanza=aws-prdan 2022-07-02 00:42:51.755 P00 INFO: pushed WAL file '00015596002C' to the archive asynchronously 2022-07-02 00:42:51.755 P00 INFO: archive-push command end: completed successfully (1ms) 2022-07-02 07:54:34.333 GMT [15] - [ - - ] [] : 0LOG: starting PostgreSQL 13.7 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit 2022-07-02 07:54:34.335 GMT [15] - [ - - ] [] : 0LOG: listening on IPv4 address "0.0.0.0", port 5432 I can login to database using psql but can't execute command, I can't stop database kill -INT, kill -QUIT, docker stop. Database size on zfs volume already have 3,8T. At the end I restart entry host and all back to normal work. Situation occurs on master and slave nodes. Please tell me what I should do to trace and fix the problem. Thanks for your attention. Regards BS
Postgrsql blocked for more than 120 s
Hi, I have a problem with my slave postgresql instance. During normal operations databaseis locked and in syslog I see stacktrace: Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.289110] INFO: task postgres:1172 blocked for more than 120 seconds. Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.292137] Not tainted 5.19.0-1028-aws #29~22.04.1-Ubuntu Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.295239] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299360] task:postgres state:D stack:0 pid: 1172 ppid: 1133 flags:0x0004 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299366] Call Trace: Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299369] Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299373] __schedule+0x254/0x5a0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299381] schedule+0x5d/0x100 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299385] io_schedule+0x46/0x80 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299389] blk_mq_get_tag+0x117/0x300 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299394] ? destroy_sched_domains_rcu+0x40/0x40 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299399] __blk_mq_alloc_requests+0xc4/0x1e0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299404] blk_mq_get_new_requests+0xcc/0x190 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299408] blk_mq_submit_bio+0x1eb/0x450 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299411] __submit_bio+0xf6/0x190 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299414] ? kmem_cache_alloc+0x1a6/0x2f0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299419] __submit_bio_noacct+0x81/0x1f0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299422] submit_bio_noacct_nocheck+0x104/0x1c0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299425] ? ext4_inode_block_valid+0x1d/0x40 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299430] submit_bio_noacct+0x1b9/0x600 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299433] submit_bio+0x40/0xf0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299437] ext4_mpage_readpages+0x254/0xc20 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299441] ? xas_load+0x1f/0x100 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299447] ? __filemap_add_folio+0x1ca/0x540 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299452] ext4_readahead+0x3f/0x50 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299455] read_pages+0x7b/0x2e0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299459] page_cache_ra_unbounded+0x12d/0x180 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299462] force_page_cache_ra+0xc5/0x100 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299466] generic_fadvise+0x19d/0x280 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299469] ksys_fadvise64_64+0x9f/0xb0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299471] ? syscall_trace_enter.constprop.0+0xb5/0x1a0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299475] __x64_sys_fadvise64+0x1c/0x30 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299478] do_syscall_64+0x5c/0x90 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299482] ? exc_page_fault+0x92/0x190 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299485] entry_SYSCALL_64_after_hwframe+0x63/0xcd Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299489] RIP: 0033:0x7f7358c993c2 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299493] RSP: 002b:7ffd448a8a88 EFLAGS: 0206 ORIG_RAX: 00dd Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299498] RAX: ffda RBX: 0383 RCX: 7f7358c993c2 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299501] RDX: 2000 RSI: 099a RDI: 0389 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299503] RBP: 2000 R08: 0002 R09: Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299506] R10: 0003 R11: 0206 R12: 099a Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299508] R13: 0a10 R14: 01504cd0 R15: 7ffd448a8ad0 Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299514] Jul 16 22:36:13 ec1-prd-postgresan-2 kernel: [ 1452.119882] INFO: task postgres:1172 blocked for more than 241 seconds. Jul 16 22:36:13 ec1-prd-postgresan-2 kernel: [ 1452.122832] Not tainted 5.19.0-1028-aws #29~22.04.1-Ubuntu Jul 16 22:36:13 ec1-prd-postgresan-2 kernel: [ 1452.128443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 16 22:36:13 ec1-prd-postgresan-2 kernel: [ 1452.132098] task:postgres state:D stack:0 pid: 1172 ppid: 1133 flags:0x0004 Jul 16 22:36:13 ec1-prd-postgresan-2 kernel: [ 1452.132106] Call Trace: Jul 16 22:36:13 ec1-prd-postgresan-2 kernel: [ 1452.132109] Jul 16 22:36:13 ec1-prd-postgresan-2 kernel:
Re: Postgrsql blocked for more than 120 s
Hi, Problem was on disk io starvation, storage migration to more iops resolve the problem. Regards BS pon., 17 lip 2023 o 07:40 Bogdan Siara napisał(a): > Hi, I have a problem with my slave postgresql instance. During normal > operations databaseis locked and in syslog I see stacktrace: > > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.289110] INFO: task > postgres:1172 blocked for more than 120 seconds. > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.292137] Not > tainted 5.19.0-1028-aws #29~22.04.1-Ubuntu > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.295239] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299360] task:postgres >state:D stack:0 pid: 1172 ppid: 1133 flags:0x0004 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299366] Call Trace: > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299369] > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299373] > __schedule+0x254/0x5a0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299381] > schedule+0x5d/0x100 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299385] > io_schedule+0x46/0x80 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299389] > blk_mq_get_tag+0x117/0x300 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299394] ? > destroy_sched_domains_rcu+0x40/0x40 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299399] > __blk_mq_alloc_requests+0xc4/0x1e0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299404] > blk_mq_get_new_requests+0xcc/0x190 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299408] > blk_mq_submit_bio+0x1eb/0x450 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299411] > __submit_bio+0xf6/0x190 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299414] ? > kmem_cache_alloc+0x1a6/0x2f0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299419] > __submit_bio_noacct+0x81/0x1f0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299422] > submit_bio_noacct_nocheck+0x104/0x1c0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299425] ? > ext4_inode_block_valid+0x1d/0x40 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299430] > submit_bio_noacct+0x1b9/0x600 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299433] > submit_bio+0x40/0xf0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299437] > ext4_mpage_readpages+0x254/0xc20 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299441] ? > xas_load+0x1f/0x100 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299447] ? > __filemap_add_folio+0x1ca/0x540 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299452] > ext4_readahead+0x3f/0x50 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299455] > read_pages+0x7b/0x2e0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299459] > page_cache_ra_unbounded+0x12d/0x180 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299462] > force_page_cache_ra+0xc5/0x100 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299466] > generic_fadvise+0x19d/0x280 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299469] > ksys_fadvise64_64+0x9f/0xb0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299471] ? > syscall_trace_enter.constprop.0+0xb5/0x1a0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299475] > __x64_sys_fadvise64+0x1c/0x30 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299478] > do_syscall_64+0x5c/0x90 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299482] ? > exc_page_fault+0x92/0x190 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299485] > entry_SYSCALL_64_after_hwframe+0x63/0xcd > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299489] RIP: > 0033:0x7f7358c993c2 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299493] RSP: > 002b:7ffd448a8a88 EFLAGS: 0206 ORIG_RAX: 00dd > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299498] RAX: > ffda RBX: 0383 RCX: 7f7358c993c2 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299501] RDX: > 2000 RSI: 099a RDI: 0389 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299503] RBP: > 2000 R08: 0002 R09: > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299506] R10: > 0003 R11: 0206 R12: 099a > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299508] R13: > 0a10 R14: 01504cd0 R15: 7ffd448a8ad0 > Jul 16 22:34:12 ec1-prd-postgresan-2 kernel: [ 1331.299514] > Jul 16 22:36:13 ec1-prd-postgresan-2 kernel: [ 1452.119882] INFO: task > po
huge_pages=on cause could not map anonymous shared memory: Cannot allocate memory
Hi, I have problem to run postgresql on ubuntu 24.04 server with huge_pages = on. My instance have 8GB ram and 2 vcpus (t3a.large). My configuration is: max_connections = 1000 superuser_reserved_connections = 3 shared_buffers = 1960MB effective_cache_size = 5881MB huge_pages = on temp_buffers = 32MB max_prepared_transactions = 100 work_mem = 1MB maintenance_work_mem = 392MB max_stack_depth = 4MB dynamic_shared_memory_type = posix archive_mode = on max_wal_senders = 10 wal_sender_timeout = 0 min_wal_size = 4GB max_wal_size = 16GB wal_level = logical checkpoint_timeout = 15min checkpoint_completion_target = 0.9 In my sysct.conf I have: vm.overcommit_memory=2 vm.overcommit_ratio=50 vm.vfs_cache_pressure=50 vm.dirty_background_ratio=10 vm.dirty_ratio=40 fs.nr_open=1000 fs.file-max=1000 vm.nr_hugepages=980 vm.hugetlb_shm_group=1010 my huge meminfo looks: cat /proc/meminfo | grep -i huge AnonHugePages: 0 kB ShmemHugePages:0 kB FileHugePages: 0 kB HugePages_Total: 980 HugePages_Free: 980 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB Hugetlb: 2007040 kB When I start postgres instance I get error: May 07 10:56:32 pg1 pg_ctl[16753]: server starting May 07 10:56:32 pg1 systemd[1]: Started postgres-17.4-local.service - PostgreSQL 17.4 database local server. May 07 10:56:35 pg1 pg_ctl[16756]: 2025-05-07 10:56:35.826 CEST [16756] - [ - - ] [] : XX000FATAL: could not map anonymous shared memory: Cannot allocate memory May 07 10:56:35 pg1 pg_ctl[16756]: 2025-05-07 10:56:35.826 CEST [16756] - [ - - ] [] : XX000HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 2204106752 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing "shared_buffers" or "max_connections". May 07 10:56:35 pg1 pg_ctl[16756]: 2025-05-07 10:56:35.826 CEST [16756] - [ - - ] [] : 0LOG: database system is shut down May 07 10:56:35 pg1 systemd[1]: postgres-17.4-sdx.service: Main process exited, code=exited, status=1/FAILURE May 07 10:56:35 pg1 systemd[1]: postgres-17.4-sdx.service: Failed with result 'exit-code'. May 07 10:56:35 pg1 systemd[1]: postgres-17.4-sdx.service: Consumed 3.493s CPU time, 2.8M memory peak, 0B memory swap peak. When I comment #huge_pages=on in configuration postgresql started without problems. Can someone tell me where is the problem? Regards Bogdan
Re: huge_pages=on cause could not map anonymous shared memory: Cannot allocate memory
Hi Alicja, Thanks for your advice, now postgresql works fine with 'huge_pages=on'. Regards Bogdan śr., 7 maj 2025 o 14:17 Alicja Kucharczyk napisał(a): > Hi Bogdan, > The root cause here is that the number of huge pages you've configured > (vm.nr_hugepages = 980) is not sufficient. > Each huge page on your system is 2 MB in size, so 980 pages give you > roughly 1.96 GB of memory (980 × 2 MB). However, PostgreSQL is clearly > requesting about 2.2 GB of shared memory (specifically, 2204106752 bytes as > shown in the error message you provided), which exceeds what's available > through huge pages. > > That’s why PostgreSQL fails to start when huge_pages = on - it requires > the entire shared memory segment to come from huge pages and refuses to > fall back to regular ones. > > Earlier, you had the huge_pages setting commented out, which means > PostgreSQL used the default value: huge_pages = try. In that mode, it first > attempts to use huge pages, but if that fails (like in your case due to > insufficient allocation), it falls back to standard memory pages — which is > why the instance started without issues then. > > To fix the issue, you should increase vm.nr_hugepages to at least 1100 to > fully cover the shared memory request (you can go a bit higher to be safe > and then reduce it as described in the article I'm pasting the link to). > > Also, a side note: max_connections = 1000 is quite high for an instance > with 8 GB of RAM and only 2 vCPUs. Even if huge pages are properly > allocated, such a high number of connections can lead to performance > issues. You might want to consider lowering it or using a connection pooler > like PgBouncer. > > If you’d like to understand how huge pages work in PostgreSQL, including > how to calculate memory needs and configure the OS properly, I wrote a > detailed article some time ago (still valid). It’s in Polish, which I > assume is fine for you: > https://linuxpolska.com/pl/baza-wiedzy/blog/postgres-pamieci-ram-tipstricks/ > > best regards, > Alicja Kucharczyk > >>