Fwd: Query error: could not resize shared memory segment
Hello, I got following error when running some heavy queries "ERROR: could not resize shared memory segment "/PostgreSQL.388782411" to 50438144 bytes: No space left on device SQL state: 53100" I'm using a postgis 10 docker container with mounted volume on ubuntu 16 vps. Some of failed queries can run after I increased my work_mem. On the other hand, this issue is not producible on postgresql 9.6. Hope to get any advice here. Thanks and regards, Thuc
Re: Query error: could not resize shared memory segment
Hi, The dynamic_shared_memory_type is posix, the before and after values for work_mem are ~41MB and ~64MB. I'm using a Digital Ocean vps of 16RAM 8 Cores. For more information, I managed to reproduce this issue on a fresh vps after I changed the random_page_cost from 4.0 to 1.1. So that said, I did reduce the random_page_cost to 1.1, in order to optimize postgresql performance on SSD (DO uses SSD) and got this issue. On Wed, Jan 3, 2018 at 10:53 AM, Thomas Munro wrote: > On Wed, Jan 3, 2018 at 1:22 PM, Thuc Nguyen Canh > wrote: > > I got following error when running some heavy queries > > "ERROR: could not resize shared memory segment "/PostgreSQL.388782411" to > > 50438144 bytes: No space left on device SQL state: 53100" > > > > I'm using a postgis 10 docker container with mounted volume on ubuntu 16 > > vps. > > > > Some of failed queries can run after I increased my work_mem. > > > > On the other hand, this issue is not producible on postgresql 9.6. > > Hi, > > So it couldn't allocate 50MB of dynamic shared memory. Can you show > the work_mem settings, the query plan with the two different work_mem > settings (the one that works and the one that doesn't), the value of > dynamic_shared_memory_type, and tell us how much memory and swap space > you have? Do you run many of these queries in parallel? I guess this > is probably a parallel query using parallel bitmap heapscan and seeing > the error coming from the change in commit > 899bd785c0edf376077d3f5d65c316f92c1b64b5, meaning that it would have > risked death by SIGBUS before that commit. What is surprising is that > increasing work_mem helped. > > -- > Thomas Munro > http://www.enterprisedb.com >
Re: Query error: could not resize shared memory segment
Hi, Here is the query plan of a query that causes above issue for any random_page_cost < 3 (I keep the work_mem by default) 'Sort (cost=9441498.11..9542397.83 rows=40359886 width=64) (actual time=33586.588..33586.590 rows=4 loops=1)' ' Sort Key: (to_char(b.week, 'dd-mm-'::text))' ' Sort Method: quicksort Memory: 25kB' ' CTE sumorder' '-> GroupAggregate (cost=763614.25..775248.11 rows=513746 width=16) (actual time=16587.507..17320.290 rows=4 loops=1)' ' Group Key: (date_trunc('month'::text, to_timestamp("order".order_time)))' ' -> Sort (cost=763614.25..764923.47 rows=523689 width=14) (actual time=16587.362..16913.230 rows=539089 loops=1)' 'Sort Key: (date_trunc('month'::text, to_timestamp("order".order_time)))' 'Sort Method: quicksort Memory: 47116kB' '-> Bitmap Heap Scan on "order" (cost=12679.94..713868.12 rows=523689 width=14) (actual time=516.465..15675.517 rows=539089 loops=1)' ' Recheck Cond: ((service_id = ANY ('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1509469200'::double precision))' ' Heap Blocks: exact=242484' ' -> Bitmap Index Scan on order_service_id_order_time_idx (cost=0.00..12549.02 rows=523689 width=0) (actual time=425.697..425.697 rows=539089 loops=1)' 'Index Cond: ((service_id = ANY ('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1509469200'::double precision))' ' CTE badorder' '-> Finalize GroupAggregate (cost=993588.49..995549.11 rows=15712 width=16) (actual time=16257.720..16263.183 rows=13 loops=1)' ' Group Key: (date_trunc('month'::text, to_timestamp(order_1.order_time)))' ' -> Gather Merge (cost=993588.49..995247.93 rows=13100 width=16) (actual time=16257.435..16263.107 rows=39 loops=1)' 'Workers Planned: 2' 'Workers Launched: 2' '-> Partial GroupAggregate (cost=992588.46..992735.84 rows=6550 width=16) (actual time=16246.191..16251.348 rows=13 loops=3)' ' Group Key: (date_trunc('month'::text, to_timestamp(order_1.order_time)))' ' -> Sort (cost=992588.46..992604.84 rows=6550 width=14) (actual time=16245.767..16248.316 rows=3715 loops=3)' 'Sort Key: (date_trunc('month'::text, to_timestamp(order_1.order_time)))' 'Sort Method: quicksort Memory: 274kB' '-> Parallel Seq Scan on "order" order_1 (cost=0.00..992173.28 rows=6550 width=14) (actual time=4.162..16230.174 rows=3715 loops=3)' ' Filter: ((order_time >= '1483203600'::double precision) AND (service_id = ANY ('{SGN-BIKE,SGN-POOL}'::text[])) AND (rating_by_user < 5) AND (rating_by_user > 0))' ' Rows Removed by Filter: 1801667' ' -> Merge Join (cost=60414.85..1271289.99 rows=40359886 width=64) (actual time=33586.471..33586.503 rows=4 loops=1)' 'Merge Cond: (b.week = s.week)' '-> Sort (cost=1409.33..1448.61 rows=15712 width=16) (actual time=16263.259..16263.276 rows=13 loops=1)' ' Sort Key: b.week' ' Sort Method: quicksort Memory: 25kB' ' -> CTE Scan on badorder b (cost=0.00..314.24 rows=15712 width=16) (actual time=16257.737..16263.220 rows=13 loops=1)' '-> Sort (cost=59005.52..60289.88 rows=513746 width=16) (actual time=17320.506..17320.509 rows=4 loops=1)' ' Sort Key: s.week' ' Sort Method: quicksort Memory: 25kB' ' -> CTE Scan on sumorder s (cost=0.00..10274.92 rows=513746 width=16) (actual time=16587.532..17320.352 rows=4 loops=1)' 'Planning time: 3.202 ms' 'Execution time: 33589.971 ms' On Wed, Jan 3, 2018 at 11:13 AM, Thomas Munro wrote: > On Wed, Jan 3, 2018 at 5:05 PM, Thuc Nguyen Canh > wrote: > > The dynamic_shared_memory_type is posix, the before and after values for > > work_mem are ~41MB and ~64MB. > > I'm using a Digital Ocean vps of 16RAM 8 Cores. > > For more information, I managed to reproduce this issue on a fresh vps > after > > I changed the random_page_cost from 4.0 to 1.1. So that said, I did > reduce > > the random_page_cost to 1.1, in order to optimize postgresql performance > on > > SSD (DO uses SSD) and got this issue. > > So you have 16GB of RAM and here we're failing to posix_fallocate() > 50MB (actually we can't tell if it's the ftruncate() or > posix_fallocate() call that failed, but the latter seems more likely > since the former just creates a big hole in the underlying tmpfs > file). Can you share the query plan (EXPLAIN SELECT ...)? > > -- > Thomas Munro > http://www.enterprisedb.com >
Re: Query error: could not resize shared memory segment
The last query explain is with random_page_cost = 3. Here is the query explain with random_page_cost = 2.5, that causes the 'shared memory segment' issue. 'Sort (cost=9255854.81..9356754.53 rows=40359886 width=64)' ' Sort Key: (to_char(b.week, 'dd-mm-'::text))' ' CTE sumorder' '-> GroupAggregate (cost=692280.90..703914.76 rows=513746 width=16)' ' Group Key: (date_trunc('month'::text, to_timestamp("order".order_time)))' ' -> Sort (cost=692280.90..693590.12 rows=523689 width=14)' 'Sort Key: (date_trunc('month'::text, to_timestamp("order".order_time)))' '-> Bitmap Heap Scan on "order" (cost=11461.44..642534.77 rows=523689 width=14)' ' Recheck Cond: ((service_id = ANY ('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1509469200'::double precision))' ' -> Bitmap Index Scan on order_service_id_order_time_idx (cost=0.00..11330.52 rows=523689 width=0)' 'Index Cond: ((service_id = ANY ('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1509469200'::double precision))' ' CTE badorder' '-> Finalize GroupAggregate (cost=987667.04..989627.66 rows=15712 width=16)' ' Group Key: (date_trunc('month'::text, to_timestamp(order_1.order_time)))' ' -> Gather Merge (cost=987667.04..989326.48 rows=13100 width=16)' 'Workers Planned: 2' '-> Partial GroupAggregate (cost=986667.01..986814.39 rows=6550 width=16)' ' Group Key: (date_trunc('month'::text, to_timestamp(order_1.order_time)))' ' -> Sort (cost=986667.01..986683.39 rows=6550 width=14)' 'Sort Key: (date_trunc('month'::text, to_timestamp(order_1.order_time)))' '-> Parallel Bitmap Heap Scan on "order" order_1 (cost=35678.61..986251.83 rows=6550 width=14)' ' Recheck Cond: ((service_id = ANY ('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1483203600'::double precision))' ' Filter: ((rating_by_user < 5) AND (rating_by_user > 0))' ' -> Bitmap Index Scan on order_service_id_order_time_idx (cost=0.00..35674.67 rows=1740356 width=0)' 'Index Cond: ((service_id = ANY ('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1483203600'::double precision))' ' -> Merge Join (cost=60414.85..1271289.99 rows=40359886 width=64)' ' Merge Cond: (b.week = s.week)' '-> Sort (cost=1409.33..1448.61 rows=15712 width=16)' ' Sort Key: b.week' ' -> CTE Scan on badorder b (cost=0.00..314.24 rows=15712 width=16)' '-> Sort (cost=59005.52..60289.88 rows=513746 width=16)' ' Sort Key: s.week' ' -> CTE Scan on sumorder s (cost=0.00..10274.92 rows=513746 width=16)' On Wed, Jan 3, 2018 at 11:43 AM, Thomas Munro wrote: > On Wed, Jan 3, 2018 at 5:22 PM, Thuc Nguyen Canh > wrote: > > Here is the query plan of a query that causes above issue for any > > random_page_cost < 3 (I keep the work_mem by default) > > > > 'Sort (cost=9441498.11..9542397.83 rows=40359886 width=64) (actual > > time=33586.588..33586.590 rows=4 loops=1)' > > I guess that must be EXPLAIN ANALYZE, because it includes "actual" > time, so it must be the plan when you set random_page_code >= 3, > right? Otherwise it would raise the error. Can you now set it to < > 3 and do just EXPLAIN (no ANALYZE) so that we can see the failing plan > without trying to run it? I'm guessing it's different, because the > plan you showed doesn't look like it would want 50MB of DSM. > > -- > Thomas Munro > http://www.enterprisedb.com >
Re: Query error: could not resize shared memory segment
Hi, Here is the result from host: mount | grep /dev/shm => tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) du -hs /dev/shm => 0 /dev/shm df /dev/shm => Filesystem 1K-blocks Used Available Use% Mounted on tmpfs2023252 0 2023252 0% /dev/shm -- And here is the result from postgres container: mount | grep /dev/shm => shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k) du -hs /dev/shm => 8.0K /dev/shm df /dev/shm => Filesystem 1K-blocks Used Available Use% Mounted on shm65536 8 65528 1% /dev/shm On Wed, Jan 3, 2018 at 12:05 PM, Thomas Munro wrote: > On Wed, Jan 3, 2018 at 5:39 PM, Tom Lane wrote: >> Thomas Munro writes: >>> So you have 16GB of RAM and here we're failing to posix_fallocate() >>> 50MB (actually we can't tell if it's the ftruncate() or >>> posix_fallocate() call that failed, but the latter seems more likely >>> since the former just creates a big hole in the underlying tmpfs >>> file). Can you share the query plan (EXPLAIN SELECT ...)? >> >> I wonder if OP is running with a tmpfs size setting that's less than >> the traditional Linux default of half of physical RAM size. > > Hmm. Canh, can you please share the output of the following commands? > > mount | grep /dev/shm > du -hs /dev/shm > df /dev/shm > > -- > Thomas Munro > http://www.enterprisedb.com
Re: Query error: could not resize shared memory segment
Thank you Thomas, I make it work with extra setting --shm-size=1g in my docker run script. On Wed, Jan 3, 2018 at 12:16 PM, Thomas Munro wrote: > On Wed, Jan 3, 2018 at 6:13 PM, Thuc Nguyen Canh > wrote: >> And here is the result from postgres container: >> >> mount | grep /dev/shm >> => shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k) > > Bingo. Somehow your container tech is limiting shared memory. That > error is working as designed. You could figure out how to fix the > mount options, or you could disable parallelism with > max_parallel_workers_per_gather = 0. > > -- > Thomas Munro > http://www.enterprisedb.com