Fwd: Query error: could not resize shared memory segment

2018-01-02 Thread Thuc Nguyen Canh
Hello,

I got following error when running some heavy queries
"ERROR: could not resize shared memory segment "/PostgreSQL.388782411" to
50438144 bytes: No space left on device SQL state: 53100"

I'm using a postgis 10 docker container with mounted volume on ubuntu 16
vps.

Some of failed queries can run after I increased my work_mem.

On the other hand, this issue is not producible on postgresql 9.6.

Hope to get any advice here.

Thanks and regards,

Thuc


Re: Query error: could not resize shared memory segment

2018-01-02 Thread Thuc Nguyen Canh
Hi,

The dynamic_shared_memory_type is posix, the before and after values for
work_mem are ~41MB and ~64MB.
I'm using a Digital Ocean vps of 16RAM 8 Cores.
For more information, I managed to reproduce this issue on a fresh vps
after I changed the random_page_cost from 4.0 to 1.1. So that said, I did
reduce the random_page_cost to 1.1, in order to optimize postgresql
performance on SSD (DO uses SSD) and got this issue.




On Wed, Jan 3, 2018 at 10:53 AM, Thomas Munro  wrote:

> On Wed, Jan 3, 2018 at 1:22 PM, Thuc Nguyen Canh
>  wrote:
> > I got following error when running some heavy queries
> > "ERROR: could not resize shared memory segment "/PostgreSQL.388782411" to
> > 50438144 bytes: No space left on device SQL state: 53100"
> >
> > I'm using a postgis 10 docker container with mounted volume on ubuntu 16
> > vps.
> >
> > Some of failed queries can run after I increased my work_mem.
> >
> > On the other hand, this issue is not producible on postgresql 9.6.
>
> Hi,
>
> So it couldn't allocate 50MB of dynamic shared memory.  Can you show
> the work_mem settings, the query plan with the two different work_mem
> settings (the one that works and the one that doesn't), the value of
> dynamic_shared_memory_type, and tell us how much memory and swap space
> you have?  Do you run many of these queries in parallel?  I guess this
> is probably a parallel query using parallel bitmap heapscan and seeing
> the error coming from the change in commit
> 899bd785c0edf376077d3f5d65c316f92c1b64b5, meaning that it would have
> risked death by SIGBUS before that commit.  What is surprising is that
> increasing work_mem helped.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


Re: Query error: could not resize shared memory segment

2018-01-02 Thread Thuc Nguyen Canh
Hi,
Here is the query plan of a query that causes above issue for any
random_page_cost < 3 (I keep the work_mem by default)

'Sort  (cost=9441498.11..9542397.83 rows=40359886 width=64) (actual
time=33586.588..33586.590 rows=4 loops=1)'
'  Sort Key: (to_char(b.week, 'dd-mm-'::text))'
'  Sort Method: quicksort  Memory: 25kB'
'  CTE sumorder'
'->  GroupAggregate  (cost=763614.25..775248.11 rows=513746 width=16)
(actual time=16587.507..17320.290 rows=4 loops=1)'
'  Group Key: (date_trunc('month'::text,
to_timestamp("order".order_time)))'
'  ->  Sort  (cost=763614.25..764923.47 rows=523689 width=14)
(actual time=16587.362..16913.230 rows=539089 loops=1)'
'Sort Key: (date_trunc('month'::text,
to_timestamp("order".order_time)))'
'Sort Method: quicksort  Memory: 47116kB'
'->  Bitmap Heap Scan on "order"  (cost=12679.94..713868.12
rows=523689 width=14) (actual time=516.465..15675.517 rows=539089 loops=1)'
'  Recheck Cond: ((service_id = ANY
('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1509469200'::double
precision))'
'  Heap Blocks: exact=242484'
'  ->  Bitmap Index Scan on
order_service_id_order_time_idx  (cost=0.00..12549.02 rows=523689 width=0)
(actual time=425.697..425.697 rows=539089 loops=1)'
'Index Cond: ((service_id = ANY
('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1509469200'::double
precision))'
'  CTE badorder'
'->  Finalize GroupAggregate  (cost=993588.49..995549.11 rows=15712
width=16) (actual time=16257.720..16263.183 rows=13 loops=1)'
'  Group Key: (date_trunc('month'::text,
to_timestamp(order_1.order_time)))'
'  ->  Gather Merge  (cost=993588.49..995247.93 rows=13100
width=16) (actual time=16257.435..16263.107 rows=39 loops=1)'
'Workers Planned: 2'
'Workers Launched: 2'
'->  Partial GroupAggregate  (cost=992588.46..992735.84
rows=6550 width=16) (actual time=16246.191..16251.348 rows=13 loops=3)'
'  Group Key: (date_trunc('month'::text,
to_timestamp(order_1.order_time)))'
'  ->  Sort  (cost=992588.46..992604.84 rows=6550
width=14) (actual time=16245.767..16248.316 rows=3715 loops=3)'
'Sort Key: (date_trunc('month'::text,
to_timestamp(order_1.order_time)))'
'Sort Method: quicksort  Memory: 274kB'
'->  Parallel Seq Scan on "order" order_1
(cost=0.00..992173.28 rows=6550 width=14) (actual time=4.162..16230.174
rows=3715 loops=3)'
'  Filter: ((order_time >=
'1483203600'::double precision) AND (service_id = ANY
('{SGN-BIKE,SGN-POOL}'::text[])) AND (rating_by_user < 5) AND
(rating_by_user > 0))'
'  Rows Removed by Filter: 1801667'
'  ->  Merge Join  (cost=60414.85..1271289.99 rows=40359886 width=64)
(actual time=33586.471..33586.503 rows=4 loops=1)'
'Merge Cond: (b.week = s.week)'
'->  Sort  (cost=1409.33..1448.61 rows=15712 width=16) (actual
time=16263.259..16263.276 rows=13 loops=1)'
'  Sort Key: b.week'
'  Sort Method: quicksort  Memory: 25kB'
'  ->  CTE Scan on badorder b  (cost=0.00..314.24 rows=15712
width=16) (actual time=16257.737..16263.220 rows=13 loops=1)'
'->  Sort  (cost=59005.52..60289.88 rows=513746 width=16) (actual
time=17320.506..17320.509 rows=4 loops=1)'
'  Sort Key: s.week'
'  Sort Method: quicksort  Memory: 25kB'
'  ->  CTE Scan on sumorder s  (cost=0.00..10274.92 rows=513746
width=16) (actual time=16587.532..17320.352 rows=4 loops=1)'
'Planning time: 3.202 ms'
'Execution time: 33589.971 ms'

On Wed, Jan 3, 2018 at 11:13 AM, Thomas Munro  wrote:

> On Wed, Jan 3, 2018 at 5:05 PM, Thuc Nguyen Canh
>  wrote:
> > The dynamic_shared_memory_type is posix, the before and after values for
> > work_mem are ~41MB and ~64MB.
> > I'm using a Digital Ocean vps of 16RAM 8 Cores.
> > For more information, I managed to reproduce this issue on a fresh vps
> after
> > I changed the random_page_cost from 4.0 to 1.1. So that said, I did
> reduce
> > the random_page_cost to 1.1, in order to optimize postgresql performance
> on
> > SSD (DO uses SSD) and got this issue.
>
> So you have 16GB of RAM and here we're failing to posix_fallocate()
> 50MB (actually we can't tell if it's the ftruncate() or
> posix_fallocate() call that failed, but the latter seems more likely
> since the former just creates a big hole in the underlying tmpfs
> file).  Can you share the query plan (EXPLAIN SELECT ...)?
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


Re: Query error: could not resize shared memory segment

2018-01-02 Thread Thuc Nguyen Canh
The last query explain is with random_page_cost = 3.
Here is the query explain with random_page_cost = 2.5, that causes the
'shared memory segment' issue.

'Sort  (cost=9255854.81..9356754.53 rows=40359886 width=64)'
'  Sort Key: (to_char(b.week, 'dd-mm-'::text))'
'  CTE sumorder'
'->  GroupAggregate  (cost=692280.90..703914.76 rows=513746 width=16)'
'  Group Key: (date_trunc('month'::text,
to_timestamp("order".order_time)))'
'  ->  Sort  (cost=692280.90..693590.12 rows=523689 width=14)'
'Sort Key: (date_trunc('month'::text,
to_timestamp("order".order_time)))'
'->  Bitmap Heap Scan on "order"  (cost=11461.44..642534.77
rows=523689 width=14)'
'  Recheck Cond: ((service_id = ANY
('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1509469200'::double
precision))'
'  ->  Bitmap Index Scan on
order_service_id_order_time_idx  (cost=0.00..11330.52 rows=523689 width=0)'
'Index Cond: ((service_id = ANY
('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1509469200'::double
precision))'
'  CTE badorder'
'->  Finalize GroupAggregate  (cost=987667.04..989627.66 rows=15712
width=16)'
'  Group Key: (date_trunc('month'::text,
to_timestamp(order_1.order_time)))'
'  ->  Gather Merge  (cost=987667.04..989326.48 rows=13100
width=16)'
'Workers Planned: 2'
'->  Partial GroupAggregate  (cost=986667.01..986814.39
rows=6550 width=16)'
'  Group Key: (date_trunc('month'::text,
to_timestamp(order_1.order_time)))'
'  ->  Sort  (cost=986667.01..986683.39 rows=6550
width=14)'
'Sort Key: (date_trunc('month'::text,
to_timestamp(order_1.order_time)))'
'->  Parallel Bitmap Heap Scan on "order"
order_1  (cost=35678.61..986251.83 rows=6550 width=14)'
'  Recheck Cond: ((service_id = ANY
('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1483203600'::double
precision))'
'  Filter: ((rating_by_user < 5) AND
(rating_by_user > 0))'
'  ->  Bitmap Index Scan on
order_service_id_order_time_idx  (cost=0.00..35674.67 rows=1740356 width=0)'
'Index Cond: ((service_id = ANY
('{SGN-BIKE,SGN-POOL}'::text[])) AND (order_time >= '1483203600'::double
precision))'
'  ->  Merge Join  (cost=60414.85..1271289.99 rows=40359886 width=64)'
'    Merge Cond: (b.week = s.week)'
'->  Sort  (cost=1409.33..1448.61 rows=15712 width=16)'
'  Sort Key: b.week'
'  ->  CTE Scan on badorder b  (cost=0.00..314.24 rows=15712
width=16)'
'->  Sort  (cost=59005.52..60289.88 rows=513746 width=16)'
'  Sort Key: s.week'
'  ->  CTE Scan on sumorder s  (cost=0.00..10274.92 rows=513746
width=16)'

On Wed, Jan 3, 2018 at 11:43 AM, Thomas Munro  wrote:

> On Wed, Jan 3, 2018 at 5:22 PM, Thuc Nguyen Canh
>  wrote:
> > Here is the query plan of a query that causes above issue for any
> > random_page_cost < 3 (I keep the work_mem by default)
> >
> > 'Sort  (cost=9441498.11..9542397.83 rows=40359886 width=64) (actual
> > time=33586.588..33586.590 rows=4 loops=1)'
>
> I guess that must be EXPLAIN ANALYZE, because it includes "actual"
> time, so it must be the plan when you set random_page_code >= 3,
> right?  Otherwise it would raise the error.  Can you now set it to  <
> 3 and do just EXPLAIN (no ANALYZE) so that we can see the failing plan
> without trying to run it?  I'm guessing it's different, because the
> plan you showed doesn't look like it would want 50MB of DSM.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


Re: Query error: could not resize shared memory segment

2018-01-02 Thread Thuc Nguyen Canh
Hi,
Here is the result from host:

mount | grep /dev/shm
=> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

du -hs /dev/shm
=>  0 /dev/shm

df /dev/shm
=>
Filesystem 1K-blocks  Used Available Use% Mounted on
tmpfs2023252 0   2023252   0% /dev/shm

--
And here is the result from postgres container:

mount | grep /dev/shm
=> shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)

du -hs /dev/shm
=>  8.0K /dev/shm

df /dev/shm
=>
Filesystem 1K-blocks  Used Available Use% Mounted on
shm65536 8 65528   1% /dev/shm

On Wed, Jan 3, 2018 at 12:05 PM, Thomas Munro
 wrote:
> On Wed, Jan 3, 2018 at 5:39 PM, Tom Lane  wrote:
>> Thomas Munro  writes:
>>> So you have 16GB of RAM and here we're failing to posix_fallocate()
>>> 50MB (actually we can't tell if it's the ftruncate() or
>>> posix_fallocate() call that failed, but the latter seems more likely
>>> since the former just creates a big hole in the underlying tmpfs
>>> file).  Can you share the query plan (EXPLAIN SELECT ...)?
>>
>> I wonder if OP is running with a tmpfs size setting that's less than
>> the traditional Linux default of half of physical RAM size.
>
> Hmm.  Canh, can you please share the output of the following commands?
>
> mount | grep /dev/shm
> du -hs /dev/shm
> df /dev/shm
>
> --
> Thomas Munro
> http://www.enterprisedb.com



Re: Query error: could not resize shared memory segment

2018-01-02 Thread Thuc Nguyen Canh
Thank you Thomas,

I make it work with extra setting --shm-size=1g in my docker run script.

On Wed, Jan 3, 2018 at 12:16 PM, Thomas Munro
 wrote:
> On Wed, Jan 3, 2018 at 6:13 PM, Thuc Nguyen Canh
>  wrote:
>> And here is the result from postgres container:
>>
>> mount | grep /dev/shm
>> => shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
>
> Bingo.  Somehow your container tech is limiting shared memory.  That
> error is working as designed.  You could figure out how to fix the
> mount options, or you could disable parallelism with
> max_parallel_workers_per_gather = 0.
>
> --
> Thomas Munro
> http://www.enterprisedb.com