Re: ERROR: invalid memory alloc request size 1073741824

2020-03-11 Thread Stefan Blanke

Hi,

We've upgraded to PostgreSQL 11.5 (postgresql.org rhel 6 rpm) and I have 
had another occurrence of this invalid alloc of 1GB. Apologies for never 
providing a query plan when discussing this two years ago; we decided to 
move to a newer PostgreSQL to see if the issue went away but took a 
while to complete the move.


The invalid alloc still only occurs occasionally every few months on a 
query that we run every minute; so unfortunately we still don't have a 
contained reproducible test case.


This is the SQL we are running with a query plan - the query plan is 
from an new empty database so the planner has no stats.


CREATE OR REPLACE FUNCTION
create_table()
RETURNS TABLE (a INTEGER, b INTEGER)
AS $$
UPDATE y
SET c = true,
d = false
WHERE e IS NOT true
AND f IS NOT true
AND g = 1
AND h = 0
AND i = 0
AND (j IS NULL
OR j > 0)
RETURNING y.a, y.b;
$$ LANGUAGE SQL;

-- Prepared statement (PQprepare)
CREATE TEMPORARY TABLE temp_table AS
SELECT * FROM create_table()

-- Prepared statement (PQprepare)
SELECT y.a, y.b,
 x.k,
 x.l,
 y.m,
 y.n,
 y.o
 FROM temp_table
 INNER JOIN y ON temp_table.b = y.b
 AND temp_table.a = y.a
 INNER JOIN x ON x.a = y.a

-- The two prepared statements are executed one after another
-- in the order shown many times an hour.

The query plan for the second prepared statement is:

 Nested Loop  (cost=17.14..64.38 rows=16 width=112)
   ->  Hash Join  (cost=17.00..61.47 rows=16 width=80)
 Hash Cond: ((temp_table.b = y.b) AND (temp_table.a = y.a))
 ->  Seq Scan on temp_table  (cost=0.00..32.60 rows=2260 width=8)
 ->  Hash  (cost=12.80..12.80 rows=280 width=76)
   ->  Seq Scan on y  (cost=0.00..12.80 rows=280 width=76)
   ->  Index Scan using x_pkey on x  (cost=0.14..0.18 rows=1 width=40)
 Index Cond: (a = temp_table.a)

Thanks,
Stefan

On 31/01/2018 21:23, Tomas Vondra wrote:



On 01/31/2018 09:51 PM, Jan Wieck wrote:



On Wed, Jan 31, 2018 at 12:32 PM, Stefan Blanke
mailto:stefan.bla...@framestore.com>> wrote:

 >
 > I'll bet you it's not that. It's quite unlikely that would fail with
 > exactly 1GB request size. It seems much more like a buffer that we keep
 > to be power of 2. The question is which one.

 I had dismissed corruption before writing in. It's exactly 1GB every
 time this has happened - and we can dump the full dataset
 periodically without issue.

 >> I have my money on a corrupted TOAST entry. Is this happening on
 >> trustworthy hardware or beige box with no ECC or RAID?

 It's good quality commercial hardware in our colo - no exactly sure
 what.


If it is a sporadic issue and you can dump the full dataset, then I just
lost my money (Tomas, you coming to PGConf in Jersey City?).



Unfortunately no, but I'm sure there will be other opportunities to buy
me a beer ;-) Like pgcon, for example.



But then, if this is a plain COPY to stdout ... I am wondering what
could possibly be in that path that wants to allocate 1GB. Or is this
not so plain but rather a COPY ... SELECT ...?



That's what I've been guessing, and why I was asking for a query plan.


regards






Re: ERROR: invalid memory alloc request size 1073741824

2020-03-13 Thread Stefan Blanke

> Could you adjust your process to log the actual temp table size
> each time, ie "select count(*) from temp_table" in between the
> two steps, and then note whether the failures are correlated
> with unusual temp table sizes?

Thanks, I'll add a count there and come back with a number when we next 
hit the "invalid memory alloc".


Stefan

On 11/03/2020 15:37, Tom Lane wrote:

Stefan Blanke  writes:

We've upgraded to PostgreSQL 11.5 (postgresql.org rhel 6 rpm) and I have
had another occurrence of this invalid alloc of 1GB. Apologies for never
providing a query plan when discussing this two years ago; we decided to
move to a newer PostgreSQL to see if the issue went away but took a
while to complete the move.



The invalid alloc still only occurs occasionally every few months on a
query that we run every minute; so unfortunately we still don't have a
contained reproducible test case.


Given the lack of stats, I wonder whether the issue could be related
to the plan sometimes being horribly bad, eg due to the temp table
being much larger than expected.  (A possible mechanism would be
hash table bloat, perhaps, but that's getting way ahead of the
evidence.)

Could you adjust your process to log the actual temp table size
each time, ie "select count(*) from temp_table" in between the
two steps, and then note whether the failures are correlated
with unusual temp table sizes?

regards, tom lane






ERROR: invalid memory alloc request size 1073741824

2018-01-30 Thread Stefan Blanke

Hello,

We've tripped over an error when doing a "COPY.. TO STDOUT WITH BINARY" 
query.


"ERROR:  invalid memory alloc request size 1073741824"
(exactly 1GB)

So a palloc() call is failing on the AllocSizeIsValid() check.

Does anyone know if this a safety catch we are tripping with a bad query 
- or whether this check is something that should never be hit (i.e. a bug).


This has been some discussion before about making a change before the 
9.5beta1:


https://www.postgresql.org/message-id/flat/9A28C8860F777E439AA12E8AEA7694F8010F6F3F%40BPXM15GP.gisp.nec.co.jp

https://www.postgresql.org/message-id/flat/CAKJS1f_sxPyBHF6%3DnEYwPJdKKDNjVDP91b_EJApfyGTei%3DViDw%40mail.gmail.com

postgresql 9.5.4 (2PGDG.rhel6)
work_mem = 256MB

Thanks,
Stefan



Re: ERROR: invalid memory alloc request size 1073741824

2018-01-31 Thread Stefan Blanke

> That very much depends on where exactly the failure happens - it might
> be both (or relying on an assumption that changed over). It's not clear
> if it fails during query execution or formatting the output (I seem to
> remember there were/are issues with rows containing multiple large bytea
> values, for example).
>
> Can you attach gdb to the backend, set breakpoint on that elog and share
> the backtrace? Then we can investigate further.

I can't easily get a meaningful query plan (i.e. from the live system) 
as the query that failed is based on a temporary table built by some 
pl/pgsql (the database is more of a conveyor belt than a long term store).


Unfortunately it's difficult to reproduce the ERROR. This has only 
occurred a couple of times on a live system in the last year and these 
queries run frequently.


I think we will move to 9.5.10 and see if it ever happens again; then if 
it does get a backtrace on that newer version. Incidentally we never saw 
this error on 8.4 where this code was running from 2009 until 2017.



> I'll bet you it's not that. It's quite unlikely that would fail with
> exactly 1GB request size. It seems much more like a buffer that we keep
> to be power of 2. The question is which one.

I had dismissed corruption before writing in. It's exactly 1GB every 
time this has happened - and we can dump the full dataset periodically 
without issue.


>> I have my money on a corrupted TOAST entry. Is this happening on
>> trustworthy hardware or beige box with no ECC or RAID?

It's good quality commercial hardware in our colo - no exactly sure what.


Thanks for taking the time to look at this!
Stefan




On 01/30/18 22:00, Tomas Vondra wrote:



On 01/30/2018 10:43 PM, Jan Wieck wrote:



On Tue, Jan 30, 2018 at 12:35 PM, Stefan Blanke
mailto:stefan.bla...@framestore.com>> wrote:

Hello,

We've tripped over an error when doing a "COPY.. TO STDOUT WITH
BINARY" query.

"ERROR:  invalid memory alloc request size 1073741824"
(exactly 1GB)


I have my money on a corrupted TOAST entry. Is this happening on
trustworthy hardware or beige box with no ECC or RAID?



I'll bet you it's not that. It's quite unlikely that would fail with
exactly 1GB request size. It seems much more like a buffer that we keep
to be power of 2. The question is which one.


regards