Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
Hi Tom and Thomas, thanks for your help.

@Tom:
If it really is per-process then I would have expected it to die after
200GB was used?
As far as "valid bug" is concerned: I had hoped this would be per session,
as this at least delivers a reasonable and usable limit; it is easy to
control the number of sessions/statements in execution.
If it really is per process then the limit is not really useful, just like
work_mem: the execution plan of a query determines the number of processes
(up to the max, at least that is way better than work_mem) and that can
change whenever Postgres feels a new plan is in order. I can understand
that solving this might be harder (although to me it looks like just a
little bit of shared memory per session to keep a number). To me it does
not sound like an invalid bug, just one you do not want to solve now ;) And
the real problem, for me, is actually that both work_mem and
temp_file_limit should be for the entire instance/cluster ;) I know that
that is even harder.

For us it means we really cannot use Postgres parallelism: it is infinitely
better to have a query that runs longer but which finishes than to have the
database die and recover randomly with OOM or with disk space filling up
killing random queries. Which is a bit of a pity, ofc.

@Justin
The test is running on Ubuntu 22.04.1, x86_64, the disk is an NVMe 2TB
WD850X with ext4 as a file system.
I will collect the other data around tmpfiles hopefully later today.
I have already set max_parallel_workers_per_gather to 1. I will probably
disable all parallelism for the next runs to see whether that makes the
size limit more workable..


Re: Fwd: temp_file_limit?

2022-12-19 Thread Ranier Vilela
Em seg., 19 de dez. de 2022 às 06:47, Frits Jalvingh  escreveu:

>
> The test is running on Ubuntu 22.04.1, x86_64, the disk is an NVMe 2TB
> WD850X with ext4 as a file system.
>
It's probably not a production environment.
Any chance of adding another 2TB NVMe, just for the temp files?
To see if Postgres can finish the queries and provide more information?
What exactly is the version of Postgres (14.???) are you using it?

regards,
Ranier Vilela


Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
Hi Ranier, thanks for your help.

I do not have more disks lying around, and I fear that if it does not
complete with 1.3TB of disk space it might not be that likely that adding
750GB would work...
Postgres version: the original (prd) issue was on 10.x. I also tested it on
14.x with the same issue. I then upgraded my machine to 15.1 to make sure
to report on the latest version, and all information mentioned in this
thread is from that version.

btw, this query generates quite different plans when tweaking things like
nested_loop=false/true, and the "fast" plan requires nested_loops=true and
join_collapse_limit=1 (5 seconds response). An odd thing is that both plans
contain only one nested loop (a cross join, I think it cannot do that
without one) but the general plan changes a lot.. I am trying to get output
from that second plan because this one just loops using CPU, not disk...
Perhaps that one will finish with some statistics...


Re: Fwd: temp_file_limit?

2022-12-19 Thread Ranier Vilela
Em seg., 19 de dez. de 2022 às 11:45, Frits Jalvingh  escreveu:

> Hi Ranier, thanks for your help.
>
> I do not have more disks lying around, and I fear that if it does not
> complete with 1.3TB of disk space it might not be that likely that adding
> 750GB would work...
> Postgres version: the original (prd) issue was on 10.x. I also tested it
> on 14.x with the same issue. I then upgraded my machine to 15.1 to make
> sure to report on the latest version, and all information mentioned in this
> thread is from that version.
>
You can run with a Postgres debug compiled version?
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

Maybe, some light appears.

regards,
Ranier Vilela

>


Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
Hehehe, that is not the worst plan ;) I did that once to debug a deadlock
in the JDBC driver when talking with Postgres, but it's not an adventure
I'd like to repeat right now ;)

>


RE: Postgres12 looking for possible HashAggregate issue workarounds?

2022-12-19 Thread João Paulo Luís
Thank you David Rowley (best peformance fix so far)!

nsoamt=> select attname,n_distinct from pg_Stats where tablename = 
'sentencesource' and attname = 'sentence';
 attname  | n_distinct
--+
 sentence | 255349
(1 row)

select count(*), count(distinct sentence) from sentencesource;
   count|   count
+---
 1150174041 | 885946963
(1 row)

-- Seems badly estimated to me.

-- I expect +/-80% of rows to have a distinct value. Manual says -1 is for all 
rows being distinct.
nsoamt=> ALTER TABLE sentencesource ALTER COLUMN sentence SET (n_distinct = -1);

nsoamt=> ANALYZE VERBOSE sentencesource ;
INFO:  analyzing "public.sentencesource"
INFO:  "sentencesource": scanned 3 of 9028500 pages, containing 3819977 
live rows and 260307 dead rows; 3 rows in sample, 1149622078 estimated 
total rows
ANALYZE

nsoamt=> select attname,n_distinct from pg_Stats where tablename = 
'sentencesource' and attname = 'sentence';
 attname  | n_distinct
--+
 sentence | -1
(1 row)


nsoamt=> EXPLAIN SELECT COUNT(*), COUNT(NULLIF(Stchk.haserrors,'f'))
FROM SentenceToolCheck Stchk
WHERE EXISTS (SELECT SSrc.sentence
  FROM SentenceSource SSrc, Document Doc
  WHERE SSrc.sentence = Stchk.id
  AND Doc.id = SSrc.document
  AND Doc.source ILIKE 
'/bigpostgres/misc/arxiv/arxiv/arxiv/pdf/%');
 QUERY PLAN

 Finalize Aggregate  (cost=275199757.84..275199757.85 rows=1 width=16)
   ->  Gather  (cost=275199757.62..275199757.83 rows=2 width=16)
 Workers Planned: 2
 ->  Partial Aggregate  (cost=275198757.62..275198757.63 rows=1 
width=16)
   ->  Hash Join  (cost=228004096.84..273527643.59 rows=222815204 
width=1)
 Hash Cond: (stchk.id = ssrc.sentence)
 ->  Parallel Seq Scan on sentencetoolcheck stchk  
(cost=0.00..35858393.80 rows=232196880 width=9)
 ->  Hash  (cost=209905168.81..209905168.81 rows=1103172722 
width=8)
   ->  Unique  (cost=204389305.20..209905168.81 
rows=1103172722 width=8)
 ->  Sort  (cost=204389305.20..207147237.01 
rows=1103172722 width=8)
   Sort Key: ssrc.sentence
   ->  Hash Join  
(cost=73287.01..23615773.05 rows=1103172722 width=8)
 Hash Cond: (ssrc.document = doc.id)
 ->  Seq Scan on sentencesource 
ssrc  (cost=0.00..20524720.16 rows=1149622016 width=16)
 ->  Hash  (cost=54327.65..54327.65 
rows=1516749 width=4)
   ->  Seq Scan on document doc 
 (cost=0.00..54327.65 rows=1516749 width=4)
 Filter: (source ~~* 
'/bigpostgres/misc/arxiv/arxiv/arxiv/pdf/%'::text)
 JIT:
   Functions: 25
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(20 rows)

And the query finished in Time: 2637891.352 ms (43:57.891) (the best 
performance so far, although sentencesource fits in RAM :-)


Curious, now that I've manually set it to -1, who/what will change that setting 
in the future (not ANALYZE?) ?
It will stay that way until some one else (human user) changes it ? (How do I 
set it back to "automatic"?)

Hope that there is a way that this poor estimation is fixed in the future 
releases...

João Luís

Senior Developer

[email protected]

+351 210 337 700


[https://dlnk.bio/wp-content/uploads/2022/11/assinaturaPDM-Natal-1-1.gif]
[https://www.pdmfc.com/images/email-signature/28-04.png] 
[https://www.pdmfc.com/images/email-signature/28-06.png] 
   
[https://www.pdmfc.com/images/email-signature/28-05.png] 
   
[https://www.pdmfc.com/images/email-signature/28-07.png] 
 
[https://www.pdmfc.com/images/email-signature/28-08.png] 





Confidentiality
The information in this message is confidential and privileged. It is intended 
solely for the addressee. If you are not the intended recipient, any 
disclosure, copying, or distribution of the message, or any action or omission 
taken by you in reliance on it is prohibited.
Please contact the sender immediately if you have received this message by 
mistake.
Thank you for your cooperation.


De: David Rowley 
Enviado: 18 de dezembro de 2022 11:06
Para: João Paulo Luís 
Cc: Justin 

Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
@justin:

Ran the query again. Top shows the following processes:
   PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND

 650830 postgres  20   0 7503,2m   2,6g   2,6g R 100,0   4,2  12:46.34
postgres: jal datavault_317_prd [local] EXPLAIN

 666141 postgres  20   0 7486,3m   2,6g   2,6g R 100,0   4,1   2:10.24
postgres: parallel worker for PID 650830

Your commands shows, during execution:
root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
./pgsql_tmp |sort -nr
68629 ./pgsql_tmp/pgsql_tmp650830.3.fileset
68629 ./pgsql_tmp

root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
./pgsql_tmp |sort -nr
194494 ./pgsql_tmp
194493 ./pgsql_tmp/pgsql_tmp650830.3.fileset

root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
./pgsql_tmp |sort -nr
335289 ./pgsql_tmp/pgsql_tmp650830.3.fileset
335289 ./pgsql_tmp

root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
./pgsql_tmp |sort -nr
412021 ./pgsql_tmp/pgsql_tmp650830.3.fileset
412021 ./pgsql_tmp
^^^ a few seconds after this last try the query aborted:
ERROR:  temporary file size exceeds temp_file_limit (104857600kB)

One possibility is that there are files in the tmpdir, which have been
> unlinked, but are still opened, so their space hasn't been reclaimed.
> You could check for that by running lsof -nn |grep pgsql_tmp Any deleted
> files would say things like 'DEL|deleted|inode|no such'
>
I do not really understand what you would like me to do, and when. The disk
space is growing, and it is actual files under pgsql_tmp?

Hope this tells you something, please let me know if you would like more
info, and again - thanks!


Re: Fwd: temp_file_limit?

2022-12-19 Thread Justin Pryzby
On Mon, Dec 19, 2022 at 05:57:42PM +0100, Frits Jalvingh wrote:
> @justin:
> 
> Ran the query again. Top shows the following processes:
>PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+

Thanks

> root@chatelet:/d2/var/lib/postgresql/15/main/base# du --max=2 -mx
> ./pgsql_tmp |sort -nr
> 412021 ./pgsql_tmp/pgsql_tmp650830.3.fileset
> 412021 ./pgsql_tmp
> ^^^ a few seconds after this last try the query aborted:
> ERROR:  temporary file size exceeds temp_file_limit (104857600kB)
> 
> One possibility is that there are files in the tmpdir, which have been
> > unlinked, but are still opened, so their space hasn't been reclaimed.
> > You could check for that by running lsof -nn |grep pgsql_tmp Any deleted
> > files would say things like 'DEL|deleted|inode|no such'
>
> I do not really understand what you would like me to do, and when. The disk
> space is growing, and it is actual files under pgsql_tmp?

Run this during the query as either postgres or root:
| lsof -nn |grep pgsql_tmp |grep -E 'DEL|deleted|inode|no such'

Any files it lists would be interesting to know about.

> Hope this tells you something, please let me know if you would like more
> info, and again - thanks!

I think Thomas' idea is more likely.  We'd want to know the names of
files being written, either as logged by log_temp_files or from 
| find pgsql_tmp -ls
during the query.

-- 
Justin




Re: Fwd: temp_file_limit?

2022-12-19 Thread Justin Pryzby
On Mon, Dec 19, 2022 at 06:27:57PM +0100, Frits Jalvingh wrote:
> I have listed the files during that run,

> 213M -rw--- 1 postgres postgres 213M dec 19 17:46 i100of128.p0.0
> 207M -rw--- 1 postgres postgres 207M dec 19 17:46 i100of128.p1.0
> 210M -rw--- 1 postgres postgres 210M dec 19 17:49 i100of256.p0.0
> 211M -rw--- 1 postgres postgres 211M dec 19 17:49 i100of256.p1.0
> 188M -rw--- 1 postgres postgres 188M dec 19 17:53 i100of512.p0.0
[...]

I think that proves Thomas' theory.  I'm not sure how that helps you,
though...

On Mon, Dec 19, 2022 at 01:51:33PM +1300, Thomas Munro wrote:
> One possibility is that you've hit a case that needs several rounds of
> repartitioning (because of a failure to estimate the number of tuples
> well), but we can't see that because you didn't show EXPLAIN (ANALYZE)
> output (understandably if it runs out of disk space before
> completing...).  The parallel hash code doesn't free up the previous
> generations' temporary files; it really only needs two generations'
> worth concurrently (the one it's reading from and the one it's writing
> to).  In rare cases where more generations are needed it could unlink
> the older ones -- that hasn't been implemented yet.  If you set
> log_temp_files = 0 to log temporary file names, it should be clear if
> it's going through multiple rounds of repartitioning, from the names
> (...of32..., ...of64..., ...of128..., ...of256..., ...).

-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581




Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
Ok, just to make sure that I understand correctly:
The parallel hash implementation needs to resize its table because of a
mismatch in expected tuple count. I do expect this to be true: Postgres
often grossly underestimates the expected row counts in our queries.
This is not fully implemented yet: removing the "old "files is not yet
done, so every time the table resizes it creates a new set of files and the
old ones remain.
I assume that the "used file size" only includes the "current" set of
files, and that the old ones are not counted towards that amount? That
would explain why it overallocates, of course.

By itself I now know what to do: I just need to disable all parallelism (
•̀ᴗ•́ )و ̑̑

I usually do that anyway because it makes queries die randomly. This is
just another reason.

I restarted that query with max_parallel_workers_per_gather=0, and this
does not seem to use tempspace at all. It was not exactly fast, it took 82
minutes of a single process running at 100% cpu.
https://explain.depesz.com/s/HedE

Thanks a lot for your help Justin, and all others that chimed in too.

Next round is to try to get that query to run in the 5 seconds that we know
it can do, reliably.


On Mon, Dec 19, 2022 at 6:46 PM Justin Pryzby  wrote:

> On Mon, Dec 19, 2022 at 06:27:57PM +0100, Frits Jalvingh wrote:
> > I have listed the files during that run,
>
> > 213M -rw--- 1 postgres postgres 213M dec 19 17:46 i100of128.p0.0
> > 207M -rw--- 1 postgres postgres 207M dec 19 17:46 i100of128.p1.0
> > 210M -rw--- 1 postgres postgres 210M dec 19 17:49 i100of256.p0.0
> > 211M -rw--- 1 postgres postgres 211M dec 19 17:49 i100of256.p1.0
> > 188M -rw--- 1 postgres postgres 188M dec 19 17:53 i100of512.p0.0
> [...]
>
> I think that proves Thomas' theory.  I'm not sure how that helps you,
> though...
>
> On Mon, Dec 19, 2022 at 01:51:33PM +1300, Thomas Munro wrote:
> > One possibility is that you've hit a case that needs several rounds of
> > repartitioning (because of a failure to estimate the number of tuples
> > well), but we can't see that because you didn't show EXPLAIN (ANALYZE)
> > output (understandably if it runs out of disk space before
> > completing...).  The parallel hash code doesn't free up the previous
> > generations' temporary files; it really only needs two generations'
> > worth concurrently (the one it's reading from and the one it's writing
> > to).  In rare cases where more generations are needed it could unlink
> > the older ones -- that hasn't been implemented yet.  If you set
> > log_temp_files = 0 to log temporary file names, it should be clear if
> > it's going through multiple rounds of repartitioning, from the names
> > (...of32..., ...of64..., ...of128..., ...of256..., ...).
>
> --
> Justin Pryzby
> System Administrator
> Telsasoft
> +1-952-707-8581
>


Re: Fwd: temp_file_limit?

2022-12-19 Thread Ranier Vilela
Em seg., 19 de dez. de 2022 às 16:29, Frits Jalvingh  escreveu:

> Ok, just to make sure that I understand correctly:
> The parallel hash implementation needs to resize its table because of a
> mismatch in expected tuple count. I do expect this to be true: Postgres
> often grossly underestimates the expected row counts in our queries.
> This is not fully implemented yet: removing the "old "files is not yet
> done, so every time the table resizes it creates a new set of files and the
> old ones remain.
> I assume that the "used file size" only includes the "current" set of
> files, and that the old ones are not counted towards that amount? That
> would explain why it overallocates, of course.
>
It is not necessary what is happening.
Could you try manually deleting (rm) these files, using the postgres user?
It's an ugly and dirty test, but it could indicate that files are really
being left behind, without being deleted by Postgres.

Alternatively, you could compile a version with
CHECK_WRITE_VS_EXTEND set, and try to fetch as much information from the
logs as possible,
as has been indicated by others here.


> By itself I now know what to do: I just need to disable all parallelism (
> •̀ᴗ•́ )و ̑̑
>
> I usually do that anyway because it makes queries die randomly. This is
> just another reason.
>
> I restarted that query with max_parallel_workers_per_gather=0, and this
> does not seem to use tempspace at all. It was not exactly fast, it took 82
> minutes of a single process running at 100% cpu.
> https://explain.depesz.com/s/HedE
>
Anyway, see the hint page (https://explain.depesz.com/s/HedE#hints),
maybe it will be useful.

regards,
Ranier Vilela


Re: Fwd: temp_file_limit?

2022-12-19 Thread Justin Pryzby
On Sun, Dec 18, 2022 at 06:29:41PM +0100, Frits Jalvingh wrote:
> By itself I'm used to bad query performance in Postgresql; our application
> only does bulk queries and Postgres quite often makes terrible plans for
> those, but with set enable_nestloop=false set always most of them at least
> execute. The remaining failing queries are almost 100% caused by bad join
> sequences; I plan to work around those by forcing the join order from our
> application. For instance, the exact same query above can also generate the
> following plan (this one was created by manually setting
> join_collapse_limit = 1, but fast variants also occur quite often when
> disabling parallelism):

I, too, ended up setting enable_nestloop=false for our report queries,
to avoid the worst-case plans.

But you should also try to address the rowcount misestimates.  This
underestimates the rowcount by a factor of 69 (or 138 in the plan you
sent today):

| (soort = 'MSL'::text) AND (code = 'DAE'::text)

If those conditions are correlated, you can improve the estimate by
adding extended stats object.

| CREATE STATISTICS s_h_sturingslabel_ssm_stats soort,code FROM 
s_h_sturingslabel_ssm; ANALYZE s_h_sturingslabel_ssm;

Unfortunately, stats objects currently only improve scans, and not
joins, so that might *improve* some queries, but it won't resolve the
worst problems:

| Hash Join (cost=22,832.23..44,190.21 rows=185 width=47) (actual 
time=159.725..2,645,634.918 rows=28,086,472,886 loops=1) 

Maybe you can improve that by adjusting the stats target or ndistinct...

-- 
Justin




Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
@ranier
These files ONLY exist during the query. They get deleted as soon as the
query terminates, by Postgres itself. Once the query terminates pgsql_tmp
is completely empty. Considering what Thomas said (and the actual
occurrence of the files he mentioned) this does seem to be the more likely
cause to me.


Re: Fwd: temp_file_limit?

2022-12-19 Thread Thomas Munro
On Tue, Dec 20, 2022 at 8:59 AM Frits Jalvingh  wrote:
> @ranier
> These files ONLY exist during the query. They get deleted as soon as the 
> query terminates, by Postgres itself. Once the query terminates pgsql_tmp is 
> completely empty. Considering what Thomas said (and the actual occurrence of 
> the files he mentioned) this does seem to be the more likely cause to me.

I'm working on some bug fixes near this area at the moment, so I'll
also see if I can figure out how to implement the missing eager
cleanup of earlier generations.  It's still a pretty bad scenario once
you reach it (repartitioning repeatedly, that is) and the solution to
that it probably much harder, but it's obviously not great to waste
temporary disk space like that.  BTW you can disable just parallel
hash with enable_parallel_hash=false.




Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
@justin

I tried the create statistics variant and that definitely improves the
estimate, and with that one of the "bad" cases (the one with the 82 minute
plan) now creates a good plan using only a few seconds.
That is a worthwhile path to follow. A bit hard to do, because those
conditions can be anything, but I can probably calculate the ones used per
customer and create those correlation statistics from that... It is
definitely better than tweaking the "poor man's query hints" enable_
8-/ which is really not helping with plan stability either.

That will be a lot of work, but I'll let you know the results ;)


Re: Fwd: temp_file_limit?

2022-12-19 Thread Frits Jalvingh
@Thomas


Thanks for helping identifying the issue. I think it would be nice to clean
up those obsoleted files during the run because running out of disk is
reality not a good thing to have ;) Of course the bad estimates leading to
the resize are the real issue but this at least makes it less bad.

Thanks for the tip about disabling parallel_hash but I also found it in the
source. As mentioned before I disable (on production systems) all
parallelism, not just for this issue but to prevent the OOM killer from
killing Postgres - which happens way more often with parallelism on...