PostgreSQLv14 TPC-H performance GCC vs Clang

2021-11-16 Thread arjun shetty
Yes, currently focusing affects queries as well.
In meanwhile on analysis(hardware level) and sample examples noticed
1. GCC performance  better than Clang on int128 .
2. Clang performance better than GCC on long long
 the reference example
https://stackoverflow.com/questions/63029428/why-is-int128-t-faster-than-long-long-on-x86-64-gcc

3.GCC enabled with “ fexcess-precision=standard” (precision cast for
floating point ).

Is these 3 points can make performance  difference GCC vs Clang in
PostgreSQLv14 in Apple/AMD/()environment(intel environment need to check).
In these environment int128 enabled wrt PostgreSQLv14.

On Friday, November 5, 2021, Tomas Vondra 
wrote:

> Hi,
>
> IMO this thread provides so little information it's almost impossible to
> answer the question. There's almost no information about the hardware,
> scale of the test, configuration of the Postgres instance, the exact build
> flags, differences in generated asm code, etc.
>
> I find it hard to believe merely switching from clang to gcc yields 22%
> speedup - that's way higher than any differences we've seen in the past.
>
> In my experience, the speedup is unlikely to be "across the board". There
> will be a handful of affected queries, while most remaining queries will be
> about the same. In that case you need to focus on those queries, see if the
> plans are the same, do some profiling, etc.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: Need help identifying a periodic performance issue.

2021-11-16 Thread Robert Creager


On Nov 15, 2021, at 10:29 PM, Justin Pryzby 
mailto:[email protected]>> wrote:

This message originated outside your organization.

On Tue, Nov 16, 2021 at 04:43:25AM +, Robert Creager wrote:
> We’re executing the following copy to fill a table with approximately 5k 
> records, then repeating for a total of 250k records. Normally, this copy 
> executes < 1 second, with the entire set taking a couple of minutes. The 
> problem is not reproducible on command, but usually within a couple of hours 
> of starting some test runs.
>
> COPY ds3.blob (byte_offset, checksum, checksum_type, id, length, object_id) 
> FROM STDIN WITH DELIMITER AS ‘|’
>
> But, occasionally we get into a huge performance bottleneck for about 2 
> hours, where these copy operations are taking 140 seconds or so
>
> Nov 15 22:25:49 sm4u-34 postgres[5799]: [381-1] 
> db=tapesystem,user=Administrator,app=PostgreSQL JDBC 
> Driver,client=127.0.0.1 LOG: duration: 145326.293 ms 
> statement: COPY ds3.blob (byte_offset, checksum, checksum_type, id, length, 
> object_id) FROM STDIN WITH DELIMITER AS '|'

> I’m logging statements with pgbadger monitoring the logs. There are no 
> apparent auto-vacuum’s running, nor any vacuums, nor anything at all really. 
> Other select queries around that time frame are executing normally.

What about checkpoints ?

Would you show the "^checkpoint starting" and "^checkpoint complete" logs
surrounding a slow COPY ?

Sorry, it was late last night, I meant to include the checkpoint info. I didn’t 
have enough logs around the one I pointed out above, my tail got aborted by a 
reboot.  Working on a log server… From an earlier one:

Nov  5 03:56:28 sm4u-34 postgres[4934]: [2679-1] db=,user=,app=,client= LOG:  
checkpoint complete: wrote 247 buffers (0.0%); 0 WAL file(s) added, 0 removed, 
0 recycled; write=26.279 s, sync=0.002 s, total=26.323 s; sync files=142, 
longest=0.001 s, average=0.001 s; distance=592 kB, estimate=279087 kB
Nov  5 04:26:03 sm4u-34 postgres[4934]: [2680-1] db=,user=,app=,client= LOG:  
checkpoint starting: time
Nov  5 04:26:14 sm4u-34 postgres[4934]: [2681-1] db=,user=,app=,client= LOG:  
checkpoint complete: wrote 115 buffers (0.0%); 0 WAL file(s) added, 0 removed, 
0 recycled; write=11.880 s, sync=0.003 s, total=11.885 s; sync files=75, 
longest=0.001 s, average=0.001 s; distance=541 kB, estimate=251232 kB
Nov  5 04:56:03 sm4u-34 postgres[4934]: [2682-1] db=,user=,app=,client= LOG:  
checkpoint starting: time
Nov  5 04:56:15 sm4u-34 postgres[4934]: [2683-1] db=,user=,app=,client= LOG:  
checkpoint complete: wrote 103 buffers (0.0%); 0 WAL file(s) added, 0 removed, 
0 recycled; write=11.011 s, sync=0.002 s, total=11.015 s; sync files=74, 
longest=0.001 s, average=0.001 s; distance=528 kB, estimate=226162 kB
Nov  5 05:15:28 sm4u-34 postgres[59442]: [24-1] 
db=tapesystem,user=Administrator,app=PostgreSQL JDBC Driver,client=127.0.0.1 
LOG:  duration: 1.059 ms  statement: COPY ds3.blob (byte_offset, checksum, 
checksum_type, id, length, object_id) FROM STDIN WITH DELIMITER AS '|'
Nov  5 05:26:03 sm4u-34 postgres[4934]: [2684-1] db=,user=,app=,client= LOG:  
checkpoint starting: time
Nov  5 05:26:27 sm4u-34 postgres[4934]: [2685-1] db=,user=,app=,client= LOG:  
checkpoint complete: wrote 226 buffers (0.0%); 0 WAL file(s) added, 1 removed, 
0 recycled; write=24.000 s, sync=0.006 s, total=24.037 s; sync files=122, 
longest=0.001 s, average=0.001 s; distance=583 kB, estimate=203604 kB
Nov  5 05:56:03 sm4u-34 postgres[4934]: [2686-1] db=,user=,app=,client= LOG:  
checkpoint starting: time
Nov  5 05:56:24 sm4u-34 postgres[4934]: [2687-1] db=,user=,app=,client= LOG:  
checkpoint complete: wrote 199 buffers (0.0%); 0 WAL file(s) added, 0 removed, 
0 recycled; write=21.212 s, sync=0.004 s, total=21.218 s; sync files=122, 
longest=0.001 s, average=0.001 s; distance=580 kB, estimate=183302 kB
Nov  5 06:26:03 sm4u-34 postgres[4934]: [2688-1] db=,user=,app=,client= LOG:  
checkpoint starting: time
Nov  5 06:26:22 sm4u-34 postgres[4934]: [2689-1] db=,user=,app=,client= LOG:  
checkpoint complete: wrote 178 buffers (0.0%); 0 WAL file(s) added, 0 removed, 
0 recycled; write=19.078 s, sync=0.005 s, total=19.084 s; sync files=120, 
longest=0.001 s, average=0.001 s; distance=563 kB, estimate=165028 kB
Nov  5 06:32:27 sm4u-34 postgres[7728]: [213-1] 
db=tapesystem,user=Administrator,app=PostgreSQL JDBC Driver,client=127.0.0.1 
LOG:  duration: 143318.661 ms  statement: COPY ds3.blob (byte_offset, checksum, 
checksum_type, id, length, object_id) FROM STDIN WITH DELIMITER AS '|'
Nov  5 06:34:56 sm4u-34 postgres[7728]: [214-1] 
db=tapesystem,user=Administrator,app=PostgreSQL JDBC Driver,client=127.0.0.1 
LOG:  duration: 149175.227 ms  statement: COPY ds3.blob (byte_offset, checksum, 
checksum_type, id, length, object_id) FROM STDIN WITH DELIMITER AS '|'
Nov  5 06:37:27 sm4u-34 postgres[7728]: [215-1] 
db=tapesystem,user=Administrator,app=PostgreSQL JDBC Driver,client=127.0.0.1 
LOG:  duration: 15

Re: Need help identifying a periodic performance issue.

2021-11-16 Thread Robert Creager

On Nov 15, 2021, at 10:50 PM, Thomas Munro 
mailto:[email protected]>> wrote:

This message originated outside your organization.

On Tue, Nov 16, 2021 at 5:43 PM Robert Creager 
mailto:[email protected]>> wrote:
One CPU is pegged, the data has been sent over STDIN, so Postgres is not 
waiting for more, there are no other queries running using this select:

So PostgreSQL is eating 100% CPU, with no value shown in
wait_event_type, and small numbers of system calls are counted.  In
that case, is there an interesting user stack that jumps out with a
profiler during the slowdown (or the kernel version, stack())?

sudo dtrace -n 'profile-99 /arg0/ { @[ustack()] = count(); } tick-10s
{ exit(0); }

I setup a monitoring script to do the dtrace stack sampler you sent once a 
minute on the top CPU consuming Postgres process.  Now I wait until we 
reproduce it.

#!/usr/local/bin/bash

while [[ true ]]; do
   DATE=$(date "+%d-%H:%M:%S")
   PID=$(top -b | grep postgres | head -n 1  | awk '{print $1}')
   echo "${DATE} ${PID}"
   dtrace -n 'profile-99 /pid == '$PID'/ { @[ustack()] = count(); } tick-10s { 
exit(0); }' > dtrace/dtrace_${DATE}.txt
   sleep 60
done

Presuming this is the type of output you are expecting:

CPU IDFUNCTION:NAME
  0  58709:tick-10s


  postgres`AtEOXact_LargeObject+0x11
  postgres`CommitTransaction+0x127
  postgres`CommitTransactionCommand+0xf2
  postgres`PostgresMain+0x1fef
  postgres`process_startup_packet_die
  postgres`0x73055b
  postgres`PostmasterMain+0xf36
  postgres`0x697837
  postgres`_start+0x100
  `0x80095f008
1

  postgres`printtup+0xf3
  postgres`standard_ExecutorRun+0x136
  postgres`PortalRunSelect+0x10f
  postgres`PortalRun+0x1c8
  postgres`PostgresMain+0x1f94
  postgres`process_startup_packet_die
  postgres`0x73055b
  postgres`PostmasterMain+0xf36
  postgres`0x697837
  postgres`_start+0x100
  `0x80095f008
1
...




Re: Need help identifying a periodic performance issue.

2021-11-16 Thread Thomas Munro
On Wed, Nov 17, 2021 at 11:40 AM Robert Creager
 wrote:
> Presuming this is the type of output you are expecting:
>
> CPU IDFUNCTION:NAME
>   0  58709:tick-10s
>
>
>   postgres`AtEOXact_LargeObject+0x11
>   postgres`CommitTransaction+0x127
>   postgres`CommitTransactionCommand+0xf2
>   postgres`PostgresMain+0x1fef
>   postgres`process_startup_packet_die
>   postgres`0x73055b
>   postgres`PostmasterMain+0xf36
>   postgres`0x697837
>   postgres`_start+0x100
>   `0x80095f008
> 1

It's the right output format, but isn't /pid == '$PID'/ only going to
match one single process called "postgres"?   Maybe /execname ==
"postgres"/ to catch them all?  Hopefully it'll be obvious what's
going on from an outlier stack with a high sample count.  Can also be
useful to convert the output to flamegraph format if CPU time is
distributed over many distinct stacks.




Re: Need help identifying a periodic performance issue.

2021-11-16 Thread Thomas Munro
On Wed, Nov 17, 2021 at 11:51 AM Thomas Munro  wrote:
> It's the right output format, but isn't /pid == '$PID'/ only going to
> match one single process called "postgres"?   Maybe /execname ==
> "postgres"/ to catch them all?

Oh, duh, it's the top CPU one.  Makes sense.  Never mind :-)




Re: Lock contention high

2021-11-16 Thread arjun shetty
Hi Askhil

PostgreSQL utilizes  lightweight locks(LWLocks) to synchronize and control
access to the buffer content. A process acquires an LWLock in a  shared
mode to read from the buffer and an exclusive mode  to write to the buffer.
Therefore, while holding an exclusive lock, a process prevents other
processes from acquiring a shared or exclusive lock. Also, a shared lock
can be acquired concurrently by other processes. The issue starts when many
processes acquire an exclusive lock on buffer content. As a result,
LwlockAcquire seen as top hot function in profilng.
Here  need to understand LwlockAcquire is lock contention or cpu time spent
inside the method/ function(top function in profiling)

It can analysed log  “LwStatus” with parameters like
ex-acquire-count(exclusive mode) , sh-acquire-count , block-count and
spin-delay-count

Total lock acquisition request = ex-acquire-count+sh-acquire-count)
Time lock contention %= block count)/ Total lock acquisition request.

Time lock contention may provide as most of cpu time inside the function
rather than spinning/ waiting for lock.

On Friday, November 12, 2021, Ashkil Dighin 
wrote:

> Hi
> I suspect lock contention and performance issues with __int128. And I
> would like to check the performance by forcibly disabling
> int128(Maxalign16bytes) and enable like long long(maxlign 8bytes).
>  Is it possible to disable int128 in PostgreSQL?
>
> On Thursday, October 28, 2021, Andres Freund  wrote:
>
>> Hi,
>>
>> On October 27, 2021 2:44:56 PM PDT, Ashkil Dighin <
>> [email protected]> wrote:
>> >Hi,
>> >Yes, lock contention reduced with postgresqlv14.
>> >Lock acquire reduced 18% to 10%
>> >10.49 %postgres  postgres[.] LWLockAcquire
>> >5.09%  postgres  postgres[.] _bt_compare
>> >
>> >Is lock contention can be reduced to 0-3%?
>>
>> Probably not, or at least not easily. Because of the atomic instructions
>> the locking also includes  some other costs (e.g. cache misses, serializing
>> store buffers,...).
>>
>> There's a good bit we can do to increase the cache efficiency around
>> buffer headers, but it won't get us quite that low I'd guess.
>>
>>
>> >On pg-stat-activity shown LwLock as “BufferCounter” and “WalInsert”
>>
>> Without knowing what proportion they have to each and to non-waiting
>> backends that unfortunately doesn't help that much..
>>
>> Andres
>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>
>