date:20210312

Re: FreeBSD UFS & fsync

2021-03-12 Thread Luca Ferrari

On Thu, Mar 11, 2021 at 3:29 PM Bruce Momjian  wrote:
>
> You should really be running pg_test_fsync for this kind of testing.
>

Sorry Bruce, but it is not clear to me: pg_test_fsync compares
different fsync implementations, but not the fsync on/off setting of a
cluster.

Now, pg_test_fsync reports the "non synced writes", which are
effectively 15x faster (that is near to what I was expecting turning
off fsync):

% pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
   open_datasync   n/a
   fdatasync 16269.365 ops/sec  61 usecs/op
   fsync  8471.429 ops/sec 118 usecs/op
   fsync_writethrough  n/a
   open_sync  5664.861 ops/sec 177 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
   open_datasync   n/a
   fdatasync 15196.244 ops/sec  66 usecs/op
   fsync  7754.729 ops/sec 129 usecs/op
   fsync_writethrough  n/a
   open_sync  2670.645 ops/sec 374 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write  5486.140 ops/sec 182 usecs/op
2 *  8kB open_sync writes 2344.310 ops/sec 427 usecs/op
4 *  4kB open_sync writes 1323.548 ops/sec 756 usecs/op
8 *  2kB open_sync writes  659.449 ops/sec1516 usecs/op
   16 *  1kB open_sync writes  332.844 ops/sec3004 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
   write, fsync, close7515.006 ops/sec 133 usecs/op
   write, close, fsync7107.698 ops/sec 141 usecs/op

Non-sync'ed 8kB writes:
   write278484.510 ops/sec   4 usecs/op



However, these are not results I'm getting via pgbench.

% sudo -u postgres postgres -C fsync  -D /postgres/12/data
on
% sudo -u postgres postgres -C checkpoint_timeout  -D /postgres/12/data
30
%  pgbench -T 60 -c 4  -r -n  -U luca pgbench
...
number of transactions actually processed: 7347
latency average = 32.947 ms
tps = 121.405308 (including connections establishing)
tps = 121.429075 (excluding connections establishing)


% sudo -u postgres postgres -C checkpoint_timeout  -D /postgres/12/data
30
% sudo -u postgres postgres -C fsync  -D /postgres/12/data
off
%  pgbench -T 60 -c 4  -r -n  -U luca pgbench
...
number of transactions actually processed: 8220
latency average = 29.212 ms
tps = 136.929481 (including connections establishing)
tps = 136.963971 (excluding connections establishing)


Of course, the above test is really quick (and covers at least one
checkpoint), but event longer tests provide similar results, that are
somehow in contrast with the pg_test_fsync result.
However, apparently the problem is not related to disck cache, since
pg_test_fsync reports correct times (as far as I understand).
Am I missing something?

Luca

Re: FreeBSD UFS & fsync

2021-03-12 Thread Thomas Munro

On Fri, Mar 12, 2021 at 10:09 PM Luca Ferrari  wrote:
>fdatasync 16269.365 ops/sec  61 usecs/op
>fsync  8471.429 ops/sec 118 usecs/op

> Non-sync'ed 8kB writes:
>write278484.510 ops/sec   4 usecs/op

> tps = 136.963971 (excluding connections establishing)

It looks like your system is performing very badly for some other
reason, so that synchronous I/O waits are only a small proportion of
the time, and thus fsync=off doesn't speed things up very much.  I'd
look into profiling the system to try to figure out what it's doing...
maybe it's suffering from super slow hypercalls for gettimeofday(), or
something like that?

Re: Fwd: different execution time for the same query (and same DB status)

2021-03-12 Thread Francesco De Angelis

I have re-tested the execution times with several different values of
shared_buffers in the range 256 MB - 4 GB.
It didn't solve the problem and I noticed that for values greater than 3GB
the executions halt very frequently.
I also tried to disable JIT and this further slowed it down.
But there is an interesting news. I managed to exploit some properties of
the data I am modelling and I have changed the types of the tables and the
query as follows:

CREATE TABLE A (
a1 int2,
a2 int2,
v int4 primary key
);

CREATE TABLE B (
a1 int2,
a2 int2,
v int4 primary key
);

create index hash_pkA on A using hash(v);
create index hash_pkB on B using hash(v);

CREATE TABLE C (
la1 int2,
la2 int2,
va1 int2,
va2 int2,
res text,
c int8
);
create index hash_C on C using hash(c);

select count(*) from (
((select A.v,
coalesce(A.a1,0) as la1,
coalesce(A.a2,0) as la2,
coalesce(B.a1,0) as va1,
coalesce(B.a2,0) as va2
from A
left join B on A.v = B.v)
union all
select B.v,
0 as la1,
0 as la2,
B.a1 as va1,
B.a2 as va2
from B where B.v not in (select A.v from A))as
ta inner join C on
ta.la1 | (ta.la2::int8 << 10) |
(ta.va1::int8 << 20) |
(ta.va2::int8 << 30)
= C.c);

With these changes I get stable results around 15 seconds. Here is the
plan: https://explain.depesz.com/s/Y9dT.
I also verified that I can decrease work_mem to 300MB (against 800MB of the
original query) by keeping the same execution time. In the original one,
decreasing such a value worsens the overall performances instead.
In the new query there is only one comparison on a single column (they were
four in the original one) and I started guessing whether, in the previous
case, the DBMS considers the overall memory consumption is too high and
changes the plan. If yes, I would be interesting in understanding how the
optimisation algorithm works and whether there is a way to disable it.
In this way I can try to better figure out what to do, in the future, in
case the data model cannot be re-arranged like in this case.
Thanks again.

Best regards,
Francesco De Angelis


Il giorno mer 10 mar 2021 alle ore 14:29 Michael Lewis 
ha scritto:

> I would increase shared_buffers to 1GB or more. Also, it would be very
> interesting to see these queries executed with JIT off.
>

Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?

2021-03-12 Thread Michael Lewis

https://www.postgresql-archive.org/CPU-hogged-by-concurrent-SELECT-FOR-UPDATE-SKIP-LOCKED-td6150480.html

David Rowley on 20 Aug 2020-
"When updates occur in a non-partitioned table we can follow item
pointer chains to find the live row and check if the WHERE clause
still matches to determine if the row should be updated, or in this
case just locked since it's a SELECT FOR UPDATE. However, with
partitioned table, a concurrent UPDATE may have caused the row to have
been moved off to another partition, in which case the tuple's item
pointer cannot point to it since we don't have enough address space,
we only have 6 bytes for a TID. To get around the fact that we can't
follow these update chains, we just throw the serialization error,
which is what you're getting. Ideally, we'd figure out where the live
version of the tuple is and check if it matches the WHERE clause and
lock it if it does, but we've no means to do that with the current
design."

Moving data between partitions is supported, but maybe another partitioning
design is better suited for high concurrency use cases.

Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?

2021-03-12 Thread Kamil Dziedzic

To me it seems like bug because it clearly states it tries to lock on non
existing row:
"tuple to be locked was already moved to another partition due to
concurrent update"

But there is SKIP LOCKED clause so why throw an error that it can't lock if
we explicitly ask to not bother if you can't lock and just skip it.


Best, Saludos, Kamil Dziedzic

On Fri, Mar 12, 2021, 17:22 Michael Lewis  wrote:

>
> https://www.postgresql-archive.org/CPU-hogged-by-concurrent-SELECT-FOR-UPDATE-SKIP-LOCKED-td6150480.html
>
> David Rowley on 20 Aug 2020-
> "When updates occur in a non-partitioned table we can follow item
> pointer chains to find the live row and check if the WHERE clause
> still matches to determine if the row should be updated, or in this
> case just locked since it's a SELECT FOR UPDATE. However, with
> partitioned table, a concurrent UPDATE may have caused the row to have
> been moved off to another partition, in which case the tuple's item
> pointer cannot point to it since we don't have enough address space,
> we only have 6 bytes for a TID. To get around the fact that we can't
> follow these update chains, we just throw the serialization error,
> which is what you're getting. Ideally, we'd figure out where the live
> version of the tuple is and check if it matches the WHERE clause and
> lock it if it does, but we've no means to do that with the current
> design."
>
> Moving data between partitions is supported, but maybe another
> partitioning design is better suited for high concurrency use cases.
>

Re: FreeBSD UFS & fsync

2021-03-12 Thread Bruce Momjian

On Fri, Mar 12, 2021 at 10:33:29PM +1300, Thomas Munro wrote:
> On Fri, Mar 12, 2021 at 10:09 PM Luca Ferrari  wrote:
> >fdatasync 16269.365 ops/sec  61 usecs/op
> >fsync  8471.429 ops/sec 118 usecs/op
> 
> > Non-sync'ed 8kB writes:
> >write278484.510 ops/sec   4 usecs/op
> 
> > tps = 136.963971 (excluding connections establishing)
> 
> It looks like your system is performing very badly for some other
> reason, so that synchronous I/O waits are only a small proportion of
> the time, and thus fsync=off doesn't speed things up very much.  I'd
> look into profiling the system to try to figure out what it's doing...
> maybe it's suffering from super slow hypercalls for gettimeofday(), or
> something like that?

And we have pg_test_timing for gettimeofday() testing.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: FreeBSD UFS & fsync

Re: FreeBSD UFS & fsync

Re: Fwd: different execution time for the same query (and same DB status)

Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?

Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?

Re: FreeBSD UFS & fsync

6 matches

Site Navigation

Mail list logo

Footer information