Re: FreeBSD UFS & fsync
On Thu, Mar 11, 2021 at 3:29 PM Bruce Momjian wrote: > > You should really be running pg_test_fsync for this kind of testing. > Sorry Bruce, but it is not clear to me: pg_test_fsync compares different fsync implementations, but not the fsync on/off setting of a cluster. Now, pg_test_fsync reports the "non synced writes", which are effectively 15x faster (that is near to what I was expecting turning off fsync): % pg_test_fsync 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync n/a fdatasync 16269.365 ops/sec 61 usecs/op fsync 8471.429 ops/sec 118 usecs/op fsync_writethrough n/a open_sync 5664.861 ops/sec 177 usecs/op Compare file sync methods using two 8kB writes: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync n/a fdatasync 15196.244 ops/sec 66 usecs/op fsync 7754.729 ops/sec 129 usecs/op fsync_writethrough n/a open_sync 2670.645 ops/sec 374 usecs/op Compare open_sync with different write sizes: (This is designed to compare the cost of writing 16kB in different write open_sync sizes.) 1 * 16kB open_sync write 5486.140 ops/sec 182 usecs/op 2 * 8kB open_sync writes 2344.310 ops/sec 427 usecs/op 4 * 4kB open_sync writes 1323.548 ops/sec 756 usecs/op 8 * 2kB open_sync writes 659.449 ops/sec1516 usecs/op 16 * 1kB open_sync writes 332.844 ops/sec3004 usecs/op Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) write, fsync, close7515.006 ops/sec 133 usecs/op write, close, fsync7107.698 ops/sec 141 usecs/op Non-sync'ed 8kB writes: write278484.510 ops/sec 4 usecs/op However, these are not results I'm getting via pgbench. % sudo -u postgres postgres -C fsync -D /postgres/12/data on % sudo -u postgres postgres -C checkpoint_timeout -D /postgres/12/data 30 % pgbench -T 60 -c 4 -r -n -U luca pgbench ... number of transactions actually processed: 7347 latency average = 32.947 ms tps = 121.405308 (including connections establishing) tps = 121.429075 (excluding connections establishing) % sudo -u postgres postgres -C checkpoint_timeout -D /postgres/12/data 30 % sudo -u postgres postgres -C fsync -D /postgres/12/data off % pgbench -T 60 -c 4 -r -n -U luca pgbench ... number of transactions actually processed: 8220 latency average = 29.212 ms tps = 136.929481 (including connections establishing) tps = 136.963971 (excluding connections establishing) Of course, the above test is really quick (and covers at least one checkpoint), but event longer tests provide similar results, that are somehow in contrast with the pg_test_fsync result. However, apparently the problem is not related to disck cache, since pg_test_fsync reports correct times (as far as I understand). Am I missing something? Luca
Re: FreeBSD UFS & fsync
On Fri, Mar 12, 2021 at 10:09 PM Luca Ferrari wrote: >fdatasync 16269.365 ops/sec 61 usecs/op >fsync 8471.429 ops/sec 118 usecs/op > Non-sync'ed 8kB writes: >write278484.510 ops/sec 4 usecs/op > tps = 136.963971 (excluding connections establishing) It looks like your system is performing very badly for some other reason, so that synchronous I/O waits are only a small proportion of the time, and thus fsync=off doesn't speed things up very much. I'd look into profiling the system to try to figure out what it's doing... maybe it's suffering from super slow hypercalls for gettimeofday(), or something like that?
Re: Fwd: different execution time for the same query (and same DB status)
I have re-tested the execution times with several different values of shared_buffers in the range 256 MB - 4 GB. It didn't solve the problem and I noticed that for values greater than 3GB the executions halt very frequently. I also tried to disable JIT and this further slowed it down. But there is an interesting news. I managed to exploit some properties of the data I am modelling and I have changed the types of the tables and the query as follows: CREATE TABLE A ( a1 int2, a2 int2, v int4 primary key ); CREATE TABLE B ( a1 int2, a2 int2, v int4 primary key ); create index hash_pkA on A using hash(v); create index hash_pkB on B using hash(v); CREATE TABLE C ( la1 int2, la2 int2, va1 int2, va2 int2, res text, c int8 ); create index hash_C on C using hash(c); select count(*) from ( ((select A.v, coalesce(A.a1,0) as la1, coalesce(A.a2,0) as la2, coalesce(B.a1,0) as va1, coalesce(B.a2,0) as va2 from A left join B on A.v = B.v) union all select B.v, 0 as la1, 0 as la2, B.a1 as va1, B.a2 as va2 from B where B.v not in (select A.v from A))as ta inner join C on ta.la1 | (ta.la2::int8 << 10) | (ta.va1::int8 << 20) | (ta.va2::int8 << 30) = C.c); With these changes I get stable results around 15 seconds. Here is the plan: https://explain.depesz.com/s/Y9dT. I also verified that I can decrease work_mem to 300MB (against 800MB of the original query) by keeping the same execution time. In the original one, decreasing such a value worsens the overall performances instead. In the new query there is only one comparison on a single column (they were four in the original one) and I started guessing whether, in the previous case, the DBMS considers the overall memory consumption is too high and changes the plan. If yes, I would be interesting in understanding how the optimisation algorithm works and whether there is a way to disable it. In this way I can try to better figure out what to do, in the future, in case the data model cannot be re-arranged like in this case. Thanks again. Best regards, Francesco De Angelis Il giorno mer 10 mar 2021 alle ore 14:29 Michael Lewis ha scritto: > I would increase shared_buffers to 1GB or more. Also, it would be very > interesting to see these queries executed with JIT off. >
Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?
https://www.postgresql-archive.org/CPU-hogged-by-concurrent-SELECT-FOR-UPDATE-SKIP-LOCKED-td6150480.html David Rowley on 20 Aug 2020- "When updates occur in a non-partitioned table we can follow item pointer chains to find the live row and check if the WHERE clause still matches to determine if the row should be updated, or in this case just locked since it's a SELECT FOR UPDATE. However, with partitioned table, a concurrent UPDATE may have caused the row to have been moved off to another partition, in which case the tuple's item pointer cannot point to it since we don't have enough address space, we only have 6 bytes for a TID. To get around the fact that we can't follow these update chains, we just throw the serialization error, which is what you're getting. Ideally, we'd figure out where the live version of the tuple is and check if it matches the WHERE clause and lock it if it does, but we've no means to do that with the current design." Moving data between partitions is supported, but maybe another partitioning design is better suited for high concurrency use cases.
Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?
To me it seems like bug because it clearly states it tries to lock on non existing row: "tuple to be locked was already moved to another partition due to concurrent update" But there is SKIP LOCKED clause so why throw an error that it can't lock if we explicitly ask to not bother if you can't lock and just skip it. Best, Saludos, Kamil Dziedzic On Fri, Mar 12, 2021, 17:22 Michael Lewis wrote: > > https://www.postgresql-archive.org/CPU-hogged-by-concurrent-SELECT-FOR-UPDATE-SKIP-LOCKED-td6150480.html > > David Rowley on 20 Aug 2020- > "When updates occur in a non-partitioned table we can follow item > pointer chains to find the live row and check if the WHERE clause > still matches to determine if the row should be updated, or in this > case just locked since it's a SELECT FOR UPDATE. However, with > partitioned table, a concurrent UPDATE may have caused the row to have > been moved off to another partition, in which case the tuple's item > pointer cannot point to it since we don't have enough address space, > we only have 6 bytes for a TID. To get around the fact that we can't > follow these update chains, we just throw the serialization error, > which is what you're getting. Ideally, we'd figure out where the live > version of the tuple is and check if it matches the WHERE clause and > lock it if it does, but we've no means to do that with the current > design." > > Moving data between partitions is supported, but maybe another > partitioning design is better suited for high concurrency use cases. >
Re: FreeBSD UFS & fsync
On Fri, Mar 12, 2021 at 10:33:29PM +1300, Thomas Munro wrote: > On Fri, Mar 12, 2021 at 10:09 PM Luca Ferrari wrote: > >fdatasync 16269.365 ops/sec 61 usecs/op > >fsync 8471.429 ops/sec 118 usecs/op > > > Non-sync'ed 8kB writes: > >write278484.510 ops/sec 4 usecs/op > > > tps = 136.963971 (excluding connections establishing) > > It looks like your system is performing very badly for some other > reason, so that synchronous I/O waits are only a small proportion of > the time, and thus fsync=off doesn't speed things up very much. I'd > look into profiling the system to try to figure out what it's doing... > maybe it's suffering from super slow hypercalls for gettimeofday(), or > something like that? And we have pg_test_timing for gettimeofday() testing. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
