Re: ERROR: found xmin from before relfrozenxid

2019-01-26 Thread Adrien NAYRAT

On 1/25/19 6:20 PM, Mariel Cherkassky wrote:
I'm getting this issue when I try to connect to a specific db. Does it 
matters what table I specify ? Should I just choose a random table from 
the problematic db? If I'll dump the db and restore it it can help ?


Error message is on "db1.public.table_1", but maybe other tables are 
impacted.


If you can dump and restore your database it should fix your issue. Be 
careful to apply minor update (9.6.11 f you can).





Re: ERROR: found xmin from before relfrozenxid

2019-01-26 Thread Mariel Cherkassky
Update to the minor version should be an easy solution - yum update
postgresql . What did you mean by carful

On Sat, Jan 26, 2019, 12:48 PM Adrien NAYRAT  On 1/25/19 6:20 PM, Mariel Cherkassky wrote:
> > I'm getting this issue when I try to connect to a specific db. Does it
> > matters what table I specify ? Should I just choose a random table from
> > the problematic db? If I'll dump the db and restore it it can help ?
>
> Error message is on "db1.public.table_1", but maybe other tables are
> impacted.
>
> If you can dump and restore your database it should fix your issue. Be
> careful to apply minor update (9.6.11 f you can).
>
>


Re: ERROR: found xmin from before relfrozenxid

2019-01-26 Thread Adrien NAYRAT

On 1/26/19 11:56 AM, Mariel Cherkassky wrote:
Update to the minor version should be an easy solution - yum update 
postgresql . What did you mean by carful



Sorry, I meant, do not forget to apply update to be sure same bug do not 
happen again.




Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-26 Thread Saurabh Nanda
Hi Jeff,

Thank you for replying.



> wal_sync_method=fsync
>
>
Why this change?


Actually, I re-checked and noticed that this config section was left to
it's default values, which is the following. Since the commented line said
`wal_sync_method = fsync`, I _assumed_ that's the default value. But it
seems that Linux uses fdatasync, by default, and the output of
pg_test_fsync also shows that it is /probably/ the fastest method on my
hardware.

# wal_sync_method = fsync
# the default is the first option supported by the operating system:
#   open_datasync
#   fdatasync (default on Linux)
#   fsync
#   fsync_writethrough
#   open_sync


PGOPTIONS="-c synchronous_commit=off" pgbench -T 3600 -P 10 


I am currently running all my benchmarks with synchronous_commit=off and
will get back with my findings.

You could also try pg_test_fsync to get low-level information, to
> supplement the high level you get from pgbench.


Thanks for pointing me to this tool. never knew pg_test_fsync existed! I've
run `pg_test_fsync -s 60` two times and this is the output -
https://gist.github.com/saurabhnanda/b60e8cf69032b570c5b554eb50df64f8 I'm
not sure what to make of it? Shall I tweak the setting of `wal_sync_method`
to something other than the default value?

The effects of max_wal_size are going to depend on how you have IO
> configured, for example does pg_wal shared the same devices and controllers
> as the base data?  It is mostly about controlling disk usage and
> crash-recovery performance, neither of which is of primary importance to
> pgbench performance.


 The WAL and the data-directory reside on the same SSD disk -- is this a
bad idea? I was under the impression that smaller values for max_wal_size
cause pg-server to do "maintenance work" related to wal rotation, etc. more
frequently and would lead to lower pgbench performance.

Not all SSD are created equal, so the details here matter, both for the
> underlying drives and the raid controller.


Here's the relevant output from lshw --
https://gist.github.com/saurabhnanda/d7107d4ab1bb48e94e0a5e3ef96e7260 It
seems I have Micron SSDs. I tried finding more information on RAID but
couldn't get anything in the lshw or lspci output except the following --
`SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI
mode] (rev 31)`. Moreover, the devices are showing up as /dev/md1,
/dev/md2, etc. So, if my understanding is correct, I don't think I'm on
hardware RAID, but software RAID, right?

These machines are from the EX-line of dedicated servers provided by
Hetzner, btw.

PS: Cc-ing the list back again because I assume you didn't intend for your
reply to be private, right?

-- Saurabh.


Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-26 Thread Saurabh Nanda
>
>
> PGOPTIONS="-c synchronous_commit=off" pgbench -T 3600 -P 10 
>
>
> I am currently running all my benchmarks with synchronous_commit=off and
> will get back with my findings.
>


It seems that PGOPTIONS="-c synchronous_commit=off" has a significant
impact. However, I still can not understand why the TPS for the optimised
case is LOWER than the default for higher concurrency levels!

++-++
| client | Mostly defaults [1] | Optimised settings [2] |
++-++
| 1  | 80-86   | 169-180|
++-++
| 6  | 350-376 | 1265-1397  |
++-++
| 12 | 603-619 | 1746-2352  |
++-++
| 24 | 947-1015| 1869-2518  |
++-++
| 48 | 1435-1512   | 1912-2818  |
++-++
| 96 | 1769-1811   | 1546-1753  |
++-++
| 192| 1857-1992   | 1332-1508  |
++-++
| 384| 1667-1793   | 1356-1450  |
++-++


[1] "Mostly default" settings are whatever ships with Ubuntu 18.04 + PG 11.
A snippet of the relevant setts are given below:

max_connection=400
work_mem=4MB
maintenance_work_mem=64MB
shared_buffers=128MB
temp_buffers=8MB
effective_cache_size=4GB
wal_buffers=-1
wal_sync_method=fsync
max_wal_size=1GB
*autovacuum=off# Auto-vacuuming was disabled*


[2] An optimised version of settings was obtained from
https://pgtune.leopard.in.ua/#/  and along with that the benchmarks were
run with *PGOPTIONS="-c synchronous_commit=off"*

max_connections = 400
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 3495kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 12
max_parallel_workers_per_gather = 6
max_parallel_workers = 12
*autovacuum=off# Auto-vacuuming was disabled*