Re: tablespace to benefit from ssd ?

2020-02-19 Thread Sebastiaan Mannem
Unless this is about reads exclusively I would start with putting wal on
ssd.
What you might also do, is create separate filesystems (lvm). You can then
keep track of io with iostat per filesystem and see what would benefit
most. And see storage size usage also.
And you could use lvm to move filesystems to and from ssd hot. So just
downtime once.

Please share your end findings in this thread too.

Op wo 19 feb. 2020 om 04:42 schreef Nicolas PARIS 

> Hi
>
> I have both hdd and ssd disk on the postgres server. The cluster is
> right now created on the hdd only. I am considering using a tablespace
> to put some highly used postgres object on the ssd disk. Of course the
> ssd is small compared to the hdd, and I need to choose carefully what
> objects are stored on that side.
>
> I am wondering what kind of object (indexes, data) would benefit from
> ssd. The database primary/foreign keys are highly used and there is
> almost no sequencial scan. However the server has a large amount of ram
> memory and I suspect all of those indexes are already cached in ram.
>
> I have read that tablespaces introduce overhead of maintenance and
> introduce complication for replication. But on the other hand I have
> this ssd disk ready for something.
>
> Any recommandation ?
>
> --
> nicolas paris
>
>
> --


[image: EDB Postgres] 
Sebastiaan Alexander Mannem
Product Manager
Anthony Fokkerweg 1
1059 CM Amsterdam, The Netherlands


T: +31 6 82521560 <+31682521560>
www.edbpostgres.com
[image: Blog Feed]  [image: Facebook]
 [image: Twitter]
 [image: LinkedIn]
 [image: Google+]



Re: tablespace to benefit from ssd ?

2020-02-19 Thread Laurenz Albe
On Wed, 2020-02-19 at 05:42 +0100, Nicolas PARIS wrote:
> I have both hdd and ssd disk on the postgres server. The cluster is
> right now created on the hdd only. I am considering using a tablespace
> to put some highly used postgres object on the ssd disk. Of course the
> ssd is small compared to the hdd, and I need to choose carefully what
> objects are stored on that side.
> 
> I am wondering what kind of object (indexes, data) would benefit from
> ssd. The database primary/foreign keys are highly used and there is
> almost no sequencial scan. However the server has a large amount of ram
> memory and I suspect all of those indexes are already cached in ram.
> 
> I have read that tablespaces introduce overhead of maintenance and
> introduce complication for replication. But on the other hand I have
> this ssd disk ready for something.
> 
> Any recommandation ?

Put "pg_stat_statements" into "shared_preload_libraries" and restart the server.

Set "track_io_timing" to on.

Let your workload run for at least a day.

Install the "pg_stat_statements" extension and run

  SELECT blk_read_time, query
  FROM pg_stat_statements
  ORDER BY blk_read_time DESC LIMIT 20;

That will give you the 20 queries that spent the most time reading from I/O.

Examine those queries with EXPLAIN (ANALYZE, BUFFERS) and see which tables or
indexes cause the I/O.

Then you have a list of candidates for the fast tablespace.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: SubtransControlLock and performance problems

2020-02-19 Thread Lars Aksel Opsahl
Hi



>From: Laurenz Albe 

>Sent: Tuesday, February 18, 2020 6:27 PM

>ATo: Pavel Stehule ; Tom Lane 

>Cc: Lars Aksel Opsahl ; 
>[email protected] 

>Subject: Re: SubtransControlLock and performance problems

>

>Did you have several concurrent sessions accessing the rows that others 
>created?


Hi


Thanks every body, I have done more testing here..


- I was not able fix this problem by increasing this values

src/include/access/subtrans.h, define NUM_SUBTRANS_BUFFERS 8196

src/include/storage/proc.h , PGPROC_MAX_CACHED_SUBXIDS 128


If tried to increase PGPROC_MAX_CACHED_SUBXIDS more than 128 Postgres core 
dumped. I tried to increase shared memory and other settings but I was not able 
to get it statble.


With the values above I did see same performance problems and we ended with a 
lot of subtransControlLock.


So I started to change the code based on your feedbacks.


- What seems to work very good in combination with a catch exception and retry 
pattern is to insert the data in to separate table for each job. (I the current 
testcase we reduced the number of subtransControlLock from many hundreds to 
almost none.)


Then I later can pick up these results from different the tables with another 
job that inserts data in to common data structure and in this job I don’t have 
any catch retry pattern. Then I was able to handle 534 of 592 jobs/cells with 
out any subtransControlLock at all.


But 58 jobs did not finish so for these I had to use a catch retry pattern and 
then then I got the subtransControlLock problems, but thats for a limited sets 
of the data.


Between each job I also close open the connections I dblink.


In this test I used dataset with data set 619230 surface with total of 25909671 
and it did finish in 24:42.363, with NUM_SUBTRANS_BUFFERS 8196 and 
PGPROC_MAX_CACHED_SUBXIDS 128. When I changed this back to the original values 
the same test took 23:54.973.


For me it’s seems like in Postgres it’s better to have functions that returns 
an error state together with the result and not throws an exceptions, because 
exceptions leads performance degeneration when working with big datasets.


Thanks


Lars



Re: tablespace to benefit from ssd ?

2020-02-19 Thread Justin Pryzby
On Wed, Feb 19, 2020 at 05:42:41AM +0100, Nicolas PARIS wrote:
> Hi
> 
> I have both hdd and ssd disk on the postgres server. The cluster is
> right now created on the hdd only. I am considering using a tablespace
> to put some highly used postgres object on the ssd disk. Of course the
> ssd is small compared to the hdd, and I need to choose carefully what
> objects are stored on that side.
> 
> I am wondering what kind of object (indexes, data) would benefit from
> ssd. The database primary/foreign keys are highly used and there is
> almost no sequencial scan. However the server has a large amount of ram
> memory and I suspect all of those indexes are already cached in ram.
> 
> I have read that tablespaces introduce overhead of maintenance and
> introduce complication for replication. But on the other hand I have
> this ssd disk ready for something.

To start with, you can:
ALTER SYSTEM SET temp_tablespaces='ssd';

That will improve speed of sorts which spill to disk (if any).

+1 to using LVM for purposes of instrumentation.

You can also:
ALTER TABLESPACE ssd SET (random_page_cost=1.0);

It'd be difficult to suggest anything further without knowing about your
workload or performance goals or issues.

-- 
Justin




Re: SubtransControlLock and performance problems

2020-02-19 Thread Alvaro Herrera
On 2020-Feb-19, Lars Aksel Opsahl wrote:

> With the values above I did see same performance problems and we ended
> with a lot of subtransControlLock.
> 
> So I started to change the code based on your feedbacks.
> 
> - What seems to work very good in combination with a catch exception
> and retry pattern is to insert the data in to separate table for each
> job. (I the current testcase we reduced the number of
> subtransControlLock from many hundreds to almost none.)

I think at this point your only recourse is to start taking profiles to
see where the time is going.  Without that, you're just flying blind and
whatever you do will not necessarily move your needle at all.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: How to avoid UPDATE performance degradation in a transaction

2020-02-19 Thread Andres Freund
Hi,

On 2020-02-13 16:16:14 -0500, Tom Lane wrote:
> In principle perhaps we could improve the granularity of dead-row
> detection, so that if a row version is both created and deleted by
> the current transaction, and we have no live snapshots that could
> see it, we could go ahead and mark the row dead.  But it's not clear
> that that'd be worth the extra cost to do.  Certainly no existing PG
> release tries to do it.

I've repeatedly wondered about improving our logic around this. There's
a lot of cases where we deal with a lot of bloat solely because our
simplistic liveliness analysis.

It's not just within a single transaction, but also makes the impact of
longrunning transactions significantly worse. It's common to have
"areas" of some tables that change quickly, without normally causing a
lot of problems - but once there is a single longrunning transaction the
amount of bloat created is huge. It's not that bad to have the "hot
areas" increased in size by 2-3x, but right now it'll often be several
orders of magnitude.

But perhaps it doesn't make sense to conflate your suggestion above with
what I brought up: There'd might not be a lot of common
code/infrastructure between deleting row versions that are invisible due
to no backend having a snapshot to see them (presumably inferred via
xmin/xmax), and newly created row versions within a transaction that are
invisible because there's no snapshot with that cid.

Greetings,

Andres Freund