date:20200504

Re: 600 million rows of data. Bad hardware or need partitioning?

2020-05-04 Thread Justin Pryzby

On Sun, May 03, 2020 at 11:51:44PM -0400, Arya F wrote:
> On Sun, May 3, 2020 at 11:46 PM Michael Lewis  wrote:
> > What kinds of storage (ssd or old 5400 rpm)? What else is this machine 
> > running?
> 
> Not an SSD, but an old 1TB 7200 RPM HDD
> 
> > What configs have been customized such as work_mem or random_page_cost?
> 
> work_mem = 2403kB
> random_page_cost = 1.1

I mentioned in February and March that you should plan to set shared_buffers
to fit the indexes currently being updated.

Partitioning can help with that *if* the writes mostly affect 1-2 partitions at
a time (otherwise not).

On Wed, Feb 05, 2020 at 11:15:48AM -0600, Justin Pryzby wrote:
> > Would that work? Or any recommendations how I can achieve good performance
> > for a lot of writes?
> 
> Can you use partitioning so the updates are mostly affecting only one table at
> once, and its indices are of reasonable size, such that they can fit easily in
> shared_buffers.

On Sun, Mar 22, 2020 at 08:29:04PM -0500, Justin Pryzby wrote:
> On Sun, Mar 22, 2020 at 09:22:50PM -0400, Arya F wrote:
> > I have noticed that my write/update performance starts to dramatically
> > reduce after about 10 million rows on my hardware. The reason for the
> > slowdown is the index updates on every write/update.
> 
> It's commonly true that the indexes need to fit entirely in shared_buffers for
> good write performance.  I gave some suggestions here:
> https://www.postgresql.org/message-id/20200223101209.GU31889%40telsasoft.com

Re: Recursive query slow on strange conditions

2020-05-04 Thread Jean-Christophe Boggio


Hello,

I have rewritten the function/query to make it a PLPGSQL function and 
split the query in ~20 smaller queries.


Now the problem of the JIT compiler kicking in also happens on PG 11.6
Although the 2 seconds induced delay is not a serious problem when I 
execute the query for thousands of items, it really becomes one when 
querying ONE item.


Is there a way to disable JIT (I use the apt.postgresql.org repository) 
in both 11.6 and 12.2 ? I would have liked to disable it on this 
particular query but maybe I could live with disabling JIT everywhere.


Thanks for your help,

JC

Re: Recursive query slow on strange conditions

2020-05-04 Thread David G. Johnston

On Mon, May 4, 2020 at 9:12 AM Jean-Christophe Boggio <
[email protected]> wrote:

> Is there a way to disable JIT (I use the apt.postgresql.org repository)
> in both 11.6 and 12.2 ? I would have liked to disable it on this
> particular query but maybe I could live with disabling JIT everywhere.
>
>
https://www.postgresql.org/docs/12/jit-decision.html

David J.

Re: NUMA settings

2020-05-04 Thread Andres Freund

Hi,

On 2020-04-29 10:50:54 +0200, Laurenz Albe wrote:
> On Wed, 2020-04-29 at 08:54 +0200, Marc Rechté wrote:
> > I am trying to figure out the recommended settings for a PG dedicated 
> > machine regarding NUMA.
> > 
> > I assume that the shared buffers are using Huge Phages only. Please 
> > correct if I am wrong:
> > 
> > 1) postgres is started with numactl --interleave=all, in order to spread 
> > memory pages evenly on nodes.
> > 2) wm.swappiness is left to the default 60 value, because Huge Pages 
> > never swap, and we wish the idle backend to be swapped out if necessary.
> > 3) vm.zone_reclaim_mode = 0. I am not sure it is the right choice.
> > 4) kernel.numa_balancing = 1. Only if it is confirmed that it will not 
> > affect postgres, because started with the interleave policy.
> 
> I am not the top expert on this, but as far as I can tell:
> 
> - Disabling NUMA is good if you want to run a single database cluster
>   on the machine that should use all resources.
> 
>   If you want to run several clusters that share the resources, leaving
>   NUMA support enabled might be the better thing to do.
> 
> - If you can, disable NUMA in the BIOS, on as low a level as possible.

I am doubtful that that's generally going to be beneficial. I think the
strategy of starting postgres with interleave is probably a better
answer.

- Andres

Re: Recursive query slow on strange conditions

2020-05-04 Thread Jean-Christophe Boggio


https://www.postgresql.org/docs/12/jit-decision.html


Thanks a lot David, I missed that part of the doc.

JC

Re: Duplicate WHERE condition changes performance and plan

2020-05-04 Thread Michael Lewis

Why not vacuum analyze both tables to ensure stats are up to date?

Have you customized default_statistics_target from 100? It may be that 250
would give you a more complete sample of the table without increasing the
size of the stats tables too much such that planning time increases hugely.

Do you know if any of these columns are correlated? Custom stats with
CREATE STATISTICS may help the planner make better decisions if so.

I usually hesitate to put any boolean field in an index. Do you need
the proposal.has_been_anonymised false values only, if so you could add
that to a WHERE condition on the index instead of including it as the
leading column.

Re: Please help! Query jumps from 1s -> 4m

2020-05-04 Thread Justin Pryzby

On Mon, May 04, 2020 at 08:07:07PM +0100, Jamie Thompson wrote:
> Additionally, the execution plans for the 10th + following queries look
> fine, they have the same structure as if I run the query manually. It's not
> that the query plan switches, it seems as though the same query plan is
> just > 200X slower than usual.

Are you able to reproduce the problem manually ?

With/without PREPARE ?
https://www.postgresql.org/docs/current/sql-prepare.html

-- 
Justin

Re: Please help! Query jumps from 1s -> 4m

2020-05-04 Thread David Rowley

On Mon, 4 May 2020 at 02:35, James Thompson  wrote:
> buffers do look different - but still, reading 42k doesn't seem like it would 
> cause a delay of 4m?

You could do: SET track_io_timing TO on;

then: EXPLAIN (ANALYZE, BUFFERS) your query and see if the time is
spent doing IO.

David

Re: Please help! Query jumps from 1s -> 4m

2020-05-04 Thread James Thompson

The change is abrupt, on the 10th execution (but I hadn't spotted it was
always after the same number of executions until your suggestion - thanks
for pointing me in that direction).

I don't see any custom configuration on our end that changes the threshold
for this from 5->10. Debugging the query call I also see that PgConnection
has the prepareThreshold set to 5.

Additionally, the execution plans for the 10th + following queries look
fine, they have the same structure as if I run the query manually. It's not
that the query plan switches, it seems as though the same query plan is
just > 200X slower than usual.

As for the heap fetches -> as far as I can tell, on both occasions the
fetches are relatively low and shouldn't account for minutes of execution
(even if one is lower than the other). Looking through one days logs I do
find cases with lower heap fetches too, for example as below which has 1977
fetches instead of the previous 6940 but took approx the same time:
->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual
time=56.858..120893.874 rows=67000 loops=1)
Index Cond: (col20 = $2005)
Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY
((ARRAY[$1004, ..., $2003])::text[])))
Rows Removed by Filter: 2662793
Heap Fetches: 1977
Buffers: shared hit=84574 read=3522

Would you agree the statement threshold / heap fetches seems unlikely to be
causing this? Any other thoughts?

Thanks!

On Sun, 3 May 2020 at 16:38, Justin Pryzby  wrote:

> On Sun, May 03, 2020 at 09:58:27AM +0100, James Thompson wrote:
> > Hi,
> >
> > Hoping someone can help with this performance issue that's been driving a
> > few of us crazy :-) Any guidance greatly appreciated.
> >
> > A description of what you are trying to achieve and what results you
> > expect.:
> >  - I'd like to get an understanding of why the following query (presented
> > in full, but there are specific parts that are confusing me) starts off
> > taking ~second in duration but 'switches' to taking over 4 minutes.
>
> Does it "switch" abruptly or do you get progressively slower queries ?
> If it's abrupt following the 5th execution, I guess you're hitting this:
>
>
> https://www.postgresql.org/message-id/[email protected]
>
> https://www.postgresql.org/message-id/a737b7a37273e048b164557adef4a58b50fb8...@ntex2010i.host.magwien.gv.at
>
> >  - we initially saw this behaviour for the exact same sql with a
> different
> > index that resulted in an index scan. To try and fix the issue we've
> > created an additional index with additional included fields so we now
> have
> > Index Only Scans, but are still seeing the same problem.
>
> >  Segments of interest:
> >  1. ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> > table1 table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual
> > time=110.539..123828.134 rows=67000 loops=1)
> > Index Cond: (col20 = $2005)
> > Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text =
> ANY
> > ((ARRAY[$1004, ..., $2003])::text[])))
> > Rows Removed by Filter: 2662652
> > Heap Fetches: 6940
> > Buffers: shared hit=46619 read=42784 written=52
>
> > If I run the same queries now:
> > Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> > table1alias1  (cost=0.56..2549.69 rows=69 width=36)
> > (actual time=1.017..1221.375 rows=67000 loops=1)
> > Heap Fetches: 24
> > Buffers: shared hit=2849 read=2483
>
> It looks to me like you're getting good performance following a vacuum,
> when
> Heap Fetches is low.  So you'd want to run vacuum more often, like:
> | ALTER TABLE table1 SET (autovacuum_vacuum_scale_factor=0.005).
>
> But maybe I've missed something - you showed the bad query plan, but not
> the
> good one, and I wonder if they may be subtly different, and that's maybe
> masked
> by the replaced identifiers.
>
> --
> Justin
>

good book or any other resources for Postgresql

2020-05-04 Thread Bhupendra Babu

Hi,

I am Oracle DBA for 20+ years and well verse with Oracle internal and all
related details, performance optimization , replication etc...
So I 'm looking for acquiring similar expertise for Postgresql.

Now I am using Aurora Postgresql and looking for excellent technical book
for Posgresql internal, optimizer and debugging technique , replication
internal related book. Or any other resources as appropriate.

Please suggest. Thanks in advance.

I saw few books:

1. PostgreSQL for DBA volume 1: Structure and Administration
ISBN-13: 978-1791794125
ISBN-10: 1791794122
2. PostgreSQL for DBA: PostgreSQL 12--
ISBN-13: 978-1796506044
ISBN-10: 1796506044
3. *Title*: The Art of PostgreSQL
*Author*: Dimitri Fontaine
4. PostgreSQL 11 Administration Cookbook
author: Simon Riggs, Gianni Ciolli, Et al

https://www.packtpub.com/big-data-and-business-intelligence/postgresql-11-administration-cookbook

Thanks.
Bhupendra B Babu

Re: good book or any other resources for Postgresql

2020-05-04 Thread Michael Lewis

I don't know the others, but have enjoyed and learned a great deal from The
Art of PostgreSQL.

>

Re: good book or any other resources for Postgresql

2020-05-04 Thread Craig Jackson

We are currently engaged in an Oracle to Postgres migration. Our DBA team
has been going through this book and we have learned a lot from it.

PostgreSQL 12 High Availability Cookbook - Third Edition
https://www.packtpub.com/data/postgresql-12-high-availability-cookbook-third-edition

On Mon, May 4, 2020 at 5:42 PM Michael Lewis  wrote:

> I don't know the others, but have enjoyed and learned a great deal from
> The Art of PostgreSQL.
>
>>

-- 
Craig

Re: good book or any other resources for Postgresql

2020-05-04 Thread Bhupendra Babu

Thanks all for suggestions.

On Mon, May 4, 2020 at 4:46 PM Craig Jackson 
wrote:

> We are currently engaged in an Oracle to Postgres migration. Our DBA team
> has been going through this book and we have learned a lot from it.
>
> PostgreSQL 12 High Availability Cookbook - Third Edition
>
> https://www.packtpub.com/data/postgresql-12-high-availability-cookbook-third-edition
>
> On Mon, May 4, 2020 at 5:42 PM Michael Lewis  wrote:
>
>> I don't know the others, but have enjoyed and learned a great deal from
>> The Art of PostgreSQL.
>>
>>>
>
> --
> Craig
>


-- 
Thanks.
Bhupendra B Babu

Re: NUMA settings

2020-05-04 Thread Marc Rechté


Hi,

On 2020-04-29 10:50:54 +0200, Laurenz Albe wrote:

On Wed, 2020-04-29 at 08:54 +0200, Marc Rechté wrote:

I am trying to figure out the recommended settings for a PG dedicated
machine regarding NUMA.

I assume that the shared buffers are using Huge Phages only. Please
correct if I am wrong:

1) postgres is started with numactl --interleave=all, in order to spread
memory pages evenly on nodes.
2) wm.swappiness is left to the default 60 value, because Huge Pages
never swap, and we wish the idle backend to be swapped out if necessary.
3) vm.zone_reclaim_mode = 0. I am not sure it is the right choice.
4) kernel.numa_balancing = 1. Only if it is confirmed that it will not
affect postgres, because started with the interleave policy.


I am not the top expert on this, but as far as I can tell:

- Disabling NUMA is good if you want to run a single database cluster
   on the machine that should use all resources.

   If you want to run several clusters that share the resources, leaving
   NUMA support enabled might be the better thing to do.

- If you can, disable NUMA in the BIOS, on as low a level as possible.


I am doubtful that that's generally going to be beneficial. I think the
strategy of starting postgres with interleave is probably a better
answer.

- Andres




Thanks for answers. Further readings make me think that we should *not* 
start postgres with numactl --interleave=all: this may have counter 
productive effect on backends anon memory (heap, stack). IMHO, what is 
important is to use Huge Pages for shared buffers: they are allocated 
(reserved) by the kernel at boot time and spread evenly on all nodes. On 
top of that they never swap.


My (temp) conclusions are following:
vm.zone_reclaim_mode = 0
kernel.numa_balancing = 0 (still not sure with that choice)
wm.swappiness = 60 (default)
start postgres as usual (no numactl)

Re: 600 million rows of data. Bad hardware or need partitioning?

Re: Recursive query slow on strange conditions

Re: Recursive query slow on strange conditions

Re: NUMA settings

Re: Recursive query slow on strange conditions

Re: Duplicate WHERE condition changes performance and plan

Re: Please help! Query jumps from 1s -> 4m

Re: Please help! Query jumps from 1s -> 4m

Re: Please help! Query jumps from 1s -> 4m

good book or any other resources for Postgresql

Re: good book or any other resources for Postgresql

Re: good book or any other resources for Postgresql

Re: good book or any other resources for Postgresql

Re: NUMA settings

14 matches

Site Navigation

Mail list logo

Footer information