Re: Sudden insert performance degradation

2020-07-15 Thread Sebastian Dressler
Hi Henrique,

On 15. Jul 2020, at 03:13, Henrique Montenegro 
mailto:[email protected]>> wrote:
[...]

```
ssl = off
shared_buffers = 160GB  # min 128kB
work_mem = 96GB # min 64kB
maintenance_work_mem = 12GB # min 1MB
max_stack_depth = 4MB   # min 100kB
dynamic_shared_memory_type = posix  # the default is the first option
synchronous_commit = off# synchronization level;
commit_delay = 10   # range 0-10, in microseconds
max_wal_size = 3GB
min_wal_size = 1GB
min_parallel_index_scan_size = 64kB
effective_cache_size = 96GB
log_min_messages = debug1 # values in order of decreasing detail:
log_checkpoints = on
log_error_verbosity = verbose # terse, default, or verbose messages
log_line_prefix = '%m [%p] %q%u@%d '# special values:
log_lock_waits = on # log lock waits >= deadlock_timeout
log_timezone = 'America/New_York'
log_executor_stats = on
datestyle = 'iso, mdy'
```

[...]

  Limit  |  Time (seconds)
-|--
10   | 0.6
100  | 0.6
1000 | 1.3
1| 116.9
10   | 134.8
100  | 193.2

Notice the jump in time execution from a 1k limit to a 10k limit. Amount of
data raised 10x and execution time raised 100x.

It seems to me that inserting the data in this case is slow because the time
it takes to identify the duplicate records (which I assume would be done in a
fashion similiar to the queries above) is taking a long time.

I have attached the `explain analyze` output for the 1k and 10k queries to
this email (they are 4k+ lines each, didn't want to make this messager bigger
than it already is).

* exp1k.txt
* exp10k.txt

[...]

I quickly glanced at the exp10k plan and there are some things I noticed (sorry 
for not going over all the mail, have to re-read it again):

- There are a lot of partitions now, you maybe want consider reducing the 
amount. To me it seems that you overload the system. Scan times are low but the 
overhead to start a scan is likely quite high.
- work_mem = 96GB seems very high to me, I guess you'd be better with e.g. 4GB 
as a start but many more parallel workers. For instance, depending on your 
machine, try adjusting the max_worker_processes, max_parallel_workers and 
max_parallel_workers_per_gather. Values depend a bit on your system, make sure, 
that max_parallel_workers_per_gather are much lower than max_parallel_workers 
and that must be lower than max_worker_processes. You can try large values, for 
instance 128, 120, 12.
- You may want to test with min_parallel_table_scan_size = 0
- Did you enable partition pruning, partitionwise join and aggregate?

Thanks,
Sebastian

--

Sebastian Dressler, Solution Architect
+49 30 994 0496 72 | [email protected]

Swarm64 AS
Parkveien 41 B | 0258 Oslo | Norway
Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787
CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender 
(Styrets Leder): Dr. Sverre Munck

Swarm64 AS Zweigstelle Hive
Ullsteinstr. 120 | 12109 Berlin | Germany
Registered at Amtsgericht Charlottenburg - HRB 154382 B



Re: Sudden insert performance degradation

2020-07-15 Thread Henrique Montenegro
On Wed, Jul 15, 2020 at 4:03 AM Sebastian Dressler 
wrote:

> Hi Henrique,
>
> On 15. Jul 2020, at 03:13, Henrique Montenegro  wrote:
> [...]
>
> ```
> ssl = off
> shared_buffers = 160GB  # min 128kB
> work_mem = 96GB # min 64kB
> maintenance_work_mem = 12GB # min 1MB
> max_stack_depth = 4MB   # min 100kB
> dynamic_shared_memory_type = posix  # the default is the first option
> synchronous_commit = off# synchronization level;
> commit_delay = 10   # range 0-10, in microseconds
> max_wal_size = 3GB
> min_wal_size = 1GB
> min_parallel_index_scan_size = 64kB
> effective_cache_size = 96GB
> log_min_messages = debug1 # values in order of decreasing detail:
> log_checkpoints = on
> log_error_verbosity = verbose # terse, default, or verbose messages
> log_line_prefix = '%m [%p] %q%u@%d '# special values:
> log_lock_waits = on # log lock waits >=
> deadlock_timeout
> log_timezone = 'America/New_York'
> log_executor_stats = on
> datestyle = 'iso, mdy'
> ```
>
> [...]
>
>   Limit  |  Time (seconds)
> -|--
> 10   | 0.6
> 100  | 0.6
> 1000 | 1.3
> 1| 116.9
> 10   | 134.8
> 100  | 193.2
>
> Notice the jump in time execution from a 1k limit to a 10k limit. Amount of
> data raised 10x and execution time raised 100x.
>
> It seems to me that inserting the data in this case is slow because the
> time
> it takes to identify the duplicate records (which I assume would be done
> in a
> fashion similiar to the queries above) is taking a long time.
>
> I have attached the `explain analyze` output for the 1k and 10k queries to
> this email (they are 4k+ lines each, didn't want to make this messager
> bigger
> than it already is).
>
> * exp1k.txt
> * exp10k.txt
>
> [...]
>
>
> I quickly glanced at the exp10k plan and there are some things I noticed
> (sorry for not going over all the mail, have to re-read it again):
>
> - There are a lot of partitions now, you maybe want consider reducing the
> amount. To me it seems that you overload the system. Scan times are low but
> the overhead to start a scan is likely quite high.
> - work_mem = 96GB seems very high to me, I guess you'd be better with e.g.
> 4GB as a start but many more parallel workers. For instance, depending on
> your machine, try adjusting the max_worker_processes, max_parallel_workers
> and max_parallel_workers_per_gather. Values depend a bit on your system,
> make sure, that max_parallel_workers_per_gather are much lower than
> max_parallel_workers and that must be lower than max_worker_processes. You
> can try large values, for instance 128, 120, 12.
> - You may want to test with min_parallel_table_scan_size = 0
> - Did you enable partition pruning, partitionwise join and aggregate?
>
> Thanks,
> Sebastian
>
> --
>
> Sebastian Dressler, Solution Architect
> +49 30 994 0496 72 | [email protected]
>
> Swarm64 AS
> Parkveien 41 B | 0258 Oslo | Norway
> Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787
> CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender
> (Styrets Leder): Dr. Sverre Munck
>
> Swarm64 AS Zweigstelle Hive
> Ullsteinstr. 120 | 12109 Berlin | Germany
> Registered at Amtsgericht Charlottenburg - HRB 154382 B
>
>
Hi Sebastian,

That is a good idea about the parallel workers. I have tried to update them
and will post the results as soon as I have them.
Regarding the partition pruning it is set to the default (which is on).
partitionwise_join and partitionwise_aggregate are both set to off. I will
turn them on as well and see how it goes.

Thanks for the suggestions! I will keep the list updated.

Henrique


Re: Sudden insert performance degradation

2020-07-15 Thread Henrique Montenegro
On Wed, Jul 15, 2020 at 8:24 AM Henrique Montenegro 
wrote:

>
>
> On Wed, Jul 15, 2020 at 4:03 AM Sebastian Dressler 
> wrote:
>
>> Hi Henrique,
>>
>> On 15. Jul 2020, at 03:13, Henrique Montenegro  wrote:
>> [...]
>>
>> ```
>> ssl = off
>> shared_buffers = 160GB  # min 128kB
>> work_mem = 96GB # min 64kB
>> maintenance_work_mem = 12GB # min 1MB
>> max_stack_depth = 4MB   # min 100kB
>> dynamic_shared_memory_type = posix  # the default is the first option
>> synchronous_commit = off# synchronization level;
>> commit_delay = 10   # range 0-10, in microseconds
>> max_wal_size = 3GB
>> min_wal_size = 1GB
>> min_parallel_index_scan_size = 64kB
>> effective_cache_size = 96GB
>> log_min_messages = debug1 # values in order of decreasing detail:
>> log_checkpoints = on
>> log_error_verbosity = verbose # terse, default, or verbose messages
>> log_line_prefix = '%m [%p] %q%u@%d '# special values:
>> log_lock_waits = on # log lock waits >=
>> deadlock_timeout
>> log_timezone = 'America/New_York'
>> log_executor_stats = on
>> datestyle = 'iso, mdy'
>> ```
>>
>> [...]
>>
>>   Limit  |  Time (seconds)
>> -|--
>> 10   | 0.6
>> 100  | 0.6
>> 1000 | 1.3
>> 1| 116.9
>> 10   | 134.8
>> 100  | 193.2
>>
>> Notice the jump in time execution from a 1k limit to a 10k limit. Amount
>> of
>> data raised 10x and execution time raised 100x.
>>
>> It seems to me that inserting the data in this case is slow because the
>> time
>> it takes to identify the duplicate records (which I assume would be done
>> in a
>> fashion similiar to the queries above) is taking a long time.
>>
>> I have attached the `explain analyze` output for the 1k and 10k queries to
>> this email (they are 4k+ lines each, didn't want to make this messager
>> bigger
>> than it already is).
>>
>> * exp1k.txt
>> * exp10k.txt
>>
>> [...]
>>
>>
>> I quickly glanced at the exp10k plan and there are some things I noticed
>> (sorry for not going over all the mail, have to re-read it again):
>>
>> - There are a lot of partitions now, you maybe want consider reducing the
>> amount. To me it seems that you overload the system. Scan times are low but
>> the overhead to start a scan is likely quite high.
>> - work_mem = 96GB seems very high to me, I guess you'd be better with
>> e.g. 4GB as a start but many more parallel workers. For instance, depending
>> on your machine, try adjusting the max_worker_processes,
>> max_parallel_workers and max_parallel_workers_per_gather. Values depend a
>> bit on your system, make sure, that max_parallel_workers_per_gather are
>> much lower than max_parallel_workers and that must be lower than
>> max_worker_processes. You can try large values, for instance 128, 120, 12.
>> - You may want to test with min_parallel_table_scan_size = 0
>> - Did you enable partition pruning, partitionwise join and aggregate?
>>
>> Thanks,
>> Sebastian
>>
>> --
>>
>> Sebastian Dressler, Solution Architect
>> +49 30 994 0496 72 | [email protected]
>>
>> Swarm64 AS
>> Parkveien 41 B | 0258 Oslo | Norway
>> Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662
>> 787
>> CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender
>> (Styrets Leder): Dr. Sverre Munck
>>
>> Swarm64 AS Zweigstelle Hive
>> Ullsteinstr. 120 | 12109 Berlin | Germany
>> Registered at Amtsgericht Charlottenburg - HRB 154382 B
>>
>>
> Hi Sebastian,
>
> That is a good idea about the parallel workers. I have tried to update
> them and will post the results as soon as I have them.
> Regarding the partition pruning it is set to the default (which is on).
> partitionwise_join and partitionwise_aggregate are both set to off. I will
> turn them on as well and see how it goes.
>
> Thanks for the suggestions! I will keep the list updated.
>
> Henrique
>
>
Changing those parameters had almost no effect in the performance. I just
executed the following `SELECT` again:

```
explain analyze
select user_id from users_basic_profile_no_dups_partitioned
where
user_id in (
select user_id from users_basic_profile order by user_id
);
```

I am looking at the plan and seeing things like this:

```
 Index Only Scan using ubp_from_1000367_to_1000368_pkey on
ubp_from_1000367_to_1000368 (cost=0.42..1.99 rows=1
width=8) (actual time=3.276..3.276 rows=1 loops=611)

Index Cond: (user_id = users_basic_profile.user_id)
Heap Fetches: 0
Buffers: shared hit=1,688 read=146
```

Any idea why the actual time is in the 3ms range? If I query that partition
directly, like this:

```
explain analyze select user_id from
ubp_from_1000367_to_1000368 where user_id in (select
user_id from users_basic_profile order by user_id);
```

I get this:

```
->  Index Only Scan using
ubp_from_1000367_to_1000368_pkey on
ubp

Re: Sudden insert performance degradation

2020-07-15 Thread Justin Pryzby
On Wed, Jul 15, 2020 at 02:49:16PM -0400, Henrique Montenegro wrote:
> Any idea why the actual time is in the 3ms range? If I query that partition
> directly, like this:
> 
> As you can see, the `actual_time` when querying the partition table
> directly goes to 0.002 which is almost 2000x faster.

Because querying parents of 1000s of tables is slow.
That's improved in v12.  You can read a previous discussion about it here:
https://www.postgresql.org/message-id/[email protected]

But I think you need to know more about partitioning.  It doesn't magically
make things faster for you, and if you just guess, then it's likely to perform
worse for reading and/or writing.

Partitioning only helps for INSERTs if nearly all the insertions happening at a
given time go into a small number of partitions.  Like inserting data
partitioned by "timestamp", where all the new data goes into a partition for
the current date.  Otherwise instead of one gigantic index which doesn't fit in
shared_buffers or RAM, you have some hundreds of indexes which also don't
simultaneously fit into RAM.  That doesn't help writes, and hurts planning
time.

-- 
Justin