Re: Performance loss after upgrading from 12.15 to 17.2

2025-02-03 Thread Laurenz Albe
On Mon, 2025-02-03 at 15:11 +0100, Tobias Orlamünde wrote:
> We are currently in the process of upgrading from 12.15 with Timescale 
> 2.11 to 17.2 with Timescale 2.17.2
> On our pre-prod env we have already upgraded and noticed a remarkable 
> performance issue vs. the so far not upgraded production environment.
> If we run the exact same query in our pre-rpod env, the execution time 
> increased from ~ 250 ms to over 377k ms.
> Changing random_page_cost from 1.1 to 4 does not help, but changing 
> work_mem from 64 MB to 256 MB helps, whereas any value lower than 256 MB 
> does not help. The prod server is configured with work_mem = 50 MB and 
> is executing the query in 13 ms
> 
> The Hypertable has been created on column received_time with dimension Time.
> 
> In general, we see, that, if we are lowering the filter period of 
> tick.received_time to ~ 14 days, the query is perfomant (on pre-prod). 
> In prod, neither in- nor de-creasing this does significantly change the 
> execution time.
> 
> My first assumption is, that somehow accessing the compressed chunks is 
> eating up all the time (which we could also see in the below's explain 
> statements). I somehow tend to point to the compression methods which, 
> IIRC, significantly changed from 12 to 17.
> 
> Maybe someone could have a look into this and guide me to the right spot 
> for further examination or even solving this issue?

The difference is here (there are two instances per query):

Bad:

  ->  Seq Scan on compress_hyper_6_106_chunk (cost=0.00..6428297.17 rows=2518 
width=321) (actual time=196292.784..196292.784 rows=0 loops=1)
Filter: ((_ts_meta_min_1 <= '2025-01-29 14:31:36'::timestamp without 
time zone) AND (_ts_meta_max_1 > '2025-01-01 14:31:36'::timestamp without time 
zone) AND ((xx_id)::text =
'XS2991917530'::text))
Rows Removed by Filter: 30492771
Buffers: shared read=5894720

Good:

  ->  Index Scan using 
compress_hyper_6_106_chunk_xx_id_xx_feed_id__ts_meta_min_1_idx on 
compress_hyper_6_106_chunk  (cost=0.56..1571.33 rows=2518 width=321) (actual 
time=0.010..0.010 rows=0 loops=1)
Index Cond: (((xx_id)::text = 'XS2991917530'::text) AND (_ts_meta_min_1 
<= '2025-01-29 14:31:36'::timestamp without time zone) AND (_ts_meta_max_1 > 
'2025-01-01 14:31:36'::timestamp without
time zone))
Buffers: shared hit=4

As a first measure, I would run

   ANALYZE compress_hyper_6_106_chunk;

or analyze the partitioned table.  It might well be that the statistics are off.

If that doesn't help, it would be interesting to run the query with the low 
"work_mem"
setting, but with "enable_seqscan = off".

- Does PostgreSQL choose the correct index then?
- What are the cost estimates for the index scan?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Performance loss after upgrading from 12.15 to 17.2

2025-02-03 Thread Tobias Orlamünde

Hi everyone.

We are currently in the process of upgrading from 12.15 with Timescale 
2.11 to 17.2 with Timescale 2.17.2
On our pre-prod env we have already upgraded and noticed a remarkable 
performance issue vs. the so far not upgraded production environment.
If we run the exact same query in our pre-rpod env, the execution time 
increased from ~ 250 ms to over 377k ms.
Changing random_page_cost from 1.1 to 4 does not help, but changing 
work_mem from 64 MB to 256 MB helps, whereas any value lower than 256 MB 
does not help. The prod server is configured with work_mem = 50 MB and 
is executing the query in 13 ms


The Hypertable has been created on column received_time with dimension Time.

In general, we see, that, if we are lowering the filter period of 
tick.received_time to ~ 14 days, the query is perfomant (on pre-prod). 
In prod, neither in- nor de-creasing this does significantly change the 
execution time.


My first assumption is, that somehow accessing the compressed chunks is 
eating up all the time (which we could also see in the below's explain 
statements). I somehow tend to point to the compression methods which, 
IIRC, significantly changed from 12 to 17.


Maybe someone could have a look into this and guide me to the right spot 
for further examination or even solving this issue?


Best,
Tobias

The query is as follows (I had to anonymize certain things, therefore I 
used xx):

select  ins.xx_id,
tick.exchtim,
tick.received_time,
tick.ask,
tick.ask_spread,
tick.ask_zsprd,
tick.bid,
tick.bid_spread,
tick.bid_zsprd,
tick.oas,
ins.currency
fromxx.xx_tick_data_entity tick,
xx.xx_reference_data_entity ins
where   ins.xx_id = 'XS2991917530' and
tick.xx_id='XS2991917530' and
tick.received_time <= '2025-01-29T14:31:36' and
tick.received_time > '2025-01-01T14:31:36'
order by tick.received_time desc limit 1;

Following settings are on the Hypertable:
pre-prod:
- policy_compression => compress_after 14 days
- policy_retention => drop_after 90 days

prod:
- policy_compression => compress_after 90 days
- policy_retention => drop_after 10 years


Config for the pre-prod env:
Server:
- VM running RHEL 9.5
- 8 cores
- 32 GB RAM
- 3 TB LVM (mounted as PGDATA) via 4x100 GBit/s NFS to VMware server) 
with XFS

postgresql.conf:
- max_connections = 100
- shared_buffers = 8 GB
- work_mem = 64 MB
- maintenance_work_mem = 2 GB
- effective_cache_size = 22 GB
- archive_mode = on
- random_page_cost = 1.1
- effective_io_concurrency = 200
- default_statistics_target = 200
- max_worker_processes = 16
- max_parallel_workers = 16
- max_parallel_workers_per_gather = 4
- timescaledb.max.background.workers = 25

Config for the prod env:
Server:
- VM running RHEL 9.4
- 16 cores
- 128 GB RAM
- 9 TB LVM (mounted as PGDATA) via 4x100 GBit/s NFS to VMware server) 
with XFS

postgresql.conf:
- max_connections = 250
- shared_buffers = 24 GB
- work_mem = 50 MB
- maintenance_work_mem = 2 GB
- effective_cache_size = 68 GB
- archive_mode = on
- random_page_cost = 1.1
- effective_io_concurrency = 200
- default_statistics_target = 200
- max_worker_processes = 32
- max_parallel_workers = 32
- max_parallel_workers_per_gather = 8
- timescaledb.max.background.workers = 16

Explain (analyze, buffers) on the pre-prod server with work_mem = 64 MB:
Limit  (cost=0.85..3.41 rows=1 width=89) (actual 
time=377174.493..377174.498 rows=0 loops=1)


   Buffers: shared hit=12 read=12328808

   ->  Nested Loop  (cost=0.85..13555863.74 rows=5293428 width=89) 
(actual time=377174.492..377174.497 rows=0 loops=1)


 Buffers: shared hit=12 read=12328808

 ->  Custom Scan (ChunkAppend) on xx_tick_data_entity tick 
(cost=0.56..13489693.39 rows=5293428 width=72) (actual 
time=377174.491..377174.495 rows=0 loops=1)


   Order: tick.received_time DESC

   Buffers: shared hit=12 read=12328808

   ->  Index Scan Backward using 
"101_28_xx_tick_data_entity_pkey" on _hyper_2_101_chunk tick_1 
(cost=0.56..3159.73 rows=2789 width=72) (actual time=0.009..0.009 rows=0 
loops=1)


 Index Cond: (((xx_id)::text = 
'XS2991917530'::text) AND (received_time <= '2025-01-29 
14:31:36'::timestamp without time zone) AND (received_time > '2025-01-01 
14:31:36'::timestamp without time zone))


 Buffers: shared hit=4

   ->  Index Scan Backward using 
"89_25_xx_tick_data_entity_pkey" on _hyper_2_89_chunk tick_2 
(cost=0.56..1856.80 rows=1639 width=72) (actual time=0.016..0.016 rows=0 
loops=1)


 Index Cond: (((xx_id)::text = 
'XS2991917530'::text) AND (received_time <= '2025-01