Re: Extremely slow HashAggregate in simple UNION query

2019-08-22 Thread Felix Geisendörfer



> On 21. Aug 2019, at 20:26, Jeff Janes  wrote:
> 
> As noted elsewhere, v12 thwarts your attempts to deliberately design the bad 
> estimates.  You can still get them, you just have to work a bit harder at it:
> 
> CREATE FUNCTION j (bigint, bigint) returns setof bigint as $$ select 
> generate_series($1,$2) $$ rows 1000 language sql;

Yeah, that's awesome! I didn't know about this until I ran into this issue, 
I'll definitely be using it for future estimation problems that are difficult 
to fix otherwise!

> I've made an extension which has a function which always returns true, but 
> lies about how often it is expected to return true. See the attachment.  With 
> that, you can fine-tune the planner.
> 
> CREATE EXTENSION pg_selectivities ;

Very cool and useful : )!

I think in most cases I'll be okay with declaring a function with a static ROWS 
estimate, but I'll consider your extension if I need more flexibility in the 
future!

Thanks
Felix



Re: Erratically behaving query needs optimization

2019-08-22 Thread Luís Roberto Weck

Em 22/08/2019 08:51, Barbu Paul - Gheorghe escreveu:

That query, if I add the ORDER BY and LIMIT, returns the same results.

The problem is the fact that it behaves the same way regarding its
speed as the original query with the index you suggested.
Sometimes it takes 800ms, sometimes it takes 6s to run, how the hell
can I get it to behave the same every time?
After I added the index you suggested, it was fine for a while, next
morning the run time exploded back to several seconds per query... and
it oscillates.

On Wed, Aug 21, 2019 at 2:25 PM Luís Roberto Weck
 wrote:

Em 21/08/2019 04:30, Barbu Paul - Gheorghe escreveu:

I wonder how I missed that... probabily because of the "WHERE" clause
in what I already had.

I indexed by scheduler_task_executions.device_id and the new plan is
as follows: https://explain.depesz.com/s/cQRq

Can it be further improved?

Limit  (cost=138511.45..138519.36 rows=2 width=54) (actual
time=598.703..618.524 rows=2 loops=1)
Buffers: shared hit=242389 read=44098
->  Unique  (cost=138511.45..138527.26 rows=4 width=54) (actual
time=598.701..598.878 rows=2 loops=1)
  Buffers: shared hit=242389 read=44098
  ->  Sort  (cost=138511.45..138519.36 rows=3162 width=54)
(actual time=598.699..598.767 rows=2052 loops=1)
Sort Key: results.attribute_id, results."timestamp" DESC
Sort Method: quicksort  Memory: 641kB
Buffers: shared hit=242389 read=44098
->  Gather  (cost=44082.11..138327.64 rows=3162
width=54) (actual time=117.548..616.456 rows=4102 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=242389 read=44098
  ->  Nested Loop  (cost=43082.11..137011.44
rows=1318 width=54) (actual time=47.436..525.664 rows=1367 loops=3)
Buffers: shared hit=242389 read=44098
->  Parallel Hash Join
(cost=43081.68..124545.34 rows=24085 width=4) (actual
time=33.099..469.958 rows=19756 loops=3)
  Hash Cond:
(scheduler_operation_executions.task_execution_id =
scheduler_task_executions.id)
  Buffers: shared hit=32518 read=44098
  ->  Parallel Seq Scan on
scheduler_operation_executions  (cost=0.00..74945.82 rows=2482982
width=8) (actual time=8.493..245.190 rows=1986887 loops=3)
Buffers: shared hit=6018 read=44098
  ->  Parallel Hash
(cost=42881.33..42881.33 rows=16028 width=4) (actual
time=23.272..23.272 rows=13558 loops=3)
Buckets: 65536  Batches: 1
Memory Usage: 2112kB
Buffers: shared hit=26416
->  Parallel Bitmap Heap Scan on
scheduler_task_executions  (cost=722.55..42881.33 rows=16028 width=4)
(actual time=27.290..61.563 rows=40675 loops=1)
  Recheck Cond: (device_id = 97)
  Heap Blocks: exact=26302
  Buffers: shared hit=26416
  ->  Bitmap Index Scan on
scheduler_task_executions_device_id_idx  (cost=0.00..712.93 rows=38467
width=0) (actual time=17.087..17.087 rows=40675 loops=1)
Index Cond: (device_id = 97)
Buffers: shared hit=114
->  Index Scan using
index_operation_execution_id_asc on results  (cost=0.43..0.51 rows=1
width=58) (actual time=0.003..0.003 rows=0 loops=59269)
  Index Cond: (operation_execution_id =
scheduler_operation_executions.id)
  Filter: ((data IS NOT NULL) AND (data
<> ''::text) AND (attribute_id = ANY ('{4,5}'::integer[]))
AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text))
  Rows Removed by Filter: 0
  Buffers: shared hit=209871
Planning Time: 2.327 ms
Execution Time: 618.935 ms

On Tue, Aug 20, 2019 at 5:54 PM Luís Roberto Weck
 wrote:

Em 20/08/2019 10:54, Barbu Paul - Gheorghe escreveu:

Hello,
I'm running "PostgreSQL 11.2, compiled by Visual C++ build 1914,
64-bit" and I have a query that runs several times per user action
(9-10 times).
The query takes a long time to execute, specially at first, due to
cold caches I think, but the performance varies greatly during a run
of the application (while applying the said action by the user several
times).

My tables are only getting bigger with time, not much DELETEs and even
less UPDATEs as far as I can tell.

Problematic query:

EXPLAIN (ANALYZE,BUFFERS)
SELECT DISTINCT ON (results.attribute_id) results.timestamp,
results.data FROM results
   JOIN scheduler_opera

Extremely slow count (simple query, with index)

2019-08-22 Thread Marco Colli
Hello!

Any help would be greatly appreciated.
I need to run these simple queries on a table with millions of rows:

```
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
123;
```

```
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
123 AND "subscriptions"."trashed_at" IS NULL;
```

The count result for both queries, for project 123, is about 5M.

I have an index in place on `project_id`, and also another index on
`(project_id, trashed_at)`:

```
"index_subscriptions_on_project_id_and_created_at" btree (project_id,
created_at DESC)
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id,
trashed_at DESC)
```

The problem is that both queries are extremely slow and take about 17s each.

These are the results of `EXPLAIN ANALIZE`:


```
  QUERY PLAN

--
 Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual
time=17342.420..17342.420 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23
rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
 Recheck Cond: (project_id = 123)
 Rows Removed by Index Recheck: 23746378
 Heap Blocks: exact=131205 lossy=1480411
 ->  Bitmap Index Scan on
index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74
rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
   Index Cond: (project_id = 123)
 Planning time: 0.090 ms
 Execution time: 17344.182 ms
(9 rows)
```


```
  QUERY PLAN

--
 Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual
time=17557.218..17557.218 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19
rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
 Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Rows Removed by Index Recheck: 23746273
 Heap Blocks: exact=131144 lossy=1480409
 ->  Bitmap Index Scan on
index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55
rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
   Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Planning time: 0.084 ms
 Execution time: 17558.522 ms
(9 rows)
```

What is the problem?
What can I do to improve the performance (i.e. count in a few seconds)?

I have also tried to increase work_mem from 16MB to 128MB without any
improvement.
Even an approximate count would be enough.
Postgresql v9.5


Re: Extremely slow count (simple query, with index)

2019-08-22 Thread MichaelDBA

Hi Marco,

Since you said approximates would be good enough, there are two ways to 
do that.  Query pg_class.reltuples or pg_stat_user_tables.n_live_tup. 
Personally, I prefer the pg_stat_user tables since it is more current 
than pg_class table, unless you run ANALYZE on your target table before 
querying pg_class table.  Then of course you get results in a few 
milliseconds since you do not incur the tablescan cost of selecting 
directly from the target table.


Regards,
Michael Vitale


Marco Colli wrote on 8/22/2019 8:44 AM:

Hello!

Any help would be greatly appreciated.
I need to run these simple queries on a table with millions of rows:

```
SELECT COUNT(*) FROM "subscriptions" WHERE 
"subscriptions"."project_id" = 123;

```

```
SELECT COUNT(*) FROM "subscriptions" WHERE 
"subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS 
NULL;

```

The count result for both queries, for project 123, is about 5M.

I have an index in place on `project_id`, and also another index on 
`(project_id, trashed_at)`:


```
"index_subscriptions_on_project_id_and_created_at" btree (project_id, 
created_at DESC)
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id, 
trashed_at DESC)

```

The problem is that both queries are extremely slow and take about 17s 
each.


These are the results of `EXPLAIN ANALIZE`:


```
QUERY PLAN
--
 Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual 
time=17342.420..17342.420 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 
rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 
loops=1)

       Recheck Cond: (project_id = 123)
         Rows Removed by Index Recheck: 23746378
         Heap Blocks: exact=131205 lossy=1480411
         ->  Bitmap Index Scan on 
index_subscriptions_on_project_id_and_trashed_at 
 (cost=0.00..198324.74 rows=4996823 width=0) (actual 
time=1582.717..1582.717 rows=4994877 loops=1)

               Index Cond: (project_id = 123)
 Planning time: 0.090 ms
 Execution time: 17344.182 ms
(9 rows)
```


```
    QUERY PLAN
--
 Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual 
time=17557.218..17557.218 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 
rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 
loops=1)

       Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
       Rows Removed by Index Recheck: 23746273
         Heap Blocks: exact=131144 lossy=1480409
         ->  Bitmap Index Scan on 
index_subscriptions_on_project_id_and_trashed_at 
 (cost=0.00..186846.55 rows=4428599 width=0) (actual 
time=1566.163..1566.163 rows=4994749 loops=1)

 Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Planning time: 0.084 ms
 Execution time: 17558.522 ms
(9 rows)
```

What is the problem?
What can I do to improve the performance (i.e. count in a few seconds)?

I have also tried to increase work_mem from 16MB to 128MB without any 
improvement.

Even an approximate count would be enough.
Postgresql v9.5





Re: Extremely slow count (simple query, with index)

2019-08-22 Thread Ravikumar Reddy
Dear ,

Create the below indexes and try it !!!

create index ind_ subscriptions_ project_id   on
"subscriptions"("project_id")
Where "project_id"= 1

create index ind_ subscriptions_ trashed_at   on "subscriptions"("
trashed_at  ")
Where "trashed_at" is null



On Thu, Aug 22, 2019 at 6:36 PM Marco Colli  wrote:

> Hello!
>
> Any help would be greatly appreciated.
> I need to run these simple queries on a table with millions of rows:
>
> ```
> SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
> 123;
> ```
>
> ```
> SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
> 123 AND "subscriptions"."trashed_at" IS NULL;
> ```
>
> The count result for both queries, for project 123, is about 5M.
>
> I have an index in place on `project_id`, and also another index on
> `(project_id, trashed_at)`:
>
> ```
> "index_subscriptions_on_project_id_and_created_at" btree (project_id,
> created_at DESC)
> "index_subscriptions_on_project_id_and_trashed_at" btree (project_id,
> trashed_at DESC)
> ```
>
> The problem is that both queries are extremely slow and take about 17s
> each.
>
> These are the results of `EXPLAIN ANALIZE`:
>
>
> ```
>   QUERY PLAN
>
>
> --
>  Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual
> time=17342.420..17342.420 rows=1 loops=1)
>->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23
> rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
>  Recheck Cond: (project_id = 123)
>  Rows Removed by Index Recheck: 23746378
>  Heap Blocks: exact=131205 lossy=1480411
>  ->  Bitmap Index Scan on
> index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74
> rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
>Index Cond: (project_id = 123)
>  Planning time: 0.090 ms
>  Execution time: 17344.182 ms
> (9 rows)
> ```
>
>
> ```
>   QUERY PLAN
>
>
> --
>  Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual
> time=17557.218..17557.218 rows=1 loops=1)
>->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19
> rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
>  Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
>  Rows Removed by Index Recheck: 23746273
>  Heap Blocks: exact=131144 lossy=1480409
>  ->  Bitmap Index Scan on
> index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55
> rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
>Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
>  Planning time: 0.084 ms
>  Execution time: 17558.522 ms
> (9 rows)
> ```
>
> What is the problem?
> What can I do to improve the performance (i.e. count in a few seconds)?
>
> I have also tried to increase work_mem from 16MB to 128MB without any
> improvement.
> Even an approximate count would be enough.
> Postgresql v9.5
>
>

-- 
*Regards,*
*Ravikumar S,*
*Ph: 8106741263*


Re: Erratically behaving query needs optimization

2019-08-22 Thread Barbu Paul - Gheorghe
That query, if I add the ORDER BY and LIMIT, returns the same results.

The problem is the fact that it behaves the same way regarding its
speed as the original query with the index you suggested.
Sometimes it takes 800ms, sometimes it takes 6s to run, how the hell
can I get it to behave the same every time?
After I added the index you suggested, it was fine for a while, next
morning the run time exploded back to several seconds per query... and
it oscillates.

On Wed, Aug 21, 2019 at 2:25 PM Luís Roberto Weck
 wrote:
>
> Em 21/08/2019 04:30, Barbu Paul - Gheorghe escreveu:
> > I wonder how I missed that... probabily because of the "WHERE" clause
> > in what I already had.
> >
> > I indexed by scheduler_task_executions.device_id and the new plan is
> > as follows: https://explain.depesz.com/s/cQRq
> >
> > Can it be further improved?
> >
> > Limit  (cost=138511.45..138519.36 rows=2 width=54) (actual
> > time=598.703..618.524 rows=2 loops=1)
> >Buffers: shared hit=242389 read=44098
> >->  Unique  (cost=138511.45..138527.26 rows=4 width=54) (actual
> > time=598.701..598.878 rows=2 loops=1)
> >  Buffers: shared hit=242389 read=44098
> >  ->  Sort  (cost=138511.45..138519.36 rows=3162 width=54)
> > (actual time=598.699..598.767 rows=2052 loops=1)
> >Sort Key: results.attribute_id, results."timestamp" DESC
> >Sort Method: quicksort  Memory: 641kB
> >Buffers: shared hit=242389 read=44098
> >->  Gather  (cost=44082.11..138327.64 rows=3162
> > width=54) (actual time=117.548..616.456 rows=4102 loops=1)
> >  Workers Planned: 2
> >  Workers Launched: 2
> >  Buffers: shared hit=242389 read=44098
> >  ->  Nested Loop  (cost=43082.11..137011.44
> > rows=1318 width=54) (actual time=47.436..525.664 rows=1367 loops=3)
> >Buffers: shared hit=242389 read=44098
> >->  Parallel Hash Join
> > (cost=43081.68..124545.34 rows=24085 width=4) (actual
> > time=33.099..469.958 rows=19756 loops=3)
> >  Hash Cond:
> > (scheduler_operation_executions.task_execution_id =
> > scheduler_task_executions.id)
> >  Buffers: shared hit=32518 read=44098
> >  ->  Parallel Seq Scan on
> > scheduler_operation_executions  (cost=0.00..74945.82 rows=2482982
> > width=8) (actual time=8.493..245.190 rows=1986887 loops=3)
> >Buffers: shared hit=6018 read=44098
> >  ->  Parallel Hash
> > (cost=42881.33..42881.33 rows=16028 width=4) (actual
> > time=23.272..23.272 rows=13558 loops=3)
> >Buckets: 65536  Batches: 1
> > Memory Usage: 2112kB
> >Buffers: shared hit=26416
> >->  Parallel Bitmap Heap Scan on
> > scheduler_task_executions  (cost=722.55..42881.33 rows=16028 width=4)
> > (actual time=27.290..61.563 rows=40675 loops=1)
> >  Recheck Cond: (device_id = 97)
> >  Heap Blocks: exact=26302
> >  Buffers: shared hit=26416
> >  ->  Bitmap Index Scan on
> > scheduler_task_executions_device_id_idx  (cost=0.00..712.93 rows=38467
> > width=0) (actual time=17.087..17.087 rows=40675 loops=1)
> >Index Cond: (device_id = 
> > 97)
> >Buffers: shared hit=114
> >->  Index Scan using
> > index_operation_execution_id_asc on results  (cost=0.43..0.51 rows=1
> > width=58) (actual time=0.003..0.003 rows=0 loops=59269)
> >  Index Cond: (operation_execution_id =
> > scheduler_operation_executions.id)
> >  Filter: ((data IS NOT NULL) AND (data
> > <> ''::text) AND (attribute_id = ANY ('{4,5}'::integer[]))
> > AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text))
> >  Rows Removed by Filter: 0
> >  Buffers: shared hit=209871
> > Planning Time: 2.327 ms
> > Execution Time: 618.935 ms
> >
> > On Tue, Aug 20, 2019 at 5:54 PM Luís Roberto Weck
> >  wrote:
> >> Em 20/08/2019 10:54, Barbu Paul - Gheorghe escreveu:
> >>> Hello,
> >>> I'm running "PostgreSQL 11.2, compiled by Visual C++ build 1914,
> >>> 64-bit" and I have a query that runs several times per user action
> >>> (9-10 times).
> >>> The query takes a long time to execute, specially at first, due to
> >>> cold caches I think, but the performance varies greatly during a run
> >>> of the application (while applying the said action by the user several
> >>> times).
> >>>
> >>> My tables 

Re: Erratically behaving query needs optimization

2019-08-22 Thread Barbu Paul - Gheorghe
with ORDER BY so I get the correct results (163 seconds):
https://explain.depesz.com/s/j3o1

Unique  (cost=164620.19..164650.19 rows=4 width=54) (actual
time=163953.091..163954.621 rows=2 loops=1)
  Buffers: shared hit=183080 read=103411
  ->  Sort  (cost=164620.19..164635.19 rows=5999 width=54) (actual
time=163953.081..163953.570 rows=4102 loops=1)
Sort Key: results.attribute_id, results."timestamp" DESC
Sort Method: quicksort  Memory: 641kB
Buffers: shared hit=183080 read=103411
->  Nested Loop  (cost=132172.41..164243.74 rows=5999
width=54) (actual time=3054.965..163928.686 rows=4102 loops=1)
  Buffers: shared hit=183074 read=103411
  ->  HashAggregate  (cost=132171.98..132779.88 rows=60790
width=4) (actual time=2484.449..2581.582 rows=59269 loops=1)
Group Key: scheduler_operation_executions.id
Buffers: shared hit=87 read=76529
->  Gather  (cost=44474.33..132020.01 rows=60790
width=4) (actual time=312.503..2463.254 rows=59269 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=87 read=76529
  ->  Parallel Hash Join
(cost=43474.33..124941.01 rows=25329 width=4) (actual
time=124.733..2279.986 rows=19756 loops=3)
Hash Cond:
(scheduler_operation_executions.task_execution_id =
scheduler_task_executions.id)
Buffers: shared hit=87 read=76529
->  Parallel Seq Scan on
scheduler_operation_executions  (cost=0.00..74948.21 rows=2483221
width=8) (actual time=0.126..1828.461 rows=1986887 loops=3)
  Buffers: shared hit=2 read=50114
->  Parallel Hash
(cost=43263.67..43263.67 rows=16853 width=4) (actual
time=123.631..123.631 rows=13558 loops=3)
  Buckets: 65536  Batches: 1
Memory Usage: 2144kB
  Buffers: shared hit=1 read=26415
  ->  Parallel Bitmap Heap Scan on
scheduler_task_executions  (cost=757.90..43263.67 rows=16853 width=4)
(actual time=6.944..120.405 rows=13558 loops=3)
Recheck Cond: (device_id = 97)
Heap Blocks: exact=24124
Buffers: shared hit=1 read=26415
->  Bitmap Index Scan on
scheduler_task_executions_device_id_idx  (cost=0.00..747.79 rows=40448
width=0) (actual time=13.378..13.378 rows=40675 loops=1)
  Index Cond: (device_id = 97)
  Buffers: shared read=114
  ->  Index Scan using index_operation_execution_id_asc on
results  (cost=0.43..0.51 rows=1 width=58) (actual time=2.720..2.720
rows=0 loops=59269)
Index Cond: (operation_execution_id =
scheduler_operation_executions.id)
Filter: ((data IS NOT NULL) AND (data <>
''::text) AND (attribute_id = ANY ('{4,5}'::integer[])) AND
(object_id = 1955) AND (data_access_result = 'SUCCESS'::text))
Rows Removed by Filter: 0
Buffers: shared hit=182987 read=26882
Planning Time: 349.908 ms
Execution Time: 163962.314 ms


With ORDER BY (on the second run, 0.6 seconds):
https://explain.depesz.com/s/QZ1Z
Unique  (cost=164620.19..164650.19 rows=4 width=54) (actual
time=621.057..621.527 rows=2 loops=1)
  Buffers: shared hit=236659 read=49826
  ->  Sort  (cost=164620.19..164635.19 rows=5999 width=54) (actual
time=621.056..621.188 rows=4102 loops=1)
Sort Key: results.attribute_id, results."timestamp" DESC
Sort Method: quicksort  Memory: 641kB
Buffers: shared hit=236659 read=49826
->  Nested Loop  (cost=132172.41..164243.74 rows=5999
width=54) (actual time=503.577..619.250 rows=4102 loops=1)
  Buffers: shared hit=236659 read=49826
  ->  HashAggregate  (cost=132171.98..132779.88 rows=60790
width=4) (actual time=503.498..513.551 rows=59269 loops=1)
Group Key: scheduler_operation_executions.id
Buffers: shared hit=26790 read=49826
->  Gather  (cost=44474.33..132020.01 rows=60790
width=4) (actual time=65.499..489.396 rows=59269 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=26790 read=49826
  ->  Parallel Hash Join
(cost=43474.33..124941.01 rows=25329 width=4) (actual
time=22.059..441.847 rows=19756 loops=3)
Hash Cond:
(scheduler_operation_executions.task_execution_id =
scheduler_task_executions.id)
Buffers: shared hit=26790 read=4982

Re: Erratically behaving query needs optimization

2019-08-22 Thread Barbu Paul - Gheorghe
If I restart the PostgreSQL server, then the performance is bad,
several seconds to one or two hundred seconds.
This is reflected in the "buffers read" indicator, which is >0 when
performance is bad for the first "Index Scan using
index_operation_execution_id_asc on
results".

Probably this explains the oscillations in running time as well?
Cache gets filled after the first run, hence the performance improves,
then as the system runs, the cache gets dirty and performance for this
particular query degrades again.

On Thu, Aug 22, 2019 at 3:22 PM Barbu Paul - Gheorghe
 wrote:
>
> with ORDER BY so I get the correct results (163 seconds):
> https://explain.depesz.com/s/j3o1
>
> Unique  (cost=164620.19..164650.19 rows=4 width=54) (actual
> time=163953.091..163954.621 rows=2 loops=1)
>   Buffers: shared hit=183080 read=103411
>   ->  Sort  (cost=164620.19..164635.19 rows=5999 width=54) (actual
> time=163953.081..163953.570 rows=4102 loops=1)
> Sort Key: results.attribute_id, results."timestamp" DESC
> Sort Method: quicksort  Memory: 641kB
> Buffers: shared hit=183080 read=103411
> ->  Nested Loop  (cost=132172.41..164243.74 rows=5999
> width=54) (actual time=3054.965..163928.686 rows=4102 loops=1)
>   Buffers: shared hit=183074 read=103411
>   ->  HashAggregate  (cost=132171.98..132779.88 rows=60790
> width=4) (actual time=2484.449..2581.582 rows=59269 loops=1)
> Group Key: scheduler_operation_executions.id
> Buffers: shared hit=87 read=76529
> ->  Gather  (cost=44474.33..132020.01 rows=60790
> width=4) (actual time=312.503..2463.254 rows=59269 loops=1)
>   Workers Planned: 2
>   Workers Launched: 2
>   Buffers: shared hit=87 read=76529
>   ->  Parallel Hash Join
> (cost=43474.33..124941.01 rows=25329 width=4) (actual
> time=124.733..2279.986 rows=19756 loops=3)
> Hash Cond:
> (scheduler_operation_executions.task_execution_id =
> scheduler_task_executions.id)
> Buffers: shared hit=87 read=76529
> ->  Parallel Seq Scan on
> scheduler_operation_executions  (cost=0.00..74948.21 rows=2483221
> width=8) (actual time=0.126..1828.461 rows=1986887 loops=3)
>   Buffers: shared hit=2 read=50114
> ->  Parallel Hash
> (cost=43263.67..43263.67 rows=16853 width=4) (actual
> time=123.631..123.631 rows=13558 loops=3)
>   Buckets: 65536  Batches: 1
> Memory Usage: 2144kB
>   Buffers: shared hit=1 read=26415
>   ->  Parallel Bitmap Heap Scan on
> scheduler_task_executions  (cost=757.90..43263.67 rows=16853 width=4)
> (actual time=6.944..120.405 rows=13558 loops=3)
> Recheck Cond: (device_id = 97)
> Heap Blocks: exact=24124
> Buffers: shared hit=1 read=26415
> ->  Bitmap Index Scan on
> scheduler_task_executions_device_id_idx  (cost=0.00..747.79 rows=40448
> width=0) (actual time=13.378..13.378 rows=40675 loops=1)
>   Index Cond: (device_id = 97)
>   Buffers: shared read=114
>   ->  Index Scan using index_operation_execution_id_asc on
> results  (cost=0.43..0.51 rows=1 width=58) (actual time=2.720..2.720
> rows=0 loops=59269)
> Index Cond: (operation_execution_id =
> scheduler_operation_executions.id)
> Filter: ((data IS NOT NULL) AND (data <>
> ''::text) AND (attribute_id = ANY ('{4,5}'::integer[])) AND
> (object_id = 1955) AND (data_access_result = 'SUCCESS'::text))
> Rows Removed by Filter: 0
> Buffers: shared hit=182987 read=26882
> Planning Time: 349.908 ms
> Execution Time: 163962.314 ms
>
>
> With ORDER BY (on the second run, 0.6 seconds):
> https://explain.depesz.com/s/QZ1Z
> Unique  (cost=164620.19..164650.19 rows=4 width=54) (actual
> time=621.057..621.527 rows=2 loops=1)
>   Buffers: shared hit=236659 read=49826
>   ->  Sort  (cost=164620.19..164635.19 rows=5999 width=54) (actual
> time=621.056..621.188 rows=4102 loops=1)
> Sort Key: results.attribute_id, results."timestamp" DESC
> Sort Method: quicksort  Memory: 641kB
> Buffers: shared hit=236659 read=49826
> ->  Nested Loop  (cost=132172.41..164243.74 rows=5999
> width=54) (actual time=503.577..619.250 rows=4102 loops=1)
>   Buffers: shared hit=236659 read=49826
>   ->  HashAggregate  (cost=132171.98..132779.88 rows=60790
> width=4) (actual time=503.498..513.551 rows=59269 loops=1)

Re: Extremely slow count (simple query, with index)

2019-08-22 Thread Justin Pryzby
On Thu, Aug 22, 2019 at 02:44:15PM +0200, Marco Colli wrote:
> SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123;
> SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 
> AND "subscriptions"."trashed_at" IS NULL;

>->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 
> rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
>  Recheck Cond: (project_id = 123)
>  Rows Removed by Index Recheck: 23746378
>  Heap Blocks: exact=131205 lossy=1480411
>  ->  Bitmap Index Scan on 
> index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74 
> rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)

>->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 
> rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
>  Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
>  Rows Removed by Index Recheck: 23746273
>  Heap Blocks: exact=131144 lossy=1480409
>  ->  Bitmap Index Scan on 
> index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55 
> rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)

You can see it used the same index in both cases, and the index scan was
reasonably fast (compared to your goal), but the heap component was slow.

I suggest to run VACUUM FREEZE on the table, to try to encourage index only
scan.  If that works, you should condider setting aggressive autovacuum
parameter, at least for the table:
ALTER TABLE subscriptions SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005);
-- And possibly lower value of autovacuum_freeze_max_age

Or, running manual vacuum possibly during quiet hours (possibly setting
vacuum_freeze_table_age to encourage aggressive vacuum).

> Even an approximate count would be enough.

You can SELECT reltuples FROM pg_class WHERE oid='subscriptions'::oid, but its
accuracy depends on frequency of vacuum (and if a large delete/insert happened
since the most recent vacuum/analyze).

Justin




Re: Extremely slow count (simple query, with index)

2019-08-22 Thread Michael Lewis
>
> You can SELECT reltuples FROM pg_class WHERE oid='subscriptions'::oid, but
> its
> accuracy depends on frequency of vacuum (and if a large delete/insert
> happened
> since the most recent vacuum/analyze).
>

This only seems helpful to find approx. count for the entire table, without
considering the WHERE condition.

Marco,
As Justin pointed out, you have most of your time in the bitmap heap scan.
Are you running SSDs? I wonder about tuning effective_io_concurrency to
make more use of them.

"Currently, this setting only affects bitmap heap scans."
https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

Also, how many million rows is this table in total? Have you considered
partitioning?


Re: Erratically behaving query needs optimization

2019-08-22 Thread legrand legrand
Hello,

1/ access scheduler_task_executions 
by index with device_id = 97
seems ok

2/ 
I don't understand why 
joining
scheduler_task_executions.id=scheduler_operation_executions.task_execution_id
is done using a parallel hash join 
when a nested loop would be better (regarding the number of rows involved)

maybe because index on scheduler_operation_executions.task_execution_id

"index_task_execution_id_desc" btree (task_execution_id DESC NULLS LAST)

is not usable or bloated or because of DESC NULLS LAST ?


3/ join with results.operation_execution_id
by index
seems OK

Regards
PAscal



--
Sent from: 
https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html




Re: Extremely slow count (simple query, with index)

2019-08-22 Thread Marco Colli
I have completely solved (from 17s to 1s) by running this command:
vacuum analyze subscriptions;

Now I run the autovacuum more frequently using these settings in
postgresql.conf:
autovacuum_vacuum_scale_factor = 0.01
autovacuum_analyze_scale_factor = 0.01

Thanks to everyone - and in particular to Justin Pryzby for pointing me in
the right direction.

On Thu, Aug 22, 2019 at 7:37 PM Michael Lewis  wrote:

> You can SELECT reltuples FROM pg_class WHERE oid='subscriptions'::oid, but
>> its
>> accuracy depends on frequency of vacuum (and if a large delete/insert
>> happened
>> since the most recent vacuum/analyze).
>>
>
> This only seems helpful to find approx. count for the entire table,
> without considering the WHERE condition.
>
> Marco,
> As Justin pointed out, you have most of your time in the bitmap heap scan.
> Are you running SSDs? I wonder about tuning effective_io_concurrency to
> make more use of them.
>
> "Currently, this setting only affects bitmap heap scans."
>
> https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
>
> Also, how many million rows is this table in total? Have you considered
> partitioning?
>


Re: Extremely slow count (simple query, with index)

2019-08-22 Thread Justin Pryzby
On Thu, Aug 22, 2019 at 07:54:57PM +0200, Marco Colli wrote:
> I have completely solved (from 17s to 1s) by running this command:
> vacuum analyze subscriptions;

Thanks for following though.

On Thu, Aug 22, 2019 at 08:19:10AM -0500, Justin Pryzby wrote:
> You can see it used the same index in both cases, and the index scan was
> reasonably fast (compared to your goal), but the heap component was slow.
> 
> I suggest to run VACUUM FREEZE on the table, to try to encourage index only
> scan.  If that works, you should condider setting aggressive autovacuum

I should've used a better word, since aggressive means something specific.
Perhaps just: "parameter to encourage more frequent autovacuums".

> parameter, at least for the table:
> ALTER TABLE subscriptions SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005);
> -- And possibly lower value of autovacuum_freeze_max_age
> 
> Or, running manual vacuum possibly during quiet hours (possibly setting
> vacuum_freeze_table_age to encourage aggressive vacuum).

I think my reference to autovacuum_freeze_max_age and vacuum_freeze_table_age
were incorrect; what's important is "relallvisible" and not "relfrozenxid".
And xid wraparound isn't at issue here.

> > Even an approximate count would be enough.
> 
> You can SELECT reltuples FROM pg_class WHERE oid='subscriptions'::oid, but its

Should be: oid='subscriptions'::regclass

> accuracy depends on frequency of vacuum (and if a large delete/insert happened
> since the most recent vacuum/analyze).

Justin