Index on (fixed size) bytea value

2023-06-19 Thread Les
   Dear fellow list members,

I'm in the process of implementing a file storage system that is based on
PostgreSQL and streaming replication. There will possibly be many similar
files stored. I would like to implement block-level deduplication: each
file consists of a series of blocks, and each unique block is stored only
once (e.g. one block may be reused by multiple files). It will be part of a
bigger software, e.g. the same database will be used for other purposes too.

Here is the basic idea for storing individual blocks:


create table block(
  id uuid not null primary key,
  block bytea not null,
  hs256 bytea not null
) ;
create unique index uidx_block_hs256 on block(hs256);

create or replace function trg_biu_block() returns trigger language plpgsql
as
$function$
begin
new.hs256 = digest(new.block, 'sha256');
end;
$function$;

create trigger trg_biu_block before insert or update on block for each row
execute procedure trg_biu_block();

This is just for storing the blocks. I'm going to put this "block" table
into a separate tablespace. File operations will be at least 95% read and
at most 5% write. (Streaming replication will hopefully allow almost
horizontal scaling for read operations.) Most of the files will be several
megabytes in size (images), and some of them will be 100MB or more
(videos). Total capacity is in the 10TB range. Storage will be SSD
(possibly fiber connected, or local RAID, we are not sure yet).

I do not want to use PostgreSQL large objects, because it does not have
block level deduplication.

Here are some things that I need help with:

1. What should be the maximum size of a block? I was trying to find out the
optimal value. Default BLCKSZ is 8192 bytes. AFAIK PostgreSQL does not
allow a row to occupy multiple blocks. I don't know enough to calculate the
optimal "block size" (the max. number of bytes stored in a single row in
the block.block field), but I suspect that it should be 4K or something
similar. I think that it should be as large as possible, without hitting
the toast. My thinking is this: most files will be at least 1MB in size, so
most "block" rows will reach the maximum tuple size. It would be practical
to make one row in the "block" table occupy almost one PostgreSQL block.
2. I'm not sure if it would be beneficial to increase BLCKSZ. I will be
able to test the speed of my (not yet finished) implementation with
different BLCKSZ values, but that alone won't help me make the decision,
because AFAIK BLCKSZ must have a fixed value for the PostgreSQL instance,
so it will affect all other tables in the database. It would be hard to
tell how changing BLCKSZ would affect the system as a whole.
3. In the above example, I used SHA-256 (pgcrypto), because apparently it
is very well optimized for 64 bit machines, and it has practically zero
chance of a collision. I think that sha512 would be an overkill. But I'm
not sure that this is the best choice. Maybe somebody with more experience
can suggest a better hash function.
4. The hs256 value will always be non-null, fixed 32 byte binary value, but
probably the query planner will not know anything about that. I was also
thinking about bit(256), but I don't see an easy way to convert the bytea
digest into bit(256). A simple type cast won't work here. Maybe using bytea
here is perfectly fine, and creating an index on the hs256 bytea fields is
as effective as possible.

I'm not looking for a definitive answer, just trying to get some hints from
more experienced users before I fill up the drives with terabytes of data.

Thank you,

Laszlo


Re: Index on (fixed size) bytea value

2023-06-19 Thread Les
David G. Johnston  ezt írta (időpont: 2023.
jún. 19., H, 22:30):

> On Mon, Jun 19, 2023 at 1:05 PM Les  wrote:
>
>> AFAIK PostgreSQL does not allow a row to occupy multiple blocks.
>>
>
> Your plan is going to heavily involve out-of-band storage.  Please read up
> on it here:
>
> https://www.postgresql.org/docs/current/storage-toast.html
>
I'm aware of the TOAST, and how it works. I was referring to it ("I think
that it should be as large as possible, without hitting the toast. ") I
have designed a separate "block" table specifically to avoid storing binary
data in the TOAST. So my plan is not going to involve out-of-band storage.

Just to make this very clear: a record in the block table would store a
block, not the whole file. My question is to finding the optimal block size
(without hitting the toast), and finding the optimal hash algorithm for
block de-duplication.

Unless I totally misunderstood how the TOAST works. (?)

Laszlo


Re: Index on (fixed size) bytea value

2023-06-20 Thread Les
>
>
> Then you would ALTER the column and SET STORAGE MAIN, so that it does not
> ever use TOAST.
>
> The size limit for a row would then be 8kB minus page header minus row
> header, which
> should be somewhere in the vicinity of 8140 bytes.
>
> If you want your block size to be a power of two, the limit would be 4kB,
> which would waste
> almost half your storage space.
>

Oh I see. So if I want to save some space for future columns, then storing
about 7500 bytes in the "block bytea" column would be close to optimal,
utilizing more than 90% of the block space. I guess that the fillfactor
setting will have no effect on this table, and it does not matter if I set
it or not.


slow delete

2023-08-15 Thread Les
I have created a table called _td with about 43 000 rows. I have tried to
use this as a primary key id list to delete records from my
product.product_file table, but I could not do it. It uses 100% of one CPU
and it takes forever. Then I changed the query to delete 100 records only,
and measure the speed:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)

delete from product.product_file where id in (

select pf2_id from _td limit 100

)

It still takes 11 seconds. It means  110 msec / record, and that is
unacceptable.

I'm going to post the whole query plan at the end of this email, but I
would like to highlight the "Triggers" part:


"Triggers": [

{

"Trigger Name": "RI_ConstraintTrigger_a_26535",

"Constraint Name": "fk_pfft_product",

"Relation": "product_file",

"Time": 4.600,

"Calls": 90

},

{

"Trigger Name": "RI_ConstraintTrigger_a_26837",

"Constraint Name": "fk_product_file_src",

"Relation": "product_file",

"Time": 5.795,

"Calls": 90

},

{

"Trigger Name": "RI_ConstraintTrigger_a_75463",

"Constraint Name": "fk_pfq_src_product_file",

"Relation": "product_file",

"Time": 11179.429,

"Calls": 90

},

{

"Trigger Name": "_trg_002_aiu_audit_row",

"Relation": "product_file",

"Time": 49.410,

"Calls": 90

}

]

It seems that two foreign key constraints use 10.395 seconds out of the
total 11.24 seconds. But I don't see why it takes that much?

The product.product_file table has 477 000 rows:

CREATE TABLE product.product_file (

id uuid NOT NULL,

c_tim timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,

c_uid uuid NULL,

c_sid uuid NULL,

m_tim timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,

m_uid uuid NULL,

m_sid uuid NULL,

product_id uuid NOT NULL,

product_file_type_id uuid NOT NULL,

file_id uuid NOT NULL,

product_file_status_id uuid NOT NULL,

dl_url text NULL,

src_product_file_id uuid NULL,

CONSTRAINT product_file_pkey PRIMARY KEY (id),

CONSTRAINT fk_pf_file FOREIGN KEY (file_id) REFERENCES media.file(id),

CONSTRAINT fk_pf_file_type FOREIGN KEY (product_file_type_id) REFERENCES
product.product_file_type(id),

CONSTRAINT fk_pf_product FOREIGN KEY (product_id) REFERENCES
product.product(id) ON DELETE CASCADE,

CONSTRAINT fk_product_file_src FOREIGN KEY (src_product_file_id) REFERENCES
product.product_file(id),

CONSTRAINT fk_product_file_status FOREIGN KEY (product_file_status_id)
REFERENCES product.product_file_status(id)

);

CREATE INDEX idx_product_file_dl_url ON product.product_file USING btree
(dl_url) INCLUDE (product_id) WHERE (dl_url IS NOT NULL);

CREATE INDEX idx_product_file_file_product ON product.product_file USING
btree (file_id, product_id);

CREATE INDEX idx_product_file_product_file ON product.product_file USING
btree (product_id, file_id);

CREATE INDEX idx_product_file_src ON product.product_file USING btree
(src_product_file_id) WHERE (src_product_file_id IS NOT NULL);


The one with fk_pfft_product looks like this, it has about 5000 records in
it:

CREATE TABLE product.product_file_tag (

id uuid NOT NULL,

c_tim timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,

c_uid uuid NULL,

c_sid uuid NULL,

m_tim timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,

m_uid uuid NULL,

m_sid uuid NULL,

product_file_id uuid NOT NULL,

file_tag_id uuid NOT NULL,

CONSTRAINT product_file_tag_pkey PRIMARY KEY (id),

CONSTRAINT fk_pfft_file_tag FOREIGN KEY (file_tag_id) REFERENCES
product.file_tag(id) ON DELETE CASCADE DEFERRABLE,

CONSTRAINT fk_pfft_product FOREIGN KEY (product_file_id) REFERENCES
product.product_file(id) ON DELETE CASCADE DEFERRABLE

);

CREATE UNIQUE INDEX uidx_product_file_file_tag ON product.product_file_tag
USING btree (product_file_id, file_tag_id);


The other constraint has zero actual references, this returns zero:

select count(*) from product.product_file where src_product_file_id in (

select pf2_id from _td

); -- 0

I was trying to figure out how a foreign key constraint with zero actual
references can cost 100 msec / record, but I failed.

Can somebody please explain what is wrong here?

The plan is also visualized here:
http://tatiyants.com/pev/#/plans/plan_1692129126258

[

{

"Plan": {

"Node Type": "ModifyTable",

"Operation": "Delete",

"Parallel Aware": false,

"Async Capable": false,

"Relation Name": "product_file",

"Schema": "product",

"Alias": "product_file",

"Startup Cost": 4.21,

"Total Cost": 840.79,

"Plan Rows": 0,

"Plan Width": 0,

"Actual Startup Time": 0.567,

"Actual Total Time": 0.568,

"Actual Rows": 0,

"Actual Loops": 1,

"Shared Hit Blocks": 582,

"Shared Read Blocks": 0,

"Shared Dirtied Blocks": 10,

"Shared Written Blocks": 0,

"Local Hit Blocks": 0,

"Local Read Blocks": 0,

"Local Dirtied Blocks": 0,

"Local Written Blocks": 0,

"Temp Read Blocks": 0,

"Temp Written Blocks": 0,

"Plans": [

{

"Node Type": "Nested Loop",

"Parent Relationship": "Outer",

"Parallel Aware": false,

"Async Capable": false,

"Join Type": "Inner",

"Startup Cost": 4.21,

"Total Cost": 840.79,

"Plan Rows": 100,

"Plan Width": 46,

"A

Re: slow delete

2023-08-15 Thread Les
Tom Lane  ezt írta (időpont: 2023. aug. 15., K, 22:37):

> Les  writes:
> > It seems that two foreign key constraints use 10.395 seconds out of the
> > total 11.24 seconds. But I don't see why it takes that much?
>
> Probably because you don't have an index on the referencing column.
> You can get away with that, if you don't care about the speed of
> deletes from the PK table ...
>

For fk_pfft_product constraint this is true, but I always thought that
PostgreSQL can use an index "partially". There is already an index:

CREATE UNIQUE INDEX uidx_product_file_file_tag ON product.product_file_tag
USING btree (product_file_id, file_tag_id);

It has the same order, only it has one column more. Wouldn't it be possible
to use it for the plan?

After I created these two missing indices:

CREATE INDEX idx_pft_pf ON product.product_file_tag USING btree
(product_file_id);

CREATE INDEX idx_pfq_src_pf ON product.product_file_queue USING btree
(src_product_file_id);


I could delete all 40 000 records in 10 seconds.

Thank you!

 Laszlo

>
>


Slow query, possibly not using index

2023-08-27 Thread Les
I have this table:

CREATE TABLE media.block (

id uuid NOT NULL,

"size" int8 NOT NULL,

nrefs int8 NOT NULL DEFAULT 0,

block bytea NOT NULL,

hs256 bytea NOT NULL,

CONSTRAINT block_pkey PRIMARY KEY (id),

CONSTRAINT chk_nrefs CHECK ((nrefs >= 0))

)

WITH (

toast_tuple_target=8160

)

TABLESPACE data_slow

;

alter table media.block alter column block set storage main;

alter table media.block alter column hs256 set storage main;

CREATE INDEX idx_block_unused ON media.block USING btree (id) WHERE (nrefs
= 0);

CREATE UNIQUE INDEX uidx_block_hs256 ON media.block USING btree (hs256);


Number of rows in this table is about 40M, and most of the rows occupy a
full 8K block (in most cases, the "block" field contains 7500 bytes).

The idx_block_unused index should be used to find blocks that are unused,
so they can be deleted at some point.

The idx_block_unused index is less than 400MB:


SELECT i.relname "Table Name",indexrelname "Index Name",

pg_size_pretty(pg_total_relation_size(relid)) As "Total Size",

pg_size_pretty(pg_indexes_size(relid)) as "Total Size of all Indexes",

pg_size_pretty(pg_relation_size(relid)) as "Table Size",

pg_size_pretty(pg_relation_size(indexrelid)) "Index Size",

reltuples::bigint "Estimated table row count"

FROM pg_stat_all_indexes i JOIN pg_class c ON i.relid=c.oid

where i.relid ='media.block'::regclass


Table Name|Index Name  |Total Size|Total Size of all Indexes|Table
Size|Index Size|Estimated table row count|
--++--+-+--+--+-+
block |block_pkey  |352 GB|5584 MB  |347 GB
 |1986 MB   | 38958848|
block |uidx_block_hs256|352 GB|5584 MB  |347 GB
 |3226 MB   | 38958848|
block |idx_block_unused|352 GB|5584 MB  |347 GB
 |372 MB| 38958848|

If I try to select a single unused block this way:

explain analyze select id from media.block b where nrefs =0 limit 1


then it runs for more than 10 minutes (I'm not sure how long, I cancelled
the query after 10 minutes).

If I run this without analyze:

explain select id from media.block b where nrefs =0 limit 1


QUERY PLAN
|
---+
Limit  (cost=0.38..0.76 rows=1 width=16)
|
  ->  Index Only Scan using idx_block_unused on block b  (cost=0.38..869.83
rows=2274 width=16)|

I believe it is not actually using the index, because reading a single
(random?) entry from an index should not run for >10 minutes.

What am I doing wrong?

Thank you,

 Laszlo


Re: Slow query, possibly not using index

2023-08-27 Thread Les
> > If I try to select a single unused block this way:
> > explain analyze select id from media.block b where nrefs =0 limit 1
> > then it runs for more than 10 minutes (I'm not sure how long, I cancelled
> > the query after 10 minutes).
>
> Are you sure it isn't blocked on a lock?
>

Yes, I'm sure. I have created a single database instance from a zfs
snapshot and tried the query on that database. It was the only client.


> Another theory is that the index contains many thousands of references
> to now-dead rows, and the query is vainly searching for a live entry.
> Given that EXPLAIN thinks there are only about 2300 live entries,
> and yet you say the index is 400MB, this seems pretty plausible.
>

Nobody ever deleted anything from this table. Since it was created, this
has been a write-only table.


> Have you disabled autovacuum, or something like that?  (REINDEX
> could help here, at least till the index gets bloated again.)
>
I did not disable autovacuum.

>
> You might think that even so, it shouldn't take that long ... but
> indexes on UUID columns are a well known performance antipattern.
> The index entry order is likely to have precisely zip to do with
> the table's physical order, resulting in exceedingly-random access
> to the table, which'll be horribly expensive when the table is so
> much bigger than RAM.  Can you replace the UUID column with a simple
> serial (identity) column?
>

I'm aware of the problems with random UUID values. I was using this
function to create ulids from the beginning:

CREATE OR REPLACE FUNCTION public.gen_ulid()

RETURNS uuid

LANGUAGE sql

AS $function$

SELECT (lpad(to_hex(floor(extract(epoch FROM clock_timestamp()) * 1000)::
bigint), 12, '0') || encode(gen_random_bytes(10), 'hex'))::uuid;

$function$

;


If I order some rows by id values, I can see that their creation times are
strictly ascending. I did not write this function, it was taken from this
website:

https://blog.daveallie.com/ulid-primary-keys

They have a benchmark section where they show that these ULID values are
slower to generate (at least with this implementation) but much faster to
insert.

I might be able to replace these with int8 values, I need to check.


>
> > I believe it is not actually using the index, because reading a single
> > (random?) entry from an index should not run for >10 minutes.
>
> You should believe what EXPLAIN tells you about the plan shape.
> (Its rowcount estimates are only estimates, though.)
>

All of the 40M rows in this table are live. I'm 100% sure about this,
because nobody ever deleted rows from this table.

I can try to do VACUUM on this table, but I'm limited on resources. I think
it will take days to do this. Maybe I can try to dump the whole database
and restore it on another machine. Would that eliminate dead rows? (Is
there a way to check the number of dead rows?)

Regards,

Laszlo


Re: Slow query, possibly not using index

2023-08-27 Thread Les
>
>
>
> > I'm aware of the problems with random UUID values. I was using this
> > function to create ulids from the beginning:
>
> Oh, well that would have been useful information to provide at the
> outset.

I'm sorry, I left this out.

> Now that we know the index order is correlated with creation
> time, I wonder if it is also correlated with nrefs, in such a way that
> scanning in index order is disastrous because all the desired rows are
> at the end of the index.
>
Possibly, I have no idea.

>
> Also, you deny deleting any rows, but that wasn't the point.  Do you
> ever update nrefs from zero to nonzero?  That'd also leave dead
> entries behind in this index.  If that is a routine event that is
> correlated with creation time, it gets easier to believe that your
> index could have lots of dead entries at the front.
>

I have checked the trigger that is maintaining the nrefs field. Blocks are
referenced from a "file_block" table. Every time a block is created, it
first has an initial value of nrefs=0, then a file_block row (reference) is
inserted, and nrefs is incremented to one. It means that every block has
shown up once in the index, and then disappeared. If the index was never
vacuumed, then it is very plausible that it is full of dead rows.

CREATE OR REPLACE FUNCTION media.trg_aiud_file_block()

RETURNS trigger

LANGUAGE plpgsql

AS $function$

begin

if TG_OP='INSERT' then

update media.block set nrefs = nrefs + 1 where id = new.block_id;

return new;

end if;

if TG_OP='UPDATE' then

if old.block_id is distinct from new.block_id then

update media.block set nrefs = nrefs + 1 where id = new.block_id;

update media.block set nrefs = nrefs - 1 where id = old.block_id;

end if;

return new;

end if;

if TG_OP='DELETE' then

update media.block set nrefs = nrefs - 1 where id = old.block_id;

return old;

end if;

end;

$function$

;


The idea was to create an index that can help in quickly removing unused
blocks, to free up disk space. It would be much better to keep out the
initially inserted (not yet references) from the index, but I don't know
how to do this.


> We'd still have to figure out why autovacuum is failing to clean out
> those entries in a timely fashion, but that seems like a plausible
> way for the performance problem to exist.
>
Yes, that would be very good to know. I cloud drop and recreate the index
now, but after some time I would be facing the same situation again.

I double checked, and the "autovacuum launcher" process is running.

Here are the current settings:

=# select name, setting, unit, min_val, max_val, boot_val, reset_val,
pending_restart from pg_settings where name like '%vacuum%';
 name  |  setting   | unit | min_val |
 max_val   |  boot_val  | reset_val  | pending_restart
---++--+-++++-
 autovacuum| on |  | |
   | on | on | f
 autovacuum_analyze_scale_factor   | 0.1|  | 0   | 100
   | 0.1| 0.1| f
 autovacuum_analyze_threshold  | 50 |  | 0   |
2147483647 | 50 | 50 | f
 autovacuum_freeze_max_age | 2  |  | 10  |
20 | 2  | 2  | f
 autovacuum_max_workers| 3  |  | 1   |
262143 | 3  | 3  | f
 autovacuum_multixact_freeze_max_age   | 4  |  | 1   |
20 | 4  | 4  | f
 autovacuum_naptime| 60 | s| 1   |
2147483| 60 | 60 | f
 autovacuum_vacuum_cost_delay  | 2  | ms   | -1  | 100
   | 2  | 2  | f
 autovacuum_vacuum_cost_limit  | -1 |  | -1  |
1  | -1 | -1 | f
 autovacuum_vacuum_insert_scale_factor | 0.2|  | 0   | 100
   | 0.2| 0.2| f
 autovacuum_vacuum_insert_threshold| 1000   |  | -1  |
2147483647 | 1000   | 1000   | f
 autovacuum_vacuum_scale_factor| 0.2|  | 0   | 100
   | 0.2| 0.2| f
 autovacuum_vacuum_threshold   | 50 |  | 0   |
2147483647 | 50 | 50 | f
 autovacuum_work_mem   | -1 | kB   | -1  |
2147483647 | -1 | -1 | f
 log_autovacuum_min_duration   | 60 | ms   | -1  |
2147483647 | 60 | 60 | f
 vacuum_cost_delay | 0  | ms   | 0   | 100
   | 0  | 0  | f
 vacuum_cost_limit | 200|  | 1   |
1  | 200| 200| f
 vacuum_cost_page_dirty| 20 |  | 0   |
1  | 20 | 20 | f
 vacuum_cost_page_hit  | 1  |  | 0 

Re: Slow query, possibly not using index

2023-08-28 Thread Les
>
>>
> All right, I started pgstattuple()  and I'll also do pgstatindex(), but it
> takes a while. I'll get back with the results.
>

=# select * from pgstattuple('media.block');

  table_len   | tuple_count |  tuple_len   | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space  |
free_percent
--+-+--+---+--+++-+--
 372521984000 |39652836 | 299148572428 |  80.3 |
 3578319 |26977942540 |   7.24 | 44638265312 |11.98
(1 row)

=# select * from pgstatindex('media.idx_block_unused');
 version | tree_level | index_size | root_block_no | internal_pages |
leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
leaf_fragmentation
-+++---+++-+---+--+
   4 |  2 |  389677056 |   546 |114 |
 23069 |   0 | 24384 |90.03 |  0
(1 row)

As far as I understand these numbers, the media.block table itself is in
good shape, but the index is not. Should I vacuum the whole table? Or would
it be better to REINDEX INDEX media.idx_block_unused CONCURRENTLY ?

More important question is, how can I find out why the index was not auto
vacuumed.

Thank you,

   Laszlo


Re: Slow query, possibly not using index

2023-08-28 Thread Les
>
>
> =# select * from pgstatindex('media.idx_block_unused');
>  version | tree_level | index_size | root_block_no | internal_pages |
> leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
> leaf_fragmentation
>
> -+++---+++-+---+--+
>4 |  2 |  389677056 |   546 |114 |
>  23069 |   0 | 24384 |90.03 |  0
> (1 row)
>
> After reindex:

=# select * from pgstatindex('media.idx_block_unused');
 version | tree_level | index_size | root_block_no | internal_pages |
leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
leaf_fragmentation
-+++---+++-+---+--+
   4 |  0 |   8192 | 0 |  0 |
   0 |   0 | 0 |  NaN |NaN
(1 row)

 explain analyze select id from media.block b where nrefs =0 limit 1

QUERY PLAN
  |
-+
Limit  (cost=0.14..0.46 rows=1 width=16) (actual time=0.010..0.011 rows=0
loops=1)   |
  ->  Index Only Scan using idx_block_unused on block b  (cost=0.14..698.91
rows=2231 width=16) (actual time=0.008..0.009 rows=0 loops=1)|
Heap Fetches: 0
 |
Planning Time: 0.174 ms
 |
Execution Time: 0.030 ms
  |

It is actually empty.

Now I only need to figure out why autovacuum did not work on the index.

Thank you

Laszlo


Re: Slow query, possibly not using index

2023-08-28 Thread Les
>
>
>
> > More important question is, how can I find out why the index was not
> auto vacuumed.
>
> You should have a look at pg_stat_user_tables.  It'll let you know if
> the table is being autovacuumed and how often.  If you're concerned
> about autovacuum not running properly, then you might want to lower
> log_autovacuum_min_duration.  Generally, anything that takes a
> conflicting lock will cause autovacuum to cancel so that the
> conflicting locker can get through.  Things like ALTER TABLE or even
> an ANALYZE running will cancel most autovacuum runs on tables.
>
> Also, this is a fairly large table and you do have the standard
> autovacuum settings.  Going by pgstattuple, the table has 39652836
> tuples.  Autovacuum will trigger when the statistics indicate that 20%
> of tuples are dead, which is about 8 million tuples. Perhaps that's
> enough for the index scan to have to skip over a large enough number
> of dead tuples to make it slow. You might want to consider lowering
> the autovacuum scale factor for this table.
>
> Also, ensure you're not doing anything like calling pg_stat_reset();
>
> It might be worth showing us the output of:
>
> select * from pg_stat_user_tables where relid = 'media.block'::regclass;
>
Thank you for your suggestion, this is really very helpful.

select * from pg_stat_user_tables where relid = 'media.block'::regclass;



Name   |Value|
---+-+
relid  |25872|
schemaname |media|
relname|block|
seq_scan   |8|
seq_tup_read   |139018370|
idx_scan   |45023556 |
idx_tup_fetch  |37461539 |
n_tup_ins  |7556051  |
n_tup_upd  |7577720  |
n_tup_del  |0|
n_tup_hot_upd  |0|
n_live_tup |39782042 |
n_dead_tup |5938057  |
n_mod_since_analyze|1653427  |
n_ins_since_vacuum |5736676  |
last_vacuum| |
last_autovacuum|2023-08-17 22:39:29.383 +0200|
last_analyze   | |
last_autoanalyze   |2023-08-22 16:02:56.093 +0200|
vacuum_count   |0|
autovacuum_count   |1|
analyze_count  |0|
autoanalyze_count  |4|

Regards,

Laszlo


Terribly slow query with very good plan?

2022-02-04 Thread Les
Hello,

I have a table that contains folders, and another one that contains files.

Here are the table definitions. I have removed most of the columns because
they are not important for this question. (There are lots of columns.)

CREATE TABLE media.oo_folder (
id int8 NOT NULL,
is_active bool NOT NULL DEFAULT true,
title text NOT NULL,
relpath text NOT NULL,
CONSTRAINT chk_no_slash CHECK (("position"(title, '/'::text) = 0)),
CONSTRAINT oo_folder_chk_no_slash CHECK (("position"(title, '/'::text) =
0)),
CONSTRAINT pk_oo_folder PRIMARY KEY (id),
CONSTRAINT fk_oo_folder_parent_id FOREIGN KEY (parent_id) REFERENCES
media.oo_folder(id) ON DELETE CASCADE DEFERRABLE
);
CREATE INDEX oo_folder_idx_parent ON media.oo_folder USING btree
(parent_id);
CREATE INDEX oo_folder_idx_relpath ON media.oo_folder USING btree (relpath);
CREATE UNIQUE INDEX uidx_oo_folder_active_title ON media.oo_folder USING
btree (parent_id, title) WHERE is_active;


CREATE TABLE media.oo_file (
id int8 NOT NULL,
is_active bool NOT NULL DEFAULT true,
title text NOT NULL,
ext text NULL,
relpath text NOT NULL,
sha1 text NOT NULL,
CONSTRAINT chk_no_slash CHECK (("position"(title, '/'::text) = 0)),
CONSTRAINT oo_file_chk_no_slash CHECK (("position"(title, '/'::text) = 0)),
CONSTRAINT pk_oo_file PRIMARY KEY (id),
CONSTRAINT fk_oo_file_oo_folder_id FOREIGN KEY (oo_folder_id) REFERENCES
media.oo_folder(id) ON DELETE CASCADE DEFERRABLE,
);
CREATE INDEX oo_file_idx_oo_folder_id ON media.oo_file USING btree
(oo_folder_id);
CREATE INDEX oo_file_idx_relpath ON media.oo_file USING btree (relpath);
CREATE INDEX oo_file_idx_sha1 ON media.oo_file USING btree (sha1);
CREATE UNIQUE INDEX uidx_oo_file_active_title ON media.oo_file USING btree
(oo_folder_id, title) WHERE is_active;

The "replath" field contains the path of the file/folder. For example:
"/folder1/folder2/folder3/filename4.ext5".  The replath field is managed by
triggers. There are about 1M rows for files and 600K folder rows in the
database. The files are well distributed between folders, and there are
only 45 root folders ( parent_id is null)

This query runs very fast:


EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select id, title from
media.oo_folder f where f.parent_id is null

QUERY PLAN
   |
--+
Index Scan using oo_folder_idx_parent on media.oo_folder f
 (cost=0.42..73.70 rows=20 width=25) (actual time=0.030..0.159 rows=45
loops=1)|
  Output: id, title
  |
  Index Cond: (f.parent_id IS NULL)
  |
  Buffers: shared hit=40
   |
Planning Time: 0.123 ms
  |
Execution Time: 0.187 ms
   |

My task is to write a query that tells if a folder has any active file
inside it - directly or in subfolders. Here is the query for that:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)

select id, title,
(exists (select f2.id from media.oo_file f2 where f2.relpath like f.relpath
|| '%')) as has_file
from media.oo_folder f where f.parent_id is null

QUERY PLAN

   |
--+
Index Scan using oo_folder_idx_parent on media.oo_folder f
 (cost=0.42..488.02 rows=20 width=26) (actual time=713.419..25414.969
rows=45 loops=1) |
  Output: f.id, f.title, (SubPlan 1)

   |
  Index Cond: (f.parent_id IS NULL)

  |
  Buffers: shared hit=7014170

  |
  SubPlan 1

  |
->  Index Only Scan using oo_file_idx_relpath on media.oo_file f2
 (cost=0.55..108499.27 rows=5381 width=0) (actual time=564.756..564.756
rows=0 loops=45)|
  Filter: (f2.relpath ~~ (f.relpath || '%'::text))

   |
  Rows Removed by Filter: 792025

   |
  Heap Fetches: 768960

   |
  Buffers: shared hit=7014130

  |
Planning Time: 0.361 ms

  |
Execution Time: 25415.088 ms

   |

It also returns 45 rows, but in 25 seconds which is unacceptable.

It I execute the "has_file" subquery for one specific relpath then it
speeds up again, to < 1msec:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select exists ( select id from media.oo_file of2  where relpath  like
'Felhasználók%')
QUERY PLAN
   |
--+
Result  (cost=1.66..1.67 rows=1 width=1) (actual time=0.049..0.050 rows=1
loops=1)|
  Output: $0
   |
  Buffers: sh

Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
Laurenz Albe  ezt írta (időpont: 2022. febr. 4.,
P, 10:18):

>   |
> >
> > It also returns 45 rows, but in 25 seconds which is unacceptable.
>
> You should create an index that supports LIKE; for example
>
> CREATE INDEX ON media.oo_file (relpath COLLATE "C");
>
>
CREATE INDEX test ON media.oo_file (relpath COLLATE "C");
 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)

select id, title,
(exists (select f2.id from
media.oo_file f2
where f2.relpath like f.relpath || '%' )) as has_file
from media.oo_folder f where f.parent_id is null;
QUERY PLAN
  |
-+
Index Scan using oo_folder_idx_parent on media.oo_folder f
 (cost=0.42..459.38 rows=20 width=26) (actual time=772.566..24081.820
rows=45 loops=1)|
  Output: f.id, f.title, (SubPlan 1)
  |
  Index Cond: (f.parent_id IS NULL)
 |
  Buffers: shared hit=6672274
 |
  SubPlan 1
 |
->  Index Only Scan using test on media.oo_file f2
 (cost=0.55..100756.64 rows=5379 width=0) (actual time=535.113..535.113
rows=0 loops=45)  |
  Filter: (f2.relpath ~~ (f.relpath || '%'::text))
  |
  Rows Removed by Filter: 777428
  |
  Heap Fetches: 736418
  |
  Buffers: shared hit=6672234
 |
Planning Time: 0.338 ms
 |
Execution Time: 24082.152 ms
  |

Not helping :-(


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
I really think now that the query plan is wrong (or "could be improved" so
to say). As far as I understand, the "index only scan" is essentially a
sequential scan on the index data. In this specific case, where the filter
is a "begins with" condition on a field that is the starting (and only)
column of an index, there is a much much better way to find out if there is
a row or not: lookup the closest value in the index and see if it begins
with the value. The operation of looking up the closest value in an index
would be much more efficient.


> I don't understand how it is possible in the slow case Rows Removed by
Filter: 792025 (returns 0 row) and in the second case Rows Removed by
Filter: 15 (returns 1 row).

Pavel, I think it is because the scan found a suitable row at the beginning
of the scan and stopped the scan. If you look at that plan you will see
that it uses a seq scan. It was fast by accident. :-)

The plan of that single-row version was changed to a normal index scan,
after I added the collation "C" index:


EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select exists (
select id from media.oo_file of2  where relpath like 'this does not exist%'
);
QUERY PLAN
  |
-+
Result  (cost=0.63..0.64 rows=1 width=1) (actual time=0.022..0.023 rows=1
loops=1)   |
  Output: $0
  |
  Buffers: shared hit=4
 |
  InitPlan 1 (returns $0)
 |
->  Index Only Scan using test on media.oo_file of2  (cost=0.55..8.57
rows=108 width=0) (actual time=0.018..0.018 rows=0 loops=1)|
  Index Cond: ((of2.relpath >= 'this does not exist'::text) AND
(of2.relpath < 'this does not exisu'::text)) |
  Filter: (of2.relpath ~~ 'this does not exist%'::text)
 |
  Heap Fetches: 0
 |
  Buffers: shared hit=4
 |
Planning Time: 0.530 ms
 |
Execution Time: 0.055 ms
  |

I would expect for the same originally slow query with the has_file column,
but it does not happen. :-(


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
Nick Cleaton  ezt írta (időpont: 2022. febr. 4., P,
11:00):

>
> In the fast case the 'Felhasználók%' part is known at query planning
> time, so it can be a prefix search.
>
> In the slow case, the planner doesn't know what that value will be, it
> could be something that starts with '%' for example.
>
>
First of all, it CANNOT start with '%'. This is a fact and this fact can be
determined by analyzing the query. Something that the query planner should
do, right?

Second argument: the same query is also slow with the ^@ operator...

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)

select id, title,
(exists (select f2.id from
media.oo_file f2
where f2.relpath ^@ f.relpath )) as has_file
from media.oo_folder f where f.parent_id is null

QUERY PLAN
   |
--+
Index Scan using oo_folder_idx_parent on media.oo_folder f
 (cost=0.42..449.38 rows=20 width=26) (actual time=1652.624..61636.232
rows=45 loops=1)|
  Output: f.id, f.title, (SubPlan 1)
   |
  Index Cond: (f.parent_id IS NULL)
  |
  Buffers: shared hit=6672274
  |
  SubPlan 1
  |
->  Index Only Scan using test on media.oo_file f2
 (cost=0.55..98067.11 rows=5379 width=0) (actual time=1369.665..1369.665
rows=0 loops=45)  |
  Filter: (f2.relpath ^@ f.relpath)
  |
  Rows Removed by Filter: 777428
   |
  Heap Fetches: 736418
   |
  Buffers: shared hit=6672234
  |
Planning Time: 0.346 ms
  |
Execution Time: 61636.319 ms
   |


> Also your logic looks a bit unsafe, the query you have would include
> files under all top-level folders with names starting with
> Felhasználók, so you could accidentally merge in files in folders
> called Felhasználókfoo and Felhasználókbar for example.
>

Forgive me, I typed in these examples for demonstration. The actual code
uses relpath || '/%' and it avoids those cases.


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
> In the fast case the 'Felhasználók%' part is known at query planning
>> time, so it can be a prefix search.
>>
>> In the slow case, the planner doesn't know what that value will be, it
>> could be something that starts with '%' for example.
>>
>>
> First of all, it CANNOT start with '%'. This is a fact and this fact can
> be determined by analyzing the query. Something that the query planner
> should do, right?
>
> Second argument: the same query is also slow with the ^@ operator...
>

Oh I see, the query planner does not know that there will be no %
characters in file and folder names.

But what is the solution then? It just seems wrong that I can speed up a
query 1000 times by replacing it with a nested loop in a pl/sql function :(


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
>
>
>>
>> First of all, it CANNOT start with '%'. This is a fact and this fact can
>> be determined by analyzing the query. Something that the query planner
>> should do, right?
>>
>> Second argument: the same query is also slow with the ^@ operator...
>>
>
> Oh I see, the query planner does not know that there will be no %
> characters in file and folder names.
>
> On second thought, it does not explain why it is also slow with the ^@
operator.


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
Nick Cleaton  ezt írta (időpont: 2022. febr. 4., P,
11:57):

>
> With the ^@ operator, my guess is that because the planner knows
> nothing about the folder name value it could be the empty string,
> which would be a prefix of everything.
>

I think I could narrow down the problem to the simplest query possible.

The "title could be empty" does not hold for this:

CREATE index test ON media.oo_file (relpath COLLATE "C");

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active  and fi.relpath ^@ 'Természettudomány' limit 1

Limit  (cost=0.00..2.70 rows=1 width=8) (actual time=14445.559..14445.561
rows=0 loops=1)
  Output: id
  Buffers: shared hit=22288 read=108975
  ->  Seq Scan on media.oo_file fi  (cost=0.00..144710.65 rows=53574
width=8) (actual time=14445.555..14445.556 rows=0 loops=1)
Output: id
Filter: (fi.is_active AND (fi.relpath ^@ 'Természettudomány'::text))
Rows Removed by Filter: 1075812
Buffers: shared hit=22288 read=108975
Planning Time: 0.398 ms
Execution Time: 14445.593 ms

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active  and fi.relpath like 'Természettudomány%' limit 1

Limit  (cost=0.00..2.70 rows=1 width=8) (actual time=11222.280..11222.282
rows=0 loops=1)
  Output: id
  Buffers: shared hit=22320 read=108943
  ->  Seq Scan on media.oo_file fi  (cost=0.00..144710.65 rows=53574
width=8) (actual time=11222.278..11222.279 rows=0 loops=1)
Output: id
Filter: (fi.is_active AND (fi.relpath ~~
'Természettudomány%'::text))
Rows Removed by Filter: 1075812
Buffers: shared hit=22320 read=108943
Planning Time: 0.488 ms
Execution Time: 11222.307 ms

It is using seq scan for both cases. This is definitely wrong!

One of my collaguage has noticed that the LIKE query uses index scan for
some of the letters:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active  and fi.relpath like 'A%' limit 1;
Limit  (cost=0.55..60.85 rows=1 width=8) (actual time=6.508..6.509 rows=0
loops=1)
  Output: id
  Buffers: shared hit=2776
  ->  Index Scan using test on media.oo_file fi  (cost=0.55..4583.29
rows=76 width=8) (actual time=6.506..6.507 rows=0 loops=1)
Output: id
Index Cond: ((fi.relpath >= 'A'::text) AND (fi.relpath < 'B'::text))
Filter: (fi.is_active AND (fi.relpath ~~ 'A%'::text))
Rows Removed by Filter: 3784
Buffers: shared hit=2776
Planning Time: 0.543 ms
Execution Time: 6.560 ms

Actually, the number of files per starting letter is:

select substr(relpath, 0, 2), count(*)
from media.oo_file
group by substr(relpath, 0, 2)
order by count(*) desc
substr|count |
--+--+
O |386087|
F |236752|
N |167171|
Ó |111479|
T |109786|
M | 34348|
P | 19878|
L |  5657|
A |  3784|
I |   869|
C | 1|

PostgreSQL uses seq scan for O, F, N, T letters, but it uses index scan for
A, I, C letters (with the "like" query).

There might be a problem with the planner here, because I think that using
an index scan will always be faster than a seq scan. The number of rows for
the prefix should not matter at all, because we are trying to get the first
matching row only. For some reason it decides between seq/index scan based
on the number of rows stored in some stats. At least it seems that way.

If I could tell the planner to use the index, I think my problem would be
solved. Is there a way to put optimizer hints into the query?

There could be some improvement made to the @^ operator too, because it
always uses seq scan, no matter what.

What do you think?


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
> > PostgreSQL uses seq scan for O, F, N, T letters, but it uses index scan
> for A, I, C letters (with the "like" query).
>
> That's interesting.
>
> Does it help if you create an additional index on relpath with the
> text_pattern_ops modifier, e.g.
>
> CREATE INDEX ... USING btree (relpath text_pattern_ops);
>

It does not help. Details below. (PostgreSQL version 12.8)

CREATE index test ON media.oo_file (relpath COLLATE "C");
CREATE INDEX test2 ON media.oo_file USING btree (relpath text_pattern_ops);
CREATE INDEX test3 ON media.oo_file USING btree (relpath collate "C"
text_pattern_ops);
-- letter "A" ^@ operator -> slow seq scan
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active  and fi.relpath ^@ 'A' limit 1;
QUERY PLAN
 |
+
Limit  (cost=0.00..1904.09 rows=1 width=8) (actual
time=10779.585..10779.587 rows=0 loops=1)|
  Output: id
 |
  Buffers: shared hit=9960 read=121303
 |
  ->  Seq Scan on media.oo_file fi  (cost=0.00..144710.65 rows=76 width=8)
(actual time=10779.582..10779.583 rows=0 loops=1)|
Output: id
 |
Filter: (fi.is_active AND (fi.relpath ^@ 'A'::text))
 |
Rows Removed by Filter: 1075812
|
Buffers: shared hit=9960 read=121303
 |
Planning Time: 0.428 ms
|
Execution Time: 10779.613 ms
 |

-- letter 'A' like expression index scan fast
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active  and fi.relpath like 'A%' limit 1;
QUERY PLAN
|
---+
Limit  (cost=0.55..60.85 rows=1 width=8) (actual time=7.047..7.048 rows=0
loops=1) |
  Output: id
|
  Buffers: shared hit=2776
|
  ->  Index Scan using test on media.oo_file fi  (cost=0.55..4583.29
rows=76 width=8) (actual time=7.045..7.045 rows=0 loops=1)|
Output: id
|
Index Cond: ((fi.relpath >= 'A'::text) AND (fi.relpath <
'B'::text))   |
Filter: (fi.is_active AND (fi.relpath ~~ 'A%'::text))
   |
Rows Removed by Filter: 3784
|
Buffers: shared hit=2776
|
Planning Time: 0.937 ms
   |
Execution Time: 7.091 ms
|


-- letter 'T' like expression, seq scan slow
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active  and fi.relpath like 'Természettudomány%' limit 1;
QUERY PLAN
  |
-+
Limit  (cost=0.00..2.70 rows=1 width=8) (actual time=9842.935..9842.938
rows=0 loops=1)  |
  Output: id
  |
  Buffers: shared hit=10024 read=121239
 |
  ->  Seq Scan on media.oo_file fi  (cost=0.00..144710.65 rows=53574
width=8) (actual time=9842.933..9842.934 rows=0 loops=1)|
Output: id
  |
Filter: (fi.is_active AND (fi.relpath ~~
'Természettudomány%'::text))
 |
Rows Removed by Filter: 1075812
 |
Buffers: shared hit=10024 read=121239
 |
Planning Time: 0.975 ms
 |
Execution Time: 9842.962 ms
 |


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
> >
> > It does not help.
>
> What if you try applying the C collation to the values from the table:
>
> where fi.is_active  and fi.relpath collate "C" ^@ 'A'
>

Slow

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active  and fi.relpath collate "C" ^@ 'A' limit 1;
QUERY PLAN
   |
--+
Limit  (cost=0.00..1904.09 rows=1 width=8) (actual time=3837.338..3837.340
rows=0 loops=1)|
  Output: id
   |
  Buffers: shared hit=9355 read=121908
   |
  ->  Seq Scan on media.oo_file fi  (cost=0.00..144710.65 rows=76 width=8)
(actual time=3837.336..3837.336 rows=0 loops=1)|
Output: id
   |
Filter: (fi.is_active AND ((fi.relpath)::text ^@ 'A'::text))
   |
Rows Removed by Filter: 1075812
  |
Buffers: shared hit=9355 read=121908
   |
Planning Time: 0.391 ms
  |
Execution Time: 3837.364 ms
  |


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
> Slow
>
> What about this:
>
> fi.relpath between ('A' collate "C") and ('A'||chr(255) collate "C")
>

It uses index scan.


EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active
and fi.relpath between ('A' collate "C") and ('A'||chr(255) collate "C")
limit 1;
QUERY PLAN
|
---+
Limit  (cost=0.55..6.12 rows=1 width=8) (actual time=1623.069..1623.070
rows=0 loops=1)|
  Output: id
|
  Buffers: shared hit=2439 read=1994 dirtied=1107
   |
  ->  Index Scan using test on media.oo_file fi  (cost=0.55..5732.47
rows=1029 width=8) (actual time=1623.067..1623.067 rows=0 loops=1)|
Output: id
|
Index Cond: ((fi.relpath >= 'A'::text COLLATE "C") AND (fi.relpath
<= 'A '::text COLLATE "C")) |
Filter: fi.is_active
|
Rows Removed by Filter: 3784
|
Buffers: shared hit=2439 read=1994 dirtied=1107
   |
Planning Time: 18.817 ms
|
Execution Time: 1623.104 ms
   |

Although the same with 'Természettudomány' uses seq scan:


EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active
and fi.relpath between
('Természettudomány' collate "C")
and ('Természettudomány'||chr(255) collate "C")
limit 1;

QUERY PLAN
|
---+
Limit  (cost=0.00..2.13 rows=1 width=8) (actual time=7521.531..7521.532
rows=0 loops=1)|
  Output: id
|
  Buffers: shared hit=17018 read=150574
   |
  ->  Seq Scan on media.oo_file fi  (cost=0.00..188195.39 rows=88290
width=8) (actual time=7521.528..7521.529 rows=0 loops=1)
   |
Output: id
|
Filter: (fi.is_active AND (fi.relpath >= 'Természettudomány'::text
COLLATE "C") AND (fi.relpath <= 'Természettudomány '::text COLLATE "C"))|
Rows Removed by Filter: 1075812
   |
Buffers: shared hit=17018 read=150574
   |
Planning Time: 8.918 ms
   |
Execution Time: 7521.560 ms
   |


Re: Terribly slow query with very good plan?

2022-02-04 Thread Les
>
> That may be because it's expecting to get 88290 rows from the
> sequential scan, and the"limit 1" means it expects sequential scan to
> be fast because on average it will only need to scan 1/88290 of the
> table before it finds a matching row, then it can stop.
>


We are looking for a single row. With an index scan, it is always much
faster to find a single row. No seq scan can be faster "on average", when
you are looking for a single row. Am I wrong?

> Try it without the "limit 1"


Without the limit it uses bitmap heap scan. Unbelievable!

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
select fi.id from media.oo_file fi
where fi.is_active
and fi.relpath between
('Természettudomány' collate "C")
and ('Természettudomány'||chr(255) collate "C");

QUERY PLAN
   |
--+
Bitmap Heap Scan on media.oo_file fi  (cost=10480.10..140065.96 rows=70010
width=8) (actual time=9757.917..9757.920 rows=0 loops=1)   |
  Output: id
   |
  Recheck Cond: ((fi.relpath >= 'Természettudomány'::text COLLATE "C") AND
(fi.relpath <= 'Természettudomány '::text COLLATE "C"))|
  Filter: fi.is_active
   |
  Rows Removed by Filter: 85207
  |
  Heap Blocks: exact=24954
   |
  Buffers: shared hit=197 read=26531
   |
  ->  Bitmap Index Scan on test  (cost=0.00..10462.59 rows=99404 width=0)
(actual time=425.571..425.572 rows=85207 loops=1)   |
Index Cond: ((fi.relpath >= 'Természettudomány'::text COLLATE "C")
AND (fi.relpath <= 'Természettudomány '::text COLLATE "C"))|
Buffers: shared hit=6 read=1768
  |
Planning Time: 1.145 ms
  |
JIT:
   |
  Functions: 6
   |
  Options: Inlining false, Optimization false, Expressions true, Deforming
true   |
  Timing: Generation 2.295 ms, Inlining 0.000 ms, Optimization 1.142 ms,
Emission 11.632 ms, Total 15.070 ms  |
Execution Time: 9760.361 ms
  |





>
>