Re: Index for range queries on JSON (user defined fields)
On Fri, 4 Dec 2020 at 15:39, Marco Colli wrote: > Hello! > > We have a multi-tenant service where each customer has millions of users > (total: ~150M rows). Now we would like to let each customer define some > custom columns for his users and then let the customer search his users > efficiently based on these columns. > > This problem seems really hard to solve with PostgreSQL: > > https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields > > In particular the easiest way would be to add a JSON field on the users > table (e.g. user metadata). However the PostgreSQL GIN index only supports > exact matches and not range queries. This means that a query on a range > (e.g. age > 30) would be extremely inefficient and would result in a table > scan. > You could have a table of (tenant, customer, setting_name, setting_value) so that a btree index on (tenant, setting_name, setting_value) would work for "select customer from my_table where tenant=$1 and setting_name='age' and setting_value > 30" That doesn't deal with setting values having a variety of types, but you could have a distinct user defined settings table for each setting value type that you want to support.
Re: Terribly slow query with very good plan?
On Fri, 4 Feb 2022 at 09:11, Les wrote: | > -> Index Only Scan using oo_file_idx_relpath on media.oo_file f2 > (cost=0.55..108499.27 rows=5381 width=0) (actual time=564.756..564.756 rows=0 > loops=45)| > Filter: (f2.relpath ~~ (f.relpath || '%'::text)) > > | > Rows Removed by Filter: 792025 > > | > Heap Fetches: 768960 > > | > Buffers: shared hit=7014130 > > | > Planning Time: 0.361 ms > Execution Time: 25415.088 ms > -> Seq Scan on media.oo_file of2 (cost=0.00..144714.70 rows=86960 > width=0) (actual time=0.044..0.044 rows=1 loops=1)| > Filter: (of2.relpath ~~ 'Felhasználók%'::text) > | > Rows Removed by Filter: 15 > | > Buffers: shared hit=2 > | > Planning Time: 0.290 ms > | > Execution Time: 0.076 ms > | > > In other words, I could write a pl/sql function with a nested loop instead of > the problematic query, and it will be 1000 times faster. > > What am I missing? In the fast case the 'Felhasználók%' part is known at query planning time, so it can be a prefix search. In the slow case, the planner doesn't know what that value will be, it could be something that starts with '%' for example. Also your logic looks a bit unsafe, the query you have would include files under all top-level folders with names starting with Felhasználók, so you could accidentally merge in files in folders called Felhasználókfoo and Felhasználókbar for example.
Re: Terribly slow query with very good plan?
On Fri, 4 Feb 2022 at 10:09, Les wrote: > > Oh I see, the query planner does not know that there will be no % characters > in file and folder names. > > But what is the solution then? It just seems wrong that I can speed up a > query 1000 times by replacing it with a nested loop in a pl/sql function :( You don't need a nested loop, doing it in two stages in pl/pgsql would be enough I think, first get the folder name and then construct a new query using it as a constant. I'd use SELECT FOR SHARE when getting the folder name, so that no other process can change it underneath you before you run your second query. With the ^@ operator, my guess is that because the planner knows nothing about the folder name value it could be the empty string, which would be a prefix of everything.
Re: Terribly slow query with very good plan?
On Fri, 4 Feb 2022 at 12:27, Les wrote: > PostgreSQL uses seq scan for O, F, N, T letters, but it uses index scan for > A, I, C letters (with the "like" query). That's interesting. Does it help if you create an additional index on relpath with the text_pattern_ops modifier, e.g. CREATE INDEX ... USING btree (relpath text_pattern_ops);
Re: Terribly slow query with very good plan?
On Fri, 4 Feb 2022 at 13:07, Les wrote: > >> >> > PostgreSQL uses seq scan for O, F, N, T letters, but it uses index scan >> > for A, I, C letters (with the "like" query). >> >> That's interesting. >> >> Does it help if you create an additional index on relpath with the >> text_pattern_ops modifier, e.g. >> >> CREATE INDEX ... USING btree (relpath text_pattern_ops); > > > It does not help. What if you try applying the C collation to the values from the table: where fi.is_active and fi.relpath collate "C" ^@ 'A'
Re: Terribly slow query with very good plan?
On Fri, 4 Feb 2022 at 13:21, Les wrote:
>
>> What if you try applying the C collation to the values from the table:
>>
>> where fi.is_active and fi.relpath collate "C" ^@ 'A'
>
>
> Slow
What about this:
fi.relpath between ('A' collate "C") and ('A'||chr(255) collate "C")
Re: Terribly slow query with very good plan?
On Fri, 4 Feb 2022 at 14:07, Les wrote:
>
>
>
>> > Slow
>>
>> What about this:
>>
>> fi.relpath between ('A' collate "C") and ('A'||chr(255) collate "C")
>
>
> It uses index scan.
> Although the same with 'Természettudomány' uses seq scan:
>
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
> select fi.id from media.oo_file fi
> where fi.is_active
> and fi.relpath between
> ('Természettudomány' collate "C")
> and ('Természettudomány'||chr(255) collate "C")
> limit 1;
>
> QUERY PLAN
> |
> ---+
> Limit (cost=0.00..2.13 rows=1 width=8) (actual time=7521.531..7521.532
> rows=0 loops=1)|
> Output: id
> |
> Buffers: shared hit=17018 read=150574
> |
> -> Seq Scan on media.oo_file fi (cost=0.00..188195.39 rows=88290 width=8)
> (actual time=7521.528..7521.529 rows=0 loops=1) |
> Output: id
> |
> Filter: (fi.is_active AND (fi.relpath >= 'Természettudomány'::text
> COLLATE "C") AND (fi.relpath <= 'Természettudomány '::text COLLATE "C"))|
> Rows Removed by Filter: 1075812
> |
> Buffers: shared hit=17018 read=150574
> |
> Planning Time: 8.918 ms
> |
> Execution Time: 7521.560 ms
That may be because it's expecting to get 88290 rows from the
sequential scan, and the"limit 1" means it expects sequential scan to
be fast because on average it will only need to scan 1/88290 of the
table before it finds a matching row, then it can stop.
Try it without the "limit 1"
