Slow query because lexeme index not used

2021-08-07 Thread Alex
Table "product" has a GIN index on "lexeme" column (tsvector) that is not used.

Query that doesn't use lexeme idx:  https://explain.dalibo.com/plan/BlB#plan, 
~8s, ~60.000 blocks needed

Query forced to use lexeme idx: https://explain.dalibo.com/plan/i52, ~800ms 
(10x less), ~15.000 blocks needed (x4 less)
Table metdata:
 relname  | relpages | reltuples | relallvisible | relkind | 
relnatts | relhassubclass | reloptions | pg_table_size
--+--+---+---+-+--+++---
 product_property_default | 8992 |    622969 |  8992 | r   |
   16 | f  |    |  73719808
 product  |    49686 |    413840 | 49686 | r   |
   14 | f  |    | 493314048
Table stats:
   frac_mcv    |    tablename | attname | inherited | null_frac | 
n_distinct  | n_mcv | n_hist | correlation
---+--+-+---+---+-+---++-
   | product  | lexeme  | f | 0 |   
   -1 |   |    |
    0.99773335 | product_property_default | meaning | f | 0 |   
   63 |    39 | 24 |  0.19444875
 0.6416333 | product_property_default | first   | f | 0 |   
 2193 |   100 |    101 | -0.09763639
 0.0002334 | product_property_default | product | f | 0 | 
-0.15221785 | 1 |    101 |  0.08643274


Using windows docker with wsl2.Both cases are run with cold cache.All database 
memory is limited to 1GB by using .wslconfig file with memory=1GB, also the 
docker container is limited to 1GB. 
My requirement is to optimize disk access with this limited memory


Postgres 12.4










Re: Slow query because lexeme index not used

2021-08-07 Thread Justin Pryzby
On Sat, Aug 07, 2021 at 07:35:25PM +, Alex wrote:
> Table "product" has a GIN index on "lexeme" column (tsvector) that is not 
> used.
> 
> Query that doesn't use lexeme idx:  https://explain.dalibo.com/plan/BlB#plan, 
> ~8s, ~60.000 blocks needed
> 
> Query forced to use lexeme idx: https://explain.dalibo.com/plan/i52, ~800ms 
> (10x less), ~15.000 blocks needed (x4 less)

Could you show the table stats for product.id ?  In particular its
"correlation".

I guess the correlation is ~1, and the 10,659 index scans on product.id are
considered to be cheaper than scannning the lexeme index - since there are no
correlation stats for tsvector.

How large is shared_buffers ?

Does the query plan improve if you increase work_mem ?

Maybe you could encourage scanning in order of product_property.product.
You could CLUSTER product_property_default on an index on "product" and then
ANALYZE.  Or you could write the query with a temp table:

CREATE TEMP TABLE product_ids AS
SELECT product
FROM product_property
WHERE "meaning" = 'B' AND "first" = 1.7179869184E10
GROUP BY 1 -- or DISTINCT, because the table is only used for EXISTS
ORDER BY 1; -- to scan product in order of id
ANALYZE product_ids;

The index scans on product.id should be faster when you use
EXISTS(SELECT 1 FROM product_ids ...), even though it didn't use the lexeme 
index.

Maybe it would help to create stats on "first" and "meaning"; the rowcount is
underestimated by 3x, which means it did several times more index scans into
"product" than planned.
| Bitmap Heap Scan on product_property_default product_property_default 
(cost=2,748.6..8,823.4 rows=6,318 width=4) (actual time=43.945..211.621 
rows=21,061 loops=1) 

CREATE STATISTICS first_meaning ON first,meaning FROM product_property;
ANALYZE product_property;

> Table metdata:
>  relname  | relpages | reltuples | relallvisible | relkind | 
> relnatts | relhassubclass | reloptions | pg_table_size
> --+--+---+---+-+--+++---
>  product_property_default | 8992 |    622969 |  8992 | r   |  
>  16 | f  |    |  73719808
>  product  |    49686 |    413840 | 49686 | r   |  
>  14 | f  |    | 493314048
>
> Table stats:
>    frac_mcv    |    tablename | attname | inherited | null_frac | 
> n_distinct  | n_mcv | n_hist | correlation
> ---+--+-+---+---+-+---++-
>    | product  | lexeme  | f | 0 | 
>  -1 |   |    |
>     0.99773335 | product_property_default | meaning | f | 0 | 
>  63 |    39 | 24 |  0.19444875
>  0.6416333 | product_property_default | first   | f | 0 | 
>    2193 |   100 |    101 | -0.09763639
>  0.0002334 | product_property_default | product | f | 0 | 
> -0.15221785 | 1 |    101 |  0.08643274
> 
> 
> Using windows docker with wsl2.Both cases are run with cold cache.All 
> database memory is limited to 1GB by using .wslconfig file with memory=1GB, 
> also the docker container is limited to 1GB. 
> My requirement is to optimize disk access with this limited memory