Simple JOIN on heavy table not using expected index

2024-02-09 Thread kimaidou
Hi all,

I have performance issue for a pretty simple request in a PostgreSQL server
14.10

* Request

SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
;

* Table definition (extract)

  Table « private.parcelles »
Colonne|Type | Collationnement |
NULL-able | Par défaut
---+-+-+---+
 id| integer | |
|
 geom  | geometry(MultiPolygon,2154) | |
|
 fid   | bigint  | |
|
 id_parcelle   | character varying(14)   | |
not null  |
 insee_col | character varying(5)| |
|
 nom_col   | character varying   | |
|
 section   | character varying(2)| |
|
 numero| character varying(4)| |
|
 contenance| bigint  | |
|
 epci_nom  | character varying   | |
|
 dep   | character varying   | |
|
 dep_nom   | character varying   | |
|
Index :
"foncier_pkey" PRIMARY KEY, btree (id_parcelle)
"idx_extension_eol_parcelle" btree (extension_eol)
"idx_lien_hubspot_parcelels" btree (lien_hubspot)
"idx_reg_parcelle" btree (reg)
"idx_type_ener_parcelles" btree (type_d_energie)
"parcelles_dep_idx" btree (dep)
"parcelles_id_parcelle_idx" btree (id_parcelle)
"parcelles_inseecol_idx" btree (insee_col)
"parcelles_object_id_idx" btree (hs_object_id)
"parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
"parcelles_synctohubspot_idx" btree (synctohubspot)
"sidx_foncier_geom" gist (geom)

-> First comment, the primary Key should be on id (integer) and not on
id_parcelle (a text code)


* Statistiques

lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname =
'private' AND relname = 'parcelles';
-[ RECORD 1 ]---+--
relid   | 2364725
schemaname  | private
relname | parcelles
seq_scan| 1891
seq_tup_read| 552509679
idx_scan| 19144304
idx_tup_fetch   | 38926631
n_tup_ins   | 3
n_tup_upd   | 3073182
n_tup_del   | 0
n_tup_hot_upd   | 2996591
n_live_tup  | 92876681
n_dead_tup  | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum  | 3
last_vacuum |
last_autovacuum |
last_analyze| 2024-02-08 15:33:14.008286+01
last_autoanalyze|
vacuum_count| 0
autovacuum_count| 0
analyze_count   | 1
autoanalyze_count   | 0

* Plan :
https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2

It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text
field), even if the corresponding number of lines for this WHERE clause is
a smal subset of the entire data:
approx 6M against 80M in total

Thanks in advance for any hint regarding this cumbersome query.

Regards
Kimaidou


Re: Simple JOIN on heavy table not using expected index

2024-02-09 Thread Burçin Yazıcı
can you share result for:

*explain analyze* SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
;

On Fri, 9 Feb 2024 at 17:14, kimaidou  wrote:

> Hi all,
>
> I have performance issue for a pretty simple request in a PostgreSQL
> server 14.10
>
> * Request
>
> SELECT p.id_parcelle
> FROM private.parcelles p
> WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
> ;
>
> * Table definition (extract)
>
>   Table « private.parcelles »
> Colonne|Type | Collationnement |
> NULL-able | Par défaut
>
> ---+-+-+---+
>  id| integer | |
> |
>  geom  | geometry(MultiPolygon,2154) | |
> |
>  fid   | bigint  | |
> |
>  id_parcelle   | character varying(14)   | |
> not null  |
>  insee_col | character varying(5)| |
> |
>  nom_col   | character varying   | |
> |
>  section   | character varying(2)| |
> |
>  numero| character varying(4)| |
> |
>  contenance| bigint  | |
> |
>  epci_nom  | character varying   | |
> |
>  dep   | character varying   | |
> |
>  dep_nom   | character varying   | |
> |
> Index :
> "foncier_pkey" PRIMARY KEY, btree (id_parcelle)
> "idx_extension_eol_parcelle" btree (extension_eol)
> "idx_lien_hubspot_parcelels" btree (lien_hubspot)
> "idx_reg_parcelle" btree (reg)
> "idx_type_ener_parcelles" btree (type_d_energie)
> "parcelles_dep_idx" btree (dep)
> "parcelles_id_parcelle_idx" btree (id_parcelle)
> "parcelles_inseecol_idx" btree (insee_col)
> "parcelles_object_id_idx" btree (hs_object_id)
> "parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
> "parcelles_synctohubspot_idx" btree (synctohubspot)
> "sidx_foncier_geom" gist (geom)
>
> -> First comment, the primary Key should be on id (integer) and not on
> id_parcelle (a text code)
>
>
> * Statistiques
>
> lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname
> = 'private' AND relname = 'parcelles';
> -[ RECORD 1 ]---+--
> relid   | 2364725
> schemaname  | private
> relname | parcelles
> seq_scan| 1891
> seq_tup_read| 552509679
> idx_scan| 19144304
> idx_tup_fetch   | 38926631
> n_tup_ins   | 3
> n_tup_upd   | 3073182
> n_tup_del   | 0
> n_tup_hot_upd   | 2996591
> n_live_tup  | 92876681
> n_dead_tup  | 1836882
> n_mod_since_analyze | 769313
> n_ins_since_vacuum  | 3
> last_vacuum |
> last_autovacuum |
> last_analyze| 2024-02-08 15:33:14.008286+01
> last_autoanalyze|
> vacuum_count| 0
> autovacuum_count| 0
> analyze_count   | 1
> autoanalyze_count   | 0
>
> * Plan :
> https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2
>
> It seems PostgreSQL does not use the index parcelles_dep_idx on "dep"
> (text field), even if the corresponding number of lines for this WHERE
> clause is a smal subset of the entire data:
> approx 6M against 80M in total
>
> Thanks in advance for any hint regarding this cumbersome query.
>
> Regards
> Kimaidou
>
>
>

-- 
https://www.burcinyazici.com


Re: Simple JOIN on heavy table not using expected index

2024-02-09 Thread kimaidou
The query plan is visible here :
https://explain.dalibo.com/plan/50a719h92hde6950

Regards

Le vendredi 9 février 2024, Burçin Yazıcı  a écrit :

> can you share result for:
>
> *explain analyze* SELECT p.id_parcelle
> FROM private.parcelles p
> WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
> ;
>
> On Fri, 9 Feb 2024 at 17:14, kimaidou  wrote:
>
>> Hi all,
>>
>> I have performance issue for a pretty simple request in a PostgreSQL
>> server 14.10
>>
>> * Request
>>
>> SELECT p.id_parcelle
>> FROM private.parcelles p
>> WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
>> ;
>>
>> * Table definition (extract)
>>
>>   Table « private.parcelles »
>> Colonne|Type | Collationnement |
>> NULL-able | Par défaut
>> ---+-+--
>> ---+---+
>>  id| integer | |
>>   |
>>  geom  | geometry(MultiPolygon,2154) | |
>>   |
>>  fid   | bigint  | |
>>   |
>>  id_parcelle   | character varying(14)   | |
>> not null  |
>>  insee_col | character varying(5)| |
>>   |
>>  nom_col   | character varying   | |
>>   |
>>  section   | character varying(2)| |
>>   |
>>  numero| character varying(4)| |
>>   |
>>  contenance| bigint  | |
>>   |
>>  epci_nom  | character varying   | |
>>   |
>>  dep   | character varying   | |
>>   |
>>  dep_nom   | character varying   | |
>>   |
>> Index :
>> "foncier_pkey" PRIMARY KEY, btree (id_parcelle)
>> "idx_extension_eol_parcelle" btree (extension_eol)
>> "idx_lien_hubspot_parcelels" btree (lien_hubspot)
>> "idx_reg_parcelle" btree (reg)
>> "idx_type_ener_parcelles" btree (type_d_energie)
>> "parcelles_dep_idx" btree (dep)
>> "parcelles_id_parcelle_idx" btree (id_parcelle)
>> "parcelles_inseecol_idx" btree (insee_col)
>> "parcelles_object_id_idx" btree (hs_object_id)
>> "parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
>> "parcelles_synctohubspot_idx" btree (synctohubspot)
>> "sidx_foncier_geom" gist (geom)
>>
>> -> First comment, the primary Key should be on id (integer) and not on
>> id_parcelle (a text code)
>>
>>
>> * Statistiques
>>
>> lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname
>> = 'private' AND relname = 'parcelles';
>> -[ RECORD 1 ]---+--
>> relid   | 2364725
>> schemaname  | private
>> relname | parcelles
>> seq_scan| 1891
>> seq_tup_read| 552509679
>> idx_scan| 19144304
>> idx_tup_fetch   | 38926631
>> n_tup_ins   | 3
>> n_tup_upd   | 3073182
>> n_tup_del   | 0
>> n_tup_hot_upd   | 2996591
>> n_live_tup  | 92876681
>> n_dead_tup  | 1836882
>> n_mod_since_analyze | 769313
>> n_ins_since_vacuum  | 3
>> last_vacuum |
>> last_autovacuum |
>> last_analyze| 2024-02-08 15:33:14.008286+01
>> last_autoanalyze|
>> vacuum_count| 0
>> autovacuum_count| 0
>> analyze_count   | 1
>> autoanalyze_count   | 0
>>
>> * Plan :
>> https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2
>>
>> It seems PostgreSQL does not use the index parcelles_dep_idx on "dep"
>> (text field), even if the corresponding number of lines for this WHERE
>> clause is a smal subset of the entire data:
>> approx 6M against 80M in total
>>
>> Thanks in advance for any hint regarding this cumbersome query.
>>
>> Regards
>> Kimaidou
>>
>>
>>
>
> --
> https://www.burcinyazici.com
>


Re: Simple JOIN on heavy table not using expected index

2024-02-09 Thread Tom Lane
kimaidou  writes:
> It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text
> field), even if the corresponding number of lines for this WHERE clause is
> a smal subset of the entire data:
> approx 6M against 80M in total

6M out of 80M rows is not a "small subset".  Typically I'd expect
the planner to use an index-based scan for up to 1 or 2 percent of
the table.  Beyond that, you're going to be touching most pages
of the table anyway.

You can try reducing random_page_cost to favor indexscans, but
you might not find that the query gets any faster.

regards, tom lane




Re: Simple JOIN on heavy table not using expected index

2024-02-09 Thread kimaidou
Tom, thanks a lot for your suggestion.
Indeed, setting random_page_cost to 2 instead of 4 improves this query a
lot !

See the new plan :
https://explain.dalibo.com/plan/h924389529e11244

30 seconds VS 17 minutes before

Cheers
Michaël

Le vendredi 9 février 2024, Tom Lane  a écrit :

> kimaidou  writes:
> > It seems PostgreSQL does not use the index parcelles_dep_idx on "dep"
> (text
> > field), even if the corresponding number of lines for this WHERE clause
> is
> > a smal subset of the entire data:
> > approx 6M against 80M in total
>
> 6M out of 80M rows is not a "small subset".  Typically I'd expect
> the planner to use an index-based scan for up to 1 or 2 percent of
> the table.  Beyond that, you're going to be touching most pages
> of the table anyway.
>
> You can try reducing random_page_cost to favor indexscans, but
> you might not find that the query gets any faster.
>
> regards, tom lane
>