Simple JOIN on heavy table not using expected index
Hi all, I have performance issue for a pretty simple request in a PostgreSQL server 14.10 * Request SELECT p.id_parcelle FROM private.parcelles p WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68')) ; * Table definition (extract) Table « private.parcelles » Colonne|Type | Collationnement | NULL-able | Par défaut ---+-+-+---+ id| integer | | | geom | geometry(MultiPolygon,2154) | | | fid | bigint | | | id_parcelle | character varying(14) | | not null | insee_col | character varying(5)| | | nom_col | character varying | | | section | character varying(2)| | | numero| character varying(4)| | | contenance| bigint | | | epci_nom | character varying | | | dep | character varying | | | dep_nom | character varying | | | Index : "foncier_pkey" PRIMARY KEY, btree (id_parcelle) "idx_extension_eol_parcelle" btree (extension_eol) "idx_lien_hubspot_parcelels" btree (lien_hubspot) "idx_reg_parcelle" btree (reg) "idx_type_ener_parcelles" btree (type_d_energie) "parcelles_dep_idx" btree (dep) "parcelles_id_parcelle_idx" btree (id_parcelle) "parcelles_inseecol_idx" btree (insee_col) "parcelles_object_id_idx" btree (hs_object_id) "parcelles_pipelinestage_idx" btree (hs_pipeline_stage) "parcelles_synctohubspot_idx" btree (synctohubspot) "sidx_foncier_geom" gist (geom) -> First comment, the primary Key should be on id (integer) and not on id_parcelle (a text code) * Statistiques lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname = 'private' AND relname = 'parcelles'; -[ RECORD 1 ]---+-- relid | 2364725 schemaname | private relname | parcelles seq_scan| 1891 seq_tup_read| 552509679 idx_scan| 19144304 idx_tup_fetch | 38926631 n_tup_ins | 3 n_tup_upd | 3073182 n_tup_del | 0 n_tup_hot_upd | 2996591 n_live_tup | 92876681 n_dead_tup | 1836882 n_mod_since_analyze | 769313 n_ins_since_vacuum | 3 last_vacuum | last_autovacuum | last_analyze| 2024-02-08 15:33:14.008286+01 last_autoanalyze| vacuum_count| 0 autovacuum_count| 0 analyze_count | 1 autoanalyze_count | 0 * Plan : https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2 It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text field), even if the corresponding number of lines for this WHERE clause is a smal subset of the entire data: approx 6M against 80M in total Thanks in advance for any hint regarding this cumbersome query. Regards Kimaidou
Re: Simple JOIN on heavy table not using expected index
can you share result for: *explain analyze* SELECT p.id_parcelle FROM private.parcelles p WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68')) ; On Fri, 9 Feb 2024 at 17:14, kimaidou wrote: > Hi all, > > I have performance issue for a pretty simple request in a PostgreSQL > server 14.10 > > * Request > > SELECT p.id_parcelle > FROM private.parcelles p > WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68')) > ; > > * Table definition (extract) > > Table « private.parcelles » > Colonne|Type | Collationnement | > NULL-able | Par défaut > > ---+-+-+---+ > id| integer | | > | > geom | geometry(MultiPolygon,2154) | | > | > fid | bigint | | > | > id_parcelle | character varying(14) | | > not null | > insee_col | character varying(5)| | > | > nom_col | character varying | | > | > section | character varying(2)| | > | > numero| character varying(4)| | > | > contenance| bigint | | > | > epci_nom | character varying | | > | > dep | character varying | | > | > dep_nom | character varying | | > | > Index : > "foncier_pkey" PRIMARY KEY, btree (id_parcelle) > "idx_extension_eol_parcelle" btree (extension_eol) > "idx_lien_hubspot_parcelels" btree (lien_hubspot) > "idx_reg_parcelle" btree (reg) > "idx_type_ener_parcelles" btree (type_d_energie) > "parcelles_dep_idx" btree (dep) > "parcelles_id_parcelle_idx" btree (id_parcelle) > "parcelles_inseecol_idx" btree (insee_col) > "parcelles_object_id_idx" btree (hs_object_id) > "parcelles_pipelinestage_idx" btree (hs_pipeline_stage) > "parcelles_synctohubspot_idx" btree (synctohubspot) > "sidx_foncier_geom" gist (geom) > > -> First comment, the primary Key should be on id (integer) and not on > id_parcelle (a text code) > > > * Statistiques > > lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname > = 'private' AND relname = 'parcelles'; > -[ RECORD 1 ]---+-- > relid | 2364725 > schemaname | private > relname | parcelles > seq_scan| 1891 > seq_tup_read| 552509679 > idx_scan| 19144304 > idx_tup_fetch | 38926631 > n_tup_ins | 3 > n_tup_upd | 3073182 > n_tup_del | 0 > n_tup_hot_upd | 2996591 > n_live_tup | 92876681 > n_dead_tup | 1836882 > n_mod_since_analyze | 769313 > n_ins_since_vacuum | 3 > last_vacuum | > last_autovacuum | > last_analyze| 2024-02-08 15:33:14.008286+01 > last_autoanalyze| > vacuum_count| 0 > autovacuum_count| 0 > analyze_count | 1 > autoanalyze_count | 0 > > * Plan : > https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2 > > It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" > (text field), even if the corresponding number of lines for this WHERE > clause is a smal subset of the entire data: > approx 6M against 80M in total > > Thanks in advance for any hint regarding this cumbersome query. > > Regards > Kimaidou > > > -- https://www.burcinyazici.com
Re: Simple JOIN on heavy table not using expected index
The query plan is visible here : https://explain.dalibo.com/plan/50a719h92hde6950 Regards Le vendredi 9 février 2024, Burçin Yazıcı a écrit : > can you share result for: > > *explain analyze* SELECT p.id_parcelle > FROM private.parcelles p > WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68')) > ; > > On Fri, 9 Feb 2024 at 17:14, kimaidou wrote: > >> Hi all, >> >> I have performance issue for a pretty simple request in a PostgreSQL >> server 14.10 >> >> * Request >> >> SELECT p.id_parcelle >> FROM private.parcelles p >> WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68')) >> ; >> >> * Table definition (extract) >> >> Table « private.parcelles » >> Colonne|Type | Collationnement | >> NULL-able | Par défaut >> ---+-+-- >> ---+---+ >> id| integer | | >> | >> geom | geometry(MultiPolygon,2154) | | >> | >> fid | bigint | | >> | >> id_parcelle | character varying(14) | | >> not null | >> insee_col | character varying(5)| | >> | >> nom_col | character varying | | >> | >> section | character varying(2)| | >> | >> numero| character varying(4)| | >> | >> contenance| bigint | | >> | >> epci_nom | character varying | | >> | >> dep | character varying | | >> | >> dep_nom | character varying | | >> | >> Index : >> "foncier_pkey" PRIMARY KEY, btree (id_parcelle) >> "idx_extension_eol_parcelle" btree (extension_eol) >> "idx_lien_hubspot_parcelels" btree (lien_hubspot) >> "idx_reg_parcelle" btree (reg) >> "idx_type_ener_parcelles" btree (type_d_energie) >> "parcelles_dep_idx" btree (dep) >> "parcelles_id_parcelle_idx" btree (id_parcelle) >> "parcelles_inseecol_idx" btree (insee_col) >> "parcelles_object_id_idx" btree (hs_object_id) >> "parcelles_pipelinestage_idx" btree (hs_pipeline_stage) >> "parcelles_synctohubspot_idx" btree (synctohubspot) >> "sidx_foncier_geom" gist (geom) >> >> -> First comment, the primary Key should be on id (integer) and not on >> id_parcelle (a text code) >> >> >> * Statistiques >> >> lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname >> = 'private' AND relname = 'parcelles'; >> -[ RECORD 1 ]---+-- >> relid | 2364725 >> schemaname | private >> relname | parcelles >> seq_scan| 1891 >> seq_tup_read| 552509679 >> idx_scan| 19144304 >> idx_tup_fetch | 38926631 >> n_tup_ins | 3 >> n_tup_upd | 3073182 >> n_tup_del | 0 >> n_tup_hot_upd | 2996591 >> n_live_tup | 92876681 >> n_dead_tup | 1836882 >> n_mod_since_analyze | 769313 >> n_ins_since_vacuum | 3 >> last_vacuum | >> last_autovacuum | >> last_analyze| 2024-02-08 15:33:14.008286+01 >> last_autoanalyze| >> vacuum_count| 0 >> autovacuum_count| 0 >> analyze_count | 1 >> autoanalyze_count | 0 >> >> * Plan : >> https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2 >> >> It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" >> (text field), even if the corresponding number of lines for this WHERE >> clause is a smal subset of the entire data: >> approx 6M against 80M in total >> >> Thanks in advance for any hint regarding this cumbersome query. >> >> Regards >> Kimaidou >> >> >> > > -- > https://www.burcinyazici.com >
Re: Simple JOIN on heavy table not using expected index
kimaidou writes: > It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text > field), even if the corresponding number of lines for this WHERE clause is > a smal subset of the entire data: > approx 6M against 80M in total 6M out of 80M rows is not a "small subset". Typically I'd expect the planner to use an index-based scan for up to 1 or 2 percent of the table. Beyond that, you're going to be touching most pages of the table anyway. You can try reducing random_page_cost to favor indexscans, but you might not find that the query gets any faster. regards, tom lane
Re: Simple JOIN on heavy table not using expected index
Tom, thanks a lot for your suggestion. Indeed, setting random_page_cost to 2 instead of 4 improves this query a lot ! See the new plan : https://explain.dalibo.com/plan/h924389529e11244 30 seconds VS 17 minutes before Cheers Michaël Le vendredi 9 février 2024, Tom Lane a écrit : > kimaidou writes: > > It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" > (text > > field), even if the corresponding number of lines for this WHERE clause > is > > a smal subset of the entire data: > > approx 6M against 80M in total > > 6M out of 80M rows is not a "small subset". Typically I'd expect > the planner to use an index-based scan for up to 1 or 2 percent of > the table. Beyond that, you're going to be touching most pages > of the table anyway. > > You can try reducing random_page_cost to favor indexscans, but > you might not find that the query gets any faster. > > regards, tom lane >
