Suggestion to improve query performance for GIS query.
Hi Team, Thanks for your support. Could you please suggest on below query. EnvironmentPostgreSQL: 9.5.15 Postgis: 2.2.7 The table contains GIS data which is fiber data(underground routes). We are using the below query inside the proc which is taking a long time to complete. * SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like '%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id ||',%' or Column1 like '%,sheath--'||cable_seq_id or Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ; We have created an index on parental_path Column1 still it is taking 4secs to get the results. Could you please suggest a better way to execute the query. Thanks for your support. Regards, PostgAnn.
Re: Suggestion to improve query performance for GIS query.
On Thursday, May 21, 2020, postgann2020 s wrote: > > SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like > '%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id > ||',%' or Column1 like '%,sheath--'||cable_seq_id or > Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ; > > > Could you please suggest a better way to execute the query > Add a trigger to the table to normalize the contents of column1 upon insert and then rewrite your query to reference the newly created normalized fields. David J.
Re: Suggestion to improve query performance for GIS query.
Dear team, Kindly try to execute the vacuum analyzer on that particular table and refresh the session and execute the query. VACUUM (VERBOSE, ANALYZE) tablename; Regards, Mohammed Afsar Database engineer On Fri, May 22, 2020, 12:30 PM postgann2020 s wrote: > Hi Team, > > Thanks for your support. > > Could you please suggest on below query. > > EnvironmentPostgreSQL: 9.5.15 > Postgis: 2.2.7 > > The table contains GIS data which is fiber data(underground routes). > > We are using the below query inside the proc which is taking a long time > to complete. > > * > > SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like > '%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id > ||',%' or Column1 like '%,sheath--'||cable_seq_id or > Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ; > > > > We have created an index on parental_path Column1 still it is taking 4secs > to get the results. > > Could you please suggest a better way to execute the query. > > Thanks for your support. > > Regards, > PostgAnn. >
Re: Suggestion to improve query performance for GIS query.
Thanks for your support David and Afsar. Hi David, Could you please suggest the resource link to "Add a trigger to the table to normalize the contents of column1 upon insert and then rewrite your query to reference the newly created normalized fields." if anything available. So that it will help me to get into issues. Thanks for your support. Regards, Postgann. On Fri, May 22, 2020 at 12:46 PM Mohammed Afsar wrote: > Dear team, > > Kindly try to execute the vacuum analyzer on that particular table and > refresh the session and execute the query. > > VACUUM (VERBOSE, ANALYZE) tablename; > > Regards, > Mohammed Afsar > Database engineer > > On Fri, May 22, 2020, 12:30 PM postgann2020 s > wrote: > >> Hi Team, >> >> Thanks for your support. >> >> Could you please suggest on below query. >> >> EnvironmentPostgreSQL: 9.5.15 >> Postgis: 2.2.7 >> >> The table contains GIS data which is fiber data(underground routes). >> >> We are using the below query inside the proc which is taking a long time >> to complete. >> >> * >> >> SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like >> '%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id >> ||',%' or Column1 like '%,sheath--'||cable_seq_id or >> Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ; >> >> >> >> We have created an index on parental_path Column1 still it is taking >> 4secs to get the results. >> >> Could you please suggest a better way to execute the query. >> >> Thanks for your support. >> >> Regards, >> PostgAnn. >> >
Request to help on GIS Query improvement suggestion.
Hi Team, Thanks for your support. Could someone please suggest on the below query. One of the query which was created on GIS data is taking a long time and even it is not taking the index as well. I have included all the required details for reference. Database Stack: === PostgreSQL : 9.5.15 Postgis: 2.2.7 Table Structure: === ALTER TABLE SCHEMA.TABLE_NAME ADD COLUMN parental_path text; Created Indexes on column parental_path: = CREATE INDEX cable_pair_parental_path_idx ON SCHEMA.TABLE_NAME USING btree (md5(parental_path) COLLATE pg_catalog."default"); CREATE INDEX cable_pair_parental_path_idx_fulltext ON SCHEMA.TABLE_NAME USING gist (parental_path COLLATE pg_catalog."default"); Sample data in "parental_path" column: == 'route--2309421/2951584/3373649/2511322/1915187/2696397/2623291/2420708/2144348/2294454,circuit--88458/88460,sheath--8874' Actual Query: = SELECT seq_no + 1 FROM SCHEMA.TABLE_NAME WHERE (parental_path LIKE '%,sheath--' || cable_seq_id || ',%' OR parental_path LIKE 'sheath--' || cable_seq_id || ',%' OR parental_path LIKE '%,sheath--' || cable_seq_id OR parental_path = 'sheath--' || cable_seq_id) ORDER BY seq_no DESC LIMIT 1; Explain Plan: = Limit (cost=108111.60..108111.61 rows=1 width=4) (actual time=4597.605..4597.605 rows=0 loops=1) Output: ((seq_no + 1)), seq_no Buffers: shared hit=2967 read=69606 dirtied=1 -> Sort (cost=108111.60..108113.09 rows=595 width=4) (actual time=4597.603..4597.603 rows=0 loops=1) Output: ((seq_no + 1)), seq_no Sort Key: TABLE_NAME.seq_no DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=2967 read=69606 dirtied=1 -> *Seq Scan on SCHEMA.TABLE_NAME (cost=0.00..108108.63 rows=595 width=4) (actual time=4597.595..4597.595 rows=0 loops=1)* Output: (seq_no + 1), seq_no Filter: ((TABLE_NAME.parental_path ~~ '%,sheath--64690,%'::text) OR (TABLE_NAME.parental_path ~~ 'sheath--64690,%'::text) OR (TABLE_NAME.parental_path ~~ '%,sheath--64690'::text) OR (TABLE_NAME.parental_path = 'sheath--64690'::text)) Rows Removed by Filter: 1930188 Buffers: shared hit=2967 read=69606 dirtied=1 Please share your suggestion if I have to change or add new objects to the table etc.. Thanks & Regards, PostgAnn.
Request to help on Query improvement suggestion.
Hi Team, Thanks for your support. Could someone please suggest on the below query. One of the query which was created on GIS data is taking a long time and even it is not taking the index as well. I have included all the required details for reference. Database Stack: === PostgreSQL : 9.5.15 Postgis: 2.2.7 Table Structure: === ALTER TABLE SCHEMA.TABLE_NAME ADD COLUMN parental_path text; Created Indexes on column parental_path: = CREATE INDEX cable_pair_parental_path_idx ON SCHEMA.TABLE_NAME USING btree (md5(parental_path) COLLATE pg_catalog."default"); CREATE INDEX cable_pair_parental_path_idx_fulltext ON SCHEMA.TABLE_NAME USING gist (parental_path COLLATE pg_catalog."default"); Sample data in "parental_path" column: == 'route--2309421/2951584/3373649/2511322/1915187/2696397/2623291/2420708/2144348/2294454,circuit--88458/88460,sheath--8874' Actual Query: = SELECT seq_no + 1 FROM SCHEMA.TABLE_NAME WHERE (parental_path LIKE '%,sheath--' || cable_seq_id || ',%' OR parental_path LIKE 'sheath--' || cable_seq_id || ',%' OR parental_path LIKE '%,sheath--' || cable_seq_id OR parental_path = 'sheath--' || cable_seq_id) ORDER BY seq_no DESC LIMIT 1; Explain Plan: = Limit (cost=108111.60..108111.61 rows=1 width=4) (actual time=4597.605..4597.605 rows=0 loops=1) Output: ((seq_no + 1)), seq_no Buffers: shared hit=2967 read=69606 dirtied=1 -> Sort (cost=108111.60..108113.09 rows=595 width=4) (actual time=4597.603..4597.603 rows=0 loops=1) Output: ((seq_no + 1)), seq_no Sort Key: TABLE_NAME.seq_no DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=2967 read=69606 dirtied=1 -> *Seq Scan on SCHEMA.TABLE_NAME (cost=0.00..108108.63 rows=595 width=4) (actual time=4597.595..4597.595 rows=0 loops=1)* Output: (seq_no + 1), seq_no Filter: ((TABLE_NAME.parental_path ~~ '%,sheath--64690,%'::text) OR (TABLE_NAME.parental_path ~~ 'sheath--64690,%'::text) OR (TABLE_NAME.parental_path ~~ '%,sheath--64690'::text) OR (TABLE_NAME.parental_path = 'sheath--64690'::text)) Rows Removed by Filter: 1930188 Buffers: shared hit=2967 read=69606 dirtied=1 Please share your suggestion. Thanks & Regards, Devchef.
Re: Request to help on GIS Query improvement suggestion.
Your indexes and operators are not compatible. You have added a btree index on md5 function result and are not using md5 in your query, and also using LIKE operator not one of the supported ones. I believe it might use a btree operator (plain value, not md5 result) if you are always searching for "string starts with but I don't know what it ends with" but you can't possibly use a btree index where you are putting a wild card at the front. https://www.postgresql.org/docs/9.5/indexes-types.html a gist index operators supported- https://www.postgresql.org/docs/9.5/gist-builtin-opclasses.html Here's a whole page on full text search, it would be worth a read- https://www.postgresql.org/docs/9.5/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
Strategy for materialisation and centralisation of data
Apologies for the cross-post to the general list. I'm keen to know if there are any good reasons apart from disk space and possible replication connection overhead to avoid the strategy proposed below. We have quite a few databases of type a and many of type b in a cluster. Both a and b types are fairly complex and are different solutions to a similar problem domain. All the databases are very read-centric, and all database interaction is currently through plpgsql with no materialised data. Some organisations have several type a and many type b databases, and need to query these in a homogeneous manner. We presently do this with many middleware requests or pl/proxy. An a or b type database belongs to 0 or 1 organisations. Making a and b generally the same would be a very big project. Consequently I'm discussing materialising a subset of data in a common format between the two database types and shipping that data to organisation databases. This would have the benefit of providing a common data interface and speeding up queries for all database types. Users would have faster queries, and it would be a big time saver for our development team, who presently have to deal with three quite different data APIs. Presently I've been thinking of using triggers or materialized views in each database to materialise data into a "matview" schema which is then shipped via logical replication to an organisation database when required. New columns in the matview schema tables would ensure replica identity uniqueness and allow the data to be safely stored in common tables in the organisation database. A few issues I foresee with this approach include: * requiring two to three times current storage for materialisation (the cluster is currently ~250GB) * having to have many logical replication slots (we sometimes suffer from pl/proxy connection storms) Commentary gratefully received, Rory
