Suggestion to improve query performance for GIS query.

2020-05-22 Thread postgann2020 s
Hi Team,

Thanks for your support.

Could you please suggest on below query.

EnvironmentPostgreSQL: 9.5.15
Postgis: 2.2.7

The table contains GIS data which is fiber data(underground routes).

We are using the below query inside the proc which is taking a long time to
complete.

*

SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like
'%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id
||',%' or Column1 like '%,sheath--'||cable_seq_id  or
Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ;



We have created an index on parental_path Column1 still it is taking 4secs
to get the results.

Could you please suggest a better way to execute the query.

Thanks for your support.

Regards,
PostgAnn.


Re: Suggestion to improve query performance for GIS query.

2020-05-22 Thread David G. Johnston
On Thursday, May 21, 2020, postgann2020 s  wrote:

>
> SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like
> '%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id
> ||',%' or Column1 like '%,sheath--'||cable_seq_id  or
> Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ;
>
>
> Could you please suggest a better way to execute the query
>

Add a trigger to the table to normalize the contents of column1 upon insert
and then rewrite your query to reference the newly created normalized
fields.

David J.


Re: Suggestion to improve query performance for GIS query.

2020-05-22 Thread Mohammed Afsar
Dear team,

Kindly try to execute the vacuum analyzer on that particular table and
refresh the session and execute the query.

VACUUM (VERBOSE, ANALYZE) tablename;

Regards,
Mohammed Afsar
Database engineer

On Fri, May 22, 2020, 12:30 PM postgann2020 s 
wrote:

> Hi Team,
>
> Thanks for your support.
>
> Could you please suggest on below query.
>
> EnvironmentPostgreSQL: 9.5.15
> Postgis: 2.2.7
>
> The table contains GIS data which is fiber data(underground routes).
>
> We are using the below query inside the proc which is taking a long time
> to complete.
>
> *
>
> SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like
> '%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id
> ||',%' or Column1 like '%,sheath--'||cable_seq_id  or
> Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ;
>
> 
>
> We have created an index on parental_path Column1 still it is taking 4secs
> to get the results.
>
> Could you please suggest a better way to execute the query.
>
> Thanks for your support.
>
> Regards,
> PostgAnn.
>


Re: Suggestion to improve query performance for GIS query.

2020-05-22 Thread postgann2020 s
Thanks for your support David and Afsar.

Hi David,

Could you please suggest the resource link to  "Add a trigger to the table
to normalize the contents of column1 upon insert and then rewrite your
query to reference the newly created normalized fields." if anything
available. So that it will help me to get into issues.

Thanks for your support.

Regards,
Postgann.


On Fri, May 22, 2020 at 12:46 PM Mohammed Afsar  wrote:

> Dear team,
>
> Kindly try to execute the vacuum analyzer on that particular table and
> refresh the session and execute the query.
>
> VACUUM (VERBOSE, ANALYZE) tablename;
>
> Regards,
> Mohammed Afsar
> Database engineer
>
> On Fri, May 22, 2020, 12:30 PM postgann2020 s 
> wrote:
>
>> Hi Team,
>>
>> Thanks for your support.
>>
>> Could you please suggest on below query.
>>
>> EnvironmentPostgreSQL: 9.5.15
>> Postgis: 2.2.7
>>
>> The table contains GIS data which is fiber data(underground routes).
>>
>> We are using the below query inside the proc which is taking a long time
>> to complete.
>>
>> *
>>
>> SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like
>> '%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id
>> ||',%' or Column1 like '%,sheath--'||cable_seq_id  or
>> Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ;
>>
>> 
>>
>> We have created an index on parental_path Column1 still it is taking
>> 4secs to get the results.
>>
>> Could you please suggest a better way to execute the query.
>>
>> Thanks for your support.
>>
>> Regards,
>> PostgAnn.
>>
>


Request to help on GIS Query improvement suggestion.

2020-05-22 Thread postggen2020 s
Hi Team,

Thanks for your support.

Could someone please suggest on the below query.

One of the query which was created on GIS data is taking a long time and
even it is not taking the index as well. I have included all the required
details for reference.

Database Stack:
===
PostgreSQL : 9.5.15
Postgis: 2.2.7

Table Structure:
===

ALTER TABLE SCHEMA.TABLE_NAME ADD COLUMN parental_path text;

Created Indexes on column parental_path:
=

CREATE INDEX cable_pair_parental_path_idx
  ON SCHEMA.TABLE_NAME
  USING btree
  (md5(parental_path) COLLATE pg_catalog."default");

CREATE INDEX cable_pair_parental_path_idx_fulltext
  ON SCHEMA.TABLE_NAME
  USING gist
  (parental_path COLLATE pg_catalog."default");

Sample data in "parental_path" column:
==

'route--2309421/2951584/3373649/2511322/1915187/2696397/2623291/2420708/2144348/2294454,circuit--88458/88460,sheath--8874'

Actual Query:
=

SELECT seq_no + 1 FROM SCHEMA.TABLE_NAME WHERE (parental_path LIKE
'%,sheath--' || cable_seq_id || ',%' OR parental_path LIKE 'sheath--' ||
cable_seq_id || ',%' OR parental_path LIKE '%,sheath--' || cable_seq_id OR
parental_path = 'sheath--' || cable_seq_id) ORDER BY seq_no DESC LIMIT 1;

Explain Plan:
=

Limit  (cost=108111.60..108111.61 rows=1 width=4) (actual
time=4597.605..4597.605 rows=0 loops=1)
 Output: ((seq_no + 1)), seq_no
 Buffers: shared hit=2967 read=69606 dirtied=1
 ->  Sort  (cost=108111.60..108113.09 rows=595 width=4) (actual
time=4597.603..4597.603 rows=0 loops=1)
   Output: ((seq_no + 1)), seq_no
   Sort Key: TABLE_NAME.seq_no DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=2967 read=69606 dirtied=1
   ->  *Seq Scan on SCHEMA.TABLE_NAME  (cost=0.00..108108.63 rows=595
width=4) (actual time=4597.595..4597.595 rows=0 loops=1)*
 Output: (seq_no + 1), seq_no
 Filter: ((TABLE_NAME.parental_path ~~
'%,sheath--64690,%'::text) OR (TABLE_NAME.parental_path ~~
'sheath--64690,%'::text) OR (TABLE_NAME.parental_path ~~
'%,sheath--64690'::text) OR (TABLE_NAME.parental_path =
'sheath--64690'::text))
 Rows Removed by Filter: 1930188
 Buffers: shared hit=2967 read=69606 dirtied=1

Please share your suggestion if I have to change or add new objects to the
table etc..


Thanks & Regards,
PostgAnn.


Request to help on Query improvement suggestion.

2020-05-22 Thread devchef2020 d
Hi Team,

Thanks for your support.

Could someone please suggest on the below query.

One of the query which was created on GIS data is taking a long time and
even it is not taking the index as well. I have included all the required
details for reference.

Database Stack:
===
PostgreSQL : 9.5.15
Postgis: 2.2.7

Table Structure:
===

ALTER TABLE SCHEMA.TABLE_NAME ADD COLUMN parental_path text;

Created Indexes on column parental_path:
=

CREATE INDEX cable_pair_parental_path_idx
  ON SCHEMA.TABLE_NAME
  USING btree
  (md5(parental_path) COLLATE pg_catalog."default");

CREATE INDEX cable_pair_parental_path_idx_fulltext
  ON SCHEMA.TABLE_NAME
  USING gist
  (parental_path COLLATE pg_catalog."default");

Sample data in "parental_path" column:
==

'route--2309421/2951584/3373649/2511322/1915187/2696397/2623291/2420708/2144348/2294454,circuit--88458/88460,sheath--8874'

Actual Query:
=

SELECT seq_no + 1 FROM SCHEMA.TABLE_NAME WHERE (parental_path LIKE
'%,sheath--' || cable_seq_id || ',%' OR parental_path LIKE 'sheath--' ||
cable_seq_id || ',%' OR parental_path LIKE '%,sheath--' || cable_seq_id OR
parental_path = 'sheath--' || cable_seq_id) ORDER BY seq_no DESC LIMIT 1;

Explain Plan:
=

Limit  (cost=108111.60..108111.61 rows=1 width=4) (actual
time=4597.605..4597.605 rows=0 loops=1)
 Output: ((seq_no + 1)), seq_no
 Buffers: shared hit=2967 read=69606 dirtied=1
 ->  Sort  (cost=108111.60..108113.09 rows=595 width=4) (actual
time=4597.603..4597.603 rows=0 loops=1)
   Output: ((seq_no + 1)), seq_no
   Sort Key: TABLE_NAME.seq_no DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=2967 read=69606 dirtied=1
   ->  *Seq Scan on SCHEMA.TABLE_NAME  (cost=0.00..108108.63 rows=595
width=4) (actual time=4597.595..4597.595 rows=0 loops=1)*
 Output: (seq_no + 1), seq_no
 Filter: ((TABLE_NAME.parental_path ~~
'%,sheath--64690,%'::text) OR (TABLE_NAME.parental_path ~~
'sheath--64690,%'::text) OR (TABLE_NAME.parental_path ~~
'%,sheath--64690'::text) OR (TABLE_NAME.parental_path =
'sheath--64690'::text))
 Rows Removed by Filter: 1930188
 Buffers: shared hit=2967 read=69606 dirtied=1

Please share your suggestion.

Thanks & Regards,
Devchef.


Re: Request to help on GIS Query improvement suggestion.

2020-05-22 Thread Michael Lewis
Your indexes and operators are not compatible. You have added a btree index
on md5 function result and are not using md5 in your query, and also using
LIKE operator not one of the supported ones. I believe it might use a btree
operator (plain value, not md5 result) if you are always searching for
"string starts with  but I don't know what it ends with" but you can't
possibly use a btree index where you are putting a wild card at the front.

https://www.postgresql.org/docs/9.5/indexes-types.html

a gist index operators supported-
https://www.postgresql.org/docs/9.5/gist-builtin-opclasses.html

Here's a whole page on full text search, it would be worth a read-
https://www.postgresql.org/docs/9.5/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX


Strategy for materialisation and centralisation of data

2020-05-22 Thread Rory Campbell-Lange
Apologies for the cross-post to the general list.

I'm keen to know if there are any good reasons apart from disk space and
possible replication connection overhead to avoid the strategy proposed
below.

We have quite a few databases of type a and many of type b in a cluster.
Both a and b types are fairly complex and are different solutions to a
similar problem domain. All the databases are very read-centric, and all
database interaction is currently through plpgsql with no materialised
data.

Some organisations have several type a and many type b databases, and
need to query these in a homogeneous manner. We presently do this with
many middleware requests or pl/proxy. An a or b type database belongs to
0 or 1 organisations.

Making a and b generally the same would be a very big project.
Consequently I'm discussing materialising a subset of data in a common
format between the two database types and shipping that data to
organisation databases. This would have the benefit of providing a
common data interface and speeding up queries for all database types.
Users would have faster queries, and it would be a big time saver for
our development team, who presently have to deal with three quite
different data APIs.

Presently I've been thinking of using triggers or materialized views in
each database to materialise data into a "matview" schema which is then
shipped via logical replication to an organisation database when
required. New columns in the matview schema tables would ensure replica
identity uniqueness and allow the data to be safely stored in common
tables in the organisation database.

A few issues I foresee with this approach include:

* requiring two to three times current storage for materialisation
  (the cluster is currently ~250GB)

* having to have many logical replication slots
  (we sometimes suffer from pl/proxy connection storms)

Commentary gratefully received,
Rory