Re: Searching in varchar column having 100M records

2019-07-19 Thread mayank rupareliya
Well, you haven't shown us the execution plan, so it's hard to check why
it did not help much and give you further advice.


This is the latest query execution with explain after adding indexing on
both columns.

Aggregate  (cost=174173.57..174173.58 rows=1 width=8) (actual
time=65087.657..65087.658 rows=1 loops=1)
  ->  Bitmap Heap Scan on fields  (cost=1382.56..174042.61 rows=52386
width=0) (actual time=160.340..65024.533 rows=31857 loops=1)
Recheck Cond: ((field)::text = 'Champlin'::text)
Heap Blocks: exact=31433
->  Bitmap Index Scan on index_field  (cost=0.00..1369.46
rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1)
  Index Cond: ((field)::text = 'Champlin'::text)
Planning Time: 8.595 ms
Execution Time: 65093.508 ms

On Thu, Jul 18, 2019 at 6:11 PM Tomas Vondra 
wrote:

> On Thu, Jul 18, 2019 at 05:21:49PM +0530, mayank rupareliya wrote:
> >*Please recheck with track_io_timing = on in configuration. explain
> >(analyze,buffers) with this option will report how many time we spend
> >during i/o*
> >
> >*>   Buffers: shared hit=2 read=31492*
> >
> >*31492 blocks / 65 sec ~ 480 IOPS, not bad if you are using HDD*
> >
> >*Your query reads table data from disks (well, or from OS cache). You need
> >more RAM for shared_buffers or disks with better performance.*
> >
> >
> >Thanks Sergei..
> >*track_io_timing = on helps.. Following is the result after changing that
> >config.*
> >
> >Aggregate  (cost=10075.78..10075.79 rows=1 width=8) (actual
> >time=63088.198..63088.199 rows=1 loops=1)
> >  Buffers: shared read=31089
> >  I/O Timings: read=61334.252
> >  ->  Bitmap Heap Scan on fields  (cost=72.61..10069.32 rows=2586 width=0)
> >(actual time=69.509..63021.448 rows=31414 loops=1)
> >Recheck Cond: ((field)::text = 'Klein'::text)
> >Heap Blocks: exact=30999
> >Buffers: shared read=31089
> >I/O Timings: read=61334.252
> >->  Bitmap Index Scan on index_field  (cost=0.00..71.96 rows=2586
> >width=0) (actual time=58.671..58.671 rows=31414 loops=1)
> >  Index Cond: ((field)::text = 'Klein'::text)
> >  Buffers: shared read=90
> >  I/O Timings: read=45.316
> >Planning Time: 66.435 ms
> >Execution Time: 63088.774 ms
> >
>
> How did that help? It only gives you insight that it's really the I/O that
> takes time. You need to reduce that, somehow.
>
> >
> >*So try something like*
> >
> >*CREATE INDEX ios_idx ON table (field, user_id);*
> >
> >*and make sure the table is vacuumed often enough (so that the visibility*
> >*map is up to date).*
> >
> >Thanks Tomas... I tried this and result improved but not much.
> >
>
> Well, you haven't shown us the execution plan, so it's hard to check why
> it did not help much and give you further advice.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>


Re: Searching in varchar column having 100M records

2019-07-19 Thread Michael Lewis
On Fri, Jul 19, 2019 at 8:13 AM mayank rupareliya 
wrote:

> Well, you haven't shown us the execution plan, so it's hard to check why
> it did not help much and give you further advice.
>
>
> This is the latest query execution with explain after adding indexing on
> both columns.
>
> Aggregate  (cost=174173.57..174173.58 rows=1 width=8) (actual
> time=65087.657..65087.658 rows=1 loops=1)
>   ->  Bitmap Heap Scan on fields  (cost=1382.56..174042.61 rows=52386
> width=0) (actual time=160.340..65024.533 rows=31857 loops=1)
> Recheck Cond: ((field)::text = 'Champlin'::text)
> Heap Blocks: exact=31433
> ->  Bitmap Index Scan on index_field  (cost=0.00..1369.46
> rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1)
>   Index Cond: ((field)::text = 'Champlin'::text)
> Planning Time: 8.595 ms
> Execution Time: 65093.508 ms
>
>>
>>

Are you on a solid state drive? If so, have you tried setting
effective_io_concurrency to 200 or 300 and checking performance? Given
nearly all of the execution time is doing a bitmap heap scan, I wonder
about adjusting this.

https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
effective_io_concurrency
"The allowed range is 1 to 1000, or zero to disable issuance of
asynchronous I/O requests. Currently, this setting only affects bitmap heap
scans."
"The default is 1 on supported systems, otherwise 0. "


Re: Searching in varchar column having 100M records

2019-07-19 Thread Tomas Vondra

On Fri, Jul 19, 2019 at 07:43:26PM +0530, mayank rupareliya wrote:

Well, you haven't shown us the execution plan, so it's hard to check why
it did not help much and give you further advice.


This is the latest query execution with explain after adding indexing on
both columns.

Aggregate  (cost=174173.57..174173.58 rows=1 width=8) (actual
time=65087.657..65087.658 rows=1 loops=1)
 ->  Bitmap Heap Scan on fields  (cost=1382.56..174042.61 rows=52386
width=0) (actual time=160.340..65024.533 rows=31857 loops=1)
   Recheck Cond: ((field)::text = 'Champlin'::text)
   Heap Blocks: exact=31433
   ->  Bitmap Index Scan on index_field  (cost=0.00..1369.46
rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1)
 Index Cond: ((field)::text = 'Champlin'::text)
Planning Time: 8.595 ms
Execution Time: 65093.508 ms



That very clearly does not use the index-only scan, so it's not
surprising it's not any faster. You need to find out why the planner
makes that decision.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Searching in varchar column having 100M records

2019-07-19 Thread mgbii bax
Another suggestion, try to cluster the table using the index for the
"field" column, then analyze. If you're on a spinning disk it will help if
you sort your search "field" during bulk insert.
--

regards

marie g. bacuno ii


On Fri, Jul 19, 2019 at 11:39 AM Tomas Vondra 
wrote:

> On Fri, Jul 19, 2019 at 07:43:26PM +0530, mayank rupareliya wrote:
> >Well, you haven't shown us the execution plan, so it's hard to check why
> >it did not help much and give you further advice.
> >
> >
> >This is the latest query execution with explain after adding indexing on
> >both columns.
> >
> >Aggregate  (cost=174173.57..174173.58 rows=1 width=8) (actual
> >time=65087.657..65087.658 rows=1 loops=1)
> >  ->  Bitmap Heap Scan on fields  (cost=1382.56..174042.61 rows=52386
> >width=0) (actual time=160.340..65024.533 rows=31857 loops=1)
> >Recheck Cond: ((field)::text = 'Champlin'::text)
> >Heap Blocks: exact=31433
> >->  Bitmap Index Scan on index_field  (cost=0.00..1369.46
> >rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1)
> >  Index Cond: ((field)::text = 'Champlin'::text)
> >Planning Time: 8.595 ms
> >Execution Time: 65093.508 ms
> >
>
> That very clearly does not use the index-only scan, so it's not
> surprising it's not any faster. You need to find out why the planner
> makes that decision.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>


Speeding up query pulling comments from pg_catalog

2019-07-19 Thread Ken Tanzer
Hi.  I've got an app that queries pg_catalog to find any table columns that
have comments.  After setting up PgBadger, it was #2 on my list of time
consuming queries, with min/max/avg duration of 199/2351/385 ms (across
~12,000 executions logged).

I'm wondering if there are any ways to speed this query up, including if
there are better options for what to query.

I'm running on 9.6.14 on CentOS 7.

I've copied the EXPLAIN below.  Let me know if additional info would be
helpful.  Thanks in advance!

Ken

ag_reach=> EXPLAIN (ANALYZE, VERBOSE,BUFFERS,TIMING, COSTS)
SELECT c.relname AS table,a.attname AS
column,pg_catalog.col_description(a.attrelid, a.attnum) AS comment
FROM pg_catalog.pg_attribute a, pg_class c
WHERE  a.attrelid = c.oid
AND pg_catalog.col_description(a.attrelid, a.attnum) IS NOT NULL;

  QUERY
PLAN

---
 Hash Join  (cost=197.09..22533.42 rows=39858 width=160) (actual
time=92.538..386.047 rows=8 loops=1)
   Output: c.relname, a.attname, col_description(a.attrelid,
(a.attnum)::integer)
   Hash Cond: (a.attrelid = c.oid)
   Buffers: shared hit=81066
   ->  Seq Scan on pg_catalog.pg_attribute a  (cost=0.00..11718.81
rows=41278 width=70) (actual time=76.069..369.410 rows=8 loops=1)
 Output: a.attrelid, a.attname, a.atttypid, a.attstattarget,
a.attlen, a.attnum, a.attndims, a.attcacheoff, a.atttypmod, a.attbyval,
a.attstorage, a.attalign, a.attnotnull, a.atthasdef, a.attisdropped,
a.attislocal, a.attinhcount, a.attcollation, a.attacl, a.attoptions,
a.attfdwoptions
 Filter: (col_description(a.attrelid, (a.attnum)::integer) IS NOT
NULL)
 Rows Removed by Filter: 40043
 Buffers: shared hit=80939
   ->  Hash  (cost=144.82..144.82 rows=4182 width=68) (actual
time=15.932..15.934 rows=4183 loops=1)
 Output: c.relname, c.oid
 Buckets: 8192  Batches: 1  Memory Usage: 473kB
 Buffers: shared hit=103
 ->  Seq Scan on pg_catalog.pg_class c  (cost=0.00..144.82
rows=4182 width=68) (actual time=0.015..7.667 rows=4183 loops=1)
   Output: c.relname, c.oid
   Buffers: shared hit=103
 Planning time: 0.408 ms
 Execution time: 386.148 ms
(18 rows)



-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://demo.agency-software.org/client
*
[email protected]
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.