Inconsistent query times and spiky CPU with GIN tsvector search

2018-09-04 Thread Scott Rankin
Hello all,

We are running postgresql 9.4 and we have a table where we do some full-text 
searching using a GIN index on a tsvector column:

CREATE TABLE public.location_search
(
id bigint NOT NULL DEFAULT nextval('location_search_id_seq'::regclass),
…
search_field_tsvector tsvector
)

and

CREATE INDEX location_search_tsvector_idx
ON public.location_search USING gin
(search_field_tsvector)
TABLESPACE pg_default;

The search_field_tsvector column contains the data from the location's name and 
address:

to_tsvector('pg_catalog.english', COALESCE(NEW.name, '')) || 
to_tsvector(COALESCE(address, ''))

This setup has been running very well, but as our load is getting heavier, the 
performance seems to be getting much more inconsistent.  Our searches are run 
on a dedicated read replica, so this server is only doing queries against this 
one table.  IO is very low, indicating to me that the data is all in memory.  
However, we're getting some queries taking upwards of 15-20 seconds, while the 
average is closer to 1 second.

A sample query that's running slowly is

explain (analyze, buffers)
SELECT ls.location AS locationId FROM location_search ls
WHERE ls.client = 1363
AND ls.favorite = TRUE
AND search_field_tsvector @@ to_tsquery('CA-94:* &E &San:*')
LIMIT 4;

And the explain analyze is:

Limit  (cost=39865.85..39877.29 rows=1 width=8) (actual time=4471.120..4471.120 
rows=0 loops=1)
  Buffers: shared hit=25613
  ->  Bitmap Heap Scan on location_search ls  (cost=39865.85..39877.29 rows=1 
width=8) (actual time=4471.117..4471.117 rows=0 loops=1)
Recheck Cond: (search_field_tsvector @@ to_tsquery('CA-94:* &E 
&San:*'::text))
Filter: (favorite AND (client = 1363))
Rows Removed by Filter: 74
Heap Blocks: exact=84
Buffers: shared hit=25613
->  Bitmap Index Scan on location_search_tsvector_idx  
(cost=0.00..39865.85 rows=6 width=0) (actual time=4470.895..4470.895 rows=84 
loops=1)
  Index Cond: (search_field_tsvector @@ to_tsquery('CA-94:* &E 
&San:*'::text))
  Buffers: shared hit=25529
Planning time: 0.335 ms
Execution time: 4487.224 ms

I'm a little bit at a loss to where to start at this - any suggestions would be 
hugely appreciated!

Thanks,
Scott

This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.


Slow Bitmap Index Scan

2018-11-28 Thread Scott Rankin
Hello all,

We recently moved our production database systems from a 9.4 running on a 
self-managed EC2 instance to 9.6.10 on Amazon’s AWS (same RAM, CPU).  After the 
move, we’re finding that certain queries that we run against a GIN full-text 
index have some occasionally very slow executions and I’m struggling to figure 
out what to do about it.   I would be very grateful for any ideas!

Thanks,
Scott

The setup we have is a 32-core, 244 GB RAM primary with a same-sized read 
replica.  The queries are running off the replica, but performance is roughly 
the same between the master and the replica.

Here’s a query that’s performing badly:


SELECT ls.location AS locationId  FROM location_search ls  WHERE ls.client = 83 
 AND search_field_tsvector @@ to_tsquery('9000:* &Smith''s:* &Mill:*')  AND 
ls.favorite = TRUE  LIMIT 1

The mean time for this query (and others like it) is about 900ms, but the std 
deviation is over 1000ms and the max is 11000ms.

The explain looks like this:

Limit  (cost=1516.25..1520.52 rows=1 width=223) (actual time=4506.482..4506.482 
rows=0 loops=1)
  Buffers: shared hit=9073
  ->  Bitmap Heap Scan on location_search ls  (cost=1516.25..1520.52 rows=1 
width=223) (actual time=4506.480..4506.480 rows=0 loops=1)
Recheck Cond: (search_field_tsvector @@ to_tsquery('9000:* &Smith''s:* 
&Mill:*'::text))
Filter: (favorite AND (client = 83))
Rows Removed by Filter: 8
Heap Blocks: exact=12
Buffers: shared hit=9073
->  Bitmap Index Scan on location_search_tsvector_idx  
(cost=0.00..1516.25 rows=1 width=0) (actual time=4506.450..4506.450 rows=12 
loops=1)
  Index Cond: (search_field_tsvector @@ to_tsquery('9000:* 
&Smith''s:* &Mill:*'::text))
  Buffers: shared hit=9061
Planning time: 0.240 ms
Execution time: 4509.995 ms

The table has about 30 million rows in it.   The table and index definition are:

CREATE TABLE public.location_search
(
id bigint NOT NULL DEFAULT nextval('location_search_id_seq'::regclass),
person_location bigint,
person bigint,
client_location bigint,
client bigint,
location bigint,
org_unit_id bigint,
latitude numeric(10,7),
longitude numeric(10,7),
geofence numeric(10,7),
address_line_one text COLLATE pg_catalog."default",
address_line_two text COLLATE pg_catalog."default",
city text COLLATE pg_catalog."default",
state text COLLATE pg_catalog."default",
postal_code text COLLATE pg_catalog."default",
country text COLLATE pg_catalog."default",
full_address text COLLATE pg_catalog."default",
is_google_verified boolean,
address_source text COLLATE pg_catalog."default",
active boolean,
name character varying(255) COLLATE pg_catalog."default",
external_client_location_id character varying(500) COLLATE 
pg_catalog."default",
custom_field_values hstore,
location_tags hstore,
legacy_location_id bigint,
favorite boolean,
search_field_tsvector tsvector
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX location_search_tsvector_idx
ON public.location_search USING gin
(search_field_tsvector)
TABLESPACE pg_default;

Right now the output of pgstatginindex is this:
version pending_pages pending_tuples
2  214 9983

Lastly, here are some of the relevant config entries:

autovacuum on
autovacuum_analyze_scale_factor 0
autovacuum_analyze_threshold 50
autovacuum_freeze_max_age 4
autovacuum_max_workers 3
autovacuum_multixact_freeze_max_age 4
autovacuum_naptime 30s
autovacuum_vacuum_cost_delay 20ms
autovacuum_vacuum_cost_limit-1
autovacuum_vacuum_scale_factor 0
autovacuum_vacuum_threshold 50
cpu_index_tuple_cost 0
cpu_operator_cost 0
cpu_tuple_cost 0
cursor_tuple_fraction 0
effective_cache_size 12584kB
effective_io_concurrency 1
gin_fuzzy_search_limit 0
gin_pending_list_limit 4MB
maintenance_work_mem 4027MB
seq_page_cost 1
shared_buffers 6288kB
work_mem 20kB


SCOTT RANKIN
VP, Technology
Motus, LLC
Two Financial Center, 60 South Street, Boston, MA 02111
617.467.1900 (O) | [email protected]<mailto:[email protected]>

Follow us on LinkedIn<https://www.linkedin.com/company/motus-llc/> | Visit us 
at motus.com<http://www.motus.com/>


This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message 

Re: Slow Bitmap Index Scan

2018-11-28 Thread Scott Rankin

On 11/28/18, 2:18 PM, "Justin Pryzby"  wrote:

On Wed, Nov 28, 2018 at 07:08:53PM +0000, Scott Rankin wrote:
> We recently moved our production database systems from a 9.4 running on a 
self-managed EC2 instance to 9.6.10 on Amazon’s AWS (same RAM, CPU).  After the 
move, we’re finding that certain queries that we run against a GIN full-text 
index have some occasionally very slow executions and I’m struggling to figure 
out what to do about it.   I would be very grateful for any ideas!
>
> The setup we have is a 32-core, 244 GB RAM primary with a same-sized read 
replica.  The queries are running off the replica, but performance is roughly 
the same between the master and the replica.
>
> Here’s a query that’s performing badly:

Can you compare or show the explain(analyze,buffers) for a fast query 
instance
vs slow query instance ?  Is it slower due to index access or heap?  Due to
cache misses ?

If I reduce the number of search terms in , I get this:

SELECT ls.location AS locationId   FROM location_search ls  WHERE ls.client = 
83  AND search_field_tsvector @@ to_tsquery('9000:*'::text)  AND ls.favorite = 
TRUE  LIMIT 100

Limit  (cost=13203.99..13627.40 rows=100 width=8) (actual time=66.568..66.759 
rows=100 loops=1)
  Buffers: shared hit=1975
  ->  Bitmap Heap Scan on location_search ls  (cost=13203.99..13923.79 rows=170 
width=8) (actual time=66.568..66.729 rows=100 loops=1)
Recheck Cond: ((search_field_tsvector @@ to_tsquery('9000:*'::text)) 
AND (client = 83))
Filter: favorite
Heap Blocks: exact=86
Buffers: shared hit=1975
->  BitmapAnd  (cost=13203.99..13203.99 rows=170 width=0) (actual 
time=66.471..66.472 rows=0 loops=1)
  Buffers: shared hit=1889
  ->  Bitmap Index Scan on location_search_tsvector_idx  
(cost=0.00..2235.02 rows=11570 width=0) (actual time=20.603..20.604 rows=29155 
loops=1)
Index Cond: (search_field_tsvector @@ 
to_tsquery('9000:*'::text))
Buffers: shared hit=546
  ->  Bitmap Index Scan on location_search_client_idx  
(cost=0.00..10968.63 rows=442676 width=0) (actual time=40.682..40.682 
rows=482415 loops=1)
Index Cond: (client = 83)
Buffers: shared hit=1343
Planning time: 0.181 ms
Execution time: 66.806 ms

I see almost no IO reads, and pg_stat_statements shows no cache misses.

Also, you have big ram - have you tried disabling KSM or THP ?
https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com

Since this is Amazon RDS, we don't have any control over or access to the 
underlying OS.  I know that huge_page support is on for these instances.  I 
would hope that Amazon's already done that...

Justin



This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.


Re: Slow Bitmap Index Scan

2018-12-03 Thread Scott Rankin
Upon further analysis, this is - unsurprisingly - taking place when we have 
multiple prefixed search terms in a ts_query going against a tsvector index.

We have roughly 30 million rows in the table, and the search column is 
basically a concatenation of a location's name (think "Walmart #123456") and 
its street address.

We use these searches mostly for autocompleting of a location search.  So the 
search for that record above might be "Walmart 123", which we change to be 
to_tsquery('walmart:* &123:*').  We prefix both terms to correct for 
misspellings or lazy typing.

Is it unrealistic to think that we could have sub-1000ms searches against that 
size of a table?

On 11/28/18, 2:18 PM, "Justin Pryzby"  wrote:

    On Wed, Nov 28, 2018 at 07:08:53PM +, Scott Rankin wrote:
> We recently moved our production database systems from a 9.4 running on a 
self-managed EC2 instance to 9.6.10 on Amazon’s AWS (same RAM, CPU).  After the 
move, we’re finding that certain queries that we run against a GIN full-text 
index have some occasionally very slow executions and I’m struggling to figure 
out what to do about it.   I would be very grateful for any ideas!
>
> The setup we have is a 32-core, 244 GB RAM primary with a same-sized read 
replica.  The queries are running off the replica, but performance is roughly 
the same between the master and the replica.
>
> Here’s a query that’s performing badly:

Can you compare or show the explain(analyze,buffers) for a fast query 
instance
vs slow query instance ?  Is it slower due to index access or heap?  Due to
cache misses ?

Also, you have big ram - have you tried disabling KSM or THP ?
https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com

Justin



This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.


Huge shared hit for small table

2019-11-04 Thread Scott Rankin
Hello all,

We are trying to debug some slow performance in our production environment 
(Amazon RDS, Postgresql 9.6.11), and we’re looking at a particular EXPLAIN node 
that seems… weird.  This is a very large query involving a number of joins, but 
it performs pretty well in our staging environment (which has roughly the same 
data set as production, with a few tweaks).  However, there is one node in the 
EXPLAIN plan that is wildly different:

In the staging environment, we get this:

Index Scan using "programPK" on public.program prog  (cost=0.29..0.35 rows=1 
width=16) (actual time=0.002..0.003 rows=1 loops=21965)
  Output: prog.id, prog.version, prog.active, prog.created_date, 
prog.last_modified_date, prog.created_by, prog.last_modified_by, 
prog.client_id, prog.scheme_id, prog.name, prog.legacy_group_id, 
prog.custom_fields, prog.setup_complete, prog.setup_messages, 
prog.legacy_program_type
  Index Cond: (prog.id = per.program)
  Buffers: shared hit=87860

In the production environment, we get this:

Index Scan using "programPK" on public.program prog  (cost=0.29..0.36 rows=1 
width=16) (actual time=0.017..4.251 rows=1 loops=21956)
   Output: prog.id, prog.version, prog.active, prog.created_date, 
prog.last_modified_date, prog.created_by, prog.last_modified_by, 
prog.client_id, prog.scheme_id, prog.name, prog.legacy_group_id, 
prog.custom_fields, prog.setup_complete, prog.setup_messages, 
prog.legacy_program_type
   Index Cond: (prog.id = per.program)
   Buffers: shared hit=25437716

The tables in both environments are about the same size (18MB) and the indexes 
are about the same size (360kb/410kb) – and the shared hits are pretty much the 
same on the other nodes of the query between the two environments.

Lastly, if we take out the join to the “program” table, the query performs much 
faster in production and the timing between staging and production is similar.

This has happened one time before, and we did a “REINDEX” on the program table 
– and that made the problem mostly go away.  Now it seems to be back, and I’m 
not sure what to make of it.

Thanks in advance for any help you can offer!

Scott

SCOTT RANKIN
VP, Technology
Motus, LLC
Two Financial Center, 60 South Street, Boston, MA 02111
617.467.1900 (O) | [email protected]<mailto:[email protected]>

Follow us on LinkedIn<https://www.linkedin.com/company/motus-llc/> | Visit us 
at motus.com<http://www.motus.com/>


This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.


Re: Huge shared hit for small table

2019-11-04 Thread Scott Rankin
The index is exceedingly simple:


CREATE UNIQUE INDEX "programPK" ON program(id int8_ops);

From pg_stat_user_indexes:

Staging:

idx_scan: 5826745
idx_tup_read: 52715470
idx_tup_fetch: 52644465

Production:

idx_scan : 7277919087
idx_tup_read: 90612605047
idx_tup_fetch: 5207807880

From: Andres Freund 
Date: Monday, November 4, 2019 at 2:46 PM
To: Scott Rankin 
Cc: "[email protected]" 

Subject: Re: Huge shared hit for small table

Hi,

On 2019-11-04 19:38:40 +, Scott Rankin wrote:
> In the staging environment, we get this:
>
> Index Scan using "programPK" on public.program prog (cost=0.29..0.35 rows=1 
> width=16) (actual time=0.002..0.003 rows=1 loops=21965)
> Output: prog.id<http://prog.id>, prog.version, 
> prog.active<http://prog.active>, prog.created_date, prog.last_modified_date, 
> prog.created_by, prog.last_modified_by, prog.client_id, prog.scheme_id, 
> prog.name<http://prog.name>, prog.legacy_group_id, prog.custom_fields, 
> prog.setup_complete, prog.setup_messages, prog.legacy_program_type
> Index Cond: (prog.id<http://prog.id> = per.program)
> Buffers: shared hit=87860
>
> In the production environment, we get this:
>
> Index Scan using "programPK" on public.program prog (cost=0.29..0.36 rows=1 
> width=16) (actual time=0.017..4.251 rows=1 loops=21956)
> Output: prog.id<http://prog.id>, prog.version, 
> prog.active<http://prog.active>, prog.created_date, prog.last_modified_date, 
> prog.created_by, prog.last_modified_by, prog.client_id, prog.scheme_id, 
> prog.name<http://prog.name>, prog.legacy_group_id, prog.custom_fields, 
> prog.setup_complete, prog.setup_messages, prog.legacy_program_type
> Index Cond: (prog.id<http://prog.id> = per.program)
> Buffers: shared hit=25437716
>
> The tables in both environments are about the same size (18MB) and the 
> indexes are about the same size (360kb/410kb) – and the shared hits are 
> pretty much the same on the other nodes of the query between the two 
> environments.

It'd be worthwhile to look at the index stats using pgstatindex. Also,
could you show the definition of those indexes?


> This email message contains information that Motus, LLC considers 
> confidential and/or proprietary, or may later designate as confidential and 
> proprietary. It is intended only for use of the individual or entity named 
> above and should not be forwarded to any other persons or entities without 
> the express consent of Motus, LLC, nor should it be used for any purpose 
> other than in the course of any potential or actual business relationship 
> with Motus, LLC. If the reader of this message is not the intended recipient, 
> or the employee or agent responsible to deliver it to the intended recipient, 
> you are hereby notified that any dissemination, distribution, or copying of 
> this communication is strictly prohibited. If you have received this 
> communication in error, please notify sender immediately and destroy the 
> original message.
>
> Internal Revenue Service regulations require that certain types of written 
> advice include a disclaimer. To the extent the preceding message contains 
> advice relating to a Federal tax issue, unless expressly stated otherwise the 
> advice is not intended or written to be used, and it cannot be used by the 
> recipient or any other taxpayer, for the purpose of avoiding Federal tax 
> penalties, and was not written to support the promotion or marketing of any 
> transaction or matter discussed herein.

GNGNGGRR.

Greetings,

Andres Freund



This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.


Re: Huge shared hit for small table

2019-11-04 Thread Scott Rankin
Thanks to Justin for the clarification around pgstatindex:

Staging:

version2
tree_level1
index_size425984
root_block_no3
internal_pages1
leaf_pages50
empty_pages0
deleted_pages0
avg_leaf_density70.86
leaf_fragmentation16

Production:

version2
tree_level1
index_size360448
root_block_no3
internal_pages1
leaf_pages41
empty_pages0
deleted_pages1
avg_leaf_density60.44
leaf_fragmentation39.02

On 11/4/19, 3:07 PM, "Peter Geoghegan"  wrote:

On Mon, Nov 4, 2019 at 11:56 AM Justin Pryzby  wrote:
> I think it's because some heap pages are being visited many times, due to 
the
> index tuples being badly "fragmented".  Note, I'm not talking about
> fragmentation of index *pages*, which is what pgstattuple reports (which
> wouldn't have nearly so detrimental effect).  I could probably say that 
the
> index tuples are badly "correlated" with the heap.

But this is a unique index, and Scott indicates that the problem seems
to go away for a while following a REINDEX.

> In PG v12 you can use REINDEX CONCURRENTLY (but beware there's a crash
> affecting its progress reporting, fix to be included in v12.1).

PG v12 will store B-Tree duplicates in heap TID order, so if that's
the problem then upgrading to v12 (and REINDEXing if the upgrade was
performed using pg_upgrade) will fix it for good.

--
Peter Geoghegan



This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.


Re: Huge shared hit for small table

2019-11-04 Thread Scott Rankin
Definitely no long-running transactions on this table; in fact, this table is 
pretty infrequently updated – on the order of a few tens of rows updated per 
day.

From: Jeff Janes 
Date: Monday, November 4, 2019 at 3:32 PM
To: Scott Rankin 
Cc: "[email protected]" 

Subject: Re: Huge shared hit for small table

On Mon, Nov 4, 2019 at 2:38 PM Scott Rankin 
mailto:[email protected]>> wrote:
Hello all,

We are trying to debug some slow performance in our production environment 
(Amazon RDS, Postgresql 9.6.11), and we’re looking at a particular EXPLAIN node 
that seems… weird.  This is a very large query involving a number of joins, but 
it performs pretty well in our staging environment (which has roughly the same 
data set as production, with a few tweaks).  However, there is one node in the 
EXPLAIN plan that is wildly different:

Could there be a long-open transaction, which is preventing hint-bits from 
getting on set on the table rows, as well on the index rows?

...

The tables in both environments are about the same size (18MB) and the indexes 
are about the same size (360kb/410kb) – and the shared hits are pretty much the 
same on the other nodes of the query between the two environments.

If this table has more turn-over than those other tables (as measured in rows, 
not in percentage of the table), this would not be inconsistent with my theory.

This has happened one time before, and we did a “REINDEX” on the program table 
– and that made the problem mostly go away.  Now it seems to be back, and I’m 
not sure what to make of it.


A reindex would not by itself fix the problem if it were the long open 
transaction.  But  if the long open transaction held a sufficient lock on the 
table, then the reindex would block until the transaction went away on its own, 
at which point the problem would go away on its own, so it might **appear** to 
have fixed the problem.

Cheers,

Jeff

This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.


Re: Huge shared hit for small table

2019-11-04 Thread Scott Rankin
I think we have a winner.  I looked in and found a process that was 'idle in 
transaction' for a couple days - and once I killed it, query performance went 
back to normal.

Thank you all for the very quick responses on this.

On 11/4/19, 3:41 PM, "Peter Geoghegan"  wrote:

On Mon, Nov 4, 2019 at 12:38 PM Scott Rankin  wrote:
> Definitely no long-running transactions on this table; in fact, this 
table is pretty infrequently updated – on the order of a few tens of rows 
updated per day.

But a long running transaction will have an impact on all tables --
not just the tables that happen to have been accessed so far in the
long running transaction. This is necessary because nothing stops the
long running transaction from SELECTing data from any table at any
time -- we need to pessimistically keep around the data required to
make that work.

--
Peter Geoghegan



This email message contains information that Motus, LLC considers confidential 
and/or proprietary, or may later designate as confidential and proprietary. It 
is intended only for use of the individual or entity named above and should not 
be forwarded to any other persons or entities without the express consent of 
Motus, LLC, nor should it be used for any purpose other than in the course of 
any potential or actual business relationship with Motus, LLC. If the reader of 
this message is not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution, or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written 
advice include a disclaimer. To the extent the preceding message contains 
advice relating to a Federal tax issue, unless expressly stated otherwise the 
advice is not intended or written to be used, and it cannot be used by the 
recipient or any other taxpayer, for the purpose of avoiding Federal tax 
penalties, and was not written to support the promotion or marketing of any 
transaction or matter discussed herein.