materialized view refresh of a foreign table

2019-06-23 Thread Rick Otten
I'm not sure where else to look, so I'm asking here for tips.

I have a table in a remote (Heroku-managed) postgresql database (PG 10.7).

On the other end, (PG 11.3) I have a foreign table configured with a
materialized view in front of it.

Up until Thursday evening, it was taking about 12 - 15 seconds to refresh,
it is only 15,000 rows with 41 columns.   Since Thursday evening it has
been taking 15 _minutes_ or more to refresh.  Nothing changed on my end
that I'm aware of.  It completes, it just takes forever.

Here is a summary of what I've tried:

1) Refreshing the materialized views of other tables from that same source
database, some much bigger, still perform within seconds as they always
have.
2) Dropping the foreign table and the materialized view and recreating them
didn't help.
3) It doesn't matter whether I refresh concurrently or not.
4) Configuring the foreign table and materialized view on my laptop's
postgresql instance exhibited the same behavior for just this one table.
5) Querying the foreign table directly for a specific row was fast.
6) Reindex and vacuum full analyze on the source table didn't help.
7) Bumping the database on my end to 11.4, didn't help.
8) There are no locks on either database that I can see while the query
appears to be stalled.
9) Running the materialized view select directly against the source table
completes within seconds.
10) Running the materialized view select directly against the foreign table
also completes within a few seconds.
11) Dropping all of the indexes on the materialized view, including the
unique one and running the refresh (without 'concurrently') does not help.

I feel like I'm missing something obvious here, but I'm just not seeing
it.  Any thoughts about where else to look?


monitoring tuple_count vs dead_tuple_count

2019-06-23 Thread Mariel Cherkassky
Hi,
I wrote a script that monitored the size of a specific table of mine(dead
tuples_mb vs live tuples_mb). The script run a query on pg_stattuple every
15 minutes : select * from pg_stattuple('table_name'). I know that every
night there is a huge delete query that deletes most of the table`s
content. In addition, I set the following parameters for the table :
toast.autovacuum_vacuum_scale_factor=0,
 toast.autovacuum_vacuum_threshold=1,
toast.autovacuum_vacuum_cost_limit=1,
toast.autovacuum_vacuum_cost_delay=5

After a week of monitoring I generates a csv of the results and I created a
graph from that data.  However, the graph that I created confused me very
much.
A small sample of all the data that I gathered :
date toasted_live_tup_size_MB toasted_dead_tup_size_mb
6/16/19 0:00 58.8537941 25.68760395
6/16/19 0:15 8.725102425 25.02167416
6/16/19 0:30 8.668716431 25.08410168
6/16/19 0:45 8.810066223 24.94327927
6/16/19 1:00 8.732183456 25.02435684
6/16/19 1:15 8.67656517 20.01097107
6/16/19 1:30 9.573832512 20.76298809
6/16/19 1:45 9.562319756 20.7739706
6/16/19 2:00 9.567030907 21.01560402
6/16/19 2:15 9.576253891 70.62042999
6/16/19 2:30 9.715950966 492.2445602
6/16/19 2:45 9.59837532 801.455843
6/16/19 3:00 9.599774361 1110.201434
6/16/19 3:15 9.606671333 1402.255548
6/16/19 3:30 9.601698875 1698.487226
6/16/19 3:45 9.606934547 2003.051514
6/16/19 4:00 9.600641251 2307.625901
6/16/19 4:15 9.61320591 2612.196963
6/16/19 4:30 9.606646538 2916.773588
6/16/19 4:45 9.61294651 3221.337314
6/16/19 5:00 9.607636452 3525.914713
6/16/19 5:15 5.447218895 3826.313025
6/16/19 5:30 9.621054649 4130.883012
6/16/19 5:45 11.48730659 4433.29188
6/16/19 6:00 7.311745644 4742.039024
6/16/19 6:15 12.31321144 5135.994677
6/16/19 6:30 12.12382507 5671.512811
6/16/19 6:45 8.029448509 6171.677253
6/16/19 7:00 7.955677986 .846472
6/16/19 7:15 12.21173954 7161.934807
6/16/19 7:30 7.96325779 7661.273341
6/16/19 7:45 12.20623493 8156.362462
6/16/19 8:00 7.960205078 8655.704986
6/16/19 8:15 12.13819695 33.60424519
6/16/19 8:30 12.21746635 57.87192154
6/16/19 8:45 12.2179966 33.52415848
6/16/19 9:00 12.14417744 33.60204792
6/16/19 9:15 12.21954441 26.85134888


As you can see in this example, The size of the dead rows from 2am until
8am increased while there isnt any change in the size of the live rows.
During that time I know that there were a delete query that run and deleted
a lot of rows. That is why I'm confused here, if more dead rows are
generated because of a delete, it means that number of live_tuples should
be decreased but it doesnt happen here. Any idea why ?


Re: Incorrect index used in few cases..

2019-06-23 Thread AminPG Jaffer
Sorry for late reply.

The initial values before upgrade for seq_page_cost=1, random_page_cost=4
and after upgrading when we started to see the issues as we were seeing
"Seq Scan" we change them  seq_page_cost=1, random_page_cost=1

The issue happens only in production so making the index invalid would
affect service so it isn't something we can do.
I have tried to rebuild the PK index to see it helps or not but it doesn't
seem help.

Related to the same issue we sometimes see following Seq Scan on update
when querying by PK alone which appears to be related.

   update tc set...where id = $1 and version <$2
   Update on tc  (cost=100.00..10003184001.52 rows=1 width=1848)
  ->  Seq Scan on tc  (cost=100.00..10003184001.52 rows=1
width=1848)
Filter: ((version < '38'::numeric) AND (id =
'53670604704'::numeric))

I was trying to find where the cost=100 is set in the source code
but wasn't able to find it, do anyone where it is set?
And if you someone can point me to the code where it goes through the
execution plans when SQL is sent i can try to go through the code to see if
can figure out what it is doing behind to scene in it's calculation?

Thanks

On Tue, Jun 18, 2019 at 3:23 PM Tom Lane  wrote:

> Andres Freund  writes:
> > Are those indexes used for other queries? Any chance they've been
> > recently created?
>
> > SELECT indexrelid::regclass, xmin, indcheckxmin, indisvalid, indisready,
> > indislive, txid_current(), txid_current_snapshot()
> > FROM pg_index WHERE indrelid = 'tc'::regclass;
>
> > might tell us.
>
> Oh, that's a good idea.
>
> > Amin, might be worth to see what the query plan is if you disable that
> > index. I assume it's too big to quickly drop (based on the ?
>
> Considering that the "right" query plan would have a cost estimate in
> the single digits or close to it, I have to suppose that the planner is
> rejecting that index as unusable, not making a cost-based decision not
> to use it.  (Well, maybe if it's bloated by three orders of magnitude
> compared to the other indexes, it'd lose on cost.  Doesn't seem likely
> though.)
>
> So I think we're looking for a hard "can't use the index" reason, and
> now we've eliminated datatype mismatch which'd be the most obvious
> such reason.  But index-isnt-valid or index-isnt-ready might do the
> trick.
>
> regards, tom lane
>