Re: Active connections are terminated because of small wal_sender_timeout

2019-07-08 Thread Michael Paquier
On Fri, Jul 05, 2019 at 10:03:16AM -0400, Tom Lane wrote:
> ayaho...@ibagroup.eu writes:
>> Do you have any thoughts regarding this issue? 
> 
> I do not think anybody thinks this is a bug.  Setting wal_sender_timeout
> too small is a configuration mistake.

Yeah.  I don't see any bug here.  Please note that it can be also a
problem to set up a too high value in some configuration setups.  The
lack of flexibility in this area is why wal_sender_timeout has been
switch to be user-settable in v12.  In short you can configure it in
the connection string to enforce a custom value per standby.
--
Michael


signature.asc
Description: PGP signature


Incremental aggregate/rollup strategy advice

2019-07-08 Thread Morris de Oryx
I'm researching strategies for incrementally updating aggregate/rollup
tables. The problem is how to do so without double-counting changes, and
not skipping changes. I know enough about concurrency issues to ask the
question, but do *not* know enough about the features and details of
Postgres' concurrency management to figure out a 100% reliable solution
without some help. And, with concurrency-related stuff, you're either 100%
right or you're buggy.

And *thanks in advance* to anyone who can help out. I'm not good at writing
short :( I've tried to put in enough detail to get to the point, which is
"how do I find unprocessed records without missing any."

Okay, the setup is that we've got a lot of tables where we would like to do
incremental aggregates. To simplify things, mostly these are INSERT-only
tables, sometimes UPDATE, not worrying about DELETE yet. A couple of
strategies I'd like to avoid:

* Full queries will take too long, and will scale poorly. So, MATERIALIZED
VIEW is unappealing. So, rollup tables as it's possible to update them
incrementally.

* We may have multiple aggregates off the same base data, and may change
them over time. So, putting some kind of flag field in the source table
doesn't really fit.

* I was thinking about a posting/diff/delta/audit-like table, but that's a
pretty "heavy" solution. You need some kind of ON AFTER INSERT/UPDATE
selection-based trigger to push over the data that's needed to update the
aggregates. Which, again, means the source table needs to know what
aggregations are going to take place. Plus, it's just a ton of churn and
extra data...when all of necessary data exists in the source table already.

* I saw one strategy that looks good from the folks at CitusData:
https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/

Briefly, they use a bigserial counter which, I guess, is not
transaction-bound so that record insertions have a chronological stamp. 1,
2, 3, etc. This is a design familiar to me from other environments and is
sometimes called a "concurrency ID." In our case, we need to support UPDATE
as well, so I don't think the sequence idea will work (?) To make this more
concrete, here's a simplified table with source data:

CREATE TABLE "error_report" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We've got
distributed sources, so UUIDs for IDs.
"error_name" text NOT NULL DEFAULT false,-- Something we'll
summarize by.
"facility_id" uuid NOT NULL DEFAULT NULL,-- Something we'll
summarize by.
"error_dts" timestamptz NOT NULL DEFAULT NULL,   -- Set on the
source machine in UTC
"last_updated_dts" timestamptz NOT NULL DEFAULT NULL);   -- Set on Postgres
after INSERT or UPDATE.

The idea is that you have a stable number line as a number or a timestamp.
We use timestamptz and store everything in UTC. Otherwise, it's the same
basic idea as what the CitusData folks said: You have an ever-increasing
number line so that you can mark where you've processed to. This way, you
can fetch unprocessed rows without missing any, without a flag field the
source table, and without an audit table/change queue of any kind. I've
simplified the timestamps below for legibility to spell this out, as it's
the crux of my question about Postgres specifics. And, just pretend that
these rows are all on page 0...I've faked ctid values to make the rows
easier to keep track of.

ctid   last_updated_dts
(0,1)  2018-09-25 05:53:00
(0,2)  2018-09-25 05:54:00
(0,3)  2018-09-25 05:55:00
(0,3)  2018-09-25 05:55:00
(0,4)  2018-09-26 02:23:00
(0,5)  2018-09-26 03:14:00
(0,6)  2018-09-26 03:15:00
(0,7)  2018-09-28 05:10:00
(0,8)  2018-09-28 05:14:00
(0,9)  2018-09-28 05:15:00
(0,10) 2018-09-28 05:15:00

You need a small utility table to hold details about which records you've
aggregated or processed.

CREATE TABLE "rollup_status" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We use UUIDs,
not necessary here, but it's what we use.
"rollup_name" text NOT NULL DEFAULT false,
"last_processed_dts" timestamptz NOT NULL DEFAULT NULL); -- Marks the last
timestamp processed.

Now imagine that I've got a rollup_status record

rollup_name last_processed_dts
error_name_counts   2018-09-26 02:23:00

If I search for rows that were modified after the "processed until", I get
these:

ctid   last_updated_dts
(0,5)  2018-09-26 03:14:00
(0,6)  2018-09-26 03:15:00
(0,7)  2018-09-28 05:10:00
(0,8)  2018-09-28 05:14:00
(0,9)  2018-09-28 05:15:00
(0,10) 2018-09-28 05:15:00

And update the max(last_updated_dts) in the rollup_detail record:

rollup_name last_processed_dts
error_name_counts   2018-09-28 05:15:00

So, I got a chunk of the timeline, recorded how far I went, and processed
those records. The beauty part of this technique, if I can get it
implemented correctly, is that this doesn't have to block new records.
While I'm processing those 5 (or 5K), new records can be added onto the end
of error_report and,

Re: Incremental aggregate/rollup strategy advice

2019-07-08 Thread Tatsuo Ishii
> * Full queries will take too long, and will scale poorly. So, MATERIALIZED
> VIEW is unappealing. So, rollup tables as it's possible to update them
> incrementally.

F.Y.I. There is a proposal to implemnt incremental updation against
MATERIALIZED VIEW. It is still in WIP patch but currently it supports
count and sum.

https://www.postgresql.org/message-id/20190628195620.c306e3003a83bb85a12f54c5%40sraoss.co.jp

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Incremental aggregate/rollup strategy advice

2019-07-08 Thread Morris de Oryx
Tatsuo,

Thank you for your response, I have followed the discussion on Hackers with
interest. I hope that your efforts are a great success! In my case, I need
to find a solution available in shipping versions of Postgres. But, since
you've joined in, I'm curious: What is the advantage of a materialized view
over a real table? It seems like the update semantics and mechanics are
more straightforward with a table.


RE: Incremental aggregate/rollup strategy advice

2019-07-08 Thread Steven Winfield
There are a couple of extensions that might help you:

PipelineDB[1]: Their “Continuous Views” could be useful. A big caveat here is 
that PipelineDB’s future is uncertain[2], though.
I haven’t used it myself, but it looks like you could put triggers onto your 
existing tables to insert data in PipelinedDB “Streams”, then build whatever 
continuous views are needed around those streams.

The other is TimescaleDB[3], which has “Continuous Aggregates”, but they are 
fairly new and currently have some limitation.

Steve.

[1] https://www.pipelinedb.com/
[2] https://www.pipelinedb.com/blog/pipelinedb-is-joining-confluent
[3] https://www.timescale.com/



This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. 
Full details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy. 
Please familiarise yourself with this policy and check it from time to time for 
updates as it supplements this notice.


Re: Incremental aggregate/rollup strategy advice

2019-07-08 Thread Morris de Oryx
Thanks Steven, nice suggestions. I should have mentioned that the
deployment setup is on RDS on PG 11.x, which rules out those extensions.
I've looked at TimescaleDB several times, and it looks pretty great.

I've now read through some of the archives from years back when
pg_xact_commit_timestamp was still in development, and I'm thinking it
might be the right solution. I'm still not clear how long timestamps are
held, or what the overhead is. It sounds like commit timestamps might be
exactly the same, but that's fine for me. So long as they're never in the
past, it doesn't matter how many timestamps are the same.


Re: Incremental aggregate/rollup strategy advice

2019-07-08 Thread Tatsuo Ishii
> Thank you for your response, I have followed the discussion on Hackers with
> interest. I hope that your efforts are a great success! In my case, I need
> to find a solution available in shipping versions of Postgres. But, since
> you've joined in, I'm curious: What is the advantage of a materialized view
> over a real table? It seems like the update semantics and mechanics are
> more straightforward with a table.

In my understanding, views and materialized views provide users more
flexible and easy way to access base tables. In RDB, base tables are
usually heavily normalized and may not be easy for applications to
extract information. By defining views, apps would have convenient and
intuitive way to get information from base tables.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Measuring the Query Optimizer Effect: Turning off the QO?

2019-07-08 Thread Andrew Gierth
> "Tom" == Tom Mercha  writes:

 Tom> Hi All

 Tom> As we know, a query goes through number of stages before it is
 Tom> executed. One of these stages is query optimization (QO).

That's not really true at all. One of the stages is query _planning_,
which takes the (rewritten) query as input and converts it to something
that the executor can take action on. There isn't actually any separate
"optimization" phase.

-- 
Andrew (irc:RhodiumToad)




Re: Measuring the Query Optimizer Effect: Turning off the QO?

2019-07-08 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Two I'd particularly draw your attention to are
 Tom> join_collapse_limit and from_collapse_limit --- if you set both to
 Tom> 1, that'll effectively disable searching for a good join order,
 Tom> causing the join order to match the syntactic structure of the
 Tom> FROM clause. For instance "FROM a,b,c" will always be done by
 Tom> joining a to b first

FROM a,b,c can always be planned in any join order. If you want to force
the join order you have to set join_collapse_limit=1 AND write it as
FROM a JOIN b ON ... JOIN c ON ...

For an example, try:

explain select * from onek o1, tenk1 t, onek o2
 where o1.unique1=t.unique1 and t.unique1=o2.unique1
   and o1.unique2<10 and o2.unique2<10;

which (at least for me) joins o1 and o2 together first even with the
collapse limits set to 1.

-- 
Andrew (irc:RhodiumToad)




Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-08 Thread John Lumby

> From: Peter Geoghegan 
> Sent: June 26, 2019 12:09 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density 
> as new implicit FILLFACTOR
> 
> > >
> > > I suspect that you might find that the enhancements to B-Tree indexes
> > > that went into Postgres 12 would help with this workload, especially
> > > if you notice that this happens with indexes that have a lot of duplicates
> > >
> 
> > 
> > I had not noticed that,   thanks for pointing it out.  Yes ,  in my 
> > workload most of the indexes in question
> >  are non-unique and some have very low key card.I will try out the 
> > pg-12 beta when I get a chance.
> > 
> 
> 
> It's easy to show problems with very low cardinality indexes in the
> old code. You'll definitely notice a difference there.
> 

I've run a comparison of pg-12beta2 with an older release, pg-9.4.6,   
running the same intensive-delete-insert workload as mentioned before in this 
thread,and would appreciate if you could comment on the results,  
especially as to whether they are roughly in line with your expectation.
I also have one question about a new-in-pg-11 config parm.

Briefly,   the workload involves a repetition of a loop in which ,  on one 
single table which has 8 indexes,  2 unique and 6 non-unique,
about 4300 records are deleted,  and for each of those records,   a 
corresponding record is inserted in which one or more of the non-unique key 
values are modified to values which are not present in the relevant index at 
that point.  In other words, across all the indexes ,  4300 key-tids are 
deleted and then 4300 new key-tids are inserted. At the end of each loop 
there is zero net change in counts of records and keys but possibly some 
increase in numbers of pages,  which is what the test is interested in as well 
as overall throughput rate.

For this test,   I did not modify the index default fill factors which 
therefore remained at 90%,   in order to make a stab at evaluating not setting 
explicit fillfactor.   In each case the indexes were either freshly loaded or 
else reindexed to have the same starting density.   Here are counts and 
sizes after 768 iterations



| tbpages |  tbtuples|ixname| isuniq | livetuples | 
deadtuples | avg_leaf_density | ixpages 
+-+--+--++++--+-

pg-9.4.6
-
 
|   32160 | 2.55548e+06  | metadata_value_boolean   | f  |2932852 | 
 0 |46.39 |   13535
|   32160 | 2.55548e+06  | metadata_value_field_id  | f  |2932852 | 
 0 |48.58 |   12916
|   32160 | 2.55548e+06  | metadata_value_floatnumber   | f  |2932852 | 
 0 |45.97 |   13658
|   32160 | 2.55548e+06  | metadata_value_longnumber| f  |2932852 | 
 0 |48.26 |   13009
|   32160 | 2.55548e+06  | metadata_value_owner_field_u | t  |2932852 | 
 0 |58.69 |   14990
|   32160 | 2.55548e+06  | metadata_value_owner_id  | f  |2932852 | 
 0 |53.06 |   11817
|   32160 | 2.55548e+06  | metadata_value_pkey  | t  |2932852 | 
 0 |57.83 |   10842
|   32160 | 2.55548e+06  | metadata_value_timestamp | f  |2932852 | 
 0 |45.96 |   13663

pg-12beta2
-

|   41814 | 2.519268e+06 | metadata_value_boolean   | f  |2519268 | 
  6766 |63.17 |   10768
|   41814 | 2.519268e+06 | metadata_value_field_id  | f  |2519268 | 
  6766 | 68.7 |   12031
|   41814 | 2.519268e+06 | metadata_value_floatnumber   | f  |2519268 | 
  6766 |61.48 |   11225
|   41814 | 2.519268e+06 | metadata_value_longnumber| f  |2519268 | 
  6766 |58.34 |   12397
|   41814 | 2.519268e+06 | metadata_value_owner_field_u | t  |2519268 | 
  6766 |61.69 |   14780
|   41814 | 2.519268e+06 | metadata_value_owner_id  | f  |2519268 | 
  6766 |48.86 |   12947
|   41814 | 2.519268e+06 | metadata_value_pkey  | t  |2519268 | 
  6766 |59.71 |   11076
|   41814 | 2.519268e+06 | metadata_value_timestamp | f  |2519268 | 
  6766 |57.81 |   12295

Overall,  pg-12beta2 yielded a 6.7% reduction in sizes (total pages) of 
indexes,   which was most noticable with the 6 non-unique ones.
In fact the primary-key index was larger with pg-12.   Would you have 
expected better than 6.7%?   Although a welcome improvement,  I think it is 
not enough to justify stopping use of setting a lower explicit FILLFACTOR. 
Which then brings me back to  thinking there is a case for the subject of this 
thread,  an automatic way to preserve density.

Secondary points:

I did not expe

Re: Measuring the Query Optimizer Effect: Turning off the QO?

2019-07-08 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Two I'd particularly draw your attention to are
>  Tom> join_collapse_limit and from_collapse_limit --- if you set both to
>  Tom> 1, that'll effectively disable searching for a good join order,
>  Tom> causing the join order to match the syntactic structure of the
>  Tom> FROM clause. For instance "FROM a,b,c" will always be done by
>  Tom> joining a to b first

> FROM a,b,c can always be planned in any join order.

Ah, right, -ENOCAFFEINE.  What from_collapse_limit really does is
prevent flattening sub-SELECTs when doing so would add more join-order
freedom in the parent query.  But ignoring sub-SELECTs, using explicit
JOIN syntax with join_collapse_limit=1 will let you control the
join order.

regards, tom lane




OOM with many sorts

2019-07-08 Thread Justin Pryzby
One of our instances was killed:

| Out of memory: Kill process 27670 (postmaster) score 381 or sacrifice child
| Killed process 27670 (postmaster) total-vm:13207796kB, anon-rss:5736kB, 
file-rss:0kB, shmem-rss:12421696kB

The VM has: 32GB RAM, shared_buffers=12GB, work_mem=128MB 
Running only pg11.3

Trying to diagnose the issue by running a query on a narrower range of dates,
it looks to me that's due to many sorts nodes per worker.

Part of the query plan looks like this:

 [...]
 ->  Finalize GroupAggregate (actual rows=1788 loops=1)
   Group Key: huawei_umts_ucell_20181217.sect_id, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text))
   ->  Gather Merge (actual rows=56921 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort (actual rows=18974 loops=3)
   Sort Key: huawei_umts_ucell_20181217.sect_id, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text))
   Sort Method: quicksort  Memory: 36499kB
   Worker 0:  Sort Method: quicksort  Memory: 40275kB
   Worker 1:  Sort Method: quicksort  Memory: 40263kB
   ->  Parallel Append (actual rows=18974 loops=3)
 ->  Partial GroupAggregate (actual rows=1783 
loops=1)
   Group Key: 
huawei_umts_ucell_20181217.sect_id, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text))
   ->  Sort (actual rows=42792 loops=1)
 Sort Key: 
huawei_umts_ucell_20181217.sect_id, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text))
 Worker 0:  Sort Method: quicksort  
Memory: 44997kB
 ->  Parallel Seq Scan on 
huawei_umts_ucell_20181217 (actual rows=42792 loops=1)
   Filter: ((start_time >= 
'2018-12-01 00:00:00'::timestamp without time zone) AND (start_time < 
'2019-01-01 00:00:00'::timestamp without time zone))
 ->  Partial GroupAggregate (actual rows=1783 
loops=1)
   Group Key: 
huawei_umts_ucell_20181216.sect_id, 
(to_char(huawei_umts_ucell_20181216.start_time, '-MM'::text))
   ->  Sort (actual rows=42792 loops=1)
 Sort Key: 
huawei_umts_ucell_20181216.sect_id, 
(to_char(huawei_umts_ucell_20181216.start_time, '-MM'::text))
 Worker 1:  Sort Method: quicksort  
Memory: 44997kB
 ->  Parallel Seq Scan on 
huawei_umts_ucell_20181216 (actual rows=42792 loops=1)
   Filter: ((start_time >= 
'2018-12-01 00:00:00'::timestamp without time zone) AND (start_time < 
'2019-01-01 00:00:00'::timestamp without time zone))

 [...many partial GroupAggregate/Sort repeated here 
for ~200 child tables...]

This isn't the first time I've had to reduce work_mem on a parallel query to
avoid OOM, but it seems unreasonable to be concerned with 50MB sorts.

It looks like the results of each Sort node stay in RAM, during processing of
additional sort nodes (is that required?)

I SET max_parallel_workers_per_gather=0 and dumped mem context:
...
PortalContext: 203948152 total in 25756 blocks; 176600 free (319 chunks); 
203771552 used:
...
Grand total: 1918118632 bytes in 30716 blocks; 192472208 free (3685 chunks); 
1725646424 used

The total shows ~1700MB but the largest context is only 200MB.
"Caller tuples" accounts for 1400MB.

awk '/Caller tuples/{s+=$3}END{print s/1024^3}' /tmp/mem-context
1.44043

I'm attaching full plan and mem context.

It doesn't seem to be changed under pg12b2, which is concerning since PG12
advertizes that it gracefully supports "thousands" of partitions.  It seems to
me that would require tiny work_mem, which would be devastating to some
workloads.  This is not a contrived test case, it's one of our reports run
across ~10 months of history with weekly granularity.

Thanks in advance for any advice.

Justin

QUERY PLAN  
  
--
 GroupAggregate (actual rows=277 loops=1)
   Group Key: sites.site_office, sites.site_location, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text)), 
sites.site_name
   ->  Sort (actual rows=1783 loops=1)
 Sort Key: sites.site_office, sites.site_location, 
(to_char(huawei_um

Elastic Search much faster at statistics?

2019-07-08 Thread Guyren Howe
I find this… surprising. Caching?

http://blog.nrowegt.com/database-vs-elasticsearch-speed-column-statistics/


Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-08 Thread Peter Geoghegan
On Mon, Jul 8, 2019 at 9:23 AM John Lumby  wrote:
> Overall,  pg-12beta2 yielded a 6.7% reduction in sizes (total pages) of 
> indexes,   which was most noticable with the 6 non-unique ones.
> In fact the primary-key index was larger with pg-12.

The avg_leaf_density was actually higher for the primary key index, so
it looks like it really came out slightly ahead on v12. Perhaps you
didn't take deleted_pages into account -- there must be free space
that is reusable by the index that has yet to be reused. It would
probably make sense to subtract that across the board.

> Would you have expected better than 6.7%?

I don't think that a test case that runs VACUUM when there are only
4300 deletions and 4300 insertions is particularly realistic, in
general. You might see a larger difference if there was more churn
between each VACUUM run.

> Although a welcome improvement,  I think it is not enough to justify stopping 
> use of setting a lower explicit FILLFACTOR. Which then brings me back to  
> thinking there is a case for the subject of this thread,  an automatic way to 
> preserve density.

I don't think that such an option would make much sense. The "waves of
misery" paper is about smoothing out the frequency of page splits
following bulk loading and a CREATE INDEX. It is not about making
splits occur less often. It's well understood that a certain amount of
free space is the overhead of B-Tree indexes, albeit an overhead that
can be avoided in certain specific instances.

> And one question :
> I notice that in some pg-11 release,   a new config parameter appeared  :
>   vacuum_cleanup_index_scale_factor

> I have not researched this at all and nor did I set it to anything for my 
> pg-12beta2 run,  but it sounds as though maybe it could be relevant to 
> this kind of workload  -   Is that so?

You seem to be worried about keeping indexes as small as possible.
vacuum_cleanup_index_scale_factor won't help with that.

-- 
Peter Geoghegan




Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-08 Thread John Lumby
Thanks Peter

> From: Peter Geoghegan 
> Sent: July 8, 2019 1:39 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density 
> as new implicit FILLFACTOR
> 
> Perhaps you didn't take deleted_pages into account -- there must be
> free space that is reusable by the index that has yet to be reused.
> It would probably make sense to subtract that across the board.
> 

Correct,  I did not,   but will do so for the next runs.

> 
> I don't think that a test case that runs VACUUM when there are only
> 4300 deletions and 4300 insertions is particularly realistic, in
> general. You might see a larger difference if there was more churn
> between each VACUUM run.
> 

Actually the test workload does not run any explicit VACUUM command,
it relies on autovacuum with these settings
(same settings for 9.4 and 12beta2)

 autovacuum  | on  |   
 autovacuum_analyze_scale_factor | 0.4 |   
 autovacuum_analyze_threshold| 5   |   
 autovacuum_max_workers  | 6   |   
 autovacuum_naptime  | 20  | s 
 autovacuum_vacuum_cost_delay| 0   | ms
 autovacuum_vacuum_cost_limit| |   
 autovacuum_vacuum_scale_factor  | 0   |   
 autovacuum_vacuum_threshold | 2000|   
 autovacuum_work_mem | 1048576 | kB


To correspond to your " more churn between each VACUUM"
Would you then suggest increasing
autovacuum_vacuum_cost_delay and/or  autovacuum_vacuum_scale_factor?

Cheers,  John Lumby



Re: Measuring the Query Optimizer Effect: Turning off the QO?

2019-07-08 Thread Tom Mercha
On 08/07/2019 18:29, Tom Lane wrote:
> Andrew Gierth  writes:
>> "Tom" == Tom Lane  writes:
>>   Tom> Two I'd particularly draw your attention to are
>>   Tom> join_collapse_limit and from_collapse_limit --- if you set both to
>>   Tom> 1, that'll effectively disable searching for a good join order,
>>   Tom> causing the join order to match the syntactic structure of the
>>   Tom> FROM clause. For instance "FROM a,b,c" will always be done by
>>   Tom> joining a to b first
> 
>> FROM a,b,c can always be planned in any join order.
> 
> Ah, right, -ENOCAFFEINE.  What from_collapse_limit really does is
> prevent flattening sub-SELECTs when doing so would add more join-order
> freedom in the parent query.  But ignoring sub-SELECTs, using explicit
> JOIN syntax with join_collapse_limit=1 will let you control the
> join order.
> 
>   regards, tom lane
> 

Thanks for the clarification. I really appreciate taking the time to 
provide such in depth info about my query!


Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-08 Thread Peter Geoghegan
On Mon, Jul 8, 2019 at 12:10 PM John Lumby  wrote:
> Actually the test workload does not run any explicit VACUUM command,
> it relies on autovacuum with these settings
> (same settings for 9.4 and 12beta2)

> To correspond to your " more churn between each VACUUM"
> Would you then suggest increasing
> autovacuum_vacuum_cost_delay and/or  autovacuum_vacuum_scale_factor?

Well, you're still running autovacuum very aggressively here. It'll
easily keep up when run on a relatively small table such as this.

BTW, you should definitely run the latest point release of 9.4 -- not
9.4.6. You're missing years of bug fixes by sticking to such an old
point release, including some rather nasty ones -- 9.4.23 is the
current 9.4 point release. Actually, 9.4 is going to lose support this
year, as the oldest stable version that's currently supported by the
community.

-- 
Peter Geoghegan




Re: Measuring the Query Optimizer Effect: Turning off the QO?

2019-07-08 Thread Tom Mercha
On 08/07/2019 16:23, Andrew Gierth wrote:
>> "Tom" == Tom Mercha  writes:
> 
>   Tom> Hi All
> 
>   Tom> As we know, a query goes through number of stages before it is
>   Tom> executed. One of these stages is query optimization (QO).
> 
> That's not really true at all. One of the stages is query _planning_,
> which takes the (rewritten) query as input and converts it to something
> that the executor can take action on. There isn't actually any separate
> "optimization" phase.
> 

Hi

I was just loosely speaking - I am merely referring to the concept that 
a query can be optimized vs unoptimized. Of course it follows that we 
have a phase or a subset of stages which are responsible for this purpose.


Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

2019-07-08 Thread Peter Geoghegan
On Mon, Jul 8, 2019 at 12:19 PM Peter Geoghegan  wrote:
> Well, you're still running autovacuum very aggressively here. It'll
> easily keep up when run on a relatively small table such as this.

Also, an exactly equal number of insertions and deletions is rather
likely to result in bloated indexes in a way that probably isn't
representative of many workloads. Even if the number of insertions was
only slightly greater than the number of deletions, then the overall
pattern would be one of continual growth, which is generally
considered much more interesting.

For far far more information on the topic than you want, see the paper
"B-Trees with Inserts and Deletes: Why Free-at-Empty Is Better Than
Merge-at-Half":

https://www.sciencedirect.com/science/article/pii/00229390020W

The salient point made by the paper is that good space utilization
rests on the assumption that there are fewer deletes than inserts,
though maybe only slightly fewer:

"The tendency of the utilization to remain near 69% can be explained
by the following arguments: If there are even just a few more inserts
than deletes, the B-tree will grow at the net insert rate (the rate of
inserts minus the rate of deletes)."

If the volume of data never grows past a certain point, then it's
unlikely that the space utilization is very important. This may even
be premature optimization.
-- 
Peter Geoghegan




PGPOOL Question

2019-07-08 Thread Andrew Kerber
I know this is the wrong list for this question, but the pgpool list doesnt
appear to be working currently.

I am hoping someone on this list can point me to an answer.


I have a fairly (I thought) straightforward pgpool setup, the backend is
postrgresql database that is streaming to a hot standby. I am testing the
failover, but when I shutdown the primary database instance and open the
standby database instance for read/write, i get this message. "health check
failed on node 0 but failover is disallowed for the node". What setting am
I missing here?


Applicable section from the log: Jul 5 10:45:14 pgpool1
pgpool-II-10[32640]: [13-1] 2019-07-05 10:45:14: pid 32640: ERROR: failed
to make persistent db connection Jul 5 10:45:14 pgpool1
pgpool-II-10[32640]: [13-2] 2019-07-05 10:45:14: pid 32640: DETAIL:
connection to host:"10.12.1.225:5432" failed Jul 5 10:45:14 pgpool1
pgpool-II-10[32640]: [14-1] 2019-07-05 10:45:14: pid 32640: LOG: health
check failed on node 0 but failover is disallowed for the node

Here are the applicable sections from my pgpool.conf
- Backend Connection Settings -

backend_hostname0 = '10.12.1.225'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/postgres/data/pgdata'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = '10.12.1.61'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/postgres/data/pgdata'
backend_flag1 = 'ALLOW_TO_FAILOVER'

master_slave_mode = on
master_slave_sub_mode = 'stream'
health_check_period = 10
health_check_timeout = 5
health_check_user = 'nobody'
health_check_password = ''
health_check_max_retries = 0
health_check_retry_delay = 1
connect_timeout = 1
failover_command = '' failback_command = ''

-- 
Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'


Re: Elastic Search much faster at statistics?

2019-07-08 Thread Miles Elam
Not enough information. It looks far more like he's testing Ruby's support
for ElasticSearch vs ActiveRecord rather than ES vs PostgreSQL. Caching
could definitely hold a role but also choice of indexes. If ES is
calculating some aggregate info on the fly, the equivalent in PG would be a
stats table updated by trigger or as part of a regularly refreshed
materialized view. That said, if ES does some of that aggregation out of
the box, the convenience by default is compelling for some.

There are indeed cases where a caching engine will outperform a general
purpose data management engine. There are many cases where ACID
requirements preclude the use of a dedicated search engine. Use the right
tool for the job, and for a sufficiently large scale, use multiple tools.

On Mon, Jul 8, 2019 at 9:54 AM Guyren Howe  wrote:

> I find this… surprising. Caching?
>
> http://blog.nrowegt.com/database-vs-elasticsearch-speed-column-statistics/
>


pgpool, pgmaster and pgslave migration to ubuntu 18.04

2019-07-08 Thread a venkatesh
Hi,

I'm exploring the options to migrate postgresql master, slave along with
pgpool from ubuntu14.04 to 18.04. Please help me understand the following.

1. What are the available options for migrating from one version of O/S to
the other (Here ubuntu 14.04 to 18.04) ?
2. Document references.
3. Any best practices.
4. I'm assuming that migrating pgpool and pgmaster,slave can be done
separately and there isn't any dependency between them. (I mean first I can
separately migrate pgpool and then later migrate pgmaster/slave) is that
correct?

Regards,
Venkatesh.


Re: pgpool, pgmaster and pgslave migration to ubuntu 18.04

2019-07-08 Thread Tatsuo Ishii
> Hi,
> 
> I'm exploring the options to migrate postgresql master, slave along with
> pgpool from ubuntu14.04 to 18.04. Please help me understand the following.
> 
> 1. What are the available options for migrating from one version of O/S to
> the other (Here ubuntu 14.04 to 18.04) ?
> 2. Document references.
> 3. Any best practices.
> 4. I'm assuming that migrating pgpool and pgmaster,slave can be done
> separately and there isn't any dependency between them. (I mean first I can
> separately migrate pgpool and then later migrate pgmaster/slave) is that
> correct?

As far as Pgpool-II concerns, that's generally true. I don't know what
version of Pgpool-II you are using, and trying to migrate to but if
you are using Pgpool-II dedicated extensions, be sure to upgrade them
as well. For more information regarding Pgpool-II version up, please
refer to the release notes sections.

https://pgpool.net/mediawiki/index.php/Documentation

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: OOM with many sorts

2019-07-08 Thread Thomas Munro
On Tue, Jul 9, 2019 at 4:44 AM Justin Pryzby  wrote:
>  [...many partial GroupAggregate/Sort repeated 
> here for ~200 child tables...]
>
> This isn't the first time I've had to reduce work_mem on a parallel query to
> avoid OOM, but it seems unreasonable to be concerned with 50MB sorts.

This is a fundamental problem with our model of memory budgets.  To
oversimplify massively, we can use about work_mem * ntables * the
number of parallel workers.  Core counts are going up, and now we have
a convenient way to get large values of ntables.  One of many -hackers
threads to discuss the problem:

https://www.postgresql.org/message-id/flat/CAH2-WzmNwV%3DLfDRXPsmCqgmm91mp%3D2b4FvXNF%3DcCvMrb8YFLfQ%40mail.gmail.com

> It looks like the results of each Sort node stay in RAM, during processing of
> additional sort nodes (is that required?)

That's a very interesting question linked to whole-query memory
budgets.  If you don't have a whole-query memory budget, then you have
nothing to lose by keeping hash and sort results in memory, and you
gain the ability to do cheap rescans (if the node happens to be in a
nest loop).  I discussed this in the context of hash joins over here:

https://www.postgresql.org/message-id/CAEepm%3D0N6DODN7nx6Zb93YOW-y%3DRftNNFZJRaLyG6jbJHJVjsA%40mail.gmail.com

The TL;DR is that some other databases throw out eg hash tables as
soon as possible, and consider changing the shape of their join
nesting in order to minimise the total number of hash tables in memory
at once, in order to come in under a certain budget for memory used at
any one point in time.  That is, they don't just divide query_work_mem
up over all the nodes, they understand things about when nodes will
allocate and free memory.

There be some less clever things we can do, though, before we tackle
the big problems involved.  We could probably still opportunistically
give back memory sooner, when we know there is no chance of rescan,
and other things along those lines.

> Thanks in advance for any advice.

I think it's impossible to choose a single value for work_mem if you
have a mixture of types of queries that hit wildly different numbers
of partitions and workers.  I think this is an ongoing topic for
-hackers.

-- 
Thomas Munro
https://enterprisedb.com




pg_dump and search_path

2019-07-08 Thread Igal @ Lucee.org

I have a custom search_path:

# show search_path;
   search_path
--
 "staging, transient, pg_catalog"
(1 row)

I ran `pg_dump --schema-only` and the only reference in the output to 
search_path is:


  SELECT pg_catalog.set_config('search_path', '', false);

Then one of my functions which does not reference the full name of a 
table with its schema fails with "relation [rel-name] does not exist".


Is that a bug?  I have seen some old posts about this issue but am not 
sure if there is a ticket or why it still is an issue.


Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org 



Re: pg_dump and search_path

2019-07-08 Thread Igal @ Lucee.org

On 7/8/2019 11:48 PM, Igal @ Lucee.org wrote:


I have a custom search_path:

# show search_path;
   search_path
--
 "staging, transient, pg_catalog"
(1 row)

I ran `pg_dump --schema-only` and the only reference in the output to 
search_path is:


  SELECT pg_catalog.set_config('search_path', '', false);

Then one of my functions which does not reference the full name of a 
table with its schema fails with "relation [rel-name] does not exist".


Is that a bug?  I have seen some old posts about this issue but am not 
sure if there is a ticket or why it still is an issue.


Looks like this might be by design.  I will follow the links at 
https://www.postgresql.org/message-id/flat/MWHPR14MB160079A6D9DC64A2F60E9004C0D00%40MWHPR14MB1600.namprd14.prod.outlook.com 
and ask more questions if I have them.


I might need to add the schema name to the table in my function.

Igal