Re: PostGreSQL Replication and question on maintenance

2019-11-16 Thread github kran
Any reply on this please ?.

On Fri, Nov 15, 2019 at 9:10 AM github kran  wrote:

>
>
> On Thu, Nov 14, 2019 at 11:42 PM Pavel Stehule 
> wrote:
>
>>   these numbers looks crazy high - how much memory has your server - more
>> than 1TB?
>>
>
> The cluster got 244 GB of RAM and storage capacity it has is 64 TB.
>
>>
>>
>> pá 15. 11. 2019 v 6:26 odesílatel github kran 
>> napsal:
>>
>>>
>>> Hello postGreSQL Community ,



 Hope everyone is doing great !!.


 *Background*

 We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for
 our day to day activities to write and read data. We have 2 clusters
 running PostgreSQL engine , one cluster

 keeps data up to 60 days and another cluster retains data beyond 1
 year. The data is partitioned close to a week( ~evry 5 days a partition)
 and we have around 5 partitions per month per each table and we have 2
 tables primarily so that will be 10 tables a week. So in the cluster-1 we
 have around  20 partitions and in cluster-2 we have around 160 partitions (
 data from 2018). We also want to keep the data for up to 2 years in the
 cluster-2 to serve the data needs of the customer and so far we reached
 upto 1 year of maintaining this data.



 *Current activity*

 We have a custom weekly migration DB script job that moves data from 1
 cluster to another cluster what it does is the below things.

 1) COPY command to copy the data from cluster-1 and split that data
 into binary files

 2) Writing the binary data into the cluster-2 table

 3) Creating indexes after the data is copied.



 *Problem what we have right now. *

 When the migration activity runs(weekly) from past 2 times , we saw the
 cluster read replica instance has restarted as it fallen behind the
 master(writer instance). Everything

 after that worked seamlessly but we want to avoid the replica getting
 restarted. To avoid from restart we started doing smaller binary files and
 copy those files to the cluster-2

 instead of writing 1 big file of 450 million records. We were
 successful in the recent migration as the reader instance didn’t restart
 after we split 1 big file into multiple files to copy the data over but did
 restart after the indexes are created on the new table as it could be write
 intensive.



 *DB parameters set on migration job*

 work_mem set to 8 GB  and maintenace_work_mem=32 GB.

>>>
>>
>>
>> Indexes per table = 3

 total indexes for 2 tables = 5



 *DB size*

 Cluster-2 = 8.6 TB

 Cluster-1 = 3.6 TB

 Peak Table relational rows = 400 - 480 million rows

 Average table relational rows = 300 - 350 million rows.

 Per table size = 90 -95 GB , per table index size is about 45 GB



 *Questions*

 1) Can we decrease the maintenace_work_mem to 16 GB and will it slow
 down the writes to the cluster , with that the reader instance can sync the
 data slowly ?.

 2) Based on the above use case what are your recommendations to keep
 the data longer up to 2 years ?

 3) What other recommendations you recommend ?.





 Appreciate your replies.

 THanks
 githubkran

>


Re: access to original-statement predicates in an INSTEAD-OF row trigger

2019-11-16 Thread John Lumby
On 11/15/19 17:38, Adrian Klaver wrote:
> On 11/15/19 1:54 PM, John Lumby wrote:
>> Adrian Klaver wrote :
>>>
>> We need to run with Read Committed.
>>
>> I am looking for a solution which does not alter the application or 
>> overall behaviour,
>> but just addresses detecting which predicates to apply in some way.
>
> Not sure how that could be pulled off with Read Committed as it would 
> involve predicting the future from the POV of the transaction.
>

No need for time-travel!
Let me re-phrase the question in a simpler fashion :
How can a row trigger access the original SQL statement at the root of 
the current operation?
Either in string form or any other form (e.g. parse tree).

>>
>> Cheers,  John
>>
>
>





Function performance degrades after repeated execution

2019-11-16 Thread Dave Roberge
Hi,

We've been troubleshooting a slow running function in our postgres database. 
I've been able to boil it down to the simplest function possible. It looks like 
this:

FOR rec IN select 1 as matchval FROM table1 t1, table2 t2
  join table3 t3 on t3.col = t2.col
  WHERE t1.col = id
LOOP
  IF rec.matchval > 0 THEN
co := co + 1;
  END IF;
  if co % 100 = 0 then
raise notice 'match value %', co;
  end if;
END LOOP;

Here's the interesting parts:
- The result of that query returns about 13,000 rows.
- If I open a PSQL session and execute the function it returns almost 
immediately.
- If I execute the same function 4 more times in the same session (a total of 5 
times) it returns immediately.
- On the 6th execution it slows down. It processes 100 records every 1.5 
minutes.
- On every subsequent execution from the same session (after 5 times) it is 
slow.
- It reliably slows down after 5 consecutive executions.
- If I exit the PSQL session and open a new one the function returns 
immediately (up to the 6th execution.)
- If I replace the function from a separate session after executing it 5 times, 
it returns immediately up to 5 executions.
- The CPU spikes to 100% after the 5 execution.

I'm attempting to understand what is causing the slow down after 5 consecutive 
executions. But I'm having a hard time getting insight. We are on PostgreSQL 
9.6.15.

We've tried:
- Increase logging to debug5 but don't get any helpful feedback there.
- Reviewing the execution plan of the query. Seems fine when running it outside 
of the function.
- Turn on temp file logging -- but no temp files are logged.

Any ideas for where we might get insight? Or clues as to what is happening?

Thank you.



Re: Function performance degrades after repeated execution

2019-11-16 Thread Pavel Stehule
so 16. 11. 2019 v 16:06 odesílatel Dave Roberge 
napsal:

> Hi,
>
> We've been troubleshooting a slow running function in our postgres
> database. I've been able to boil it down to the simplest function possible.
> It looks like this:
>
> FOR rec IN select 1 as matchval FROM table1 t1, table2 t2
>   join table3 t3 on t3.col = t2.col
>   WHERE t1.col = id
> LOOP
>   IF rec.matchval > 0 THEN
> co := co + 1;
>   END IF;
>   if co % 100 = 0 then
> raise notice 'match value %', co;
>   end if;
> END LOOP;
>
> Here's the interesting parts:
> - The result of that query returns about 13,000 rows.
> - If I open a PSQL session and execute the function it returns almost
> immediately.
> - If I execute the same function 4 more times in the same session (a total
> of 5 times) it returns immediately.
> - On the 6th execution it slows down. It processes 100 records every 1.5
> minutes.
> - On every subsequent execution from the same session (after 5 times) it
> is slow.
> - It reliably slows down after 5 consecutive executions.
> - If I exit the PSQL session and open a new one the function returns
> immediately (up to the 6th execution.)
> - If I replace the function from a separate session after executing it 5
> times, it returns immediately up to 5 executions.
> - The CPU spikes to 100% after the 5 execution.
>
> I'm attempting to understand what is causing the slow down after 5
> consecutive executions. But I'm having a hard time getting insight. We are
> on PostgreSQL 9.6.15.
>

6 time slow execution is known issue - it is unwanted effect of query plan
cache.

See part of doc https://www.postgresql.org/docs/9.6/sql-prepare.html

Prepared statements can use generic plans rather than re-planning with each
set of supplied EXECUTE values. This occurs immediately for prepared
statements with no parameters; otherwise it occurs only after five or more
executions produce plans whose estimated cost average (including planning
overhead) is more expensive than the generic plan cost estimate. Once a
generic plan is chosen, it is used for the remaining lifetime of the
prepared statement. Using EXECUTE values which are rare in columns with
many duplicates can generate custom plans that are so much cheaper than the
generic plan, even after adding planning overhead, that the generic plan
might never be used.

in postgresql 9.6 you can use dynamic query as workaround - it generates
only one shot plans, and it should be ok every time

FOR rec IN EXECUTE 'SELECT ...

https://www.postgresql.org/docs/9.6/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING

Regards

Pavel



> We've tried:
> - Increase logging to debug5 but don't get any helpful feedback there.
> - Reviewing the execution plan of the query. Seems fine when running it
> outside of the function.
> - Turn on temp file logging -- but no temp files are logged.
>
> Any ideas for where we might get insight? Or clues as to what is happening?
>
> Thank you.
>
>


Re: access to original-statement predicates in an INSTEAD-OF row trigger

2019-11-16 Thread Tom Lane
John Lumby  writes:
> How can a row trigger access the original SQL statement at the root of 
> the current operation?

It can't; at least not in any way that'd be reliable or maintainable.

I concur with the upthread recommendation that switching to serializable
mode would be a more manageable way of dealing with concurrent-update
problems.

regards, tom lane




Re: Function performance degrades after repeated execution

2019-11-16 Thread Ron

On 11/16/19 8:22 AM, Dave Roberge wrote:

Hi,

We've been troubleshooting a slow running function in our postgres database. 
I've been able to boil it down to the simplest function possible. It looks like 
this:

FOR rec IN select 1 as matchval FROM table1 t1, table2 t2
   join table3 t3 on t3.col = t2.col
   WHERE t1.col = id
LOOP
   IF rec.matchval > 0 THEN
 co := co + 1;
   END IF;
   if co % 100 = 0 then
 raise notice 'match value %', co;
   end if;
END LOOP;


Where are you joining table1 to either table2 or table3?

--
Angular momentum makes the world go 'round.




Re: Function performance degrades after repeated execution

2019-11-16 Thread Pavel Stehule
so 16. 11. 2019 v 16:46 odesílatel Ron  napsal:

> On 11/16/19 8:22 AM, Dave Roberge wrote:
> > Hi,
> >
> > We've been troubleshooting a slow running function in our postgres
> database. I've been able to boil it down to the simplest function possible.
> It looks like this:
> >
> > FOR rec IN select 1 as matchval FROM table1 t1, table2 t2
> >join table3 t3 on t3.col = t2.col
> >WHERE t1.col = id
> > LOOP
> >IF rec.matchval > 0 THEN
> >  co := co + 1;
> >END IF;
> >if co % 100 = 0 then
> >  raise notice 'match value %', co;
> >end if;
> > END LOOP;
>
> Where are you joining table1 to either table2 or table3?
>

good shot - there is maybe unwanted cartesian product

Pavel


> --
> Angular momentum makes the world go 'round.
>
>
>


Re: PostGreSQL Replication and question on maintenance

2019-11-16 Thread Jeff Janes
On Thu, Nov 14, 2019 at 12:23 PM github kran  wrote:

>
>>
>> *Problem what we have right now. *
>>
>> When the migration activity runs(weekly) from past 2 times , we saw the
>> cluster read replica instance has restarted as it fallen behind the
>> master(writer instance).
>>
>
I can't figure out what your setup is here.  You must be using logical
replication (not physical) or you wouldn't be able to write to the replica
at all.  But if you are using logical replication, why do you also need
these weekly jobs?  Why isn't logical replication taking care of it?




> Everything
>>
>> after that worked seamlessly but we want to avoid the replica getting
>> restarted. To avoid from restart we started doing smaller binary files and
>> copy those files to the cluster-2
>>
>
Who restarted it?  I am not aware of any case where the replica responds to
falling behind by restarting itself.  With physical replication, it can
start cancelling queries, but you don't seem to be using physical
replication.

Cheers,

Jeff

>


Re: Weird ranking results with ts_rank

2019-11-16 Thread Jeff Janes
On Fri, Nov 15, 2019 at 1:31 AM Javier Ayres  wrote:

> Hi everybody.
>
> I'm implementing a solution that uses PostgreSQL's full text search
> capabilities and I have come across a particular set of results for ts_rank
> that don't seem to make sense according to the documentation.
>

While the documentation doesn't come out and say, my interpretation is that
ts_rank assumes there is a match in the first place, and by implication is
undefined/unspecified if there is no match.

select to_tsvector('search for a text box') @@ to_tsquery('circle | lot <->
box');
 ?column?
--
 f
(1 row)

Cheers,

Jeff


REINDEX VERBOSE unknown option

2019-11-16 Thread Josef Šimánek
Hello,

according to https://www.postgresql.org/docs/11/sql-reindex.html VERBOSE
option is valid for REINDEX command for 11.3 PostgreSQL server. Anyway I'm
getting error using VERBOSE option.

project_production=# REINDEX VERBOSE TABLE sales;
ERROR:  syntax error at or near "VERBOSE"
LINE 1: REINDEX VERBOSE TABLE sales;

Time: 0.235 ms

I'm wondering if I'm doing anything wrong or actual documentation is wrong.
Any ideas?


Re: REINDEX VERBOSE unknown option

2019-11-16 Thread Josef Šimánek
Ahh, I just tried to do the same with reindexdb cli tool and the
actual syntax is REINDEX (VERBOSE) TABLE sales; Sorry for unnecessary
question. Anyway maybe we can add this to documentation as a example. I can
prepare patch for this if welcomed.

so 16. 11. 2019 v 18:40 odesílatel Josef Šimánek 
napsal:

> Hello,
>
> according to https://www.postgresql.org/docs/11/sql-reindex.html VERBOSE
> option is valid for REINDEX command for 11.3 PostgreSQL server. Anyway I'm
> getting error using VERBOSE option.
>
> project_production=# REINDEX VERBOSE TABLE sales;
> ERROR:  syntax error at or near "VERBOSE"
> LINE 1: REINDEX VERBOSE TABLE sales;
>
> Time: 0.235 ms
>
> I'm wondering if I'm doing anything wrong or actual documentation is
> wrong. Any ideas?
>


Re: here does postgres take its timezone information from?

2019-11-16 Thread Thomas Munro
On Sat, Nov 16, 2019 at 7:13 PM Tom Lane  wrote:
> Palle Girgensohn  writes:
> > 15 nov. 2019 kl. 21:32 skrev Thomas Munro :
> >> Ugh.  It doesn't have the old backward compatibility names like
> >> US/Pacific installed by default, which is a problem if that's what
> >> initdb picked for your cluster (or you've stored references to any of
> >> those names in other ways).
>
> > One quick fix is to revert the change. Tom thinks this is not reason to 
> > revert. Would it be enough to edit the postgresql.conf to use the correct 
> > "modern" name for US/Pacific (PST?)? In rhar case, an update note might be 
> > sufficient?
>
> I think the "official" name of that zone is America/Los_Angeles.
> But initdb might seize on the US/Pacific alias, if available,
> because it's shorter.  We've seen related problems with other
> time zone names, though usually it was just cosmetic and not a
> reason for the postmaster to fail to start.
>
> Yes, changing the zone name in postgresql.conf should be a sufficient
> fix.  In theory, a FreeBSD user ought to know the "official" alias
> for their zone, since the rest of the system would expect that.
> So this is slightly tedious if initdb chose a non-official alias,
> but I don't think it's reason to panic.

Perhaps the best thing would be to revert this for the older
PostgreSQL releases so that people doing minor version upgrades are
inconvenienced by a system that can't start up after "pkg upgrade",
but do it for 12 since not many people will be using that yet?




Re: ERROR: COPY escape must be a single one-byte character (multi-delimiter appears to work on Postgres 9.0 but does not on Postgres 9.2)

2019-11-16 Thread Nicolas Paris
> I am unable to edit this Talend job, as it's very old and we do not have the
> source code for the job anymore. I am unable to see what the actual delimiter

Compiled talend jobs produce jars file with java .class files in which
the SQL statements are in plain text. You should be at least able to get
the copy statement (which is plain text SQL), and also being able to
modify it.

On Wed, Nov 13, 2019 at 07:40:59PM -0500, Brandon Ragland wrote:
> Hello,
> 
> I have a Talend enterprise job that loads data into a PostgreSQL database via
> the COPY command. When migrating to a new server this command fails with the
> following error message: org.postgresql.util.PSQLException:ERROR: COPY escape
> must be a single one-byte character
> 
> The thing is, I looked over the documentation for both Postgres 9.0 and 9.2.
> Both documentations say that multi-byte delimiters are not allowed. So I'm 
> very
> confused on why this job works perfectly on Postgres 9.0 but not on 9.2.
> 
> I am unable to edit this Talend job, as it's very old and we do not have the
> source code for the job anymore. I am unable to see what the actual delimiter
> is. I am also unable to see exactly how the COPY command is being run, such as
> whether it's pushing directly to the server via the Postgres driver, or if 
> it's
> created a temporary CSV file somewhere and then loading the data into the
> server. I believe the reason we have multi byte delimiters setup is due to the
> use of various special characters in a few of the columns for multiple tables.
> 
> I am not aware of any edits to the source code of the old 9.0 Postgres server.
> 
> The reason we are migrating servers is due to the end of life for CentOS 5. 
> The
> new server runs CentOS 7. I believe that both servers are using the default
> Postgres versions that come in the default CentOS repositories. I know for 
> sure
> that the CentOS 7 server is indeed running the default Postgres version, as I
> installed it myself through yum.
> 
> Any help would be greatly appreciated.
> 
> Also, is there a way to copy the old Postgres server, dependencies, and
> executables to our new server, in case the source was modified?
> 
> Brandon Ragland
> Software Engineer
> BREAKFRONT SOFTWARE
> Office: 704.688.4085 | Mobile: 240.608.9701 | Fax: 704.973.0607

-- 
nicolas




Re: pg12 rc1 on CentOS8 depend python2

2019-11-16 Thread Devrim Gündüz

Hi,

On Fri, 2019-09-27 at 10:50 +0900, keisuke kuroda wrote:
> CentOS8 does not have python2 installed by default, But PostgreSQL is
> dependent on python2.
> 
> Do we need to install python2 when we use PostgreSQL on CentOS8?

For the archives: I fixed this in 12.1 packages. Core package do not depend on
PythonX.

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: pg12 rc1 on CentOS8 depend python2

2019-11-16 Thread Devrim Gündüz

Hi,

On Fri, 2019-09-27 at 09:38 -0400, Tom Lane wrote:
> Another idea might be to bundle them into the plpython package
> instead of contrib (and similarly for the plperl transforms).

This went into the last week's minor updates.

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: REINDEX VERBOSE unknown option

2019-11-16 Thread Pavel Stehule
so 16. 11. 2019 v 18:43 odesílatel Josef Šimánek 
napsal:

> Ahh, I just tried to do the same with reindexdb cli tool and the
> actual syntax is REINDEX (VERBOSE) TABLE sales; Sorry for unnecessary
> question. Anyway maybe we can add this to documentation as a example. I can
> prepare patch for this if welcomed.
>

Documentation patch is good idea.

Pavel


> so 16. 11. 2019 v 18:40 odesílatel Josef Šimánek 
> napsal:
>
>> Hello,
>>
>> according to https://www.postgresql.org/docs/11/sql-reindex.html VERBOSE
>> option is valid for REINDEX command for 11.3 PostgreSQL server. Anyway I'm
>> getting error using VERBOSE option.
>>
>> project_production=# REINDEX VERBOSE TABLE sales;
>> ERROR:  syntax error at or near "VERBOSE"
>> LINE 1: REINDEX VERBOSE TABLE sales;
>>
>> Time: 0.235 ms
>>
>> I'm wondering if I'm doing anything wrong or actual documentation is
>> wrong. Any ideas?
>>
>


Function's execute overhead reducing

2019-11-16 Thread Игорь Выскорко
Hi all!

The best way to describe my question is to show the code as first:

create table tst(
  id int primary key,
  j1 jsonb,
  j2 jsonb
);

insert into tst
select
  ser,
  jsonb_build_object(
floor(random() * 10 + 1), floor(random() * 1000 + 1),
floor(random() * 10 + 1), floor(random() * 1000 + 1),
floor(random() * 10 + 1), floor(random() * 1000 + 1)
  ),
  jsonb_build_object(
floor(random() * 10 + 1), floor(random() * 1000 + 1),
floor(random() * 10 + 1), floor(random() * 1000 + 1),
floor(random() * 10 + 1), floor(random() * 1000 + 1)
  )
from generate_series(1, 50) ser;

analyze tst;


-- original func is a bit complicated. But it doesn't matter here
create or replace function tst_func(a jsonb, b jsonb) returns bigint
  stable
  language sql
as $$
  select
sum(
  ((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int
)
  from
jsonb_each(a) _a
$$;

-- get plain data
explain analyze select
  id,
  j1,
  j2
from
  tst;

-- use subquery (the same code as in function)
explain analyze select
  id,
  j1,
  j2,
  (
select
  sum(
((_a.value::text)::int - (coalesce(j2->>_a.key, '0'))::int)::int
  )
from
  jsonb_each(j1) _a
  )
from
  tst;

-- use function
explain analyze select
  id,
  j1,
  j2,
  tst_func(j1, j2)
from
  tst;

select version();


And after run it I got following results:

CREATE TABLE
INSERT 0 50
ANALYZE
CREATE FUNCTION
  QUERY PLAN
  
--
 Seq Scan on tst  (cost=0.00..13558.00 rows=50 width=108) (actual 
time=0.009..40.348 rows=50 loops=1)
 Planning time: 0.189 ms
 Execution time: 56.356 ms
(3 rows)

QUERY PLAN  
  
--
 Seq Scan on tst  (cost=0.00..1644808.00 rows=50 width=116) (actual 
time=0.021..1966.190 rows=50 loops=1)
   SubPlan 1
 ->  Aggregate  (cost=3.25..3.26 rows=1 width=8) (actual time=0.004..0.004 
rows=1 loops=50)
   ->  Function Scan on jsonb_each _a  (cost=0.00..1.00 rows=100 
width=64) (actual time=0.002..0.002 rows=3 loops=50)
 Planning time: 0.072 ms
 Execution time: 1982.192 ms
(6 rows)

   QUERY PLAN   
 
-
 Seq Scan on tst  (cost=0.00..138558.00 rows=50 width=116) (actual 
time=0.072..5308.897 rows=50 loops=1)
 Planning time: 0.067 ms
 Execution time: 5328.196 ms
(3 rows)

 version
 
-
 PostgreSQL 9.6.13 on x86_64-pc-linux-gnu (Ubuntu 9.6.13-1.pgdg16.04+1), 
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit
(1 row)


As you see, subquery version needs 2 seconds when func version needs more than 
5. And it's sad to see. I love functions and actually this func (its production 
version) is widely used across our project. 
I tried to alter function as immutable and even parallel safe but it doesn't 
help.
I understand that functions execution is not free, but why so much? 
So, my question is: is there any way to make this function works faster? 
Something similar to prepared statement or maybe rules (create rule) or 
whatever else is available.

PS current situation maybe solved by denormalization with precalculations of 
function and storing results along with data, but it's not the way i would like 
to use, because it leads to more issues to solve (invalidation, for instance)