case statement within insert

2018-05-25 Thread tango ward
I want to insert data from mysql into a table in postgresql. I want to
check when the subjectcode contains PE or NSTP so I can assign True or
False to another column in destination DB.


# Source data:

 # Source data: MySQL
   curr_msql.execute(''' SELECT code, subjectname
  FROM test_subj ''')

# Destination
for row in curr_msql:
curr_psql.execute(''' INSERT INTO subs (
created, modified,
subjcode, subjname,
is_pe_or_nstp)

  VALUES (current_timestamp,
current_timestamp,
  %s, %s,
  CASE
WHEN code like '%%PE%%' or code like '%%NSTP%%'
THEN True
ELSE False
END) ''', (row['code'], row['subjectname']))

I am getting TypeError: not all arguments converted during string
formatting.

Any advice pls?


Re: case statement within insert

2018-05-25 Thread David G. Johnston
On Friday, May 25, 2018, tango ward  wrote:

>
> WHEN code like '%%PE%%' or code like '%%NSTP%%'
>
> I am getting TypeError: not all arguments converted during string
> formatting.
>
> Any advice pls?
>

Unclear how to inject percent signs in the query string here.  I'd just
avoid them and use regular expressions.

David J.


RE: Insert data if it is not existing

2018-05-25 Thread Igor Neyman

From: tango ward [mailto:tangowar...@gmail.com]
Sent: Thursday, May 24, 2018 8:16 PM
To: Adrian Klaver 
Cc: David G. Johnston ; 
pgsql-generallists.postgresql.org 
Subject: Re: Insert data if it is not existing

On Thu, May 24, 2018 at 9:38 PM, Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:
On 05/23/2018 06:03 PM, tango ward wrote:


Updated my code to this:

curr.pgsql.execute('''
INSERT INTO my_table(name, age)
SELECT %s, %s
WHERE NOT EXISTS(SELECT name FROM my_table WHERE name= name)
''', ('Scott', 23))

If I remove SELECT statement, I will get an error message: error : 
psycopg2.ProgrammingError: syntax error at or near "WHERE"
LINE 12: WHERE NOT EXISTS

Trying to coordinate with Lead Dev about adding Index On The Fly

I tried to figure how to make this work and could not, so I led you down a 
false path.




--
Adrian Klaver
adrian.kla...@aklaver.com

What I tried is
'''INSERT INTO my_table(name, age)
SELECT %s, %s,
WHERE NOT EXISTS(SELECT name FROM my_table WHERE name=%s)''', ('Scott', 23, 
'Scott'))

How about:

'''INSERT INTO my_table(name, age)
VALUES( %s, %s)
WHERE NOT EXISTS(SELECT name FROM my_table WHERE name=%s)''', ('Scott', 23, 
'Scott'))

Regards,
Igor Neyman





Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunk number 0 for toast value .. in pg_toast_2619

2018-05-25 Thread Justin Pryzby
On Sat, May 19, 2018 at 02:39:26PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > [pryzbyj@database ~]$ while :; do for db in `psql postgres -Atc "SELECT 
> > datname FROM pg_database WHERE datallowconn"`; do for t in pg_statistic 
> > pg_attrdef pg_constraint; do echo "$db.$t..."; 
> > PGOPTIONS=-cstatement_timeout='9s' psql $db -qc "VACUUM FULL $t"; done; 
> > done; done
> 
> > ...
> > postgres.pg_statistic...
> > postgres.pg_attrdef...
> > postgres.pg_constraint...
> > template1.pg_statistic...
> > template1.pg_attrdef...
> > template1.pg_constraint...
> > ts.pg_statistic...
> > ERROR:  canceling statement due to statement timeout
> > ts.pg_attrdef...
> > ts.pg_constraint...
> > postgres.pg_statistic...
> > ERROR:  missing chunk number 0 for toast value 3372855171 in pg_toast_2619
> 
> Hm, so was the timeout error happening every time through on that table,
> or just occasionally, or did you provoke it somehow?  I'm wondering how
> your 9s timeout relates to the expected completion time.

Actually statement_timeout isn't essential for this (maybe it helps to triggers
it more often - not sure).

Could you try:
time sh -ec 'while :; do time psql postgres -c "VACUUM FULL VERBOSE 
pg_toast.pg_toast_2619"; psql postgres -c "VACUUM FULL VERBOSE pg_statistic"; 
done'; date

Three servers experienced error within 30min, but one server didn't fail until
12h later, and a handful others still haven't failed..

Does this help at all ?
 2018-05-24 21:57:49.98-03  | 5b075f8d.1ad1 | LOG| pryzbyj   | 
postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
 2018-05-24 21:57:50.067-03 | 5b075f8d.1ad1 | INFO   | pryzbyj   | 
postgres | vacuuming "pg_toast.pg_toast_2619"
 2018-05-24 21:57:50.09-03  | 5b075f8d.1ad1 | INFO   | pryzbyj   | 
postgres | "pg_toast_2619": found 0 removable, 408 nonremovable row versions in 
99 pages
 2018-05-24 21:57:50.12-03  | 5b075f8e.1ada | LOG| pryzbyj   | 
postgres | statement: VACUUM FULL VERBOSE pg_statistic
 2018-05-24 21:57:50.129-03 | 5b075f8e.1ada | INFO   | pryzbyj   | 
postgres | vacuuming "pg_catalog.pg_statistic"
 2018-05-24 21:57:50.185-03 | 5b075f8e.1ada | ERROR  | pryzbyj   | 
postgres | missing chunk number 0 for toast value 3382957233 in pg_toast_2619

Some thing; this server has autovacuum logging, although it's not clear to me
if that's an essential component of the problem, either:
 2018-05-24 21:16:39.856-06 | LOG   | 5b078017.7b99 | pryzbyj   | postgres | 
statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
 2018-05-24 21:16:39.876-06 | LOG   | 5b078010.7968 |   |  | 
automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1 
   +
|   |   |   |  | 
pages: 0 removed, 117 r
 2018-05-24 21:16:39.909-06 | INFO  | 5b078017.7b99 | pryzbyj   | postgres | 
vacuuming "pg_toast.pg_toast_2619"
 2018-05-24 21:16:39.962-06 | INFO  | 5b078017.7b99 | pryzbyj   | postgres | 
"pg_toast_2619": found 0 removable, 492 nonremovable row versions in 117 pages
 2018-05-24 21:16:40.025-06 | LOG   | 5b078018.7b9b | pryzbyj   | postgres | 
statement: VACUUM FULL VERBOSE pg_statistic
 2018-05-24 21:16:40.064-06 | INFO  | 5b078018.7b9b | pryzbyj   | postgres | 
vacuuming "pg_catalog.pg_statistic"
 2018-05-24 21:16:40.145-06 | ERROR | 5b078018.7b9b | pryzbyj   | postgres | 
missing chunk number 0 for toast value 765874692 in pg_toast_2619

Or this one?

postgres=# SELECT log_time, database, user_name, session_id, left(message,999) 
FROM postgres_log WHERE (log_time>='2018-05-24 19:56' AND log_time<'2018-05-24 
19:58') AND (database='postgres' OR database IS NULL OR user_name IS NULL OR 
user_name='pryzbyj') AND message NOT LIKE 'statement:%' ORDER BY 1;

log_time   | 2018-05-24 19:56:35.396-04
database   | 
user_name  | 
session_id | 5b075131.3ec0
left   | skipping vacuum of "pg_toast_2619" --- lock not available

...

log_time   | 2019-05-24 19:57:35.78-04
database   | 
user_name  | 
session_id | 5b07516d.445e
left   | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index 
scans: 1
   : pages: 0 removed, 85 remain, 0 skipped due to pins, 0 skipped 
frozen
   : tuples: 1 removed, 348 remain, 0 are dead but not yet removable, 
oldest xmin: 63803106
   : buffer usage: 179 hits, 4 misses, 87 dirtied
   : avg read rate: 1.450 MB/s, avg write rate: 31.531 MB/s
   : system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s

log_time   | 2018-05-24 19:57:35.879-04
database   | postgres
user_name  | pryzbyj
session_id | 5b07516f.447f
left   | missing chunk number 0 for toast value 624341680 in pg_toast_2619

log_time   | 2018-05-24 19:57:44.332-04
database   |
user_name  |
session_id | 5af9fda3.70d5
left   | checkpoint starting: time

Justin



Re: case statement within insert

2018-05-25 Thread Adrian Klaver

On 05/25/2018 02:04 AM, tango ward wrote:



I want to insert data from mysql into a table in postgresql. I want to 
check when the subjectcode contains PE or NSTP so I can assign True or 
False to another column in destination DB.



# Source data:

  # Source data: MySQL
    curr_msql.execute(''' SELECT code, subjectname
   FROM test_subj ''')

# Destination
for row in curr_msql:
     curr_psql.execute(''' INSERT INTO subs (
     created, modified,
     subjcode, subjname,
     is_pe_or_nstp)

   VALUES (current_timestamp, 
current_timestamp,

   %s, %s,
   CASE
     WHEN code like '%%PE%%' or code like '%%NSTP%%'


Shouldn't the above be?:

subjcode like '%%PE%%' or subjcode like '%%NSTP%%'


     THEN True
     ELSE False
     END) ''', (row['code'], row['subjectname']))

I am getting TypeError: not all arguments converted during string 
formatting.


Any advice pls?









--
Adrian Klaver
adrian.kla...@aklaver.com



Re: case statement within insert

2018-05-25 Thread Adrian Klaver

On 05/25/2018 06:52 AM, Adrian Klaver wrote:

On 05/25/2018 02:04 AM, tango ward wrote:



I want to insert data from mysql into a table in postgresql. I want to 
check when the subjectcode contains PE or NSTP so I can assign True or 
False to another column in destination DB.



# Source data:

  # Source data: MySQL
    curr_msql.execute(''' SELECT code, subjectname
   FROM test_subj ''')

# Destination
for row in curr_msql:
 curr_psql.execute(''' INSERT INTO subs (
 created, modified,
 subjcode, subjname,
 is_pe_or_nstp)

   VALUES (current_timestamp, 
current_timestamp,

   %s, %s,
   CASE
 WHEN code like '%%PE%%' or code like '%%NSTP%%'


Shouldn't the above be?:

subjcode like '%%PE%%' or subjcode like '%%NSTP%%'


Turns out that won't work as you cannot refer to a column in the CASE:

cur.execute("insert into  cell_per(category, cell_per, season, 
plant_type, short_category)  values('herb test', 1, 'annual', CASE WHEN 
category like '%%herb%%' THEN 'herb' ELSE 'flower' END, 'ht' )")


ProgrammingError: column "category" does not exist
LINE 1: ...gory)  values('herb test', 1, 'annual', CASE WHEN category l...
 ^
HINT:  There is a column named "category" in table "cell_per", but it 
cannot be referenced from this part of the query.



This works:

cur.execute("insert into  cell_per(category, cell_per, season, 
plant_type, short_category)  values(%s, 1, 'annual', CASE WHEN %s like 
'%%herb%%' THEN 'herb' ELSE 'flower' END, 'ht' )", ('herb test', 'herb 
test'))



So change code to row['code']?






 THEN True
 ELSE False
 END) ''', (row['code'], row['subjectname']))

I am getting TypeError: not all arguments converted during string 
formatting.


Any advice pls?












--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Insert data if it is not existing

2018-05-25 Thread Raymond O'Donnell

On 25/05/18 14:35, Igor Neyman wrote:


'''INSERT INTO my_table(name, age)

VALUES( %s, %s)

WHERE NOT EXISTS(SELECT name FROM my_table WHERE name=%s)''', ('Scott', 
23, 'Scott'))


I haven't been following this thread, so maybe this has already been 
said, but I think you need a SELECT in there as well:


insert into my_table(name, age)
select 'value 1', 'value 2'
where not exists (
  .
);

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie



Re: case statement within insert

2018-05-25 Thread Adrian Klaver

On 05/25/2018 07:05 AM, Adrian Klaver wrote:

On 05/25/2018 06:52 AM, Adrian Klaver wrote:

On 05/25/2018 02:04 AM, tango ward wrote:



I want to insert data from mysql into a table in postgresql. I want 
to check when the subjectcode contains PE or NSTP so I can assign 
True or False to another column in destination DB.



# Source data:

  # Source data: MySQL
    curr_msql.execute(''' SELECT code, subjectname
   FROM test_subj ''')

# Destination
for row in curr_msql:
 curr_psql.execute(''' INSERT INTO subs (
 created, modified,
 subjcode, subjname,
 is_pe_or_nstp)

   VALUES (current_timestamp, 
current_timestamp,

   %s, %s,
   CASE
 WHEN code like '%%PE%%' or code like '%%NSTP%%'


Shouldn't the above be?:

subjcode like '%%PE%%' or subjcode like '%%NSTP%%'


Turns out that won't work as you cannot refer to a column in the CASE:

cur.execute("insert into  cell_per(category, cell_per, season, 
plant_type, short_category)  values('herb test', 1, 'annual', CASE WHEN 
category like '%%herb%%' THEN 'herb' ELSE 'flower' END, 'ht' )")


ProgrammingError: column "category" does not exist
LINE 1: ...gory)  values('herb test', 1, 'annual', CASE WHEN category l...
  ^
HINT:  There is a column named "category" in table "cell_per", but it 
cannot be referenced from this part of the query.



This works:

cur.execute("insert into  cell_per(category, cell_per, season, 
plant_type, short_category)  values(%s, 1, 'annual', CASE WHEN %s like 
'%%herb%%' THEN 'herb' ELSE 'flower' END, 'ht' )", ('herb test', 'herb 
test'))



So change code to row['code']?


Insufficient caffeine.

...
WHEN %s like '%%PE%%' or %s like '%%NSTP%%'

...
,  (row['code'], row['subjectname'], row['code'], row['code'])

FYI this is why I like the named parameters then the above could be 
shortened to:


{'code': row['code'], 'subjectname': row['subjectname']}

and you get clearer query code:

VALUES (current_timestamp, current_timestamp,
  %(code)s, %(subjectname)s,
  CASE
WHEN %(code)s like '%%PE%%' or %(code)s like '%%NSTP%%'
THEN True
ELSE False
END)








 THEN True
 ELSE False
 END) ''', (row['code'], row['subjectname']))

I am getting TypeError: not all arguments converted during string 
formatting.


Any advice pls?















--
Adrian Klaver
adrian.kla...@aklaver.com



Re: case statement within insert

2018-05-25 Thread Francisco Olarte
On Fri, May 25, 2018 at 11:04 AM, tango ward  wrote:
>   CASE
> WHEN code like '%%PE%%' or code like '%%NSTP%%'
> THEN True
> ELSE False

I cannot advise you on the %% stuff, which I do not totally understand, but

CASE WHEN condition THEN true ELSE false END

when contition is ( as it should in a CASE ) a boolean expression is a
classic antipattern , just use 'condition' or '(condition)'

Francisco Olarte.



Syndicating PostgreSQL mailing list to Discourse

2018-05-25 Thread Erlend Sogge Heggen
Hey all,

I'm with the Discourse.org team and I'm just here to ask if anyone would be
interested in having a read-only Discourse mirror of the PostgreSQL mailing
list.

Since it's read-only it would only be used for things like:

   - Fast search with advanced filters (Discourse puts PostgreSQL full text
   search to good use!)
   - All active lists aggregated into one feed, but also available as
   individual categories that can be tracked/watched.
   - Single-page topics, easier to read on mobile for the young'uns.

If our mirror archive gets enough traction we'd like to make it possible to
sign up to the forum and seamlessly interact with the mailing list,
provided it's something the community actually wants of course. We're doing
a similar experiment with the ruby-talk mailing list, which you can see
being tested  at https://rubytalk.org/.

Lemme know if there's any interest, and thanks to everyone involved in
PostgreSQL for an outstanding piece of software!

Sincerely,

Erlend


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-25 Thread Andres Freund
On 2018-05-24 16:46:24 -0400, Alvaro Herrera wrote:
> On 2018-May-24, Andres Freund wrote:
> 
> > On 2018-05-24 13:08:53 -0400, Alvaro Herrera wrote:
> > > Hmm .. surely
> 
> > >   xid = HeapTupleHeaderGetXmin(tuple);
> > >   xmin_frozen = ((xid == FrozenTransactionId) ||
> > >  HeapTupleHeaderXminFrozen(tuple));
> > > - if (TransactionIdIsNormal(xid))
> > > + if (!xmin_frozen && TransactionIdIsNormal(xid))
> 
> > I don't think that's necesary - HeapTupleHeaderGetXmin() returns
> > FrozenTransactionId if the tuple is frozen (note the
> > HeapTupleHeaderXminFrozen() within).
> 
> Ah, yeah ... I probably thought about this when writing it and removed
> it for that reason.
> 
> BTW I think the definition of HeapTupleHeaderXminFrozen is seriously
> confusing, by failing to return true if the xmin is numerically
> FrozenXid (which it'll be if the database was pg_upgraded).  I wonder
> about this one in HeapTupleSatisfiesMVCC:

I suggest raising this on -hackers. I agree that it's unfortunate.

Greetings,

Andres Freund



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-25 Thread Peter Geoghegan
On Fri, May 25, 2018 at 1:38 PM, Andres Freund  wrote:
>> BTW I think the definition of HeapTupleHeaderXminFrozen is seriously
>> confusing, by failing to return true if the xmin is numerically
>> FrozenXid (which it'll be if the database was pg_upgraded).  I wonder
>> about this one in HeapTupleSatisfiesMVCC:
>
> I suggest raising this on -hackers. I agree that it's unfortunate.

I wonder if BootstrapTransactionId also needs to be considered here.

-- 
Peter Geoghegan



Re: Syndicating PostgreSQL mailing list to Discourse

2018-05-25 Thread Adrian Klaver

On 05/25/2018 01:03 PM, Erlend Sogge Heggen wrote:

Hey all,

I'm with the Discourse.org team and I'm just here to ask if anyone would 
be interested in having a read-only Discourse mirror of the PostgreSQL 
mailing list.


Since it's read-only it would only be used for things like:

  * Fast search with advanced filters (Discourse puts PostgreSQL full
text search to good use!)
  * All active lists aggregated into one feed, but also available as
individual categories that can be tracked/watched.
  * Single-page topics, easier to read on mobile for the young'uns.

If our mirror archive gets enough traction we'd like to make it possible 
to sign up to the forum and seamlessly interact with the mailing list, 


Well 'seamlessly works' would be the important part. In the past when 
this was tried(can't remember the who) the posts lagged noticeably and 
content went missing.


provided it's something the community actually wants of course. We're 
doing a similar experiment with the ruby-talk mailing list, which you 
can see being tested  at https://rubytalk.org/.


Lemme know if there's any interest, and thanks to everyone involved in 
PostgreSQL for an outstanding piece of software!


Sincerely,

Erlend



--
Adrian Klaver
adrian.kla...@aklaver.com



Fast logical replication jump start with PG 10

2018-05-25 Thread Olivier Gautherot
Hi,

I just sent the question on StackOverflow but realized that this audience
may be more savvy. So sorry in advance for cross-posting...

I'm in the process of upgrading a PG from 9.2 to 10.4. pg_upgrade worked
fine on the master and was rather fast. The problem is that the database is
replicated and I'm planning to switch from streaming to logical. The
problem is that it is rather slow (30 minutes for the master and over 3
hours for the replication, between data transfer and indexes).

Is there a way to speed up the replication or should I rather stick to
streaming replication? As I have only 1 database on the server, it would
not be a show-stopper.


Thanks in advance
Olivier Gautherot
http://www.linkedin.com/in/ogautherot


Re: Syndicating PostgreSQL mailing list to Discourse

2018-05-25 Thread Tom Lane
Adrian Klaver  writes:
> On 05/25/2018 01:03 PM, Erlend Sogge Heggen wrote:
>> If our mirror archive gets enough traction we'd like to make it possible 
>> to sign up to the forum and seamlessly interact with the mailing list, 

> Well 'seamlessly works' would be the important part. In the past when 
> this was tried(can't remember the who) the posts lagged noticeably and 
> content went missing.

TBH, the existing services that try to do that provide a remarkably
unfriendly experience on this side, and haven't shown much interest in
improving that (I'm thinking of Nabble in particular).  So my initial
reaction is "thanks but no thanks, we don't need another of those".
But maybe you can do it a lot better than they have.

regards, tom lane



Re: Syndicating PostgreSQL mailing list to Discourse

2018-05-25 Thread Ian Zimmerman
On 2018-05-25 14:03, Adrian Klaver wrote:

> > If our mirror archive gets enough traction we'd like to make it
> > possible to sign up to the forum and seamlessly interact with the
> > mailing list,
> 
> Well 'seamlessly works' would be the important part. In the past when
> this was tried(can't remember the who) the posts lagged noticeably and
> content went missing.

Also (as with any kind of message gateway) please, Please, **PLEASE**
preserve the Message-ID when crossing the boundaries.  Otherwise
threading goes out of the window, despite the best efforts of those
among us who still care about it.

This is the reason why I stopped following gnu.* hierarchy (which is
joined two way with @gnu.org mailing lists by Mailman, and Mailman
violates this rule).  If you do it wrong, I'll stop following this list
too. 

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.



Re: Fast logical replication jump start with PG 10

2018-05-25 Thread Adrian Klaver

On 05/25/2018 02:12 PM, Olivier Gautherot wrote:

Hi,

I just sent the question on StackOverflow but realized that this 
audience may be more savvy. So sorry in advance for cross-posting...


I'm in the process of upgrading a PG from 9.2 to 10.4. pg_upgrade worked 
fine on the master and was rather fast. The problem is that the database 
is replicated and I'm planning to switch from streaming to logical. The 
problem is that it is rather slow (30 minutes for the master and over 3 
hours for the replication, between data transfer and indexes).


I am not clear on what you did, so can you clarify the following:

1) pg_upgrade from 9.2 master instance to 10.4 master instance, correct?

2) What replication are you talking about for the 3 hour value?

3) What is the 30 minute value referring to?

4) When you say database are you talking about a Postgres cluster or a 
database in the cluster?




Is there a way to speed up the replication or should I rather stick to 
streaming replication? As I have only 1 database on the server, it would 
not be a show-stopper.


See 4) above, but if you are talking about a single database in a 
cluster streaming replication will not work for that.





Thanks in advance
Olivier Gautherot
http://www.linkedin.com/in/ogautherot



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Fast logical replication jump start with PG 10

2018-05-25 Thread Olivier Gautherot
Hi Adrian, thanks for your reply. Here is the clarification.

1) It is indeed a pg_upgrade from 9.2 to 10.4. Depending on the test
machine, it runs in between 15 and 20 minutes for just over 100GB. I can
negotiate this time with our customer. The vacuum process took another 5 to
7 minutes. This this what I was referring to with the 30 minutes (point 3
in your questions)

2) After pg_upgrade, I published the tables on the database (in the sense
"CREATE DATABASE") and subscribed to this publication on the second server
(logical replication). The data copy processed started immediately and took
around 1 hour. I then loaded the indexes, what took another 2h20m. At that
point the active-passive cluster was ready to go. Note that the active and
the passive databases are on different machines.

4) By "database" I mean the result of "CREATE DATABASE" and we have 1 per
server (or "cluster" in your terminology - I tend to use this word for a
group of machines). We are currently using a streaming replication between
the 9.2 servers, so it could be a fall-back option after the upgrade (I
wanted to remove part of the indexes on the master to lower the load,
reason to use the logical replication... if the execution time is not too
excessive).

Hope it clarifies the question
Best regards
Olivier


Olivier Gautherot
oliv...@gautherot.net
Cel:+56 98 730 9361
Skype: ogautherot
www.gautherot.net
http://www.linkedin.com/in/ogautherot

On Fri, May 25, 2018 at 7:51 PM, Adrian Klaver 
wrote:

> On 05/25/2018 02:12 PM, Olivier Gautherot wrote:
>
>> Hi,
>>
>> I just sent the question on StackOverflow but realized that this audience
>> may be more savvy. So sorry in advance for cross-posting...
>>
>> I'm in the process of upgrading a PG from 9.2 to 10.4. pg_upgrade worked
>> fine on the master and was rather fast. The problem is that the database is
>> replicated and I'm planning to switch from streaming to logical. The
>> problem is that it is rather slow (30 minutes for the master and over 3
>> hours for the replication, between data transfer and indexes).
>>
>
> I am not clear on what you did, so can you clarify the following:
>
> 1) pg_upgrade from 9.2 master instance to 10.4 master instance, correct?
>
> 2) What replication are you talking about for the 3 hour value?
>
> 3) What is the 30 minute value referring to?
>
> 4) When you say database are you talking about a Postgres cluster or a
> database in the cluster?
>
>
>> Is there a way to speed up the replication or should I rather stick to
>> streaming replication? As I have only 1 database on the server, it would
>> not be a show-stopper.
>>
>
> See 4) above, but if you are talking about a single database in a cluster
> streaming replication will not work for that.
>
>
>
>>
>> Thanks in advance
>> Olivier Gautherot
>> http://www.linkedin.com/in/ogautherot
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Fast logical replication jump start with PG 10

2018-05-25 Thread Adrian Klaver

On 05/25/2018 06:35 PM, Olivier Gautherot wrote:

Hi Adrian, thanks for your reply. Here is the clarification.

1) It is indeed a pg_upgrade from 9.2 to 10.4. Depending on the test 
machine, it runs in between 15 and 20 minutes for just over 100GB. I can 
negotiate this time with our customer. The vacuum process took another 5 
to 7 minutes. This this what I was referring to with the 30 minutes 
(point 3 in your questions)


2) After pg_upgrade, I published the tables on the database (in the 
sense "CREATE DATABASE") and subscribed to this publication on the 
second server (logical replication). The data copy processed started 
immediately and took around 1 hour. I then loaded the indexes, what took > another 2h20m. At that point the active-passive cluster was ready to go.


The index creation was done on the replicated machine I presume, using 
what command?



Note that the active and the passive databases are on different machines.

4) By "database" I mean the result of "CREATE DATABASE" and we have 1 
per server (or "cluster" in your terminology - I tend to use this word 
for a group of machines). We are currently using a streaming replication 


Yeah I understand, it is just that database and cluster have specific 
meanings in Postgres and it helps to stick to those meanings when 
discussing replication operations. Lowers the confusion level:)


between the 9.2 servers, so it could be a fall-back option after the 
upgrade (I wanted to remove part of the indexes on the master to lower 
the load, reason to use the logical replication... if the execution time 
is not too excessive).


So the time you showed was with those indexes removed or not?



Hope it clarifies the question
Best regards
Olivier


Olivier Gautherot
oliv...@gautherot.net 
Cel:+56 98 730 9361
Skype: ogautherot
www.gautherot.net 
http://www.linkedin.com/in/ogautherot

On Fri, May 25, 2018 at 7:51 PM, Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 05/25/2018 02:12 PM, Olivier Gautherot wrote:

Hi,

I just sent the question on StackOverflow but realized that this
audience may be more savvy. So sorry in advance for cross-posting...

I'm in the process of upgrading a PG from 9.2 to 10.4.
pg_upgrade worked fine on the master and was rather fast. The
problem is that the database is replicated and I'm planning to
switch from streaming to logical. The problem is that it is
rather slow (30 minutes for the master and over 3 hours for the
replication, between data transfer and indexes).


I am not clear on what you did, so can you clarify the following:

1) pg_upgrade from 9.2 master instance to 10.4 master instance, correct?

2) What replication are you talking about for the 3 hour value?

3) What is the 30 minute value referring to?

4) When you say database are you talking about a Postgres cluster or
a database in the cluster?


Is there a way to speed up the replication or should I rather
stick to streaming replication? As I have only 1 database on the
server, it would not be a show-stopper.


See 4) above, but if you are talking about a single database in a
cluster streaming replication will not work for that.




Thanks in advance
Olivier Gautherot
http://www.linkedin.com/in/ogautherot




-- 
Adrian Klaver

adrian.kla...@aklaver.com 





--
Adrian Klaver
adrian.kla...@aklaver.com