Regular Expression For Duplicate Words

2022-02-02 Thread Shaozhong SHI
This link is interesting.

regex - Regular Expression For Duplicate Words - Stack Overflow


Is there any example in Postgres?

Regards,

David


Re: Regular Expression For Duplicate Words

2022-02-02 Thread David G. Johnston
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI  wrote:

> This link is interesting.
>
> regex - Regular Expression For Duplicate Words - Stack Overflow
> 
>
> Is there any example in Postgres?
>
>
Not that I'm immediately aware of, and I'm not going to search the internet
for you.

The regex capabilities in PostgreSQL are pretty full-featured so a solution
should be possible.  You should try translating the SO post concepts into
PostgreSQL yourself and ask specific questions if you get stuck.

David J.


pg_basebackup with hostssl ?

2022-02-02 Thread Laura Smith
I've had a quick glance through the man page for pg_basebackup but can't see 
any flags to set ssl.

Long story short, I've got the following in pg_hba.conf of the master:
hostsslreplicationall10.0.0.0/8md5

But the slave is complaining:
FATAL:  no pg_hba.conf entry for replication connection from host "10.1.2.3", 
user "myrepl", no encryption

I suspect this is almost certainly because I'm using "hostssl" instead of 
"host".

But other than the obvious and undesirable quick-fix, how should I be calling 
pg_basebackup to make sure it uses encryption ?




Re: pg_basebackup with hostssl ?

2022-02-02 Thread Adrian Klaver

On 2/2/22 07:48, Laura Smith wrote:

I've had a quick glance through the man page for pg_basebackup but can't see 
any flags to set ssl.

Long story short, I've got the following in pg_hba.conf of the master:
hostsslreplicationall10.0.0.0/8md5

But the slave is complaining:
FATAL:  no pg_hba.conf entry for replication connection from host "10.1.2.3", user 
"myrepl", no encryption

I suspect this is almost certainly because I'm using "hostssl" instead of 
"host".

But other than the obvious and undesirable quick-fix, how should I be calling 
pg_basebackup to make sure it uses encryption ?


I am not following. pg_basebackup is a stand alone client that is not 
involved in replication other then establishing a starting point from 
which to later establish a replication relationship.


The error is saying that whatever instance you are pointing 
pg_basebackup at does not have entry in pg_hba.conf for the 
host/user/encryption combination pg_basebackup is trying to connect as.






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: pg_basebackup with hostssl ?

2022-02-02 Thread Laura Smith



Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐

On Wednesday, February 2nd, 2022 at 16:30, Adrian Klaver 
 wrote:

> I am not following. pg_basebackup is a stand alone client that is not
>
> involved in replication other then establishing a starting point from
>
> which to later establish a replication relationship.
>

Indeed. And that is exactly that I am trying to do (i.e. have master, want new 
slave).

> The error is saying that whatever instance you are pointing
>
> pg_basebackup at does not have entry in pg_hba.conf for the
>
> host/user/encryption combination pg_basebackup is trying to connect as.
>

But it *DOES* have an entry, per my original message:
"hostsslreplicationall10.0.0.0/8md5"

And yes, that hba is loaded and live because other remote clients are happily 
connected to that server and thus reliant on a valid hba.conf.

The host is correct. The user is correct.

The point I am making is that pg_basebackup is connecting without encryption 
and I want to know how to tell it to use encryption. The docs are silent on the 
subject.




Re: pg_basebackup with hostssl ?

2022-02-02 Thread Laura Smith
Forgot to add that I also have :
"hostsslallall10.0.0.0/8md5"






Re: pg_basebackup with hostssl ?

2022-02-02 Thread Adrian Klaver

On 2/2/22 08:37, Laura Smith wrote:



Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐

On Wednesday, February 2nd, 2022 at 16:30, Adrian Klaver 
 wrote:


I am not following. pg_basebackup is a stand alone client that is not

involved in replication other then establishing a starting point from

which to later establish a replication relationship.



Indeed. And that is exactly that I am trying to do (i.e. have master, want new 
slave).


The error is saying that whatever instance you are pointing

pg_basebackup at does not have entry in pg_hba.conf for the

host/user/encryption combination pg_basebackup is trying to connect as.



But it *DOES* have an entry, per my original message:
"hostsslreplicationall10.0.0.0/8md5"


From the OP:

"
Long story short, I've got the following in pg_hba.conf of the master:
hostsslreplicationall10.0.0.0/8md5

But the slave is complaining:
FATAL:  no pg_hba.conf entry for replication connection from host 
"10.1.2.3", user "myrepl", no encryption


"

The error is coming from a connection to the slave which does not have 
data yet, correct? Or a matching pg_hba.conf entry, it would seem.






And yes, that hba is loaded and live because other remote clients are happily 
connected to that server and thus reliant on a valid hba.conf.

The host is correct. The user is correct.

The point I am making is that pg_basebackup is connecting without encryption 
and I want to know how to tell it to use encryption. The docs are silent on the 
subject.


Not completely:

https://www.postgresql.org/docs/current/app-pgbasebackup.html

-d connstr
--dbname=connstr

Specifies parameters used to connect to the server, as a 
; these will override any conflicting command line 
options.


The option is called --dbname for consistency with other client 
applications, but because pg_basebackup doesn't connect to any 
particular database in the cluster, any database name in the connection 
string will be ignored.


Following the  link:

https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING

You can use that to set sslmode.

--
Adrian Klaver
adrian.kla...@aklaver.com




Re: pg_basebackup with hostssl ?

2022-02-02 Thread David G. Johnston
On Wed, Feb 2, 2022 at 9:37 AM Laura Smith <
n5d9xq3ti233xiyif...@protonmail.ch> wrote:

> The point I am making is that pg_basebackup is connecting without
> encryption and I want to know how to tell it to use encryption. The docs
> are silent on the subject.
>

It is a client application that talks libpq.  All of them understand a
common set of environment variables:

https://www.postgresql.org/docs/current/libpq-envars.html

Many of those can also be supplied as part of the connection string.  These
include SSL mode control.

The question I would ask is whether psql connects by default using ssl in
the same basic configuration.  If so, then the inconsistency amounts to
pg_basebackup having an insecure default connection method while psql has a
secure one.

David J.


Re: pg_basebackup with hostssl ?

2022-02-02 Thread Laura Smith



Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐

On Wednesday, February 2nd, 2022 at 16:50, Adrian Klaver 
 wrote:

> Not completely:
>
> https://www.postgresql.org/docs/current/app-pgbasebackup.html
>
> -d connstr
>
> --dbname=connstr
>
> Specifies parameters used to connect to the server, as a
>
> ; these will override any conflicting command line
>
> options.
>
> The option is called --dbname for consistency with other client
>
> applications, but because pg_basebackup doesn't connect to any
>
> particular database in the cluster, any database name in the connection
>
> string will be ignored.
>

Seems like I was looking for the wrong words on the right page !

Thanks, will try that.




Re: pg_basebackup with hostssl ?

2022-02-02 Thread Adrian Klaver

On 2/2/22 09:18, Laura Smith wrote:



Seems like I was looking for the wrong words on the right page !

Thanks, will try that.


Before you do that I would establish that you are connecting to the 
correct Postgres instance.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Regular Expression For Duplicate Words

2022-02-02 Thread Jian He
It's an interesting question. But I also don't know how to do it in
PostgreSQL.
But I figured out alternative solutions.

GNU Grep:grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'
ripgrep: rg  '(hello)[[:blank:]]+\1' --pcre2  <<<'one hello hello world'

On Wed, Feb 2, 2022 at 8:53 PM David G. Johnston 
wrote:

> On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI 
> wrote:
>
>> This link is interesting.
>>
>> regex - Regular Expression For Duplicate Words - Stack Overflow
>> 
>>
>> Is there any example in Postgres?
>>
>>
> Not that I'm immediately aware of, and I'm not going to search the
> internet for you.
>
> The regex capabilities in PostgreSQL are pretty full-featured so a
> solution should be possible.  You should try translating the SO post
> concepts into PostgreSQL yourself and ask specific questions if you get
> stuck.
>
> David J.
>
>


Re: pg_basebackup with hostssl ?

2022-02-02 Thread Laura Smith



Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐

On Wednesday, February 2nd, 2022 at 17:20, Adrian Klaver 
 wrote:

>
> Before you do that I would establish that you are connecting to the
>
> correct Postgres instance.
>
>

Good news, all up and running !

The new "postgresql.auto.conf" automagic is pure awesome. ;-)




increasing effective_cache_size slows down join queries by a factor of 4000x

2022-02-02 Thread Artyom Shaposhnikov
I recently moved a postgres DB to a more powerful server with 1TB of
RAM instead of 64GB before. To my surprise after running the tuning on
the new server, the performance of joins deteriorated by 4000x
compared to the old server. I carefully examined all of the changes
and found the culprit:

if I use the effective_cache_size > 25GB, the query plan used is this:

Limit (cost=124.12..590.33 rows=1000 width=205) (actual
time=42326.662..42336.212 rows=1000 loops=1)
-> Nested Loop (cost=124.12..31909018.46 rows=68443040 width=205)
(actual time=42326.660..42336.127 rows=1000 loops=1)
-> Merge Join (cost=124.12..30710356.97 rows=68443040 width=169)
(actual time=42326.613..42332.764 rows=1000 loops=1)
Merge Cond: (d.id = dc.data_id)
-> Nested Loop (cost=1.00..31036282.72 rows=58785023 width=165)
(actual time=0.042..5.533 rows=854 loops=1)
-> Index Scan using data_pkey on data t (cost=0.57..4787030.00
rows=58785023 width=131) (actual time=0.023..0.526 rows=854 loops=1)
Index Cond: (id > 205284974)
-> Index Scan using user_pkey on data_user u (cost=0.43..0.45 rows=1
width=42) (actual time=0.005..0.005 rows=1 loops=854)
Index Cond: (id = d.user_id)
-> Index Only Scan using data_class_pkey on data_class ta
(cost=0.57..4935483.78 rows=216964862 width=8) (actual
time=0.018..35022.908 rows=151321889 loops=1)
Heap Fetches: 151321889
-> Index Scan using class_id_index on class a (cost=0.00..0.02 rows=1
width=44) (actual time=0.003..0.003 rows=1 loops=1000)
Index Cond: (id = dc.class_id)
Planning Time: 4.114 ms
Execution Time: 42336.397 ms

and it is 4000x slower than the query plan used with the lower
effective_cache_size that uses indexes instead of the merge joins:

Limit (cost=1.57..4832.30 rows=1000 width=205) (actual
time=0.081..10.457 rows=1000 loops=1)
-> Nested Loop (cost=1.57..330629805.46 rows=68443040 width=205)
(actual time=0.080..10.378 rows=1000 loops=1)
-> Nested Loop (cost=1.57..267793481.39 rows=68443040 width=169)
(actual time=0.065..7.496 rows=1000 loops=1)
-> Nested Loop (cost=1.00..100917823.18 rows=58785023 width=165)
(actual time=0.040..5.424 rows=854 loops=1)
-> Index Scan using data_pkey on data t (cost=0.57..21427806.53
rows=58785023 width=131) (actual time=0.024..0.482 rows=854 loops=1)
Index Cond: (id > 205284974)
-> Index Scan using user_pkey on data_user u (cost=0.43..1.35 rows=1
width=42) (actual time=0.005..0.005 rows=1 loops=854)
Index Cond: (id = d.user_id)
-> Index Only Scan using data_class_pkey on data_class ta
(cost=0.57..2.80 rows=4 width=8) (actual time=0.002..0.002 rows=1
loops=854)
Index Cond: (data_id = d.id)
Heap Fetches: 1000
-> Index Scan using class_id_index on class a (cost=0.00..0.92 rows=1
width=44) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: (id = dc.class_id)
Planning Time: 5.074 ms
Execution Time: 10.614 ms

query:

explain analyze select d.time as time,d.id as id, u.username as
username, a.query_symbol as query_symbol from data as d, data_user as
u, class as a, data_class as dc
   where dc.class_id = a.id and dc.data_id = d.id and d.user_id = u.id
and d.id > 205284974
 order by d.id
 limit 1000;

I found a way to fix it by creating a distinct statistics on the
largest join table:

create statistics stat_data_class (ndistinct) on data_id, class_id
from data_class;
analyze data_class;

Question: are there other ways to give PostgreSQL a hint to use the
indexes instead of the merge join plan without artificially lowering
the memory available in the parameter effective_cache_size or creating
the additional statistics on the table?

Thank you,

-Art




Re: increasing effective_cache_size slows down join queries by a factor of 4000x

2022-02-02 Thread Michael Lewis
What does the row estimate look like on the scan of data table with that
statistic in place? Anytime the stats give a mis-estimate this far off, I
wouldn't expect that plans would be optimal except by luck.

Index Scan using data_pkey on data t (cost=0.57..21427806.53 *rows=58785023*
width=131) (actual time=0.024..0.482 *rows=854 *loops=1)
Index Cond: (id > 205284974)


Can Postgres beat Oracle for regexp_count?

2022-02-02 Thread Shaozhong SHI
It has been found that regexp_count works brilliantly in Oracle.

However, it is not easy to replicate that in Postgres.  The following codes
have been experimented but without any luck.

select regexp_matches('My High Street', '([A-Z][a-z]+[\s])', 'g')

select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)', 'g')

County occurrences of 'My High Street' in one of the following strings:

'My High Street'1
'' 0
'My High Street My High Street'   2

Can anyone enlighten all of us?

Regards,

David


Re: Can Postgres beat Oracle for regexp_count?

2022-02-02 Thread David G. Johnston
On Wed, Feb 2, 2022 at 1:20 PM Shaozhong SHI  wrote:

> It has been found that regexp_count works brilliantly in Oracle.
>

What query exactly did you execute in Oracle that you wish to see if an
equivalent can be formulated in PostgreSQL?

>
> However, it is not easy to replicate that in Postgres.
>

Given we don't have a regexp_count function this isn't surprising...



>   The following codes have been experimented but without any luck.
>
> select regexp_matches('My High Street', '([A-Z][a-z]+[\s])', 'g')
>
> select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)', 'g')
>

See my first point.

David J.


Re: increasing effective_cache_size slows down join queries by a factor of 4000x

2022-02-02 Thread Artyom Shaposhnikov
the row estimate became ~1000x smaller with the stat in place, so it
looks like it grossly miscalculates the query plans without the stats
for large tables representing M:M relations.

On Wed, Feb 2, 2022 at 11:47 AM Michael Lewis  wrote:
>
> What does the row estimate look like on the scan of data table with that 
> statistic in place? Anytime the stats give a mis-estimate this far off, I 
> wouldn't expect that plans would be optimal except by luck.
>
> Index Scan using data_pkey on data t (cost=0.57..21427806.53 rows=58785023 
> width=131) (actual time=0.024..0.482 rows=854 loops=1)
> Index Cond: (id > 205284974)




Re: increasing effective_cache_size slows down join queries by a factor of 4000x

2022-02-02 Thread Imre Samu
> Question: are there other ways to give PostgreSQL a hint

What you your pg version?

Maybe with pg_hint_plan extension ( https://github.com/ossc-db/pg_hint_plan
)
"pg_hint_plan makes it possible to tweak PostgreSQL execution plans using
so-called "hints" in SQL comments, like /*+ SeqScan(a) */."

regards,
 Imre



Artyom Shaposhnikov  ezt írta (időpont: 2022. febr. 2.,
Sze, 20:05):

> I recently moved a postgres DB to a more powerful server with 1TB of
> RAM instead of 64GB before. To my surprise after running the tuning on
> the new server, the performance of joins deteriorated by 4000x
> compared to the old server. I carefully examined all of the changes
> and found the culprit:
>
> if I use the effective_cache_size > 25GB, the query plan used is this:
>
> Limit (cost=124.12..590.33 rows=1000 width=205) (actual
> time=42326.662..42336.212 rows=1000 loops=1)
> -> Nested Loop (cost=124.12..31909018.46 rows=68443040 width=205)
> (actual time=42326.660..42336.127 rows=1000 loops=1)
> -> Merge Join (cost=124.12..30710356.97 rows=68443040 width=169)
> (actual time=42326.613..42332.764 rows=1000 loops=1)
> Merge Cond: (d.id = dc.data_id)
> -> Nested Loop (cost=1.00..31036282.72 rows=58785023 width=165)
> (actual time=0.042..5.533 rows=854 loops=1)
> -> Index Scan using data_pkey on data t (cost=0.57..4787030.00
> rows=58785023 width=131) (actual time=0.023..0.526 rows=854 loops=1)
> Index Cond: (id > 205284974)
> -> Index Scan using user_pkey on data_user u (cost=0.43..0.45 rows=1
> width=42) (actual time=0.005..0.005 rows=1 loops=854)
> Index Cond: (id = d.user_id)
> -> Index Only Scan using data_class_pkey on data_class ta
> (cost=0.57..4935483.78 rows=216964862 width=8) (actual
> time=0.018..35022.908 rows=151321889 loops=1)
> Heap Fetches: 151321889
> -> Index Scan using class_id_index on class a (cost=0.00..0.02 rows=1
> width=44) (actual time=0.003..0.003 rows=1 loops=1000)
> Index Cond: (id = dc.class_id)
> Planning Time: 4.114 ms
> Execution Time: 42336.397 ms
>
> and it is 4000x slower than the query plan used with the lower
> effective_cache_size that uses indexes instead of the merge joins:
>
> Limit (cost=1.57..4832.30 rows=1000 width=205) (actual
> time=0.081..10.457 rows=1000 loops=1)
> -> Nested Loop (cost=1.57..330629805.46 rows=68443040 width=205)
> (actual time=0.080..10.378 rows=1000 loops=1)
> -> Nested Loop (cost=1.57..267793481.39 rows=68443040 width=169)
> (actual time=0.065..7.496 rows=1000 loops=1)
> -> Nested Loop (cost=1.00..100917823.18 rows=58785023 width=165)
> (actual time=0.040..5.424 rows=854 loops=1)
> -> Index Scan using data_pkey on data t (cost=0.57..21427806.53
> rows=58785023 width=131) (actual time=0.024..0.482 rows=854 loops=1)
> Index Cond: (id > 205284974)
> -> Index Scan using user_pkey on data_user u (cost=0.43..1.35 rows=1
> width=42) (actual time=0.005..0.005 rows=1 loops=854)
> Index Cond: (id = d.user_id)
> -> Index Only Scan using data_class_pkey on data_class ta
> (cost=0.57..2.80 rows=4 width=8) (actual time=0.002..0.002 rows=1
> loops=854)
> Index Cond: (data_id = d.id)
> Heap Fetches: 1000
> -> Index Scan using class_id_index on class a (cost=0.00..0.92 rows=1
> width=44) (actual time=0.002..0.002 rows=1 loops=1000)
> Index Cond: (id = dc.class_id)
> Planning Time: 5.074 ms
> Execution Time: 10.614 ms
>
> query:
>
> explain analyze select d.time as time,d.id as id, u.username as
> username, a.query_symbol as query_symbol from data as d, data_user as
> u, class as a, data_class as dc
>where dc.class_id = a.id and dc.data_id = d.id and d.user_id = u.id
> and d.id > 205284974
>  order by d.id
>  limit 1000;
>
> I found a way to fix it by creating a distinct statistics on the
> largest join table:
>
> create statistics stat_data_class (ndistinct) on data_id, class_id
> from data_class;
> analyze data_class;
>
> Question: are there other ways to give PostgreSQL a hint to use the
> indexes instead of the merge join plan without artificially lowering
> the memory available in the parameter effective_cache_size or creating
> the additional statistics on the table?
>
> Thank you,
>
> -Art
>
>
>


Re: Can Postgres beat Oracle for regexp_count?

2022-02-02 Thread Tom Lane
"David G. Johnston"  writes:
> Given we don't have a regexp_count function this isn't surprising...

FYI, it's there in HEAD.

In the meantime, you could possibly do something like

=# select count(*) from regexp_matches('My High Street', '([A-Z][a-z]+[\s])', 
'g');
 count 
---
 2
(1 row)

(Note that 2 is the correct answer given that there's no space
after the third word; I trust Oracle agrees.)

regards, tom lane




Re: increasing effective_cache_size slows down join queries by a factor of 4000x

2022-02-02 Thread Tomas Vondra

On 2/2/22 22:10, Artyom Shaposhnikov wrote:

the row estimate became ~1000x smaller with the stat in place, so it
looks like it grossly miscalculates the query plans without the stats
for large tables representing M:M relations.



Well, if the estimates are significantly off (and 3 orders of magnitude 
certainly qualifies), then all bets are off. There's no magical option 
that'd fix planning in such conditions.


Ultimately, fixing the estimates (e.g. by creating extended statistics) 
is the right "fix" as it gives the optimizer the information needed to 
pick the right plan.



On Wed, Feb 2, 2022 at 11:47 AM Michael Lewis  wrote:


What does the row estimate look like on the scan of data table with that 
statistic in place? Anytime the stats give a mis-estimate this far off, I 
wouldn't expect that plans would be optimal except by luck.

Index Scan using data_pkey on data t (cost=0.57..21427806.53 rows=58785023 
width=131) (actual time=0.024..0.482 rows=854 loops=1)
Index Cond: (id > 205284974)




It'd be interesting to see the plans without the LIMIT, as that makes 
the "actual" values low simply by terminating early.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Subscription stuck at initialize state

2022-02-02 Thread Abhishek Bhola
Update: Tried dropping and recreating publication on the source DB and
subscription still won't move ahead.
Not sure what I am missing.

On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola <
abhishek.bh...@japannext.co.jp> wrote:

> The only statement I see on the target DB log is
> "logical replication apply worker for subscription ""sub_omx_archive_tci""
> has started",""
>
> I don't see the logical replication table synchronization worker started
> for any of the tables in this subscription as I see in the other one.
>
> Is there anything in particular that I should be looking for in the log
> files?
>
> On Wed, Feb 2, 2022 at 11:31 AM Steve Baldwin 
> wrote:
>
>> Hi Abishek,
>>
>> Have you checked the subscriber and publisher database log files to see
>> if there is a problem blocking the subscription? For example, a subscribed
>> table missing a column that exists in the publisher.
>>
>> Cheers,
>>
>> Steve
>>
>> On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola <
>> abhishek.bh...@japannext.co.jp> wrote:
>>
>>> I have 2 sets of publication/subscription between my 2 DBs.
>>> One of them is working fine and the other one is stuck at initializing
>>> state for all the tables.
>>>
>>> sourcedb=# select * from pg_publication;
>>>   oid  |pubname | pubowner | puballtables | pubinsert |
>>> pubupdate | pubdelete | pubtruncate | pubviaroot
>>>
>>> ---++--+--+---+---+---+-+
>>>  19585 | omx_archive_big_tables |16420 | f| t |
>>> t | t | t   | f
>>>  19584 | omx_archive|16420 | f| t
>>>   | t | t | t   | f
>>>
>>>
>>> targetdb=# select * from pg_subscription_rel ;
>>>   srsubid   |  srrelid   | srsubstate |   srsublsn
>>> +++---
>>>  3615804367 | 3322052690 | i  |
>>>  3615804367 | 3322052570 | i  |
>>>  3615756798 | 3322051793 | r  | 9E7E/BF5F82D8
>>>  3615804367 | 3322052133 | i  |
>>>  3615804367 | 3322054214 | i  |
>>>  3615756798 | 3322051802 | r  | 9E7E/C149BBD8
>>>  3615804367 | 3322051757 | i  |
>>>
>>> targetdb=# select * from pg_subscription;
>>> -[ RECORD 1
>>> ]---+--
>>> oid | 3615756798
>>> subdbid | 16589
>>> subname | sub_omx_archive_big_tables_tci
>>> subowner| 16420
>>> subenabled  | t
>>> subconninfo | xxx
>>> subslotname | sub_omx_archive_big_tables_tci
>>> subsynccommit   | off
>>> subpublications | {omx_archive_big_tables}
>>> -[ RECORD 2
>>> ]---+--
>>> oid | 3615804367
>>> subdbid | 16589
>>> subname | sub_omx_archive_tci
>>> subowner| 16420
>>> subenabled  | t
>>> subconninfo | 
>>> subslotname | sub_omx_archive_tci
>>> subsynccommit   | off
>>> subpublications | {omx_archive}
>>>
>>> I have dropped the subscription, recreated and refreshed it many times,
>>> but it won't move from the initializing phase.
>>>
>>> Any suggestions on how to start copying the data again, other than
>>> dropping the publication and re-creating it?
>>>
>>>
>>> *This correspondence (including any attachments) is for the intended
>>> recipient(s) only. It may contain confidential or privileged information or
>>> both. No confidentiality or privilege is waived or lost by any
>>> mis-transmission. If you receive this correspondence by mistake, please
>>> contact the sender immediately, delete this correspondence (and all
>>> attachments) and destroy any hard copies. You must not use, disclose, copy,
>>> distribute or rely on any part of this correspondence (including any
>>> attachments) if you are not the intended
>>> recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。*
>>
>>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、

Re: Subscription stuck at initialize state

2022-02-02 Thread Abhishek Bhola
So far I figured out that the problem is on the subscriber side.
The same publication, when subscribed to on another DB, works fine.
Also noticed that the remote_lsn value on the target DB is still 0/0.

targetdb=# select * from pg_replication_origin_status ;
 local_id |  external_id  |   remote_lsn   |   local_lsn
--+---++
1 | pg_3615756798 | 9E96/37152C80  | 1518C/9014BD98
2 | pg_3616584803 | 0/0   | 0/0
(2 rows)

Would really appreciate it if someone could help me with this.



On Thu, Feb 3, 2022 at 9:53 AM Abhishek Bhola <
abhishek.bh...@japannext.co.jp> wrote:

> Update: Tried dropping and recreating publication on the source DB and
> subscription still won't move ahead.
> Not sure what I am missing.
>
> On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> The only statement I see on the target DB log is
>> "logical replication apply worker for subscription
>> ""sub_omx_archive_tci"" has started",""
>>
>> I don't see the logical replication table synchronization worker started
>> for any of the tables in this subscription as I see in the other one.
>>
>> Is there anything in particular that I should be looking for in the log
>> files?
>>
>> On Wed, Feb 2, 2022 at 11:31 AM Steve Baldwin 
>> wrote:
>>
>>> Hi Abishek,
>>>
>>> Have you checked the subscriber and publisher database log files to see
>>> if there is a problem blocking the subscription? For example, a subscribed
>>> table missing a column that exists in the publisher.
>>>
>>> Cheers,
>>>
>>> Steve
>>>
>>> On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola <
>>> abhishek.bh...@japannext.co.jp> wrote:
>>>
 I have 2 sets of publication/subscription between my 2 DBs.
 One of them is working fine and the other one is stuck at initializing
 state for all the tables.

 sourcedb=# select * from pg_publication;
   oid  |pubname | pubowner | puballtables | pubinsert |
 pubupdate | pubdelete | pubtruncate | pubviaroot

 ---++--+--+---+---+---+-+
  19585 | omx_archive_big_tables |16420 | f| t |
 t | t | t   | f
  19584 | omx_archive|16420 | f| t
   | t | t | t   | f


 targetdb=# select * from pg_subscription_rel ;
   srsubid   |  srrelid   | srsubstate |   srsublsn
 +++---
  3615804367 | 3322052690 | i  |
  3615804367 | 3322052570 | i  |
  3615756798 | 3322051793 | r  | 9E7E/BF5F82D8
  3615804367 | 3322052133 | i  |
  3615804367 | 3322054214 | i  |
  3615756798 | 3322051802 | r  | 9E7E/C149BBD8
  3615804367 | 3322051757 | i  |

 targetdb=# select * from pg_subscription;
 -[ RECORD 1
 ]---+--
 oid | 3615756798
 subdbid | 16589
 subname | sub_omx_archive_big_tables_tci
 subowner| 16420
 subenabled  | t
 subconninfo | xxx
 subslotname | sub_omx_archive_big_tables_tci
 subsynccommit   | off
 subpublications | {omx_archive_big_tables}
 -[ RECORD 2
 ]---+--
 oid | 3615804367
 subdbid | 16589
 subname | sub_omx_archive_tci
 subowner| 16420
 subenabled  | t
 subconninfo | 
 subslotname | sub_omx_archive_tci
 subsynccommit   | off
 subpublications | {omx_archive}

 I have dropped the subscription, recreated and refreshed it many times,
 but it won't move from the initializing phase.

 Any suggestions on how to start copying the data again, other than
 dropping the publication and re-creating it?


 *This correspondence (including any attachments) is for the intended
 recipient(s) only. It may contain confidential or privileged information or
 both. No confidentiality or privilege is waived or lost by any
 mis-transmission. If you receive this correspondence by mistake, please
 contact the sender immediately, delete this correspondence (and all
 attachments) and destroy any hard copies. You must not use, disclose, copy,
 distribute or rely on any part of this correspondence (including any
 attachments) if you are not the intended
 recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。

Re: Can Postgres beat Oracle for regexp_count?

2022-02-02 Thread Shaozhong SHI
Hi, Tom, Lane,

On Wed, 2 Feb 2022 at 22:26, Tom Lane  wrote:

> "David G. Johnston"  writes:
> > Given we don't have a regexp_count function this isn't surprising...
>
> FYI, it's there in HEAD.
>
> In the meantime, you could possibly do something like
>
> =# select count(*) from regexp_matches('My High Street',
> '([A-Z][a-z]+[\s])', 'g');
>  count
> ---
>  2
> (1 row)
>
> (Note that 2 is the correct answer given that there's no space
> after the third word; I trust Oracle agrees.)
>
> Can the whole 3 or 4 or 5 to be matched as 1?
>

The following has been attempted but no luck.

select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g')
It is intended to match 'My High Street, but it turned out only 'Street'
was matched.

Regards,  David


Re: Can Postgres beat Oracle for regexp_count?

2022-02-02 Thread David G. Johnston
On Wed, Feb 2, 2022 at 10:26 PM Shaozhong SHI 
wrote:

>
> select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g')
> It is intended to match 'My High Street, but it turned out only 'Street'
> was matched.
>
>
I'm too tired to find the documentation for why you saw your result but
basically you only have a single capturing parentheses pair and since
you've quantified that you end up with just the last capture that was found
- Street.  If you want to capture the entire found expression you need to
capture the quantifier.  So put parentheses around the entire regexp.

select regexp_matches('My High Street', '(([A-Z][a-z]+[\s]*)+)', 'g')

You now have a two element array, slots filled left-to-right based upon the
opening parenthesis.  So {"My High Street",Street}

To get rid of the undesired Street and only return a single element array
you need to make the inner parentheses non-capturing.

select regexp_matches('My High Street', '((?:[A-Z][a-z]+[\s]*)+)', 'g')

David J.


Re: Can Postgres beat Oracle for regexp_count?

2022-02-02 Thread Tom Lane
Shaozhong SHI  writes:
> The following has been attempted but no luck.

> select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g')
> It is intended to match 'My High Street, but it turned out only 'Street'
> was matched.

You've got the parentheses in the wrong place, ie inside not outside the
"+" quantifier.  Per the fine manual [1], the result is determined by the
last match of quantified capturing parens.

You could avoid using any capturing parens, so that the result is
the whole match:

regression=# select regexp_matches('My High Street', '(?:[A-Z][a-z]+[\s]*)+', 
'g');
   regexp_matches   

 {"My High Street"}
(1 row)

or you could do

regression=# select regexp_matches('My High Street', '(([A-Z][a-z]+[\s]*)+)', 
'g');
  regexp_matches   
---
 {"My High Street",Street}
(1 row)

but then you have two sets of capturing parens and you get results for
both, so you might prefer

regression=# select regexp_matches('My High Street', '((?:[A-Z][a-z]+[\s]*)+)', 
'g');
   regexp_matches   

 {"My High Street"}
(1 row)

In any case, there's no substitute for reading the manual.

regards, tom lane

[1] 
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP




Re: Undetected Deadlock

2022-02-02 Thread Michael Harris
Hi again

Some good news. After some more debugging & reflection, I realized
that the likely cause is one of our own libraries that gets loaded as
part of some custom functions we are using.

Some of these functions trigger fetching of remote resources, for
which a timeout is set using `alarm`. The function unfortunately does
not re-establish any pre-existing interval timers after it is done,
which leads to postgresql missing it's own expected alarm signal.

The reason that this was not affecting us on previous postgres
versions was this commit:


https://github.com/postgres/postgres/commit/09cf1d52267644cdbdb734294012cf1228745aaa#diff-b12a7ca3bf9c6a56745844c2670b0b28d2a4237741c395dda318c6cc3664ad4a

After this commit, once an alarm is missed, that backend never sets
one again, so no timeouts of any kind will work. Therefore, the
deadlock detector was never being run. Prior to that, the next time
any timeout was set by the backend it would re-establish it's timer.

We will of course fix our own code to prevent this issue, but I am a
little concerned at the above commit as it reduces the robustness of
postgres in this situation. Perhaps I will raise it on the
pgsql-hackers list.

Cheers
Mike

On Tue, 1 Feb 2022 at 17:50, Michael Harris  wrote:
>
> Hi
>
> The undetected deadlock occurred again today and I was able to collect
> some more info.
>
> The presentation was very similar to the case I reported previously:
> - One backend trying to do a DROP TABLE on a partition of a
> partitioned table, waiting for an AccessExclusiveLock on that table
> - Another backend trying to do a DELETE on records within a partition
> of another table, waiting on an AccessShareLock which is already held
> by the first table and already holding a lock on the table that the
> first backend is waiting for
> - A load of other backends also trying to do a DELETE.
>
> I was able to attach gdb to the backends and I discovered a few things.
>
> First, taking a stack trace of the process doing the DELETE, it looks like 
> this:
>
> #0  0x7fc82a6750bb in epoll_wait (epfd=13, events=0x18696f8,
> maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
> ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x007d0c32 in WaitEventSetWaitBlock (nevents=1,
> occurred_events=, cur_timeout=-1, set=0x1869698) at
> latch.c:1450
> #2  WaitEventSetWait (set=0x1869698, timeout=,
> occurred_events=, nevents=1, wait_event_info= out>) at latch.c:1396
> #3  0x007d0f94 in WaitLatch (latch=,
> wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=0,
> wait_event_info=50331648) at latch.c:473
> #4  0x007eaea7 in ProcSleep
> (locallock=locallock@entry=0x24571d0,
> lockMethodTable=lockMethodTable@entry=0xdb2360 )
> at proc.c:1361
> #5  0x007df419 in WaitOnLock
> (locallock=locallock@entry=0x24571d0, owner=owner@entry=0x18b80f8) at
> lock.c:1858
> #6  0x007e052c in LockAcquireExtended
> (locktag=locktag@entry=0x7ffced1d0b80, lockmode=lockmode@entry=1,
> sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false,
> reportMemoryError=reportMemoryError@entry=true,
> locallockp=locallockp@entry=0x7ffced1d0b78) at lock.c:1100
> #7  0x007ddb23 in LockRelationOid (relid=1842227607,
> lockmode=1) at lmgr.c:117
> #8  0x0050dba5 in relation_open (relationId=1842227607,
> lockmode=lockmode@entry=1) at relation.c:56
> #9  0x008fc838 in generate_partition_qual (rel=0x56614e8) at
> partcache.c:361
> #10 generate_partition_qual (rel=0x56614e8) at partcache.c:336
> #11 0x007502bf in set_baserel_partition_constraint
> (relation=relation@entry=0x56614e8, rel=,
> rel=) at plancat.c:2402
> ~snip~
>
> The lock is being taken out by the function generate_partition_qual
> (in partcache.c) which has this comment:
>
> /*
>  * Grab at least an AccessShareLock on the parent table.  Must do this
>  * even if the partition has been partially detached, because transactions
>  * concurrent with the detach might still be trying to use a partition
>  * descriptor that includes it.
>  */
>
> This is happening during query planning rather than execution, where
> it is trying to fetch the partition bound expression.
> It explains why this lock is not always required (the result is cached
> in the relcache, and anyway it looks to me like this code path is
> called depending on factors such as the row estimates).
>
> The second thing was that the DROP process, and all the DELETE
> processes, were all waiting inside epoll_wait ultimately called from
> ProcSleep as shown above. It is sitting in this section of code
> (proc.c line 1361):
>
> (void) WaitLatch(MyLatch, WL_LATCH_SET |
> WL_EXIT_ON_PM_DEATH, 0,   <--- Processes sitting
> here
>  PG_WAIT_LOCK | locallock->tag.lock.locktag_type);
> ResetLatch(MyLatch);
> /* check for deadlocks first, as that's probably log-worthy */
> if (got_deadlock_t

Re: Subscription stuck at initialize state

2022-02-02 Thread Vijaykumar Jain
On Thu, Feb 3, 2022, 10:32 AM Abhishek Bhola 
wrote:

> So far I figured out that the problem is on the subscriber side.
> The same publication, when subscribed to on another DB, works fine.
> Also noticed that the remote_lsn value on the target DB is still 0/0.
>
> targetdb=# select * from pg_replication_origin_status ;
>  local_id |  external_id  |   remote_lsn   |   local_lsn
> --+---++
> 1 | pg_3615756798 | 9E96/37152C80  | 1518C/9014BD98
> 2 | pg_3616584803 | 0/0   | 0/0
> (2 rows)
>
> Would really appreciate it if someone could help me with this.
>
>
>
> On Thu, Feb 3, 2022 at 9:53 AM Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> Update: Tried dropping and recreating publication on the source DB and
>> subscription still won't move ahead.
>> Not sure what I am missing.
>>
>> On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola <
>> abhishek.bh...@japannext.co.jp> wrote:
>>
>>> The only statement I see on the target DB log is
>>> "logical replication apply worker for subscription
>>> ""sub_omx_archive_tci"" has started",""
>>>
>>> I don't see the logical replication table synchronization worker started
>>> for any of the tables in this subscription as I see in the other one.
>>>
>>

This might help you track subscription state along with
pg_stat_subscription.
https://www.postgresql.org/docs/10/catalog-pg-subscription-rel.html

It might be a huge table being 'copy' ed and writes would be slow due to
too many indexes etc.

https://postgresteam.slack.com/files/UQMFAU01W/F02V69YK59P/untitled.sql

also if you see nothing is moving, you may want to login to the server and
strace the worker processes pid to see if you see any movement at all or it
keeps looping on some errors.


Re: Subscription stuck at initialize state

2022-02-02 Thread Abhishek Bhola
Hi Vijaykumar,

I checked the pg_subscription_rel and all the tables in that subscription
are in the state - i (initialize).
I also tried creating a new publication on the source DB with just one
table and tried to subscribe it, it doesn't work either.
However, when I try to subscribe it on some other DB than the one mentioned
above, it works.
By which I am deducing that publication and the source DB are okay, the
problem is on the target DB and it's subscription.
Maybe I will have to restart the DB as a last resort, but I am not sure if
that will solve the problem either.


On Thu, Feb 3, 2022 at 3:33 PM Vijaykumar Jain <
vijaykumarjain.git...@gmail.com> wrote:

>
>
> On Thu, Feb 3, 2022, 10:32 AM Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> So far I figured out that the problem is on the subscriber side.
>> The same publication, when subscribed to on another DB, works fine.
>> Also noticed that the remote_lsn value on the target DB is still 0/0.
>>
>> targetdb=# select * from pg_replication_origin_status ;
>>  local_id |  external_id  |   remote_lsn   |   local_lsn
>> --+---++
>> 1 | pg_3615756798 | 9E96/37152C80  | 1518C/9014BD98
>> 2 | pg_3616584803 | 0/0   | 0/0
>> (2 rows)
>>
>> Would really appreciate it if someone could help me with this.
>>
>>
>>
>> On Thu, Feb 3, 2022 at 9:53 AM Abhishek Bhola <
>> abhishek.bh...@japannext.co.jp> wrote:
>>
>>> Update: Tried dropping and recreating publication on the source DB and
>>> subscription still won't move ahead.
>>> Not sure what I am missing.
>>>
>>> On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola <
>>> abhishek.bh...@japannext.co.jp> wrote:
>>>
 The only statement I see on the target DB log is
 "logical replication apply worker for subscription
 ""sub_omx_archive_tci"" has started",""

 I don't see the logical replication table synchronization worker
 started for any of the tables in this subscription as I see in the other
 one.

>>>
>
> This might help you track subscription state along with
> pg_stat_subscription.
> https://www.postgresql.org/docs/10/catalog-pg-subscription-rel.html
>
> It might be a huge table being 'copy' ed and writes would be slow due to
> too many indexes etc.
>
> https://postgresteam.slack.com/files/UQMFAU01W/F02V69YK59P/untitled.sql
>
> also if you see nothing is moving, you may want to login to the server and
> strace the worker processes pid to see if you see any movement at all or it
> keeps looping on some errors.
>
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_