Re: Multixact wraparound monitoring

2023-09-15 Thread Alvaro Herrera
On 2023-Sep-14, bruno da silva wrote:

> This problem is more acute when the FK Table stores a small number of rows
> like types or codes.

Right, because the likelihood of multiple transactions creating
new references to the same row is higher.

> I think in those cases an enum type should be used instead of a column with
> a FK.

Right, that alleviates the issue, but IMO it's a workaround whose need
is caused by a deficiency in our implementation.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)




pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Achilleas Mantzios - cloud

Dear All

I have a weird problem, I am trying to improve performance on this query :

SELECT text('[email protected]') from mail_vessel_addressbook where 
text('[email protected]') ~* address_regex limit 1;


The first system (linux) is a linux hosted in a cloud, kernel 
3.16.0-4-amd64, 32GB mem, SSD, 4 x Intel(R) Xeon(R) CPU E7-4860 v2 @ 
2.60GHz ,


The second (freebsd) system, used as test, is my local FreeBSD 
13.1-RELEASE workstation, 32GB mem, ZFS/magnetic disks ,16 x AMD Ryzen 7 
5800X 3800.16-MHz .


Overall my workstation is faster, but my issue is not plain speed. The 
problem is as follows :


*FreeBSD*

postgres@[local]/dynacom=# explain (analyze,buffers) SELECT 
text('[email protected]') from mail_vessel_addressbook where 
text('[email protected]') ~* address_regex limit 1;


 QUERY PLAN
-- 

Limit  (cost=0.42..5.11 rows=1 width=32) (actual time=96.705..96.706 
rows=1 loops=1)

  Buffers: shared hit=71
  ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
on mail_vessel_addressbook  (cost=0.42..2912.06 rows=620 width=32) 
(actual time=96.704..96.705 rows=1 loops=1)

Filter: ('[email protected]'::text ~* address_regex)
Rows Removed by Filter: 14738
Heap Fetches: 0
Buffers: shared hit=71
Planning time: 0.082 ms
Execution time: 96.725 ms
(9 rows)

Time: 97.038 ms
postgres@[local]/dynacom=#

*Linux*

dynacom=# explain (analyze,buffers) SELECT text('[email protected]') from 
mail_vessel_addressbook where text('[email protected]') ~* address_regex limit 1;


   QUERY PLAN
-- 

Limit  (cost=0.42..5.12 rows=1 width=32) (actual time=1768.725..1768.727 
rows=1 loops=1)

  Buffers: shared hit=530
  ->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
on mail_vessel_addressbook  (cost=0.42..2913.04 rows=620 width=32) 
(actual time=1768.724..1768.725 rows=1 loops=1)

Filter: ('[email protected]'::text ~* address_regex)
Rows Removed by Filter: 97781
Heap Fetches: 0
Buffers: shared hit=530
Planning time: 1.269 ms
Execution time: 1768.998 ms
(9 rows)


The file in FreeBSD came by pg_dump from the linux system, I am puzzled 
why this huge difference in Buffers: shared hit. All table/index sizes 
are identical on both systems, I did vacuum full on the linux one, and 
also did vacuum freeze on both. I analyzed both, reindexed both (several 
times). Still the FreeBSD seems to access about 7 times less number of 
blocks from shared_buffers than linux : 71 vs 530 . There is no bloat , 
I tested with newly fresh table in both systems as well.


Thank you for any help.


Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Tom Lane
Achilleas Mantzios - cloud  writes:
> *FreeBSD*
> 
>->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
> on mail_vessel_addressbook  (cost=0.42..2912.06 rows=620 width=32) 
> (actual time=96.704..96.705 rows=1 loops=1)
>  Filter: ('[email protected]'::text ~* address_regex)
>  Rows Removed by Filter: 14738
>  Heap Fetches: 0
>  Buffers: shared hit=71
> 
> *Linux*
> 
>->  Index Only Scan using mail_vessel_addressbook_address_regex_idx 
> on mail_vessel_addressbook  (cost=0.42..2913.04 rows=620 width=32) 
> (actual time=1768.724..1768.725 rows=1 loops=1)
>  Filter: ('[email protected]'::text ~* address_regex)
>  Rows Removed by Filter: 97781
>  Heap Fetches: 0
>  Buffers: shared hit=530

> The file in FreeBSD came by pg_dump from the linux system, I am puzzled 
> why this huge difference in Buffers: shared hit.

The "rows removed" value is also quite a bit different, so it's not
just a matter of buffer touches --- there's evidently some real difference
in how much of the index is being scanned.  I speculate that you are
using different collations on the two systems, and FreeBSD's collation
happens to place the first matching row earlier in the index.

regards, tom lane




Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Achilleas Mantzios

Στις 15/9/23 18:23, ο/η Tom Lane έγραψε:

Achilleas Mantzios - cloud  writes:

*FreeBSD*

->  Index Only Scan using mail_vessel_addressbook_address_regex_idx
on mail_vessel_addressbook  (cost=0.42..2912.06 rows=620 width=32)
(actual time=96.704..96.705 rows=1 loops=1)
  Filter: ('[email protected]'::text ~* address_regex)
  Rows Removed by Filter: 14738
  Heap Fetches: 0
  Buffers: shared hit=71

*Linux*

->  Index Only Scan using mail_vessel_addressbook_address_regex_idx
on mail_vessel_addressbook  (cost=0.42..2913.04 rows=620 width=32)
(actual time=1768.724..1768.725 rows=1 loops=1)
  Filter: ('[email protected]'::text ~* address_regex)
  Rows Removed by Filter: 97781
  Heap Fetches: 0
  Buffers: shared hit=530
The file in FreeBSD came by pg_dump from the linux system, I am puzzled
why this huge difference in Buffers: shared hit.

The "rows removed" value is also quite a bit different, so it's not
just a matter of buffer touches --- there's evidently some real difference
in how much of the index is being scanned.  I speculate that you are
using different collations on the two systems, and FreeBSD's collation
happens to place the first matching row earlier in the index.


Thank you, I see that both systems use en_US.UTF-8 as lc_collate and 
lc_ctype, and that in both systems :


dynacom=# \dOS+
   List of collations
  Schema   |  Name   | Collate | Ctype | Provider | Description
+-+-+---+--+-- 


pg_catalog | C   | C   | C | libc | standard C collation
pg_catalog | POSIX   | POSIX   | POSIX | libc | standard POSIX 
collation
pg_catalog | default | |   | default  | database's default 
collation

(3 rows)


dynacom=# \l
  List of databases
  Name    |  Owner   | Encoding  |   Collate   |    Ctype    |   Access 
privileges
---+--+---+-+-+ 


dynacom   | postgres | SQL_ASCII | en_US.UTF-8 | en_US.UTF-8 |

the below seems ok

FreeBSD :

postgres@[local]/dynacom=# select * from (values 
('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;

column1
-
_
.
0
a
Z
(5 rows)

Linux:

dynacom=# select * from (values ('a'),('Z'),('_'),('.'),('0')) as qry 
order by column1::text;

column1
-
_
.
0
a
Z
(5 rows)

dynacom=#

but :

Freebsd :

postgres@[local]/dynacom=# select distinct address_regex from 
mail_vessel_addressbook order by address_regex::text ASC limit 5;

 address_regex
--
_cmo.ship.inf@.
_EMD_REEFER@hide>.
_OfficeHayPoint@hide>.
_Sabtank_PCQ1_All_SSVSSouth_area@hide>.
_Sabtank_PCQ1_Lead_OperatorsSouth_area@hide>.
(5 rows)

While in Linux :

dynacom=# select distinct address_regex from mail_vessel_addressbook 
order by address_regex::text ASC limit 5;

  address_regex
---
0033240902573@.
0033442057364@.
0072usl@.

0081354426912@.
00862163602861@.
(5 rows)

somethings does not seem right.



regards, tom lane


--
Achilleas Mantzios
 IT DEV - HEAD
 IT DEPT
 Dynacom Tankers Mgmt


Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Tom Lane
Achilleas Mantzios  writes:
> Thank you, I see that both systems use en_US.UTF-8 as lc_collate and 
> lc_ctype,

Doesn't necessarily mean they interpret that the same way, though :-(

> the below seems ok

> FreeBSD :

> postgres@[local]/dynacom=# select * from (values 
> ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
> column1
> -
> _
> .
> 0
> a
> Z
> (5 rows)

Sadly, this proves very little about Linux's behavior.  glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the
alphanumeric parts of the strings sort equal.

regards, tom lane




Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Achilleas Mantzios

Στις 15/9/23 22:42, ο/η Tom Lane έγραψε:

Achilleas Mantzios  writes:

Thank you, I see that both systems use en_US.UTF-8 as lc_collate and
lc_ctype,

Doesn't necessarily mean they interpret that the same way, though :-(


the below seems ok
FreeBSD :
postgres@[local]/dynacom=# select * from (values
('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1
-
_
.
0
a
Z
(5 rows)

Sadly, this proves very little about Linux's behavior.  glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the


Thank you so much. Makes perfect sense.

This begs the question asked also in the -sql list : how do I index on 
regex'es, or at least have a barely scalable solution? Here I try to 
match a given string against a stored regex, whereas in pg_trgm's case 
the user tries to match a stored text against a given regex.



alphanumeric parts of the strings sort equal.

regards, tom lane


--
Achilleas Mantzios
 IT DEV - HEAD
 IT DEPT
 Dynacom Tankers Mgmt





Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Thomas Munro
On Sat, Sep 16, 2023 at 7:42 AM Tom Lane  wrote:
> Sadly, this proves very little about Linux's behavior.  glibc's idea
> of en_US involves some very complicated multi-pass sort rules.
> AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
> as "same as C except case-insensitive", whereas I'm pretty sure
> that underscores and other punctuation are nearly ignored in
> glibc's interpretation; they'll only be taken into account if the
> alphanumeric parts of the strings sort equal.

Achilleas didn't mention the glibc version, but based on the kernel
vintage mentioned I guess that must be the "old" (pre 2.28) glibc
sorting.  In 2.28 they did a big sync-up with ISO 14651, while FreeBSD
follows the UCA, a closely related standard[1].  I think newer
Linux/glibc systems should agree with FreeBSD's libc in more cases
(and also agree with ICU).

[1] https://unicode.org/reports/tr10/#Synch_ISO14651




Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit

2023-09-15 Thread Achilleas Mantzios

Στις 16/9/23 02:08, ο/η Thomas Munro έγραψε:

On Sat, Sep 16, 2023 at 7:42 AM Tom Lane  wrote:

Sadly, this proves very little about Linux's behavior.  glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the
alphanumeric parts of the strings sort equal.

Achilleas didn't mention the glibc version, but based on the kernel
vintage mentioned I guess that must be the "old" (pre 2.28) glibc
sorting.  In 2.28 they did a big sync-up with ISO 14651, while FreeBSD
follows the UCA, a closely related standard[1].  I think newer
Linux/glibc systems should agree with FreeBSD's libc in more cases
(and also agree with ICU).
Thank you Thomas , our linux's glibc is on version : 2.19-18+deb8u10, we 
need to upgrade on so many levels.


[1]https://unicode.org/reports/tr10/#Synch_ISO14651


--
Achilleas Mantzios
 IT DEV - HEAD
 IT DEPT
 Dynacom Tankers Mgmt