Re: Multixact wraparound monitoring
On 2023-Sep-14, bruno da silva wrote: > This problem is more acute when the FK Table stores a small number of rows > like types or codes. Right, because the likelihood of multiple transactions creating new references to the same row is higher. > I think in those cases an enum type should be used instead of a column with > a FK. Right, that alleviates the issue, but IMO it's a workaround whose need is caused by a deficiency in our implementation. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)
pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Dear All
I have a weird problem, I am trying to improve performance on this query :
SELECT text('[email protected]') from mail_vessel_addressbook where
text('[email protected]') ~* address_regex limit 1;
The first system (linux) is a linux hosted in a cloud, kernel
3.16.0-4-amd64, 32GB mem, SSD, 4 x Intel(R) Xeon(R) CPU E7-4860 v2 @
2.60GHz ,
The second (freebsd) system, used as test, is my local FreeBSD
13.1-RELEASE workstation, 32GB mem, ZFS/magnetic disks ,16 x AMD Ryzen 7
5800X 3800.16-MHz .
Overall my workstation is faster, but my issue is not plain speed. The
problem is as follows :
*FreeBSD*
postgres@[local]/dynacom=# explain (analyze,buffers) SELECT
text('[email protected]') from mail_vessel_addressbook where
text('[email protected]') ~* address_regex limit 1;
QUERY PLAN
--
Limit (cost=0.42..5.11 rows=1 width=32) (actual time=96.705..96.706
rows=1 loops=1)
Buffers: shared hit=71
-> Index Only Scan using mail_vessel_addressbook_address_regex_idx
on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32)
(actual time=96.704..96.705 rows=1 loops=1)
Filter: ('[email protected]'::text ~* address_regex)
Rows Removed by Filter: 14738
Heap Fetches: 0
Buffers: shared hit=71
Planning time: 0.082 ms
Execution time: 96.725 ms
(9 rows)
Time: 97.038 ms
postgres@[local]/dynacom=#
*Linux*
dynacom=# explain (analyze,buffers) SELECT text('[email protected]') from
mail_vessel_addressbook where text('[email protected]') ~* address_regex limit 1;
QUERY PLAN
--
Limit (cost=0.42..5.12 rows=1 width=32) (actual time=1768.725..1768.727
rows=1 loops=1)
Buffers: shared hit=530
-> Index Only Scan using mail_vessel_addressbook_address_regex_idx
on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32)
(actual time=1768.724..1768.725 rows=1 loops=1)
Filter: ('[email protected]'::text ~* address_regex)
Rows Removed by Filter: 97781
Heap Fetches: 0
Buffers: shared hit=530
Planning time: 1.269 ms
Execution time: 1768.998 ms
(9 rows)
The file in FreeBSD came by pg_dump from the linux system, I am puzzled
why this huge difference in Buffers: shared hit. All table/index sizes
are identical on both systems, I did vacuum full on the linux one, and
also did vacuum freeze on both. I analyzed both, reindexed both (several
times). Still the FreeBSD seems to access about 7 times less number of
blocks from shared_buffers than linux : 71 vs 530 . There is no bloat ,
I tested with newly fresh table in both systems as well.
Thank you for any help.
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Achilleas Mantzios - cloud writes:
> *FreeBSD*
>
>-> Index Only Scan using mail_vessel_addressbook_address_regex_idx
> on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32)
> (actual time=96.704..96.705 rows=1 loops=1)
> Filter: ('[email protected]'::text ~* address_regex)
> Rows Removed by Filter: 14738
> Heap Fetches: 0
> Buffers: shared hit=71
>
> *Linux*
>
>-> Index Only Scan using mail_vessel_addressbook_address_regex_idx
> on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32)
> (actual time=1768.724..1768.725 rows=1 loops=1)
> Filter: ('[email protected]'::text ~* address_regex)
> Rows Removed by Filter: 97781
> Heap Fetches: 0
> Buffers: shared hit=530
> The file in FreeBSD came by pg_dump from the linux system, I am puzzled
> why this huge difference in Buffers: shared hit.
The "rows removed" value is also quite a bit different, so it's not
just a matter of buffer touches --- there's evidently some real difference
in how much of the index is being scanned. I speculate that you are
using different collations on the two systems, and FreeBSD's collation
happens to place the first matching row earlier in the index.
regards, tom lane
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Στις 15/9/23 18:23, ο/η Tom Lane έγραψε:
Achilleas Mantzios - cloud writes:
*FreeBSD*
-> Index Only Scan using mail_vessel_addressbook_address_regex_idx
on mail_vessel_addressbook (cost=0.42..2912.06 rows=620 width=32)
(actual time=96.704..96.705 rows=1 loops=1)
Filter: ('[email protected]'::text ~* address_regex)
Rows Removed by Filter: 14738
Heap Fetches: 0
Buffers: shared hit=71
*Linux*
-> Index Only Scan using mail_vessel_addressbook_address_regex_idx
on mail_vessel_addressbook (cost=0.42..2913.04 rows=620 width=32)
(actual time=1768.724..1768.725 rows=1 loops=1)
Filter: ('[email protected]'::text ~* address_regex)
Rows Removed by Filter: 97781
Heap Fetches: 0
Buffers: shared hit=530
The file in FreeBSD came by pg_dump from the linux system, I am puzzled
why this huge difference in Buffers: shared hit.
The "rows removed" value is also quite a bit different, so it's not
just a matter of buffer touches --- there's evidently some real difference
in how much of the index is being scanned. I speculate that you are
using different collations on the two systems, and FreeBSD's collation
happens to place the first matching row earlier in the index.
Thank you, I see that both systems use en_US.UTF-8 as lc_collate and
lc_ctype, and that in both systems :
dynacom=# \dOS+
List of collations
Schema | Name | Collate | Ctype | Provider | Description
+-+-+---+--+--
pg_catalog | C | C | C | libc | standard C collation
pg_catalog | POSIX | POSIX | POSIX | libc | standard POSIX
collation
pg_catalog | default | | | default | database's default
collation
(3 rows)
dynacom=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access
privileges
---+--+---+-+-+
dynacom | postgres | SQL_ASCII | en_US.UTF-8 | en_US.UTF-8 |
the below seems ok
FreeBSD :
postgres@[local]/dynacom=# select * from (values
('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1
-
_
.
0
a
Z
(5 rows)
Linux:
dynacom=# select * from (values ('a'),('Z'),('_'),('.'),('0')) as qry
order by column1::text;
column1
-
_
.
0
a
Z
(5 rows)
dynacom=#
but :
Freebsd :
postgres@[local]/dynacom=# select distinct address_regex from
mail_vessel_addressbook order by address_regex::text ASC limit 5;
address_regex
--
_cmo.ship.inf@.
_EMD_REEFER@hide>.
_OfficeHayPoint@hide>.
_Sabtank_PCQ1_All_SSVSSouth_area@hide>.
_Sabtank_PCQ1_Lead_OperatorsSouth_area@hide>.
(5 rows)
While in Linux :
dynacom=# select distinct address_regex from mail_vessel_addressbook
order by address_regex::text ASC limit 5;
address_regex
---
0033240902573@.
0033442057364@.
0072usl@.
0081354426912@.
00862163602861@.
(5 rows)
somethings does not seem right.
regards, tom lane
--
Achilleas Mantzios
IT DEV - HEAD
IT DEPT
Dynacom Tankers Mgmt
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Achilleas Mantzios writes:
> Thank you, I see that both systems use en_US.UTF-8 as lc_collate and
> lc_ctype,
Doesn't necessarily mean they interpret that the same way, though :-(
> the below seems ok
> FreeBSD :
> postgres@[local]/dynacom=# select * from (values
> ('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
> column1
> -
> _
> .
> 0
> a
> Z
> (5 rows)
Sadly, this proves very little about Linux's behavior. glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the
alphanumeric parts of the strings sort equal.
regards, tom lane
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Στις 15/9/23 22:42, ο/η Tom Lane έγραψε:
Achilleas Mantzios writes:
Thank you, I see that both systems use en_US.UTF-8 as lc_collate and
lc_ctype,
Doesn't necessarily mean they interpret that the same way, though :-(
the below seems ok
FreeBSD :
postgres@[local]/dynacom=# select * from (values
('a'),('Z'),('_'),('.'),('0')) as qry order by column1::text;
column1
-
_
.
0
a
Z
(5 rows)
Sadly, this proves very little about Linux's behavior. glibc's idea
of en_US involves some very complicated multi-pass sort rules.
AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US
as "same as C except case-insensitive", whereas I'm pretty sure
that underscores and other punctuation are nearly ignored in
glibc's interpretation; they'll only be taken into account if the
Thank you so much. Makes perfect sense.
This begs the question asked also in the -sql list : how do I index on
regex'es, or at least have a barely scalable solution? Here I try to
match a given string against a stored regex, whereas in pg_trgm's case
the user tries to match a stored text against a given regex.
alphanumeric parts of the strings sort equal.
regards, tom lane
--
Achilleas Mantzios
IT DEV - HEAD
IT DEPT
Dynacom Tankers Mgmt
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
On Sat, Sep 16, 2023 at 7:42 AM Tom Lane wrote: > Sadly, this proves very little about Linux's behavior. glibc's idea > of en_US involves some very complicated multi-pass sort rules. > AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US > as "same as C except case-insensitive", whereas I'm pretty sure > that underscores and other punctuation are nearly ignored in > glibc's interpretation; they'll only be taken into account if the > alphanumeric parts of the strings sort equal. Achilleas didn't mention the glibc version, but based on the kernel vintage mentioned I guess that must be the "old" (pre 2.28) glibc sorting. In 2.28 they did a big sync-up with ISO 14651, while FreeBSD follows the UCA, a closely related standard[1]. I think newer Linux/glibc systems should agree with FreeBSD's libc in more cases (and also agree with ICU). [1] https://unicode.org/reports/tr10/#Synch_ISO14651
Re: pgsql 10.23 , different systems, same table , same plan, different Buffers: shared hit
Στις 16/9/23 02:08, ο/η Thomas Munro έγραψε: On Sat, Sep 16, 2023 at 7:42 AM Tom Lane wrote: Sadly, this proves very little about Linux's behavior. glibc's idea of en_US involves some very complicated multi-pass sort rules. AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US as "same as C except case-insensitive", whereas I'm pretty sure that underscores and other punctuation are nearly ignored in glibc's interpretation; they'll only be taken into account if the alphanumeric parts of the strings sort equal. Achilleas didn't mention the glibc version, but based on the kernel vintage mentioned I guess that must be the "old" (pre 2.28) glibc sorting. In 2.28 they did a big sync-up with ISO 14651, while FreeBSD follows the UCA, a closely related standard[1]. I think newer Linux/glibc systems should agree with FreeBSD's libc in more cases (and also agree with ICU). Thank you Thomas , our linux's glibc is on version : 2.19-18+deb8u10, we need to upgrade on so many levels. [1]https://unicode.org/reports/tr10/#Synch_ISO14651 -- Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt
