Would it be possible to utilize a GIN index to query for distinct values ?

2022-05-24 Thread Danny Shemesh
Hey everyone !

Bumping an older thread
,
I've read the GIN readme in the code base and have skimmed through the
implementation, it made me wonder -
would it be possible to use the index to query for distinct / count
distinct values, at least for some key types ?

For instance, say I have a GIN index on a single, highly cardinal but
non-unique text column (a-la 'name'); from my very limited understanding,
would it be possible to query for distinct / count distinct values via
roughly:
- Traversing the GIN entry tree tuples
- Gathering the key data from said tuples
- Discarding keys with no / empty posting lists, as they aren't discarded
from the tree
- Traversing the pending list for indices with fastupdate & merging the
result set

I'm probably off by quite a lot, but I'd really appreciate your great
insight on the above.

Thanks !
Danny


Re: cast to domain with default collation issue.

2022-05-24 Thread Tom Lane
I wrote:
> Perhaps this should be documented more clearly, but it's not obviously
> wrong.  If the domain declaration doesn't include an explicit COLLATE
> then casting to the domain doesn't create an explicit collation
> requirement.  (That is, the domain *doesn't* have a specific
> collation attached to it, any more than type text does.)

Perhaps we could improve matters like this?

diff --git a/doc/src/sgml/ref/create_domain.sgml 
b/doc/src/sgml/ref/create_domain.sgml
index 81a8924926..e4b856d630 100644
--- a/doc/src/sgml/ref/create_domain.sgml
+++ b/doc/src/sgml/ref/create_domain.sgml
@@ -94,7 +94,8 @@ CREATE DOMAIN name [ AS ] 

 An optional collation for the domain.  If no collation is
-specified, the underlying data type's default collation is used.
+specified, the domain has the same collation behavior as its
+underlying data type.
 The underlying type must be collatable if COLLATE
 is specified.



regards, tom lane




existing row not found by SELECT ... WHERE CTID = ?

2022-05-24 Thread Matthias Apitz


We have a C-written program, written in ESQL/C, of our LMS where the logic
crawls with FETCH through a hit list and does UPDATE on some rows which
match certain condition. This works fine for thousands of rows every night,
but magically sometimes it fails. I have here the part of ESQL/C logs
when I was able to catch such a case:

The hit list for the FETCHes is built with:

[29858] [23.05.2022 23:11:06:419]: prepare_common on line 1825: name 
sid_d01buch; query: "SELECT ctid, * from d01buch WHERE d01status = 4 "
[29858] [23.05.2022 23:11:06:419]: ecpg_execute on line 2026: query: declare 
d01buch_scr scroll cursor with hold for SELECT ctid, * from d01buch WHERE 
d01status = 4 ; with 0 parameter(s) on connection sisis

then the CURSOR d01buch_scr is FETCHed some 59537 times, some of the
rows were updated and here is the failing situation with a good FETCH of
the row:

...
[29858] [23.05.2022 23:21:21:842]: ecpg_process_output on line 2655: correctly 
got 1 tuples with 79 fields
[29858] [23.05.2022 23:21:21:842]: ecpg_get_data on line 2655: RESULT: 
(668486,20) offset: 19; array: no
[29858] [23.05.2022 23:21:21:842]: ecpg_get_data on line 2655: RESULT: 03904016 
   offset: 1152; array: no
...


but when a new CURSOR should be build to UPDATE the row based
on its CTID = (668486,20), the row could not be found:

[29858] [23.05.2022 23:21:21:843]: deallocate_one on line 2494: name hs_d01buch
[29858] [23.05.2022 23:21:21:843]: prepare_common on line 2494: name 
hs_d01buch; query: "SELECT * FROM d01buch WHERE ctid = $1 FOR UPDATE"
[29858] [23.05.2022 23:21:21:843]: ecpg_execute on line 2526: query: declare 
hc_d01buch cursor for SELECT * FROM d01buch WHERE ctid = $1 FOR UPDATE; with 1 
parameter(s) on connection sisis
[29858] [23.05.2022 23:21:21:844]: ecpg_execute on line 2526: using PQexecParams
[29858] [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = 
(668486,20)

...

and the first FETCH in the CURSOR hc_d01buch could not see any row:


104 [29858] [23.05.2022 23:21:21:844]: raising sqlcode 100 on line 2531: no 
data found on  line 2531

Why is this? Thanks

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub




Re: existing row not found by SELECT ... WHERE CTID = ?

2022-05-24 Thread Tom Lane
Matthias Apitz  writes:
> We have a C-written program, written in ESQL/C, of our LMS where the logic
> crawls with FETCH through a hit list and does UPDATE on some rows which
> match certain condition. This works fine for thousands of rows every night,
> but magically sometimes it fails.

Maybe something else already updated the row since the cursor was opened?
That would change its CTID.

regards, tom lane




Re: Can I start Update row in After Insert trigger function?

2022-05-24 Thread Adrian Klaver

On 5/23/22 23:17, Durumdara wrote:

Dear Adrian!

Thank you for the information. For me the main question is that:
Can I execute an UPDATE in the  AFTER INSERT trigger procedure?
Or is this confuses PGSQL, because prior operation (INSERT) isn't fully 
finished?


It is spelled out here:

https://www.postgresql.org/docs/current/sql-createtrigger.html

"The trigger can be specified to fire before the operation is attempted 
on a row (before constraints are checked and the INSERT, UPDATE, or 
DELETE is attempted); or after the operation has completed (after 
constraints are checked and the INSERT, UPDATE, or DELETE has 
completed); or instead of the operation (in the case of inserts, updates 
or deletes on a view)."


So the INSERT has completed in an AFTER trigger.



Thank you!

BR,
dd





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: existing row not found by SELECT ... WHERE CTID = ?

2022-05-24 Thread Matthias Apitz
El día martes, mayo 24, 2022 a las 10:47:11 -0400, Tom Lane escribió:

> Matthias Apitz  writes:
> > We have a C-written program, written in ESQL/C, of our LMS where the logic
> > crawls with FETCH through a hit list and does UPDATE on some rows which
> > match certain condition. This works fine for thousands of rows every night,
> > but magically sometimes it fails.
> 
> Maybe something else already updated the row since the cursor was opened?
> That would change its CTID.

If you compare the time when the CTID was read:

[29858] [23.05.2022 23:21:21:842]: ecpg_get_data on line 2655: RESULT: 
(668486,20) offset:
19; array: no

with the time when it was not found:

[29858] [23.05.2022 23:21:21:844]: raising sqlcode 100 on line 2531: no data 
found on
line 2531

it's nearly imposible that our software could have done this. Maybe
VACUUM hits exactly this moment in time and row, but this sounds also like 5
good numbers in the lotery jackpot :-)

Thanks

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub




Re: existing row not found by SELECT ... WHERE CTID = ?

2022-05-24 Thread Laurenz Albe
On Tue, 2022-05-24 at 16:44 +0200, Matthias Apitz wrote:
> We have a C-written program, written in ESQL/C, of our LMS where the logic
> crawls with FETCH through a hit list and does UPDATE on some rows which
> match certain condition. This works fine for thousands of rows every night,
> but magically sometimes it fails. I have here the part of ESQL/C logs
> when I was able to catch such a case:
> 
> The hit list for the FETCHes is built with:
> 
> [29858] [23.05.2022 23:11:06:419]: prepare_common on line 1825: name 
> sid_d01buch; query: "SELECT ctid, * from d01buch WHERE d01status = 4 "
> [29858] [23.05.2022 23:11:06:419]: ecpg_execute on line 2026: query: declare 
> d01buch_scr scroll cursor with hold for SELECT ctid, * from d01buch WHERE 
> d01status = 4 ; with 0 parameter(s) on
> connection sisis
> 
> then the CURSOR d01buch_scr is FETCHed some 59537 times, some of the
> rows were updated and here is the failing situation with a good FETCH of
> the row:

I cannot understand many of your log messages, but it seems quite clear that
you are declaring a WITH HOLD cursor.

Such cursors are materialized when the transactoin that creates the cursor
commits, so the result set is "frozen" and does no longer reflect the current
state of the table.

It may well be that somebody deleted or updated a few rows between the time
the cursor was materialized and the time the 5th row was fetched.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




About psql \dt unable display same name table which have different schema

2022-05-24 Thread 徐志宇徐
Hello All

   I experience a problem. The psql \dt command unable display
   same name table which have different schema.

   The table new. test10 unable display.
Owing to the public.test10 exist.   The pg_type_is_visible display "F"

I don't know why this problem exist. Why unable display both table "
public.test10 、 new. test10 "

For example:

 new01=# create table public.test10 (id int);
CREATE TABLE
new01=# create table new.test10(id int);
CREATE TABLE
new01=# create table new.test11(id int);
CREATE TABLE
new01=# show search_path ;
search_path

 "$user", public, new2, new
(1 row)

new01=# \dt
 List of relations
 Schema |  Name  | Type  |  Owner
++---+--
 new| test01 | table | test01
 new| test02 | table | postgres
 new| test11 | table | postgres
 public | tbl_a  | table | postgres
 public | tbl_c  | table | postgres
 public | test10 | table | postgres
(6 rows)
new01=# SELECT pg_type_is_visible('public.test10'::regtype);
 pg_type_is_visible

 t
(1 row)

new01=# SELECT pg_type_is_visible('new.test10'::regtype);
 pg_type_is_visible

 f
(1 row)


Re: existing row not found by SELECT ... WHERE CTID = ?

2022-05-24 Thread Tom Lane
Laurenz Albe  writes:
> It may well be that somebody deleted or updated a few rows between the time
> the cursor was materialized and the time the 5th row was fetched.

Even without HOLD, a cursor will return a view of the data as it stood
when the cursor was opened, just as a plain SELECT does.  There is
*plenty* of time for another session to get in there if you've been
groveling through 50K records one at a time.

regards, tom lane




Re: About psql \dt unable display same name table which have different schema

2022-05-24 Thread Tom Lane
=?UTF-8?B?5b6Q5b+X5a6H5b6Q?=  writes:
> I don't know why this problem exist. Why unable display both table "
> public.test10 、 new. test10 "

That's behaving as designed: \dt will show you tables that are
accessible with an unqualified name, but new.test10 is hidden
behind public.test10, so it's not accessible except by
qualification.

You can do "\dt *.*" or "\dt new.*", etc, to see such tables.

See here for more info:

https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-PATTERNS

regards, tom lane




Re: About psql \dt unable display same name table which have different schema

2022-05-24 Thread 徐志宇徐
Hi Tom.

  Thanks for your explanation. I got it.
  I really appreciate it.


Jack Xu.

Tom Lane  于2022年5月25日周三 00:16写道:

> =?UTF-8?B?5b6Q5b+X5a6H5b6Q?=  writes:
> > I don't know why this problem exist. Why unable display both table "
> > public.test10 、 new. test10 "
>
> That's behaving as designed: \dt will show you tables that are
> accessible with an unqualified name, but new.test10 is hidden
> behind public.test10, so it's not accessible except by
> qualification.
>
> You can do "\dt *.*" or "\dt new.*", etc, to see such tables.
>
> See here for more info:
>
> https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-PATTERNS
>
> regards, tom lane
>


Pg14 possible index corruption after reindex concurrently

2022-05-24 Thread Aleš Zelený
Hello,

we have a problem with an index on a database we recently upgraded from
PG13 to Pg14.3 using pg_upgrade. After all the upgrade steps including
analyze in stages, we run  "vacuumdb -Fvaz -j 8" and the user workload was
started afterward.
In order to get one of the Pg14 benefits (b-tree deduplication), we decided
to rebuild all indexes:

-- CONCURRENTLY can not be used for REINDEX DATABASE and system catalog was
brand new on the upgraded database, so no need to reindex system.
SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname)
FROM pg_catalog.pg_namespace n
WHERE n.nspname !~ '^pg_' AND n.nspname <> 'information_schema'
ORDER BY 1;
\gexec


PG Version: PostgreSQL 14.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
OS: CentOS 7
Indexed columns are BIGINT (so no collation issues).

A few days later we realized, that some queries did not return the expected
data. By disabling indexscan and bitmapindex scan, expected row was found
using sequential scan.

Now we have a table (80GB, 314.108.951 rows) with a reproducible testcase
to demonstrate that using the primary key, requested data are returned
while using another index no rows are returned.

Testcase:

START TRANSACTION;
EXPLAIN ANALYZE SELECT * FROM opportunities.tab_odds WHERE id_odds =
1652734429;
  QUERY PLAN

--
 Index Scan using pk_tabodds_idodds on tab_odds  (cost=0.57..2.59 rows=1
width=229) (actual time=0.076..0.078 rows=1 loops=1)
   Index Cond: (id_odds = 1652734429)
 Planning Time: 0.152 ms
 Execution Time: 0.119 ms
(4 rows)

ALTER TABLE opportunities.tab_odds DROP CONSTRAINT pk_tabodds_idodds
CASCADE;

EXPLAIN ANALYZE SELECT * FROM opportunities.tab_odds WHERE id_odds =
1652734429;
 QUERY PLAN


 Index Scan using ix_tabodds_idodds_idopportunity on tab_odds
 (cost=0.57..2.59 rows=1 width=229) (actual time=0.026..0.027 rows=0
loops=1)
   Index Cond: (id_odds = 1652734429)
 Planning Time: 0.248 ms
 Execution Time: 0.072 ms
(4 rows)

ROLLBACK;

The indexes are:
Indexes:
"pk_tabodds_idodds" PRIMARY KEY, btree (id_odds)
"ix_tabodds_idodds_idopportunity" btree (id_odds, id_opportunity)

While we can do another reindex and check query results, we would like to
be able to identify such issues systematically rather than by en user
complaints.

I've tried to use amcheck extension to validate the index:

select * from
bt_index_check('opportunities.ix_tabodds_idodds_idopportunity'::regclass);
 bt_index_check


(1 row)

Running the pg_amcheck utility on the table (including its indexes):

/usr/pgsql-14/bin/pg_amcheck -d prematch -r opportunities.tab_odds -j 8 -P
; echo $?
heap table "prematch.opportunities.tab_odds", block 1579813, offset 62:
xmax 4051057343 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 1580759, offset 43:
xmin 4051053353 precedes relation freeze threshold 2:3960858664

heap table "prematch.opportunities.tab_odds", block 2164163, offset 5:
xmin 4051075211 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 2164163, offset 6:
xmin 4051075236 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 2164163, offset 7:
xmin 4051075236 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 2164164, offset 1:
xmin 4051075260 precedes relation freeze threshold 2:3960858664
2

The default access method check returns exit code2, while when using
--rootdescend option  return code is zero and no xmin/xmax messages:

/usr/pgsql-14/bin/pg_amcheck -d prematch -r opportunities.tab_odds
--rootdescend -j 8 -P ; echo $?

 0/15 relations (0%),0/16831435 pages (0%)
 9/15 relations (60%), 16831427/16831435 pages (99%)
10/15 relations (66%), 16831429/16831435 pages (99%)
11/15 relations (73%), 16831431/16831435 pages (99%)
12/15 relations (80%), 16831433/16831435 pages (99%)
13/15 relations (86%), 16831434/16831435 pages (99%)
14/15 relations (93%), 16831435/16831435 pages (100%)
15/15 relations (100%), 16831435/16831435 pages (100%)
0

I've thought about using pageinspect to dump the invalid index page, but
haven't found a way how to identify the index block number.

Is there a way, how to diagnose such corruption to make sure that after
(and ideally also before) upgrading the database no such corruption
happened (I'd like to test the diagnostic approach before rebuilding the
index)?

Unfortunately, the cluster was so old, and planned outage was limited -
page checksums were not enable

Re: Pg14 possible index corruption after reindex concurrently

2022-05-24 Thread Thomas Munro
On Wed, May 25, 2022 at 6:17 AM Aleš Zelený  wrote:
> SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname)

This may be related to bug #17485, discussed at:

https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org




Re: Pg14 possible index corruption after reindex concurrently

2022-05-24 Thread Aleš Zelený
Thanks for the bug link, I haven't found it.

Ales

út 24. 5. 2022 v 21:58 odesílatel Thomas Munro 
napsal:

> On Wed, May 25, 2022 at 6:17 AM Aleš Zelený  wrote:
> > SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname)
>
> This may be related to bug #17485, discussed at:
>
>
> https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org
>


Re: cast to domain with default collation issue.

2022-05-24 Thread jian he
Otherwise, all input expressions must have the same implicit collation
> derivation or the default collation. If any non-default collation is
> present, that is the result of the collation combination. Otherwise, the
> result is the default collation.
>

I think the above quote part can be used to explain the  following
examples.

> CREATE TABLE test1 ( a text COLLATE "de_DE", b text COLLATE "es_ES", c
> text );
> SELECT a < 'foo' FROM test1;

SELECT c < 'foo' FROM test1;

But the *non-default* seems not that correct for me. Like a column if
it does not mention anything, then the default value is null. So
* create table test111( a tex*t) The default collation for column a is
the same as the output of  *show lc_collate*.

so there is no *non-default? *




On Tue, May 24, 2022 at 10:43 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Monday, May 23, 2022, jian he  wrote:
>
>> CREATE DOMAIN testdomain AS text;
>>
>>  --asume the default collation is as per show LC_COLLATE;
>>
>> – on my pc, it is C.UTF-8.
>>
>> --So the testdomain will be collation "C.UTF-8"
>> 
>>
>> => \d collate_test1
>>
>> Table "test.collate_test1"
>>
>>  Column |  Type   | Collation | Nullable | Default
>>
>> +-+---+--+-
>>
>>  a  | integer |   |  |
>>
>>  b  | text| en-x-icu  | not null |
>>
>> ---
>>
>> My guess is that the following should be the same.
>>
>>
>>
> My reading of the docs say this is consistent with outcome #2.
>
> https://www.postgresql.org/docs/current/collation.html
>
>  David J.
>
>

-- 
 I recommend David Deutsch's <>

  Jian


Re: cast to domain with default collation issue.

2022-05-24 Thread David G. Johnston
Please don’t top-post.

On Tuesday, May 24, 2022, jian he  wrote:

>
> Otherwise, all input expressions must have the same implicit collation
>> derivation or the default collation. If any non-default collation is
>> present, that is the result of the collation combination. Otherwise, the
>> result is the default collation.
>>
>
> I think the above quote part can be used to explain the  following
> examples.
>
>> CREATE TABLE test1 ( a text COLLATE "de_DE", b text COLLATE "es_ES", c
>> text );
>> SELECT a < 'foo' FROM test1;
>
> SELECT c < 'foo' FROM test1;
>
> But the *non-default* seems not that correct for me. Like a column if it does 
> not mention anything, then the default value is null. So
> * create table test111( a tex*t) The default collation for column a is the 
> same as the output of  *show lc_collate*.
>
> so there is no *non-default? *
>
>
I’m not following the point you are trying to make.  table111.a contributes
the default collation for any expression needing a collation implicitly
resolved.

David J.