Re: Define hash partition for certain column values

2021-01-11 Thread Ron

On 1/11/21 12:36 AM, Tom Lane wrote:

=?utf-8?B?0JPQvtC70YPQsdC10LLQsCDQr9C90LA=?=  writes:

Hello,  I've found in source code that there is a function 
satisfies_hash_partition(oid, modulus, remainder, column_values[]) which allows 
to check if the certain column value will be placed in the certain partition. 
I' d like to know if there is an opportunity not to check the certain partition 
but to define which partition will be the certain column value placed in.

If you want to control what goes where, use list partitioning (or,
perhaps, range partitioning).  Hash is only suitable if you do not
care which partition any particular row goes to.

Personally, I think hash partitioning is mostly academic, precisely
because of that.  If the partitioning doesn't line up with application
requirements, you give up too much of the benefit of using partitions.


In non-MBCC systems, hash partitioning minimizes lock conflicts because the 
writes aren't all going into the same page.  OLTP systems can use this 
feature to distribute writes across pages; some also allow for "mixed 
pages", where records from multiple tables get written to the same page.  
(This then means that one DIO is used to read a parent and all it's child 
records.  Naturally, range reports are *very* slow, but sometimes OLTP 
performance is paramount.)


--
Angular momentum makes the world go 'round.


Understanding GIN indexes

2021-01-11 Thread Jack Orenstein
I am building a new type, which will be indexed using a GIN index. Things
are starting to work, and I am seeing queries use the index, call the
partialMatch(), consistent(), and compare() functions, and return correct
results.

However, I am still unclear on some aspects of how partialMatch and
consistent are supposed to work, (so my implementation of consistent()
always sets *refresh to true).

1) The recheck logic of consistent() is unclear to me. The docs say (
https://www.postgresql.org/docs/12/gin-extensibility.html):

On success, *recheck should be set to true if the heap tuple needs to be
rechecked against the query operator, or false if the index test is exact.
That is, a false return value guarantees that the heap tuple does not match
the query; a true return value with *recheck set to false guarantees that
the heap tuple does match the query; and a true return value with *recheck
set to true means that the heap tuple might match the query, so it needs to
be fetched and rechecked by evaluating the query operator directly against
the originally indexed item.

How can it ever be correct to return true and set *recheck to false? My
understanding of conventional (btree) indexes is that the row needs to be
retrieved, and the index condition rechecked, because the table has
visibility information, and the index does not -- a key in the index might
correspond to an obsolete row version. I understand visibility map
optimizations, and the fact that going to the actual data page can
sometimes be skipped. But that doesn't seem to be what the consistent()
refetch flag is about.

In other words, how can consistent() ever decide that a recheck is not
necessary, since the index entry may be from an obsolete row version?
Couldn't returning true and setting *recheck to false result in a false
positive?

2) For partial matches, why does consistent() need to be called at all? For
a given key (2nd arg), partialMatch() decides whether the key satisfies the
index condition. Why is a further check by consistent() required?

I think that my mental model of how GIN works must be way off. Is there a
presentation or paper that explains how GIN works?

Jack Orenstein


Re: Static memory, shared memory

2021-01-11 Thread Jack Orenstein



On Sat, Jan 9, 2021 at 12:18 PM Tom Lane  wrote:
> Jack Orenstein  writes:
> > I am writing a Postgres extension, and thought that I had memory
> > corruption, (thanks for the --enable-cassert lead). I might, but It now
> > looks like I need to understand the use of shared memory and locking in
> > Postgres. So I have two questions.
> 
> > 1) I am now guessing that my original problem is caused by relying on
> > static memory in my extension (i.e., in the source declaring
> > PG_MODULE_MAGIC). This static memory is almost but not quite constant -- it
> > is initialized from _PG_init, and then never modified. I suspect that this
> > cannot work in general (since Postgres is multi-process), but I thought it
> > would be adequate for early development. However, I am seeing this static
> > memory get corrupted even when there is only a single process executing the
> > extension code (verified by examining getpid()).
> 
> Define what you mean by "corrupted".  It seems highly unlikely that any
> code but your own is touching this memory.

Some fields have expected values, others do not.

I think I have just figured out this problem, and it was indeed my own gun 
shooting my own foot.
 
> 
> Really the big-picture question here is what are you hoping to accomplish
> and why do you think this memory might need to be shared?

The type I am implementing depends on looking up data in an array whose size is 
approximately 64k. This array needs to be computed once and for all early on, 
and is then consulted as the type is used. For now, I have hardwired the 
parameters that determine the array's contents, and the array is constructed 
during _PG_init. I will eventually remove this scaffolding, and compute the 
array contents when required, which should be a rare event.

Jack Orenstein


What to do with tablespaces when upgrading to pg13 from pg1X?

2021-01-11 Thread James(王旭)
Hello,
I was planning to upgrade from 12 to 13 using this command:
/usr/pgsql-13/bin/pg_upgrade -b /usr/pgsql-12/bin/ -B /usr/pgsql-13/bin/ -d 
/data/pg/ -D /pg/pgdata_13/ --jobs=10

And I got this output:
Checking for presence of required libraries          
       ok
Checking database user is the install user          
        ok
Checking for prepared transactions            
              ok
Checking for new cluster tablespace directories
new cluster tablespace directory already exists: 
"/data/tsdb/metas/PG_13_202007201"
Failure, exiting
 I think it's because /data/tsdb was my tablespace dir which was out of the old 
main data directory(/data/pg/).


So what should I do with old tablespaces when upgrading ?


Thanks a lot !

Re: What to do with tablespaces when upgrading to pg13 from pg1X?

2021-01-11 Thread Bruce Momjian
On Tue, Jan 12, 2021 at 10:28:52AM +0800, James(王旭) wrote:
> Hello,
> I was planning to upgrade from 12 to 13 using this command:
> 
> /usr/pgsql-13/bin/pg_upgrade -b /usr/pgsql-12/bin/ -B /usr/pgsql-13/bin/ 
> -d
> /data/pg/ -D /pg/pgdata_13/ --jobs=10
> 
> 
> And I got this output:
> 
> Checking for presence of required libraries ok
> Checking database user is the install user  ok
> Checking for prepared transactions  ok
> Checking for new cluster tablespace directories
> new cluster tablespace directory already exists: "/data/tsdb/metas/
> PG_13_202007201"
> Failure, exiting
> 
> I think it's because /data/tsdb was my tablespace dir which was out of the old
> main data directory(/data/pg/).
> 
> So what should I do with old tablespaces when upgrading ?

There should be a subdirectory under your tablespace directory for every
major version, e.g., PG_13_202007201.  I have no idea why your _new_
version already has a directory there.  Do you have a second cluster on
the machine that is using that tablespace directory for PG 13?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: What to do with tablespaces when upgrading to pg13 from pg1X?

2021-01-11 Thread James(王旭)
You mean PG_13_202007201 was created by PG 13 ? No, there isn't any other 
clusters, maybe it was created by my previous failed attempts of upgrading.
So it seems it should be ok that I issue a command like this 
find . -name "PG_13*"|xargs rm -rf and then do upgrade again?


 


There should be a subdirectory under your tablespace directory for every
major version, e.g., PG_13_202007201.  I have no idea why your _new_
version already has a directory there.  Do you have a second cluster on
the machine that is using that tablespace directory for PG 13?

-- 
  Bruce Momjian  
https://momjian.us
  
EnterpriseDB
 https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: What to do with tablespaces when upgrading to pg13 from pg1X?

2021-01-11 Thread Bruce Momjian
On Tue, Jan 12, 2021 at 10:50:16AM +0800, James(王旭) wrote:
> You mean PG_13_202007201 was created by PG 13 ? No, there isn't any other
> clusters, maybe it was created by my previous failed attempts of upgrading.

Yes, it had to be created by PG 13 --- it starts with PG_13, followed by
the system catalog version.  Yes, it might be left over.  You have to
delete the old cluster and its tablespaces when you re-try.

> So it seems it should be ok that I issue a command like this 
> 
> find . -name "PG_13*"|xargs rm -rf 
> 
> and then do upgrade again?

I suggest you do more research than just run that --- at least I would.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: What to do with tablespaces when upgrading to pg13 from pg1X?

2021-01-11 Thread James(王旭)
Got you, thanks a lot Bruce!
 
 
-- Original --
From:  "Bruce Momjian"https://momjian.us
  
EnterpriseDB
 https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee