Re: Partition column should be part of PK

2021-07-11 Thread Nagaraj Raj
 personally, I feel this design is very bad compared to other DB servers.
> If the goal is to make sure there is only one (billing_account_uid, ban) in 
> any partition regardless of date, you'll need to do something more > 
> sophisticated to make sure that two sessions don't insert an 
> (billing_account_uid, ban) value into two different partitions.  This isn't a 
> great fit > for table partitioning, and you might want to reconsider if 
> partitioning the table is the right answer here.  If you *must* have table 
> partitioning, a > possible algorithm is:
yes, this is my use case.
can I use some trigger on the partition table before inserting the call that 
function this one handle conflict? 

CREATE or replace FUNCTION insert_trigger()    RETURNS trigger    LANGUAGE 
'plpgsql'    COST 100    VOLATILE NOT LEAKPROOFAS $BODY$DECLARE      conn_name 
text;   c_table TEXT;   t_schema text;   c_table1 text;   m_table1 text;     
BEGIN     c_table1 := TG_TABLE_NAME;    t_schema := TG_TABLE_SCHEMA;    
m_table1 := t_schema||'.'||TG_TABLE_NAME;    SELECT conname FROM pg_constraint 
WHERE conrelid = TG_TABLE_NAME ::regclass::oid and contype = 'u' into 
conn_name;    execute 'insert into '|| m_table1 || ' values ' || new.* || ' on 
conflict on constraint ' || conn_name || ' do nothing -- or somthing';    
RETURN null;    end;    $BODY$;
CREATE TRIGGER insert    BEFORE INSERT    ON t4    FOR EACH ROW    WHEN 
(pg_trigger_depth() < 1)    EXECUTE FUNCTION insert_trigger();    CREATE 
TRIGGER insert    BEFORE INSERT    ON t3    FOR EACH ROW    WHEN 
(pg_trigger_depth() < 1)    EXECUTE FUNCTION insert_trigger(); .. so on ..    


https://dbfiddle.uk/?rdbms=postgres_11&fiddle=bcfdfc26685ffb498bf82e6d50da95e3


Please suggest.

Thanks,Rj
On Thursday, July 8, 2021, 08:52:35 PM PDT, Christophe Pettus 
 wrote:  
 
 

> On Jul 8, 2021, at 20:32, Nagaraj Raj  wrote:
> 
> My apologies for making confusion with new thread. Yes its same issue related 
> to earlier post.
> 
> I was trying to figure out  how to ensure unique values for columns 
> (billing_account_guid, ban). If i add partition key to constraint , it wont 
> be possible what im looking for.
> 
> My use case as below 
> 
> INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO 
> UPDATE SET something…
> 
> Or
> 
> INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or 
> uk)(billing_account_guid,ban) DO UPDATE SET something…

Right now, PostgreSQL does not support unique indexes on partitioned tables 
(that operate across all partitions) unless the partition key is included in 
the index definition.  If it didn't have that requirement, it would have to 
independently (and in a concurrency-supported way) scan every partition 
individually to see if there is a duplicate key violation in any of the 
partitions, and the machinery to do that does not exist right now.

If the goal is to make sure there is only one (billing_account_guid, ban, date) 
combination across the entire partition set, you can create an index unique 
index on the partitioned set as (billing_account_guid, ban, date), and INSERT 
... ON CONFLICT DO NOTHING works properly then.

If the goal is to make sure there is only one (billing_account_uid, ban) in any 
partition regardless of date, you'll need to do something more sophisticated to 
make sure that two sessions don't insert an (billing_account_uid, ban) value 
into two different partitions.  This isn't a great fit for table partitioning, 
and you might want to reconsider if partitioning the table is the right answer 
here.  If you *must* have table partitioning, a possible algorithm is:

-- Start a transaction
-- Hash the (billing_account_uid, ban) key into a 64 bit value.
-- Use that 64 bit value as a key to a call to pg_advisory_xact_lock() [1] to, 
in essence, create a signal to any other transaction attempting to insert that 
pair that it is being modified.
-- SELECT on that pair to make sure one does not already exist.
-- If one does not, do the INSERT.
-- Commit, which releases the advisory lock.

This doesn't provide quite the same level of uniqueness that a cross-partition 
index would, but if this is the only code path that does the INSERT, it should 
keep duplicate from showing up in different partitions.

[1] 
https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS

  

Re: Partition column should be part of PK

2021-07-11 Thread Justin Pryzby
On Fri, Jul 09, 2021 at 03:32:46AM +, Nagaraj Raj wrote:
> My apologies for making confusion with new thread. Yes its same issue related 
> to earlier post.
> I was trying to figure out  how to ensure unique values for columns 
> (billing_account_guid, ban). If i add partition key to constraint , it wont 
> be possible what im looking for.
> My use case as below 
> INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO 
> UPDATE SET something…
> 
> Or
> INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or 
> uk)(billing_account_guid,ban) DO UPDATE SET something…

I'm not sure, but did you see the 2nd half of what I wrote in June ?

lightly edited:
> It sounds like you want a unique index on (billing_account_guid, ban) to
> support INSERT ON CONFLICT.  If DO UPDATE SET will never move tuples to a new
> partition, then you could do INSERT ON CONFLICT to a partition rather
> than its parent.
> 
> But it cannot be a unique, "partitioned" index, without including load_dttm.

You could try something that doesn't use a parent/partitioned index (as I was
suggesting).  Otherwise I think you'd have to partition by something else
involving the unique columns, or not use declarative partitioning, or not use
insert on conflict.

Justin

PS, I'm sorry for my own confusion, but happy if people found it amusing.




Re: Partition column should be part of PK

2021-07-11 Thread Christophe Pettus



> On Jul 11, 2021, at 17:36, Nagaraj Raj  wrote:
> 
> personally, I feel this design is very bad compared to other DB servers.

Patches accepted.  The issue is that in order to have a partition-set-wide 
unique index, the system would have to lock the unique index entries in *all* 
partitions, not just the target one. This logic does not currently exist, and 
it's not trivial to implement efficiently.

> can I use some trigger on the partition table before inserting the call that 
> function this one handle conflict? 

That doesn't handle the core problem, which is ensuring that two different 
sessions do not insert the same (billing_account_uid, ban) into two different 
partitions.  That requires some kind of higher-level lock.  The example you 
give isn't required; PostgreSQL will perfectly happily accept a unique 
constraint on (billing_account_uid, ban) on each partition, and handle attempts 
to insert a duplicate row correctly (either by returning an error or processing 
an ON CONFLICT) clause.  What that does not prevent is a duplicate 
(billing_account_uid, ban) in two different partitions.

There's another issue here, which is this design implies that once a particular 
(billing_account_uid, ban) row is created in the partitioned table, it is never 
deleted.  This means older partitions are never dropped, which means the number 
of partitions in the table will row unbounded.  This is not going to scale well 
as the number of partitions starts getting very large.

You might consider, instead, hash-partitioning on one of billing_account_uid or 
ban, or reconsider if partitioning is the right solution here.



Re: Partition column should be part of PK

2021-07-11 Thread David Rowley
On Mon, 12 Jul 2021 at 12:37, Nagaraj Raj  wrote:
> personally, I feel this design is very bad compared to other DB servers.

I'm not sure exactly what you're referring to here as you didn't quote
it, but my guess is you mean our lack of global index support.

Generally, there's not all that much consensus in the community that
this would be a good feature to have.  Why do people want to use
partitioning?  Many people do it so that they can quickly remove data
that's no longer required with a simple DETACH operation.  This is
metadata only and is generally very fast.  Another set of people
partition as their tables are very large and they become much easier
to manage when broken down into parts.  There's also a group of people
who do it for the improved data locality.   Unfortunately, if we had a
global index feature then that requires building a single index over
all partitions.  DETACH is no longer a metadata-only operation as we
must somehow invalidate or remove tuples that belong to the detached
partition. The group of people who partitioned to get away from very
large tables now have a very large index.  Maybe the only group to get
off lightly here are the data locality group. They'll still have the
same data locality on the heap.

So in short, many of the benefits of partitioning disappear when you
have a global index.

So, why did you partition your data in the first place?  If you feel
like you wouldn't mind having a large global index over all partitions
then maybe you're better off just using a non-partitioned table to
store this data.

David




Re: Partition column should be part of PK

2021-07-11 Thread Thomas Kellerer


David Rowley schrieb am 12.07.2021 um 02:57:
> Generally, there's not all that much consensus in the community that
> this would be a good feature to have.  Why do people want to use
> partitioning?  Many people do it so that they can quickly remove data
> that's no longer required with a simple DETACH operation.  This is
> metadata only and is generally very fast.  Another set of people
> partition as their tables are very large and they become much easier
> to manage when broken down into parts.  There's also a group of people
> who do it for the improved data locality.   Unfortunately, if we had a
> global index feature then that requires building a single index over
> all partitions.  DETACH is no longer a metadata-only operation as we
> must somehow invalidate or remove tuples that belong to the detached
> partition. The group of people who partitioned to get away from very
> large tables now have a very large index.  Maybe the only group to get
> off lightly here are the data locality group. They'll still have the
> same data locality on the heap.
>
> So in short, many of the benefits of partitioning disappear when you
> have a global index.

The situations where this is useful are large tables where partitioning
would turn Seq Scans of the whole table into Seq Scans of a partition,
or where it would allow for partition wise joins and still have
foreign keys referencing the partitioned table.

I agree they do have downsides. I only know Oracle as one of those systems
where this is possible, and in general global indexes are somewhat
avoided but there are still situations where they are useful.
E.g. if you want to have foreign keys referencing your partitioned
table and including the partition key in the primary key makes no
sense.

Even though they have disadvantages, I think it would be nice to
have the option to create them.

I know that in the Oracle world, they are used seldomly (precisely
because of the disadvantages you mentioned) but they do have a place.

Thomas