Re: Partition column should be part of PK
personally, I feel this design is very bad compared to other DB servers. > If the goal is to make sure there is only one (billing_account_uid, ban) in > any partition regardless of date, you'll need to do something more > > sophisticated to make sure that two sessions don't insert an > (billing_account_uid, ban) value into two different partitions. This isn't a > great fit > for table partitioning, and you might want to reconsider if > partitioning the table is the right answer here. If you *must* have table > partitioning, a > possible algorithm is: yes, this is my use case. can I use some trigger on the partition table before inserting the call that function this one handle conflict? CREATE or replace FUNCTION insert_trigger() RETURNS trigger LANGUAGE 'plpgsql' COST 100 VOLATILE NOT LEAKPROOFAS $BODY$DECLARE conn_name text; c_table TEXT; t_schema text; c_table1 text; m_table1 text; BEGIN c_table1 := TG_TABLE_NAME; t_schema := TG_TABLE_SCHEMA; m_table1 := t_schema||'.'||TG_TABLE_NAME; SELECT conname FROM pg_constraint WHERE conrelid = TG_TABLE_NAME ::regclass::oid and contype = 'u' into conn_name; execute 'insert into '|| m_table1 || ' values ' || new.* || ' on conflict on constraint ' || conn_name || ' do nothing -- or somthing'; RETURN null; end; $BODY$; CREATE TRIGGER insert BEFORE INSERT ON t4 FOR EACH ROW WHEN (pg_trigger_depth() < 1) EXECUTE FUNCTION insert_trigger(); CREATE TRIGGER insert BEFORE INSERT ON t3 FOR EACH ROW WHEN (pg_trigger_depth() < 1) EXECUTE FUNCTION insert_trigger(); .. so on .. https://dbfiddle.uk/?rdbms=postgres_11&fiddle=bcfdfc26685ffb498bf82e6d50da95e3 Please suggest. Thanks,Rj On Thursday, July 8, 2021, 08:52:35 PM PDT, Christophe Pettus wrote: > On Jul 8, 2021, at 20:32, Nagaraj Raj wrote: > > My apologies for making confusion with new thread. Yes its same issue related > to earlier post. > > I was trying to figure out how to ensure unique values for columns > (billing_account_guid, ban). If i add partition key to constraint , it wont > be possible what im looking for. > > My use case as below > > INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO > UPDATE SET something… > > Or > > INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or > uk)(billing_account_guid,ban) DO UPDATE SET something… Right now, PostgreSQL does not support unique indexes on partitioned tables (that operate across all partitions) unless the partition key is included in the index definition. If it didn't have that requirement, it would have to independently (and in a concurrency-supported way) scan every partition individually to see if there is a duplicate key violation in any of the partitions, and the machinery to do that does not exist right now. If the goal is to make sure there is only one (billing_account_guid, ban, date) combination across the entire partition set, you can create an index unique index on the partitioned set as (billing_account_guid, ban, date), and INSERT ... ON CONFLICT DO NOTHING works properly then. If the goal is to make sure there is only one (billing_account_uid, ban) in any partition regardless of date, you'll need to do something more sophisticated to make sure that two sessions don't insert an (billing_account_uid, ban) value into two different partitions. This isn't a great fit for table partitioning, and you might want to reconsider if partitioning the table is the right answer here. If you *must* have table partitioning, a possible algorithm is: -- Start a transaction -- Hash the (billing_account_uid, ban) key into a 64 bit value. -- Use that 64 bit value as a key to a call to pg_advisory_xact_lock() [1] to, in essence, create a signal to any other transaction attempting to insert that pair that it is being modified. -- SELECT on that pair to make sure one does not already exist. -- If one does not, do the INSERT. -- Commit, which releases the advisory lock. This doesn't provide quite the same level of uniqueness that a cross-partition index would, but if this is the only code path that does the INSERT, it should keep duplicate from showing up in different partitions. [1] https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
Re: Partition column should be part of PK
On Fri, Jul 09, 2021 at 03:32:46AM +, Nagaraj Raj wrote: > My apologies for making confusion with new thread. Yes its same issue related > to earlier post. > I was trying to figure out how to ensure unique values for columns > (billing_account_guid, ban). If i add partition key to constraint , it wont > be possible what im looking for. > My use case as below > INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO > UPDATE SET something… > > Or > INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or > uk)(billing_account_guid,ban) DO UPDATE SET something… I'm not sure, but did you see the 2nd half of what I wrote in June ? lightly edited: > It sounds like you want a unique index on (billing_account_guid, ban) to > support INSERT ON CONFLICT. If DO UPDATE SET will never move tuples to a new > partition, then you could do INSERT ON CONFLICT to a partition rather > than its parent. > > But it cannot be a unique, "partitioned" index, without including load_dttm. You could try something that doesn't use a parent/partitioned index (as I was suggesting). Otherwise I think you'd have to partition by something else involving the unique columns, or not use declarative partitioning, or not use insert on conflict. Justin PS, I'm sorry for my own confusion, but happy if people found it amusing.
Re: Partition column should be part of PK
> On Jul 11, 2021, at 17:36, Nagaraj Raj wrote: > > personally, I feel this design is very bad compared to other DB servers. Patches accepted. The issue is that in order to have a partition-set-wide unique index, the system would have to lock the unique index entries in *all* partitions, not just the target one. This logic does not currently exist, and it's not trivial to implement efficiently. > can I use some trigger on the partition table before inserting the call that > function this one handle conflict? That doesn't handle the core problem, which is ensuring that two different sessions do not insert the same (billing_account_uid, ban) into two different partitions. That requires some kind of higher-level lock. The example you give isn't required; PostgreSQL will perfectly happily accept a unique constraint on (billing_account_uid, ban) on each partition, and handle attempts to insert a duplicate row correctly (either by returning an error or processing an ON CONFLICT) clause. What that does not prevent is a duplicate (billing_account_uid, ban) in two different partitions. There's another issue here, which is this design implies that once a particular (billing_account_uid, ban) row is created in the partitioned table, it is never deleted. This means older partitions are never dropped, which means the number of partitions in the table will row unbounded. This is not going to scale well as the number of partitions starts getting very large. You might consider, instead, hash-partitioning on one of billing_account_uid or ban, or reconsider if partitioning is the right solution here.
Re: Partition column should be part of PK
On Mon, 12 Jul 2021 at 12:37, Nagaraj Raj wrote: > personally, I feel this design is very bad compared to other DB servers. I'm not sure exactly what you're referring to here as you didn't quote it, but my guess is you mean our lack of global index support. Generally, there's not all that much consensus in the community that this would be a good feature to have. Why do people want to use partitioning? Many people do it so that they can quickly remove data that's no longer required with a simple DETACH operation. This is metadata only and is generally very fast. Another set of people partition as their tables are very large and they become much easier to manage when broken down into parts. There's also a group of people who do it for the improved data locality. Unfortunately, if we had a global index feature then that requires building a single index over all partitions. DETACH is no longer a metadata-only operation as we must somehow invalidate or remove tuples that belong to the detached partition. The group of people who partitioned to get away from very large tables now have a very large index. Maybe the only group to get off lightly here are the data locality group. They'll still have the same data locality on the heap. So in short, many of the benefits of partitioning disappear when you have a global index. So, why did you partition your data in the first place? If you feel like you wouldn't mind having a large global index over all partitions then maybe you're better off just using a non-partitioned table to store this data. David
Re: Partition column should be part of PK
David Rowley schrieb am 12.07.2021 um 02:57: > Generally, there's not all that much consensus in the community that > this would be a good feature to have. Why do people want to use > partitioning? Many people do it so that they can quickly remove data > that's no longer required with a simple DETACH operation. This is > metadata only and is generally very fast. Another set of people > partition as their tables are very large and they become much easier > to manage when broken down into parts. There's also a group of people > who do it for the improved data locality. Unfortunately, if we had a > global index feature then that requires building a single index over > all partitions. DETACH is no longer a metadata-only operation as we > must somehow invalidate or remove tuples that belong to the detached > partition. The group of people who partitioned to get away from very > large tables now have a very large index. Maybe the only group to get > off lightly here are the data locality group. They'll still have the > same data locality on the heap. > > So in short, many of the benefits of partitioning disappear when you > have a global index. The situations where this is useful are large tables where partitioning would turn Seq Scans of the whole table into Seq Scans of a partition, or where it would allow for partition wise joins and still have foreign keys referencing the partitioned table. I agree they do have downsides. I only know Oracle as one of those systems where this is possible, and in general global indexes are somewhat avoided but there are still situations where they are useful. E.g. if you want to have foreign keys referencing your partitioned table and including the partition key in the primary key makes no sense. Even though they have disadvantages, I think it would be nice to have the option to create them. I know that in the Oracle world, they are used seldomly (precisely because of the disadvantages you mentioned) but they do have a place. Thomas
