"Would probably look at a nested partitioning"
I'm not the original poster, but I have a schema with nested
(composite) partitions and I do run into some significant
inefficiencies compared to flat partitions in various schema metadata
operations (queries to get the list of tables, creating foreign keys,
etc.) in tables with 1,000+ total partitions.
One example:
https://www.postgresql.org/message-id/CAE%2BE%3DSQacy6t_3XzCWnY1eiRcNWfz4pp02FER0N7mU_F%2Bo8G_Q%40mail.gmail.com
Alec
On Sun, Feb 11, 2024 at 8:25 AM Justin wrote:
>
> Hi Marc,
>
> Nested partitioning still allows for simple data deletion by dropping the
> table that falls in that date range.
>
> Probably thinking of partitioning by multicolomn rules which is very complex
> to set up
>
> On Fri, Feb 9, 2024, 10:29 AM Marc Millas wrote:
>>
>>
>>
>>
>> On Thu, Feb 8, 2024 at 10:25 PM Justin wrote:
>>>
>>> Hi Sud,
>>>
>>> Would not look at HASH partitioning as it is very expensive to add or
>>> subtract the number of partitions.
>>>
>>> Would probably look at a nested partitioning using customer ID using range
>>> or list of IDs then by transaction date, Its easy to add partitions and
>>> balance the partitions segments.
>>
>>
>> I'll not do that because, then, when getting rid of obsolete data, you must
>> delete a huge number of records, and vacuum each partition.
>> if partitioning by date, you will ease greatly the cleaning, by just getting
>> rid of obsolete partitions which is quite speedy.( no delete, no vacuum, no
>> index updates, ...)
>> Marc
>>
>>>
>>> Keep in mind that SELECT queries being used on the partition must use the
>>> partitioning KEY in the WHERE clause of the query or performance will
>>> suffer.
>>>
>>> Suggest doing a query analysis before deploying partition to confirm the
>>> queries WHERE clauses matched the planned partition rule. I suggest that
>>> 80% of the queries of the executed queries must match the partition rule if
>>> not don't deploy partitioning or change all the queries in the application
>>> to match the partition rule
>>>
>>>
>>> On Thu, Feb 8, 2024 at 3:51 PM Greg Sabino Mullane
>>> wrote:
>
> Out of curiosity, As OP mentioned that there will be Joins and also
> filters on column Customer_id column , so why don't you think that
> subpartition by customer_id will be a good option? I understand List
> subpartition may not be an option considering the new customer_ids gets
> added slowly in the future(and default list may not be allowed) and also
> OP mentioned, there is skewed distribution of data for customer_id
> column. However what is the problem if OP will opt for HASH subpartition
> on customer_id in this situation?
It doesn't really gain you much, given you would be hashing it, the
customers are unevenly distributed, and OP talked about filtering on the
customer_id column. A hash partition would just be a lot more work and
complexity for us humans and for Postgres. Partitioning for the sake of
partitioning is not a good thing. Yes, smaller tables are better, but they
have to be smaller targeted tables.
sud wrote:
> 130GB of storage space as we verified using the "pg_relation_size"
> function, for a sample data set.
You might also want to closely examine your schema. At that scale, every
byte saved per row can add up.
Cheers,
Greg