When to use PARTITION BY HASH?

2020-06-02 Thread Oleksandr Shulgin
Hi!

I was reading up on declarative partitioning[1] and I'm not sure what could
be a possible application of Hash partitioning.

Is anyone actually using it?  What are typical use cases?  What benefits
does such a partitioning scheme provide?

On its face, it seems that it can only give you a number of tables which
are smaller than the un-partitioned one, but I fail to see how it would
provide any of the potential advantages listed in the documentation.

With a reasonable hash function, the distribution of rows across partitions
should be more or less equal, so I wouldn't expect any of the following to
hold true:
- "...most of the heavily accessed rows of the table are in a single
partition or a small number of partitions."
- "Bulk loads and deletes can be accomplished by adding or removing
partitions...",
etc.

That *might* turn out to be the case with a small number of distinct values
in the partitioning column(s), but then why rely on hash assignment instead
of using PARTITION BY LIST in the first place?

Regards,
-- 
Alex

[1] https://www.postgresql.org/docs/12/ddl-partitioning.html


Re: When to use PARTITION BY HASH?

2020-06-02 Thread Justin Pryzby
> To: [email protected], [email protected]

Please don't cross post to multiple lists.

On Tue, Jun 02, 2020 at 07:17:11PM +0200, Oleksandr Shulgin wrote:
> I was reading up on declarative partitioning[1] and I'm not sure what could
> be a possible application of Hash partitioning.

It's a good question.  See Tom's complaint here.
https://www.postgresql.org/message-id/31605.1586112900%40sss.pgh.pa.us

It *does* provide the benefit of smaller indexes and smaller tables, which
might allow seq scans to outpeform index scans.

It's maybe only useful for equality conditions on the partition key, and not
for ranges.  Here, it scans a single partition:

postgres=# CREATE TABLE t(i int) PARTITION BY HASH(i); CREATE TABLE t1 
PARTITION OF t FOR VALUES WITH (REMAINDER 0, MODULUS 3);
postgres=# CREATE TABLE t2 PARTITION OF t FOR VALUES WITH (MODULUS 3, REMAINDER 
1);
postgres=# CREATE TABLE t3 PARTITION OF t FOR VALUES WITH (MODULUS 3, REMAINDER 
2);
postgres=# INSERT INTO t SELECT i%9 FROM generate_series(1,)i; ANALYZE t;
postgres=# explain analyze SELECT * FROM t WHERE i=3;
 Seq Scan on t2  (cost=0.00..75.55 rows= width=4) (actual time=0.021..0.518 
rows= loops=1)
   Filter: (i = 3)
   Rows Removed by Filter: 

-- 
Justin




Re: When to use PARTITION BY HASH?

2020-06-02 Thread MichaelDBA

Hi,

I use it quite often, since I'm dealing with partitioning keys that have 
high cardinality, ie, high number of different values.  If your 
cardinality is very high, but your spacing between values is not 
uniform, HASH will balance your partitioned tables naturally.  If your 
spacing between values is consistent, perhaps RANGE partitioning would 
be better.


Regards,
Michael Vitale

Oleksandr Shulgin wrote on 6/2/2020 1:17 PM:

Hi!

I was reading up on declarative partitioning[1] and I'm not sure what 
could be a possible application of Hash partitioning.


Is anyone actually using it? What are typical use cases?  What 
benefits does such a partitioning scheme provide?


On its face, it seems that it can only give you a number of tables 
which are smaller than the un-partitioned one, but I fail to see how 
it would provide any of the potential advantages listed in the 
documentation.


With a reasonable hash function, the distribution of rows across 
partitions should be more or less equal, so I wouldn't expect any of 
the following to hold true:
- "...most of the heavily accessed rows of the table are in a single 
partition or a small number of partitions."
- "Bulk loads and deletes can be accomplished by adding or removing 
partitions...",

etc.

That *might* turn out to be the case with a small number of distinct 
values in the partitioning column(s), but then why rely on hash 
assignment instead of using PARTITION BY LIST in the first place?


Regards,
--
Alex

[1] https://www.postgresql.org/docs/12/ddl-partitioning.html







Re: When to use PARTITION BY HASH?

2020-06-02 Thread David G. Johnston
On Tue, Jun 2, 2020 at 10:17 AM Oleksandr Shulgin <
[email protected]> wrote:

> That *might* turn out to be the case with a small number of distinct
> values in the partitioning column(s), but then why rely on hash
> assignment instead of using PARTITION BY LIST in the first place?
>
> [1] https://www.postgresql.org/docs/12/ddl-partitioning.html
>

Why the cross-posting? (-performance is oriented toward problem solving,
not theory, so -general is the one and only PostgreSQL list this should
have been sent to)

Anyway, quoting the documentation you linked to:

"When choosing how to partition your table, it's also important to consider
what changes may occur in the future. For example, if you choose to have
one partition per customer and you currently have a small number of large
customers, consider the implications if in several years you instead find
yourself with a large number of small customers. In this case, it may be
better to choose to partition by HASH and choose a reasonable number of
partitions rather than trying to partition by LIST and hoping that the
number of customers does not increase beyond what it is practical to
partition the data by."

Hashing does indeed preclude some of the benefits and introduces others.

I suspect that having a hash function that turns its input into a different
output and checking for equality on the output would be better than trying
to "OR" a partition list together in order to combine multiple inputs onto
the same table.

David J.


Re: When to use PARTITION BY HASH?

2020-06-02 Thread Michel Pelletier
On Tue, Jun 2, 2020 at 10:17 AM Oleksandr Shulgin <
[email protected]> wrote:

> Hi!
>
> I was reading up on declarative partitioning[1] and I'm not sure what
> could be a possible application of Hash partitioning.
>
> Is anyone actually using it?  What are typical use cases?  What benefits
> does such a partitioning scheme provide?
>
> On its face, it seems that it can only give you a number of tables which
> are smaller than the un-partitioned one, but I fail to see how it would
> provide any of the potential advantages listed in the documentation.
>

I'm sure there will be many delightful answers to your question, and I look
forward to them!  From my point of view, hash partitioning is very useful
for spreading out high insert/update load.  Yes its' true you end up with
more smaller tables than one big large one, but remember the indexes are
(often) tree data structures.  Smaller trees are faster than bigger trees.
By making the indexes smaller they are faster.  Since the planner can knows
to only examine the specific index it needs, this ends up being a lot
faster.

Postgres can also parallelize queries on partitions.  This is different
from a parallel sequential scan, which can also happen per-partition, so
there are multiple levels of parallel opportunity.

And last that I can think of, you can put the different partitions in
different tablespaces, improving the total IO bandwidth.

-Michel



> With a reasonable hash function, the distribution of rows across
> partitions should be more or less equal, so I wouldn't expect any of the
> following to hold true:
> - "...most of the heavily accessed rows of the table are in a single
> partition or a small number of partitions."
> - "Bulk loads and deletes can be accomplished by adding or removing
> partitions...",
> etc.
>
> That *might* turn out to be the case with a small number of distinct
> values in the partitioning column(s), but then why rely on hash
> assignment instead of using PARTITION BY LIST in the first place?
>
> Regards,
> --
> Alex
>
> [1] https://www.postgresql.org/docs/12/ddl-partitioning.html
>
>