Re: Queries on QMF to POSTGRE

2019-07-22 Thread Oleksandr Shulgin
-hackers
+pgsql general 

On Sun, Jul 21, 2019 at 7:33 PM JVM .  wrote:

>
>
> I’m looking to convert QMF Queries , QMF forms and QMF procedure to the
> POSTGRESQL will it support all of them.
>
> If yes please help us with the sample example. Or any Documentation.
>

What would help anyone willing to help you, is you providing documentation
or definition of QFM and some examples of those.

Cheers,
--
Alex


When to use PARTITION BY HASH?

2020-06-02 Thread Oleksandr Shulgin
Hi!

I was reading up on declarative partitioning[1] and I'm not sure what could
be a possible application of Hash partitioning.

Is anyone actually using it?  What are typical use cases?  What benefits
does such a partitioning scheme provide?

On its face, it seems that it can only give you a number of tables which
are smaller than the un-partitioned one, but I fail to see how it would
provide any of the potential advantages listed in the documentation.

With a reasonable hash function, the distribution of rows across partitions
should be more or less equal, so I wouldn't expect any of the following to
hold true:
- "...most of the heavily accessed rows of the table are in a single
partition or a small number of partitions."
- "Bulk loads and deletes can be accomplished by adding or removing
partitions...",
etc.

That *might* turn out to be the case with a small number of distinct values
in the partitioning column(s), but then why rely on hash assignment instead
of using PARTITION BY LIST in the first place?

Regards,
-- 
Alex

[1] https://www.postgresql.org/docs/12/ddl-partitioning.html


Re: When to use PARTITION BY HASH?

2020-06-03 Thread Oleksandr Shulgin
On Tue, Jun 2, 2020 at 7:47 PM Stephen Frost  wrote:

>
> Please don't cross post to multiple lists without any particular reason
> for doing so- pick whichever list makes sense and post to that.
>

Sorry for the trouble, I should've checked it more carefully.
When posting I did think it may be relevant to the performance list as well.

At the same time, wouldn't it make sense to document this policy explicitly?
/me resists the urge of cross-posting to pgsql-www ;)

Cheers,
--
Alex


Re: When to use PARTITION BY HASH?

2020-06-03 Thread Oleksandr Shulgin
(sticking to pgsql-general)

On Tue, Jun 2, 2020 at 7:45 PM Michel Pelletier 
wrote:

>
> On Tue, Jun 2, 2020 at 10:17 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>>
>> I was reading up on declarative partitioning[1] and I'm not sure what
>> could be a possible application of Hash partitioning.
>>
>> Is anyone actually using it?  What are typical use cases?  What benefits
>> does such a partitioning scheme provide?
>>
>> On its face, it seems that it can only give you a number of tables which
>> are smaller than the un-partitioned one, but I fail to see how it would
>> provide any of the potential advantages listed in the documentation.
>>
>
>

> From my point of view, hash partitioning is very useful for spreading out
> high insert/update load.
>

Do you also assign the partitions to different tablespaces as you've
hinted below or do you see performance improvement from partitioning
alone?  How does that work?  Does it give better  results than using a RAID
to spread the disk IO, for example?

Yes its' true you end up with more smaller tables than one big large one,
> but remember the indexes are (often) tree data structures.  Smaller trees
> are faster than bigger trees.  By making the indexes smaller they are
> faster.  Since the planner can knows to only examine the specific index it
> needs, this ends up being a lot faster.
>

That sounds logical, but can it be demonstrated?  If the index(es) fit in
memory fully, it doesn't make a measurable difference, I guess?

With hash partitioning you are not expected, in general, to end up with a
small number of partitions being accessed more heavily than the rest.  So
your indexes will also not fit into memory.

I have the feeling that using a hash function to distribute rows simply
contradicts the basic assumption of when you would think of partitioning
your table at all: that is to make sure the most active part of the table
and indexes is small enough to be cached in memory.

Regards,
--
Alex


Re: When to use PARTITION BY HASH?

2020-06-05 Thread Oleksandr Shulgin
On Thu, Jun 4, 2020 at 4:32 PM Jeff Janes  wrote:

> On Wed, Jun 3, 2020 at 7:55 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> With hash partitioning you are not expected, in general, to end up with a
>> small number of partitions being accessed more heavily than the rest.  So
>> your indexes will also not fit into memory.
>>
>> I have the feeling that using a hash function to distribute rows simply
>> contradicts the basic assumption of when you would think of partitioning
>> your table at all: that is to make sure the most active part of the table
>> and indexes is small enough to be cached in memory.
>>
>
> While hash partitioning doesn't appeal to me, I think this may be overly
> pessimistic.  It would not be all that unusual for your customers to take
> turns being highly active and less active.  Especially if you do occasional
> bulk loads all with the same customer_id for any given load, for example.
>

For a bulk load you'd likely want to go with an empty partition w/o indexes
and build them later, after loading the tuples.  While it might not be
possible with any given partitioning scheme either, using hash partitioning
most certainly precludes that.


> So while you might not have a permanently hot partition, you could have
> partitions which are hot in turn.  Of course you could get the same benefit
> (and probably better) with list or range partitioning rather than hash, but
> then you have to maintain those lists or ranges when you add new customers.
>

Why are LRU eviction from the shared buffers and OS disk cache not good
enough to handle this?

This actually applies to any partitioning scheme: the hot dataset could be
recognized by these caching layers.  Does it not happen in practice?

--
Alex


Re: When to use PARTITION BY HASH?

2020-06-08 Thread Oleksandr Shulgin
On Sat, Jun 6, 2020 at 6:14 PM Michel Pelletier 
wrote:

>
> Well lets take a step back here and look at the question, hash
> partitioning exists in Postgres, is it useful?  While I appreciate the need
> to see a fact demonstrated, and generally avoiding argument by authority,
> it is true that many of the very smartest database people in the world
> conceived of, discussed, implemented and documented this feature for us.
> It stands to reason that it is useful, or it wouldn't exist.  So maybe this
> is more about finding or needing better partitioning documentation.
>

Fair point.

I've found the original commit adding this feature in version 11:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e
It says:

"Hash partitioning is useful when you want to partition a growing data
set evenly.  This can be useful to keep table sizes reasonable, which
makes maintenance operations such as VACUUM faster, or to enable
partition-wise join."

It also includes a link to discussion, though that starts in the middle of
a long thread.
The original thread is here:
https://www.postgresql.org/message-id/flat/20170228233313.fc14d8b6.nagata%40sraoss.co.jp

However, these threads only argue about implementation details and it's not
easy to find a discussion of motivation for this particular partitioning
scheme support.
I guess it was quite obvious to the participants at that point already.

With hash partitioning you are not expected, in general, to end up with a
>> small number of partitions being accessed more heavily than the rest.  So
>> your indexes will also not fit into memory.
>>
>
> Indexes are not (usually) constant time structures, they take more time
> the bigger they get.  So partitioned indexes will be smaller, quicker to
> insert into, and quicker to vacuum, and also gain possible pruning
> advantages on query when you split them up.  If the planner can, knowing
> the key, exclude all but one partition, it won't even look at the other
> tables, so if you hash partition by primary key, you reduce the search
> space to 1/N immediately.
>
> Indexes with high update activity also suffer from a problem called "index
> bloat" where spares "holes" get punched in the buckets of btree indexes
> from updates and delete deletes.  These holes are minimized by vacuuming
> but the bigger the index gets, the harder that process is to maintain.
> Smaller indexes suffer less from index bloat, and remedying the situation
> is easier because you can reindex partitions independently of each other.
> Your not just reducing the query load to an Nth, you're reducing the
> maintenance load.
>

Thanks for taking your time to explain it in detail.  Though I do not tend
to believe the insert/scan performance benefit is measurable without trying
it, I do see the benefits for maintenance.

I have the feeling that using a hash function to distribute rows simply
>> contradicts the basic assumption of when you would think of partitioning
>> your table at all: that is to make sure the most active part of the table
>> and indexes is small enough to be cached in memory.
>>
>
> I think you might be framing this with a specific data pattern in mind,
> not all data distributions have a "most active" or power law distribution
> of data.
>

I'm just referring to the first bullet-point in the docs:

"Query performance can be improved dramatically in certain situations,
particularly when most of the heavily accessed rows of the table are in a
single partition or a small number of partitions. The partitioning
substitutes for leading columns of indexes, reducing index size and making
it more likely that the heavily-used parts of the indexes fit in memory."

I think it does not apply to hash partitioning in the general case.

--
Alex


Re: Error on insert xml

2018-12-19 Thread Oleksandr Shulgin
On Wed, Dec 19, 2018 at 1:38 PM Михаил Яремчук 
wrote:

> When I insert this data in the column with type xml, I get an error "SSL
> SYSCALL error: EOF detected" and the gap of all current connections. When
> I insert a simpler xml, but larger everything is ok.
> version: PostgreSQL 9.6.11 on x86_64-pc-linux-gnu (Ubuntu
> 9.6.11-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10)
> 5.4.0 20160609, 64-bit
>

(-bugs, +general)

Assuming the server is running remotely, maybe you should check link's MTU
on the client?  Or does the server backend crash when you run the
query--did you check the server logs?

Regards,
--
Alex