Re: Queries on QMF to POSTGRE
-hackers +pgsql general On Sun, Jul 21, 2019 at 7:33 PM JVM . wrote: > > > I’m looking to convert QMF Queries , QMF forms and QMF procedure to the > POSTGRESQL will it support all of them. > > If yes please help us with the sample example. Or any Documentation. > What would help anyone willing to help you, is you providing documentation or definition of QFM and some examples of those. Cheers, -- Alex
When to use PARTITION BY HASH?
Hi! I was reading up on declarative partitioning[1] and I'm not sure what could be a possible application of Hash partitioning. Is anyone actually using it? What are typical use cases? What benefits does such a partitioning scheme provide? On its face, it seems that it can only give you a number of tables which are smaller than the un-partitioned one, but I fail to see how it would provide any of the potential advantages listed in the documentation. With a reasonable hash function, the distribution of rows across partitions should be more or less equal, so I wouldn't expect any of the following to hold true: - "...most of the heavily accessed rows of the table are in a single partition or a small number of partitions." - "Bulk loads and deletes can be accomplished by adding or removing partitions...", etc. That *might* turn out to be the case with a small number of distinct values in the partitioning column(s), but then why rely on hash assignment instead of using PARTITION BY LIST in the first place? Regards, -- Alex [1] https://www.postgresql.org/docs/12/ddl-partitioning.html
Re: When to use PARTITION BY HASH?
On Tue, Jun 2, 2020 at 7:47 PM Stephen Frost wrote: > > Please don't cross post to multiple lists without any particular reason > for doing so- pick whichever list makes sense and post to that. > Sorry for the trouble, I should've checked it more carefully. When posting I did think it may be relevant to the performance list as well. At the same time, wouldn't it make sense to document this policy explicitly? /me resists the urge of cross-posting to pgsql-www ;) Cheers, -- Alex
Re: When to use PARTITION BY HASH?
(sticking to pgsql-general) On Tue, Jun 2, 2020 at 7:45 PM Michel Pelletier wrote: > > On Tue, Jun 2, 2020 at 10:17 AM Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > >> >> I was reading up on declarative partitioning[1] and I'm not sure what >> could be a possible application of Hash partitioning. >> >> Is anyone actually using it? What are typical use cases? What benefits >> does such a partitioning scheme provide? >> >> On its face, it seems that it can only give you a number of tables which >> are smaller than the un-partitioned one, but I fail to see how it would >> provide any of the potential advantages listed in the documentation. >> > > > From my point of view, hash partitioning is very useful for spreading out > high insert/update load. > Do you also assign the partitions to different tablespaces as you've hinted below or do you see performance improvement from partitioning alone? How does that work? Does it give better results than using a RAID to spread the disk IO, for example? Yes its' true you end up with more smaller tables than one big large one, > but remember the indexes are (often) tree data structures. Smaller trees > are faster than bigger trees. By making the indexes smaller they are > faster. Since the planner can knows to only examine the specific index it > needs, this ends up being a lot faster. > That sounds logical, but can it be demonstrated? If the index(es) fit in memory fully, it doesn't make a measurable difference, I guess? With hash partitioning you are not expected, in general, to end up with a small number of partitions being accessed more heavily than the rest. So your indexes will also not fit into memory. I have the feeling that using a hash function to distribute rows simply contradicts the basic assumption of when you would think of partitioning your table at all: that is to make sure the most active part of the table and indexes is small enough to be cached in memory. Regards, -- Alex
Re: When to use PARTITION BY HASH?
On Thu, Jun 4, 2020 at 4:32 PM Jeff Janes wrote: > On Wed, Jun 3, 2020 at 7:55 AM Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > With hash partitioning you are not expected, in general, to end up with a >> small number of partitions being accessed more heavily than the rest. So >> your indexes will also not fit into memory. >> >> I have the feeling that using a hash function to distribute rows simply >> contradicts the basic assumption of when you would think of partitioning >> your table at all: that is to make sure the most active part of the table >> and indexes is small enough to be cached in memory. >> > > While hash partitioning doesn't appeal to me, I think this may be overly > pessimistic. It would not be all that unusual for your customers to take > turns being highly active and less active. Especially if you do occasional > bulk loads all with the same customer_id for any given load, for example. > For a bulk load you'd likely want to go with an empty partition w/o indexes and build them later, after loading the tuples. While it might not be possible with any given partitioning scheme either, using hash partitioning most certainly precludes that. > So while you might not have a permanently hot partition, you could have > partitions which are hot in turn. Of course you could get the same benefit > (and probably better) with list or range partitioning rather than hash, but > then you have to maintain those lists or ranges when you add new customers. > Why are LRU eviction from the shared buffers and OS disk cache not good enough to handle this? This actually applies to any partitioning scheme: the hot dataset could be recognized by these caching layers. Does it not happen in practice? -- Alex
Re: When to use PARTITION BY HASH?
On Sat, Jun 6, 2020 at 6:14 PM Michel Pelletier wrote: > > Well lets take a step back here and look at the question, hash > partitioning exists in Postgres, is it useful? While I appreciate the need > to see a fact demonstrated, and generally avoiding argument by authority, > it is true that many of the very smartest database people in the world > conceived of, discussed, implemented and documented this feature for us. > It stands to reason that it is useful, or it wouldn't exist. So maybe this > is more about finding or needing better partitioning documentation. > Fair point. I've found the original commit adding this feature in version 11: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e It says: "Hash partitioning is useful when you want to partition a growing data set evenly. This can be useful to keep table sizes reasonable, which makes maintenance operations such as VACUUM faster, or to enable partition-wise join." It also includes a link to discussion, though that starts in the middle of a long thread. The original thread is here: https://www.postgresql.org/message-id/flat/20170228233313.fc14d8b6.nagata%40sraoss.co.jp However, these threads only argue about implementation details and it's not easy to find a discussion of motivation for this particular partitioning scheme support. I guess it was quite obvious to the participants at that point already. With hash partitioning you are not expected, in general, to end up with a >> small number of partitions being accessed more heavily than the rest. So >> your indexes will also not fit into memory. >> > > Indexes are not (usually) constant time structures, they take more time > the bigger they get. So partitioned indexes will be smaller, quicker to > insert into, and quicker to vacuum, and also gain possible pruning > advantages on query when you split them up. If the planner can, knowing > the key, exclude all but one partition, it won't even look at the other > tables, so if you hash partition by primary key, you reduce the search > space to 1/N immediately. > > Indexes with high update activity also suffer from a problem called "index > bloat" where spares "holes" get punched in the buckets of btree indexes > from updates and delete deletes. These holes are minimized by vacuuming > but the bigger the index gets, the harder that process is to maintain. > Smaller indexes suffer less from index bloat, and remedying the situation > is easier because you can reindex partitions independently of each other. > Your not just reducing the query load to an Nth, you're reducing the > maintenance load. > Thanks for taking your time to explain it in detail. Though I do not tend to believe the insert/scan performance benefit is measurable without trying it, I do see the benefits for maintenance. I have the feeling that using a hash function to distribute rows simply >> contradicts the basic assumption of when you would think of partitioning >> your table at all: that is to make sure the most active part of the table >> and indexes is small enough to be cached in memory. >> > > I think you might be framing this with a specific data pattern in mind, > not all data distributions have a "most active" or power law distribution > of data. > I'm just referring to the first bullet-point in the docs: "Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory." I think it does not apply to hash partitioning in the general case. -- Alex
Re: Error on insert xml
On Wed, Dec 19, 2018 at 1:38 PM Михаил Яремчук wrote: > When I insert this data in the column with type xml, I get an error "SSL > SYSCALL error: EOF detected" and the gap of all current connections. When > I insert a simpler xml, but larger everything is ok. > version: PostgreSQL 9.6.11 on x86_64-pc-linux-gnu (Ubuntu > 9.6.11-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) > 5.4.0 20160609, 64-bit > (-bugs, +general) Assuming the server is running remotely, maybe you should check link's MTU on the client? Or does the server backend crash when you run the query--did you check the server logs? Regards, -- Alex