Re: PostgreSQL upgrade from 9.4.2 to 9.6.12
This may also be of interest to you, but checking official docs as Adrian recommended is best. https://why-upgrade.depesz.com/show?from=9.4.2&to=9.6.12&keywords=
Re: Partitioning an existing table - pg10.6
I have not personally used this, but the write-up seems solid to minimize downtime to help you to shift data gradually. Be sure you understand the limitations of partitioning, particularly when you are still on 10x not yet on v11 where updates will shift a row to a new partition if the partition key is updated. https://www.depesz.com/2019/03/19/migrating-simple-table-to-partitioned-how/
Re: Too slow to create new schema and their tables, functions, triggers.
On Tue, Jul 23, 2019 at 1:36 PM Adrian Klaver wrote: > On 7/23/19 12:20 PM, PegoraroF10 wrote: > > We have in a single database 190 identical schemas. Now, when we create > a new > > one, with exactly same structure as the previous ones, it takes 20 or 30 > > minutes to finish. Usual time to finish that script was 30 seconds. Can you create 200 schemas using your script without the data load step and reproduce the issue? With 19,000 tables on that database, how aggressive is autovacuum?
Re: How to check if a field exists in NEW in trigger
As a note to the original poster, you might want to check out- https://www.postgresql.org/docs/current/citext.html
Re: slow queries on system tables
Are you autovacuuming and analyzing aggressively enough? Is there bloat or other concern for these system tables? I expect it may be a concern based on what I remember about your environment having thousands of tables and perhaps autovacuum not keeping up (re: Too slow to create new schema and their tables, functions, triggers). Can you check results for last (auto)vacuum and analyze? select * from pg_stat_all_tables where relname IN( 'pg_class', 'pg_attribute', 'pg_index' );
Re: Complex filters -> Bad row estimates -> bad query plan
If those conditions that are throwing off the stats are expected to be minimally impactful/filtering few rows, then you can use the one tried-and-true optimizer hint (aside from materialized CTEs, stylized indexes, etc) --- OFFSET 0 at the end of a sub-query. SELECT * FROM ( [your existing query without the sub-selects that are complicated and produce bad estimates] OFFSET 0 ) WHERE [your other conditions that don't produce good estimates] If there is correlation between field1 and field2, you might also look at CREATE STATISTICS assuming you are on PG 10 or 11. Before I do any of that, I would try LEFT JOIN for Table3 and Table4 then use the where conditon "AND 2 = COALESCE( Table3.Status, Table4.Status" and see if the optimizer likes that option better.
Re: Complex filters -> Bad row estimates -> bad query plan
-- I'm thinking the OFFSET 0 create an optimization barrier that prevents the planner from collapsing that sub-query into the top query, and enforces ordering in the query? That's my understanding. I think it is an optimizer hint by another name. I used to put things in a CTE (which is always materialized until v12, which will change it to inlined unless keyword MATERIALIZED is included) or I would create a temp table if the dataset is expected to contain many rows such that I can do ANALYZE pg_temp.table_table; so the optimizer has stats to make good decisions. Note- Replying to messages with a full quote of the conversation below your comment (aka top-posting) is discouraged on these mailing lists. Please quote the portion you are responding to and that's it.
Support for using alias in having clause
Why can I use an alias for group by and order by and not for having? I am just wondering if there is some barrier to it being implemented, or if it just hasn't been. select table_schema || '.' || table_name AS schema_qualified_table_name, count( column_name ) as column_count from information_schema.columns group by schema_qualified_table_name having count( column_name ) > 50 /* this works, but trying column_count > 50 does not */ order by column_count desc;
Re: How to use brin_summarize_range
What is default_statistics_target set to? Or is there custom statistics value set for this table/column? Perhaps the planner is making sub-optimal choices because it only has a vague idea about the data and the histogram is not very informative. Planning time will increase when statistics target is increased, but perhaps 250 instead of default 100 would give *enough* more information to make better choices. Or perhaps your target is already max 1 and then I have no idea why btree would be chosen. Except, if correlation is too low for the optimizer to consider BRIN to be best perhaps. What does pg_stats say about the column in question? >
Re: slow queries on system tables
Might the issue be with pg_index or pg_attribute rather than pg_class? Is the query still slow? Do you have the output of explain analyze for use on https://explain.depesz.com/ or the output of EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) for use on http://tatiyants.com/pev/#/plans/new website?
Re: slow queries on system tables
Can you manually execute vacuum analyze on all three involved tables and again share the plan for the same query? If it is significantly improved, it would seem like following the recommendation to tune autovacuum (and analyze) to be more frequent would be prudent. You haven't seemed to change from default vacuum/analyze settings despite the default parameters being suggested only for bare minimum hardware and very light use databases.
Re: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)
On Tue, Aug 27, 2019 at 9:45 PM Laurenz Albe wrote: > Holtgrewe, Manuel wrote: > > Switching off fsync leads to a drastic time improvement but still > > higher wall-clock time for four threads. > > Don't do that unless you are ready to start from scratch with a new > "initdb" in the case of a crash. > > You can do almost as good by setting "synchronous_commit = off", > and that is crash-safe. It seems like it depends on your definition of crash-safe. Data loss can occur but not data corruption, right? Do you know any ballpark for how much difference in performance it makes to turn off synchronous_commit or what type of hardware or usage it would make the biggest (or least) difference?
Re: Query using 'LIKE' returns empty set
You need the wildcard character at front and back. select * from fish_counts where stream_tribs ilike '*%*Nehalem*%*';
Re: Bad Estimate for multi tenant database queries
> > CREATE STATISTICS MT_ReservationDepartureUtc (ndistinct) on "AccountCode", > "DepartureUtc" from "Reservation"."Reservation"; > CREATE STATISTICS MT_ReservationArrivalUtc (ndistinct) on "AccountCode", > "ArrivalUtc" from "Reservation"."Reservation"; > CREATE STATISTICS MT_ReservationNoShowFeeId (ndistinct) on "AccountCode", > "NoShowFeeId" from "Reservation"."Reservation"; > CREATE STATISTICS MT_ReservationTimeSliceDefinitionId (ndistinct) on > "AccountCode", "TimeSliceDefinitionId" from "Reservation"."Reservation"; > > but that didn't help either > Did you try with 'dependencies' for the statistics_kind or only ndistinct? What was default_statistics_target set to at the time you created the extended statistics? I am not sure if that value is used, but I would assume so.
Re: literal vs dynamic partition constraint in plan execution
I think I would expect this behavior with how you have defined the constraints as the function results rather than just being strict comparisons to the timestamp field. Instead of this- Check constraints: > "y2018_mis_ora_check" CHECK (date_part('year'::text, mis_ora) = > 2018::double precision) I would expect this- Check constraints: > "y2018_mis_ora_check" CHECK mis_ora) >= make_timestamp(2018, 1, > 1, 0, 0, 0::double precision) AND < make_timestamp(2019, 1, > 1, 0, 0, 0::double precision)
Re: pg full text search very slow for Chinese characters
>My postgres instance is based on docker image postgres:11 and runs on my MacBook Pro i7 16GB. How much ram and such did you give to this vm? >To my surprise, postgres 11 is extremely slow when creating a full text index. I added a column of tsvector type and tried to create an index on that column. Pg could not finish creating a GIN index for a long time and I had to cancel the execution.I then tried to create a partial full text index for 500 rows and it took postgres 2 to 3 minutes to create the index. Did you customize any config? maintenance_work_mem specifically would be relevant to the time to create an index and default value is only 64MB. Especially if you are running a spinning hard drive and not ssd, then this could be problematic.
Re: FW: Re: FW: Re: Shouldn;t this trigger be called?
You can also look at citext type to avoid the casting. customer_keyinteger DEFAULT nextval('customer_key_serial') PRIMARY KEY , cust_no smallint NOT NULL UNIQUE , namevarchar UNIQUE , Why do you have a surrogate primary key generated by a sequence when you have a natural key of either cust_no or name? Why not just declare the customer number to be the PK? Where does customer number come from anyway? Using smallint seems potentially short-sighted on potential future growth, but changing the type later should be minimal work as long as you don't have this customer_number denormalized many places, or use it as the FKey after dropping customer_key surrogate key.
Re: problems importing from csv
>I am trying to import some data from spreadsheets. Included in the data >sets are US monetary values. These appear in the CSV file, like this: $1.00 >The column is defined like this: NUMERIC(5,2) NOT NULL. 1) remove all $ characters from csv before import OR 2) import into text field (perhaps in a temp table), remove $ characters, cast value and insert into numeric field By the way, there may be no benefit to specifying a max value of 99,999.99 if you can foresee a potential need for bigger values. For values that fit, numeric(1000,2) will store numbers in the same bytes as a numeric(5,2) field will. It just won't throw an error if large values are used in the future.
Re: Extend inner join to fetch not yet connected rows also
Just a side note, that the below can be written more simply- CASE WHEN UserSkills.craftsmanship_id IS NULL THEN FALSE ELSE TRUE as has If you want to be direct- UserSkills.craftsmanship_id IS NOT NULL as has
Re: unable to drop index because it does not exists
> > Partition key: LIST (date_part('year'::text, mis_ora)) > As an aside, you may benefit from switching to range partitioning* depending on how your queries are written. If you have conditions such as "WHERE mis_ora BETWEEN CURRENT_DATE - 30 AND CURRENT_DATE" or similar, then the fact that your partitioning is defined as a function result will mean all partitions gets scanned instead of partitioned being pruned as early as possible in the process. That's my understanding anyway. If you always include date_part( 'year', mis_ora) comparison in your where/join conditions, then you'll likely be just fine. Do as you need. *eg '01/01/2018' to '01/01/2019' for the 2018 partition since upper bound is always exclusive
Re: Autovacuum lock conflict
> - For tables that receive only INSERTs, schedule a regular VACUUM > with "cron" or similar. Unfortunately, PostgreSQL isn't very smart > about vacuuming insert-only tables. > What is the need to vacuum on an insert only table? Does that just maintain the freespace map?
Re: managing primary key conflicts while restoring data to table with existing data
> > If the data is held in common tables(bills, vouchers, etc)then the only > thing I see happening is changing the PK values to an unused value. That > could turn into a nightmare though. Not only that you lose the connection > to the original data source. If the data can be broken out into separate > tables then I could see placing them in their own schema. > > Why not have separate databases for each "restore"? They can run together on one machine still. What is the user doing with this copy of the data?
Re: Arrays and ANY problem
> > db=# select name from table_name_ds_tmp where ARRAY[categoryid] = ANY ( > select string_to_array( '200,400', ',')::bigint[] ); > Using either of the below instead, I get the proper result. Why doesn't ANY work? I do not know. select name from table_name_ds_tmp where ARRAY[categoryid] <@ ( select (string_to_array( '200,400', ','))::bigint[] ); select name from table_name_ds_tmp where categoryid = ANY ( ARRAY[ 200, 400]::BIGINT[] ); I used- drop table if exists pg_temp.table_name_ds_tmp; create temp table table_name_ds_tmp AS( SELECT 100::BIGINT AS categoryid, 'one'::VARCHAR AS name UNION ALL SELECT 200::BIGINT, 'two'::VARCHAR UNION ALL SELECT 300::BIGINT, 'three'::VARCHAR UNION ALL SELECT 400::BIGINT, 'four'::VARCHAR );
Re: Performance on JSONB select
Much of indexing strategy depends on knowing the data like how many distinct values and what the distribution is like. Is JsonBField->>'status' always set? Are those three values mentioned in this query common or rare? Can you re-write this query to avoid using an OR in the where clause? Are you just wanting to add a GIN index for the jsonb paths? Or do you want indexed like below that are a bit stylized to this query? CREATE INDEX idx_MyTable_status USING btree( JsonBField->>'status' ); CREATE INDEX idx_MyTable_descartada_date USING btree( To_Date(JsonBField->'descartada'->>'data','-mm-dd') ); CREATE INDEX idx_MyTable_contrato_date USING btree( To_Date(JsonBField->'contrato'->>'data','-mm-dd') );
Re: Query Tuning
Both of the below visualizers can help, but require some knowledge about comparing estimated vs actual row estimates, disk sorts vs in memory, etc. Drawing implications about whether your schema needs to change or just the query will take time to master as well. http://tatiyants.com/pev/#/plans/new --data only stored locally on your computer, nice for security but not great for sharing with others and getting help, since you need to share full json output (and query ideally) https://explain.depesz.com/ --URL is unique and sharable for help from others Often you need to provide much more information to get good advice. What version of Postgres are you on? What are stats like for the involved tables? What other indexes are available or might you add? What is the use of this system-- single user or thousands? Heavy writes continually, or load & analysis? Can you share the full query text and output of EXPLAIN ANALYZE on it?
Re: partitions vs indexes
"I would like to convert a table with a primary key into a partitioned setup by a column which is not part of the primary key" That isn't possible. The partition key must be contained by the primary key. That is, the primary key could be site_id, id and you can create hash partition on id or site_id but not created_on. You could drop primary key and foreign keys and implement them via trigger functions as described in this blog series, but it seems questionable- https://www.depesz.com/2018/10/31/foreign-key-to-partitioned-table-part-2/ I do not assume the restriction would be dropped in future releases. I don't know that scanning all the partitions to figure out whether the primary key is violated would be advisable. Which is what the trigger functions described in the blog post has to do, right? It might be noteworthy that partitioning with more than 10-100 partitions is MUCH faster in PG12 than PG11 (up to 4-8 thousand partitions) from testing shared by those working on that code. >
Re: Performance on JSONB select
> > My Json has always a status > Why declare the "where" clause when creating the index? It would not seem needed if status is always set and so your index will reference all rows in the table. Thanks for sharing the trick of having the second column in the index determine the key based on the first column. I don't know if/when I might need this, but an interesting solution. Generically, it seems like the below- create index idx_mytable_jsonb_dependent_fields on public.mytable ( (JsonBField->>'primary_field_to_filter_on'), JsonBField->(JsonBField->>'primary_field_to_filter_on')->>'secondary_field' );
Re: JSONB maximal length ?
> > Hi >> >>By inserting data in a JSONB type column I got the following error >> message: >> >> *>> * >> >> *ERROR: string too long to represent as jsonb string* >> >> *DETAIL: Due to an implementation restriction, jsonb strings cannot >> exceed 268435455 bytes.* >> >> *<< * >> >> could anyone confirm that there is a size limit for JSONB type fields ? >> >> Have you looked at bytea datatype? I believe this would allow up to 1GB which is the max file size so the maximum (minus some overhead) for a row. Perhaps though, storing files of unusually large size should be done outside of the database.
Re: Declarative Range Partitioning Postgres 11
No, what you want is not possible and probably won't ever be I would expect. Scanning every partition to validate the primary key isn't scalable.
Re: Declarative Range Partitioning Postgres 11
On Mon, Oct 7, 2019 at 5:56 PM Ron wrote: > On 10/7/19 6:17 PM, Michael Lewis wrote: > > No, what you want is not possible and probably won't ever be I would > expect. > > Sure it is. Maybe not the (weird) way that Postgres does partitioning, > but > the legacy RDBMS that I still occasionally maintain has for at least 25 > years had partition key independent of any indexes. > > > Scanning every partition to validate the primary key isn't scalable. > > That's only because of the way Pg implements partitioning. > I can dig that, but since this is a Postgres list and everything I have heard indicates it is not a limitation that is likely to be removed in Postgres, it seems like we are having two different discussions.
Re: Declarative Range Partitioning Postgres 11
On Tue, Oct 8, 2019 at 8:00 AM Shatamjeev Dewan wrote: > Hi Michael, > > > > In this case , I always need to include partition key(date) in primary > key ( if I have a primary key defined on non partition key column e.g id > (in my case), to make it a composite primary key (id, date). This would > allow duplicate id with different date,which is not desirable . > If you are generating the ID with a sequence, there isn't any real world likelihood of conflict, but I do understand your concern in terms of enforcing data integrity. Other than creating a custom stored procedure that functions as a primary key constraint, I don't know of any way around that. Let's take a step back... why do you think you need to partition at all? And why partition by the date/timestamp/timestamptz field? Also, from what I have seen, PG12 is when partitioning really gets performant in terms of more than 10 to 100 partitions, and you can then create FKeys to the partitioned table (not possible in PG11). Also, if your frequent access of the table is by date/timestamptz field, then you might consider a BRIN index if you have high correlation between physical storage and values in that field. That can mitigate the need for partitioning. Our organization will be waiting until next quarter to upgrade to PG12 and then partitioning a few of our largest tables. That is to say, I don't have experience with partitioning in production yet so others may chime in with better advice.
Re: SELECT returnig a constant
On Tue, Oct 15, 2019 at 8:25 AM Geoff Winkless wrote: > On Tue, 15 Oct 2019 at 14:35, Ray O'Donnell wrote: > > > > On 15/10/2019 14:28, stan wrote: > > > I used to be able to return a constant value in a SELECT statement in > > > ORACLE. I need to populate a table for testing, and I was going to do > so > > > like this: > > > > > > SELECT > > > employee.id , > > > project.proj_no , > > > work_type.type , > > > 'rate' 1 > > > FROM employee > > > CROSS JOIN project > > > CROSS JOIN work_type; > > > > > > This statement works correctly, till I add the last " 'rate' 1 line, > then it > > > returns a syntax error. > I would assume you have the value and the alias backwards and you want SELECT 1 AS "rate" Both the double quotes around the alias and the AS keyword are optional.
Re: Securing records using linux grou permissions
It sounds like you want row level security- https://www.postgresql.org/docs/9.6/ddl-rowsecurity.html But, you will need to define separate roles on the database and ensure that the users and connecting with separate roles. The db can't magically know about the permissions on the OS side. >
Re: Postgres 9.6 active-passive HA cluster
"pg_rewind is a tool for synchronizing a PostgreSQL cluster with another copy of the same cluster, after the clusters' timelines have diverged. A typical scenario is to bring an old master server back online after failover as a standby that follows the new master." -https://www.postgresql.org/docs/9.6/app-pgrewind.html >
Re: Vacuum very big table - how the full vacuum works in background/internally?
> > We must know it to avoid disk out problems, and too much off-time. > You may be interested in this extension- https://github.com/reorg/pg_repack
Re: drop database
Your plan to loop over tables and truncate them seems great if you are worried. It seems simple to verify that space is being freed as you go, and also easy to change tactics if the need arises. >
Re: Primary key definition?
> > > CREATE TABLE books ( > > id SERIAL PRIMARY KEY, > > > > Which has the advantage of not having to manually create the sequences. > Will > > this also enforce that the "internally created sequence" will be > initialized > > to a value above the maximum key in use on a pg_restore? > I think you will still run into the same issue if your sequence is not getting the proper value as max(id) or max(id) +1, not sure which is actually needed. You may get some benefits from using IDENTITY rather than the pseudo-type of serial, as described in this blog post- https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/ Still, depending on how you are doing the data restore, you may need something like this to ensure the sequence is updated. select setval( 'table_id_seq', ( select max(id) + 1 from table ) );
Re: Trying to fetch records only if preceded by at least another one
> > But it prints too many records: all moves cross-multiplied with each other. > > As if I have forgotten to add 1 more condition to the JOIN LATERAL > LIMIT 1 inside your lateral should resolve that. Personally, I'd move that condition to EXISTS condition inside WHERE clause instead. Just a style thing as far as I know.
Re: Composite type storage overhead
On Thu, Oct 24, 2019 at 3:35 AM Laiszner Tamás wrote: > Actually, this is not such a unique idea: > https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c > > Thanks for the suggestion to split up the primary key into components. But > even going down this way, packing the components into one superstructure > (composite type) would be beneficial as the same scheme is used across > multiple tables. And we are back at the original problem. > This is probably a completely naive question, but why not store this in a text field?
Re: Trying to fetch records only if preceded by at least another one
On Fri, Oct 25, 2019 at 2:20 PM Alexander Farber wrote: > Thank you Michael - > > On Fri, Oct 25, 2019 at 7:28 PM Michael Lewis wrote: > >> But it prints too many records: all moves cross-multiplied with each >>> other. >>> >>> As if I have forgotten to add 1 more condition to the JOIN LATERAL >>> >> >> LIMIT 1 inside your lateral should resolve that. Personally, I'd move >> that condition to EXISTS condition inside WHERE clause instead. Just a >> style thing as far as I know. >> > > How would you do that with EXISTS? I tried, but my query failed to compile > Ooops. I didn't notice you had selected the previous move ID. In that case, at the end of the lateral you want ORDER BY mid DESC LIMIT 1, or perhaps order by played field and you should be good to go. Particularly if you have an index on (gid, mid) then it should be very fast. If you want to quickly find these "interesting moves" and they are very rare, a partial index would be quite helpful. Highly stylized, but when it is indexing 1-5% of a huge table perhaps, it can be a very handy tool. CREATE INDEX idx_interesting_moves ON words_moves USING btree( played ) WHERE action='play' AND length(letters) = 7 and length(hand)=7;
Re: Declarative Range Partitioning Postgres 11
On Fri, Nov 1, 2019 at 9:22 AM Shatamjeev Dewan wrote: > Hi Michael, > > > > I want to create a partition by year and subpartition by month in postgres > 11 timestamp column. Please advise syntax. > https://www.postgresql.org/docs/11/ddl-partitioning.html The documentation is rather clear with examples like- CREATE TABLE measurement_y2006m02 PARTITION OF measurement FOR VALUES FROM ('2006-02-01') TO ('2006-03-01'); CREATE TABLE measurement_y2006m03 PARTITION OF measurement FOR VALUES FROM ('2006-03-01') TO ('2006-04-01'); Note- Don't partition on function results like date_part because performance will likely suffer greatly. Also note that the top end is always exclusive so the above give a continuous range for those two months. I would hesitate to partition by more than year alone before upgrading to PG v12. The speed improvements for more than 10-100 partitions (max recommended for PG11) is huge in 12.
Re: Are my autovacuum settings too aggressive for this table?
My thinking is opposite from what you have. I consider it important to very aggressive on autovacuum because it only ever does the required amount of work. If a tiny amount of work is needed, it does only that and is done. Assuming it doesn't cause I/O concerns, do it as often as possible to minimize the growth of bloat and maximize the reuse of the space already allocated to that relation. On the subject of analyze, the statistics are unlikely to be impacted significantly by inserting about 24k rows to a table with almost 20 million already. With default_statistics_target at 100, what are the chances those new rows will even be included in the sample? I don't know the math, but given each run of analyze does the same ALL the work each and every time it runs, it seems prudent to do them a little less often than autovacuum anyway. Regardless though, autoanalyze is a small amount of work that it does each time.
Re: Storing a time interval
You certainly could choose to store as tstzrange, but why not use two fields? https://www.postgresql.org/docs/current/rangetypes.html
Re: Storing a time interval
> CREATE TYPE po_dates AS ( > po_isssued_datetimestamptz, > discount_last_date timestamptz, > net_date timestamptz > ); > What advantage does combining these three values into a custom composite type give you rather than just storing directly? Are you going to reuse this po_dates type on many tables?
Re: ERROR: there is no unique constraint matching given keys for referenced table "audit_p"
On Mon, Nov 18, 2019 at 10:10 AM Shatamjeev Dewan wrote: > I am trying to create a foreign key constraint on a table : audit_param in > postgres 12 which references partitioned table audit_p. is there anyway to > get rid of this error. > > > *ERROR: there is no unique constraint matching given keys for referenced > table "audit_p"* > As far as I understand, what you want is not possible. You cannot partition on a timestamp and then foreign key to an object_id natively. You can get around this with triggers- https://www.depesz.com/2018/10/31/foreign-key-to-partitioned-table-part-2/ - but that would not be recommended. Admittedly, I have a pretty nasty head cold so it may be that someone chimes in with much better insight on your design. By the way, there are strong recommendations for using timestamp WITH TIMEZONE in nearly all cases. I just wanted to mention since timestamp without timezone is rarely the best choice.
Re: Adding LIMIT changes PostgreSQL plan from good to a bad one
I try to avoid DISTINCT and use GROUP BY when feasible, as well as avoiding OR condition. If you combined anon1 and anon2 with UNION ALL, and did (inner) join instead of left, or even moved all of that to EXISTS, perhaps that gives you better consistent performance. Something like this- SELECT contacts.id FROM contacts JOIN contacts__aggregated AS contacts__aggregated_1 ON contacts__aggregated_1.company_id = contacts.company_id AND contacts__aggregated_1.contact_id = contacts.id AND contacts__aggregated_1.field_name = 'names' WHERE contacts.company_id = '\x4c2118ad54397f271b00' AND EXISTS ( SELECT FROM contacts_values WHERE contacts_values.field_id = '\x' AND contacts_values.field_name = 'facebook' AND nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value)) = nimble_contact_value_normalize('http://www.facebook.com/jon.ferrara')) AND contacts_values.company_id = contacts.company_id AND contacts_values.id = contacts.id UNION ALL SELECT FROM contacts_values WHERE contacts_values.field_id = '\x' AND contacts_values.field_name = 'last_name' AND nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value)) = nimble_contact_value_normalize('Ferrara') AND contacts_values.company_id = contacts.company_id AND contacts_values.id = contacts.id ) ORDER BY contacts__aggregated_1.value ASC LIMIT 30 OFFSET 0;
Re: Slow planing...
On Fri, Dec 6, 2019, 3:21 AM Mladen Marinović wrote: > Is there a way to detect why the planing is taking this long? > > The database is a 9.6.1 with 32GB of shared_buffers, and 1GB of > maintanance_work_mem, and machine CPU is below 80% all the time. > What is default_statistics_target set to for your cluster or the involved tables/columns? When raised above the default 100, planning time can increase exponentially it seems, as the planner examines the stats and weighs the many options. >
Re: Query with correlated join having slow performance
I'd suggest re-writing your query to avoid ORs whenever possible. Is this generated by an ORM or subject to change with filters selected in application or can you totally control it on DB side? It may be hugely more performant to simply rewrite this as (almost) the same query twice UNION ALL'd together to separate the productalt1_.alt_pdt_dbky OR productalt1_.orgnl_pdt_dbky conditions.
Re: Counting number of sites with same number of sampling dates
On Wed, Dec 11, 2019 at 1:54 PM Rich Shepard wrote: > A sampling location table has 28 distinct sites, each site being sampled > from 1 to 67 times. I'm trying to obtain the number of sites having 1 > sample, 2 samples, ... 67 samples and am not seeing the solution despite > several alternative queries. > > The query, > > select site_nbr, count(distinct sampdate) from wrb_hg_cong group by > site_nbr order by site_nbr; > > returns the number of times each site has been sampled, for example: > > site_nbr | count > --+--- > 10332|11 > 10335| 1 > 10339| 2 > 10340| 1 > 10342| 4 > 10344|18 > 10347| 2 > 10348| 2 > 10350| 2 > 10351| 1 > 10355|14 > etc. > > I want the number of sites for each number of samples (e.g., how many sites > with one sampdate, 2 sampdates, etc.). I cannot group by an aggregate such > as count(distinct sampdate) yet I'm confident the syntax is simple and I'm > not seeing how to get the number in each group. > Put what you have in a subquery and group/aggregate again. select sample_count, count( site_nbr ) FROM ( select site_nbr, count(distinct sampdate) AS sample_count from wrb_hg_cong group by site_nbr order by site_nbr ) sub group by sample_count;
Re: Race condition while creating a new partition
It looks like you are creating a partition for each minute of the day (when logs get inserted for a given minute at least). Would it be at all reasonable to have an hourly or daily job which creates the partitions ahead of when they are actually needed? If partitions went unused in the recent past, you could also drop those if appropriate to your workflow. I just wonder about doing the work just-in-time via 2+ connections vs doing it from a single process to avoid the chance of deadlock completely.
Re: Partitioned tables and locks
I don't recall the details, but I know v12 included significant enhancements to lock relations later in the process such that when targeting relatively few of the partitions, it can be a major performance boost.
Re: A JOIN question
> > LEFT join mfg_vendor_relationship on > mfg_vendor_relationship.mfg_key = mfg_part.mfg_key > AND > mfg_vendor_relationship.project_key = bom_item.project_key > LEFT join vendor on > mfg_vendor_relationship.vendor_key = vendor.vendor_key > Perhaps I am missing something, but it seems like just a matter of changing inner join to left so you keep what you already have and augment vendor information when it exists. I never use right joins (I re-write to always declare as left) so I am not sure if the right join near the top screws that up. I might consider using a UNION ALL to combine sets where values are NULL with left joins, and use plain (inner) joins for the not nulls.
Re: How can I set a timeout for a locked table in Function ?
I would not think that behavior would extend to lock_timeout based on the explanation on stackexchange. I would assume that the potentially long runtime in this function is mostly in acquiring the lock and not doing the update given the implied primary key in the where clause, so perhaps lock_timeout would fit the need. Or perhaps this is a much-simplified example and the real problem is not apparent. Why take an exclusive lock on an entire table to update a single row? What is this locks table for? Would advisory locks be the proper solution to the root problem perhaps? Just throwing things out there since context was lacking in the original question. https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-LOCK-TIMEOUT >
Re: UPDATE many records
> > I’m thinking it might be worth it to do a “quick” test on 1,000 or so > records (or whatever number can run in a minute or so), watching the > processor utilization as it runs. That should give me a better feel for > where the bottlenecks may be, and how long the entire update process would > take. I’m assuming, of course, that the total time would scale more or less > linearly with the number of records. > I think that depends on how your identify and limit the update to those 1000 records. If it is using a primary key with specific keys in an array, probably close to linear increase because the where clause isn't impactful to the overall execution time. If you write a sub-query that is slow, then you would need to exclude that from the time. You can always run explain analyze on the update and rollback rather than commit.
Re: UPDATE many records
On Mon, Jan 6, 2020 at 2:34 PM Mark Zellers wrote: > Just out of curiosity, what kind of trigger are you using, a row level > trigger or a statement level trigger? If you are using a row level > trigger, see if you can achieve your requirements using a statement level > trigger instead. I’m relatively new to Postgres, so there could be some > limit that I’m not aware of, but my understanding is that you have access > to the old and new values of the updated rows in the after statement > trigger. It would likely be much more performant to do your operation once > after the statement is done rather than firing a trigger on every changed > row. > My experience/understanding is that statement level triggers can be a big performance boost, but only for changing *other* tables and not the table that the trigger is on since it is *AFTER* only and can't modify NEW record directly.
Re: Upgrade PostgreSQL 9.6 to 10.6
On Wed, Jan 8, 2020 at 7:20 AM github kran wrote: > Sorry for the confusion, I tried to install this extension on 9.6 and it > seems to be not working. Does 9.6 PostGreSQL supports logical replication ? > No. See the top of this page with supported versions listed. Prior to v10, pg_logical extension was often used. https://www.postgresql.org/docs/current/logical-replication.html
Re: Upgrade PostgreSQL 9.6 to 10.6
On Wed, Jan 8, 2020 at 8:52 PM github kran wrote: > You are right on RDS but I believe the problem is on Aurora PostgreSQL > where the pglogical throws an error during installation. Are you aware if > this works on Aurora PostGreSQL > It seems like this question should be sent to AWS support for Aurora if you are wanting to upgrade an Aurora 9.6 instance to 10x also in Aurora.
Re: partitioned table
> > when I manually mocking the data into both tables are fine and when I run > the procedure, I get errorcode: 42P10 MSG: there is no unique or exclusion > constraint matching on the CONFLICT specification > > the procedure is > ... > INSERT INTO ecisdrdm.bnft_curr_fact AS prod (bnft_fact_id, > bene_cntry_of_brth_id, bene_cntry_of_rsdc_id, > bene_cntry_of_ctznshp_id, frm_id, svc_ctr_id, actn_dt_in_id, > actn_tm_in_id, src_sys_id, > bnft_hist_actn_id, bene_id, bene_end_dt_id, petnr_app_id, atty_id, > uscis_emp_id, application_id, > rmtr_id, prpr_id, mig_filename) > SELECT stg.bnft_fact_id, stg.bene_cntry_of_brth_id, > stg.bene_cntry_of_rsdc_id, > stg.bene_cntry_of_ctznshp_id, stg.frm_id, stg.svc_ctr_id, > stg.actn_dt_in_id, stg.actn_tm_in_id, stg.src_sys_id, > stg.bnft_hist_actn_id, stg.bene_id, stg.bene_end_dt_id, stg.petnr_app_id, > stg.atty_id, stg.uscis_emp_id, stg.application_id, > stg.rmtr_id, stg.prpr_id, stg.mig_filename > FROM ecisdrdm.stg_bnft_curr_fact stg > ON CONFLICT ("bnft_fact_id") DO UPDATE > SET (bnft_fact_id, bene_cntry_of_brth_id, bene_cntry_of_rsdc_id, > The documentation and the error message explain the issue. "there is no unique or exclusion constraint matching on the CONFLICT specification" "The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error." -https://www.postgresql.org/docs/current/sql-insert.html You have an index, but it is not unique. With partitioning, you cannot create a unique index on a column that is not contained by your partition key. So, you need to re-write to skip the use of ON CONFLICT I expect.
Re: Multiple Aggregations Order
"handle this aggregated data later in code" What is your end goal though? Also, approx how many rows in these tables? Can you share an example query and plan? What version are you using? >
Re: Problem with SqlState=23505 when inserting rows
On Wed, Jan 15, 2020 at 9:41 AM Werner Kuhnle wrote: > I've tried to using the newer definition: > id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY > but that does not solve the problem. > > Is there a way to define id columns to that when the database provides > values, > it recognizes already existing values avoiding conflicts. > You'll need to run something like the below to set the next value to the max current value. You'll just have to figure out the name of the sequence that is automatically created whether you use the pseudo type serial, or the newer IDENTITY option. Both are implemented with a sequence. --set sequence to max ID on a table select setval( 'table_name_id_seq', ( select max(id) + 1 from table_name ) );
Re: temporary data after diskspace error
In terms of preventing this happening again, you might consider setting some reasonable temp_file_limit as default. Alternatively or additionally, you can set up another volume and direct temp files to use that to avoid the server crashing if excessive temp files are used in a moment's time. Then only that one query fails that used too much space. It always bears mentioning that 9.4 is very old and upgrading is recommended to at least v11 that has been out for more than a year. "PostgreSQL 9.4 will stop receiving fixes on February 13, 2020, which is the next planned cumulative update release." --https://www.postgresql.org/about/news/1994/
Re: Jsonb first level keys statistic
Other than creating a functional index on the keys that you most want to access and include that function call in your query, I don't know that you can. Functional indexes get statistics while partial indexes do not. If a GIN index using path operator collected stats on the frequency of the keys, then you would be all set I think. Perhaps that is a feature that could be implemented. Others will chime in I expect, but I tend to avoid using jsonb for any values that I need to use to significant filter a result set (ON and WHERE clauses).
Re: Exclude constraint on ranges : commutative containment : allow only complete containment
If the new value does not overlap with any existing, allow. If it does overlap, then it must be fully contained by the existing element, or the existing element must be fully contained by it. Else, reject. Is that right?
Re: sensible configuration of max_connections
On Fri, Feb 7, 2020 at 6:29 AM Justin wrote: > WorkMem is the biggest consumer of resources lets say its set to 5 megs > per connection at 1000 connections that 5,000 megs that can be allocated. > Clarification- work_mem is used per operation (sort, hash, etc) and could be many many times with a complicated query, and/or parallel processing enabled. It could be that a single connection uses 10x work_mem or more. https://www.postgresql.org/docs/current/runtime-config-resource.html
Re: Avoiding out of date statistics / planner
It may also be worth noting that it is possible to make autovacuum/analyze more aggressive, perhaps only on the tables that see large changes in data that might result in a statistics issue. If you could share a query, explain analyze output, and pseudo code or at least description of what sort of bulk operations are being done, then more insight could be offered.
Re: pg_upgrade —link does it remove table bloat
There is more than one type of statistics though. Stats on the distribution of data is easily recreated with analyze table_name or analyzing the whole database. What about the stats on how many rows have been inserted or updated since the last (auto)vacuum and that will be used to trigger autovacuum? Are those set back to zero by an upgrade? I would assume usage counts like how many times an index scan has been done would be reset, but if the numbers in pg_stat_user_tables like n_tup_upd or n_tup_del are zero'd out during an upgrade, than it would seem like a manual vacuum would always be a good idea to ensure a table wasn't 99% of the way to needing one and then the stats got reset by upgrading.
Re: Error “cache lookup failed for function”
Given the explicit begin before drop/create, this is a bit of an unexpected gotcha to have any impact on other transactions. Are there other known limitations of DDL in a transaction?
Re: Logical replication lag in seconds
I am very interested in this discussion. We settled a table with a single timestamp field that a script updates every minute with NOW() so that we can check the timestamp of that table on the replica, assuming the clocks are synced, then we will be able to compute the lag.
Re: Reset DB stats suggestion pg_stat_reset()
> > We have seen some deadlocks and tempfile count in pg_stat_database view. > We are trying to reset the stats. > Can we use pg_stat_reset() function to reset these stats without any > impact stats of databases. > Please advise the process to reset the stats. > Yes, you could reset stats every month or week or whatever you decide makes sense so you have an understanding of the timeline. Else, knowing that an index has been used X times since who-knows-when is not a very useful piece of information. I have occasionally reset the stats on all indexes on a table when adding or removing an index from that table so that I have an accurate idea of whether index_xyz or index_abc gets used. You can take a snapshot of the stats on the table occasionally and compare prior count to new count if you prefer. I find reset to be cleaner.
Re: Connections dropping while using Postgres backend DB with Ejabberd
work_mem can be used many times per connection given it is per sort, hash, or other operations and as mentioned that can be multiplied if the query is handled with parallel workers. I am guessing the server has 16GB memory total given shared_buffers and effective_cache_size, and a more reasonable work_mem setting might be on the order of 32-64MB. Depending on the type of work being done and how quickly the application releases the db connection once it is done, max connections might be on the order of 4-20x the number of cores I would expect. If more simultaneous users need to be serviced, a connection pooler like pgbouncer or pgpool will allow those connections to be re-used quickly. These numbers are generalizations based on my experience. Others with more experience may have different configurations to recommend. >
Re: information_schema performance in Postgres 12
Have you tried with JIT turned off in PG12? The long running node is ProjectSet type which is related to set returning functions. If that is getting evaluated differently in PG12, that may be issue.
Re: Force WAL cleanup on running instance
I don't know the answer to your stated question. I am curious if you have set wal_level = minimal and if not, if that would be appropriate for your use case and might render your concern a non-issue.
Re: Web users as database users?
On Fri, Sep 20, 2019 at 8:19 AM Tom Lane wrote: > There is a restriction on how many distinct GRANTs you can > issue against any one object --- performance will get bad if the ACL > list gets too large. > Any ballpark numbers here? Are we talking 50 or 8000?
Re: Fwd: PG12 autovac issues
Do you have default fillfactor set on this table? If not, I would wonder if reducing it to 50% or even 20% would allow many more HOT updates that would reduce bloat. Also, is there any period of lower activity on your system that you could schedule a vacuum freeze for daily or weekly? I believe having frozen pages would also mean all the autovacuums would be able to skip more pages and therefore be faster. >> autovacuum_vacuum_cost_delay = 20 This was changed to 2ms in PG12. You should reduce that most likely.
Re: Join help, please
work_type.work_type_key = work_type.work_type_key You've done a cross join.
Re: Join help, please
On Wed, Mar 18, 2020, 12:55 PM Michael Lewis wrote: > work_type.work_type_key = work_type.work_type_key > > You've done a cross join. > You meant to do permitted_work.work_type_key = work_type.work_type_key I expect >
Re: Fwd: PG12 autovac issues
On Thu, Mar 19, 2020 at 9:31 AM Justin King wrote: > On Wed, Mar 18, 2020 at 1:40 PM Michael Lewis wrote: > > > > Do you have default fillfactor set on this table? If not, I would wonder > if reducing it to 50% or even 20% would allow many more HOT updates that > would reduce bloat. > > I don't believe we have a default fillfactor, but I'm still trying to > understand why autovacs would completely stop -- that seems like a > bug. Especially since there was no change between PG10 and PG12 and > this problem never existed there. > Is there any reason to not schedule vacuum freeze for each db daily? Just curious.
Re: Could postgres12 support millions of sequences? (like 10 million)
On Thu, Mar 19, 2020, 5:48 PM David G. Johnston wrote: > However, one other consideration with sequences: do you care that > PostgreSQL will cache/pin (i.e., no release) every single sequence you > touch for the lifetime of the session? (I do not think DISCARD matters here > but I'm just guessing) > Would you expand on this point or is there someplace specific in the documentation on this? >
Re: PG12 autovac issues
> > We haven't isolated *which* table it is blocked on (assuming it is), > but all autovac's cease running until we manually intervene. > > When we get into this state again, is there some other information > (other than what is in pg_stat_statement or pg_stat_activity) that > would be useful for folks here to help understand what is going on? https://www.postgresql.org/docs/current/progress-reporting.html#VACUUM-PROGRESS-REPORTING
Re: Runtime partition pruning
> > select * from test where id between client_id-10 and client_id+10 and > client_id=?; > > does not (it scans all partitions in parallel) . > Is it expected? > Yes. But the below would work fine I expect since the planner would know a constant range for id. I would be very surprised if the optimizer had some condition rewrite rules to handle just the scenario you show. select * from test where id between ?-10 and ?+10 and client_id=?;
Re: PG 12: Partitioning across a FDW?
Chris, Does it actually need to be a different server and database, or would it be possible to have another storage device added to your existing database and make use of tablespaces to accomplish pseudo-archive of older partitions? Just a thought. >
Re: Out of memory in big transactions after upgrade to 12.2
If you didn't turn it off, you have parallel workers on by default with v12. If work_mem is set high, memory use may be much higher as each node in a complex plan could end up executing in parallel. Also, do you use a connection pooler such as pgbouncer or pgpool? What is max_connections set to? >
Re: Logical replication
There is nothing native to compute this. This was asked a month or so ago. The best the poster came up with was a regular query on master DB which updates a timestamptz field. Assuming master and slave have clocks in sync, it is simple to compute the lag from that. >
Re: GENERATED STORED columns and table rewrites?
Yes, the system will do a full table rewrite to compute the value and store it. Unfortunately, I believe it is an access exclusive lock during that entire time.
Re: Recursive Queries
You don't want recursion, you want pivot table (Excel) behavior to reformat rows into columns. The easiest way to get this data in its raw form would be to group by date and hour of day and compute the count. If you have the option to add extensions in your environment, then you should be able to pivot your data pretty simply. https://www.postgresql.org/docs/12/tablefunc.html
Re: possibilities for SQL optimization
> > My other thought was to range partition by pixelID + brin index. >>> I would expect brin index to be INSTEAD of partitioning. You didn't share buffer hits, which I expect were 100% on the subsequent explain analyze runs, but the index scan may still be faster if the planner knows it only needs to scan a few small indexes on one, or a few, partitions. What sort of growth do you see on this table? Is future scalability a significant concern, or is the problem just that 40-300ms for this select is unacceptable? https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR Have you tuned effective_io_concurrency? The documentation says "this setting only affects bitmap heap scans" and nearly all the time is there. If it is still set to 1 as default, then increasing to 200 or perhaps more might be prudent when on SSD or other memory backed storage. You don't even need to change the server config defaults for testing the impact- set effective_io_concurrency = 200; /* select query */ reset effective_io_concurrency; /* if doing other things in the same session and wanting to revert to default behavior, else just disconnect */
Re: how to slow down parts of Pg
You say 12.2 is in testing but what are you using now? Have you tuned configs much? Would you be able to implement partitioning such that your deletes become truncates or simply a detaching of the old partition? Generally if you are doing a vacuum full, you perhaps need to tune autovacuum to be more aggressive. Consider pg_repack at least to avoid taking an exclusive lock for the entire duration. If partitioning is not an option, could you delete old records hourly rather than daily? >
Re: how to slow down parts of Pg
Reviewing pg_stat_user_tables will give you an idea of how often autovacuum is cleaning up those tables that "need" that vacuum full on a quarterly basis. You can tune individual tables to have a lower threshold ratio of dead tuples so the system isn't waiting until you have 20% dead rows before vacuuming a table with millions of rows that occupies a GB or more on disk. You might consider changing your nightly analyze to a nightly vacuum analyze, at least for the tables you know can be problematic. The more dense a table is packed, the better cache_hits and other such metrics. Like making dinner, cleanup as you go. One thing that I think is interesting is that the default cost_delay has been updated with PG12 from 20ms down to 2ms such that all things being equal, much much more work is done by autovacuum in a given second. It may be worth taking a look at. Another great thing coming to you in PG12 is the option to do reindex concurrently. Then there's no need for pg_repack on indexes. Good luck sir.
Re: pg11: Query using index, but only for half the partitions
What do the statistics look like for an example table that the index I used vs not? Is ((instance_id)::text = 'test01'::text) rare for the tables where an index scan is happening and common for the tables where a sequential scan is chosen? How many rows in these tables generally?
Re: relationship of backend_start, query_start, state_change
If you use a connection pooler, this would likely be expected behavior since the connection is getting reused many times. Else, some app is connected and not closing their connection between queries. At least they aren't idle in transaction though.
Re: Abnormal Growth of Index Size - Index Size 3x large than table size.
Indexes larger than the table may be expected if there are many. It may be prudent to check if they are being used in pg_stat_all_indexes. If there are just a few indexes that are becoming bloated quickly, you'd want to ensure your autovacuum settings are tuned more aggressively, and consider lowering FILLFACTOR on the table to better support heap-only tuples (HOT) updates such that the index isn't touched when other columns are updated in the table. If you are on PG12, you can reindex concurrently assuming you have the space. If not, you can do the same manually like below- CREATE INDEX CONCURRENTLY idx_new... DROP INDEX CONCURRENTLY idx_old... ALTER INDEX idx_new... RENAME TO idx_old...
Re: Postgres 11.6-2 to 11.7 worth the upgrade?
https://www.postgresql.org/docs/release/11.7/ It doesn't seem like it. Always best to run the most current minor version though.
Re: wal_sender_timeout default
https://www.postgresql.org/docs/11/runtime-config-replication.html >
Re: Abnormal Growth of Index Size - Index Size 3x large than table size.
On Thu, May 7, 2020 at 8:50 AM Ashish Chugh < ashish.ch...@lavainternational.in> wrote > To improve performance and release index space from database, We are > running FULL Vacuum on monthly basis. > > On PostgreSQL website it is not recommended to run FULL Vacuum on > Production Database and this also requires long downtime along with huge > log space requirement. > > > > What are the recommendations regarding vacuum. Can we run FULL Vacuum on > monthly basis or we should be running Online Auto Vacuum instead. > Autovacuum should be tuned to be more aggressive if it is not keeping up. Never turn it off. Decrease scale factor and cost_delay to get more throughput. Perhaps increase the number of workers, particularly if there are multiple databases in the cluster. Be aware that the cost limit is shared among workers so that cost limit may need to be increased when increasing workers or else you will be working on more tables concurrently, but not getting anymore work done in total per minute. Consider customizing parameters on very large tables (100 million rows or more?) to have a smaller scale factor than your new default even. Your goal should be to reach a "steady state" with rows being removed, that space marked as re-usable by autovacuum, and then the new updates/inserts using that space. If you are oscillating between 1GB and 10GB for storing a table as it bloats and then vacuum full is done periodically, then you are doing things wrong. If it hurts to clean up, do it more often and a little at a time. Oh, and an old blog post I read mentioned that autovacuum reserves the full maintenance_work_mem at the start. I don't know if that is changed (fixed) now, but I like to have maintenance_work_mem high for index creation and such, but set autovacuum_work_mem to be lower such that perhaps it has to re-scan some large indexes multiple times to finish its work, but I'm not constantly holding large amounts of memory when doing vacuum on smaller tables.
Re: AutoVacuum and growing transaction XID's
It is trying to do a vacuum freeze. Do you have autovacuum turned off? Any settings changed from default related to autovacuum? https://www.postgresql.org/docs/9.6/routine-vacuuming.html Read 24.1.5. Preventing Transaction ID Wraparound Failures These may also be of help- https://info.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql https://www.2ndquadrant.com/en/blog/managing-freezing/ Note that you need to ensure the server gets caught up, or you risk being locked out to prevent data corruption.
Re: AutoVacuum and growing transaction XID's
autovacuum_naptime being only 5 seconds seems too frequent. A lock_timeout might be 1-5 seconds depending on your system. Usually, DDL can fail and wait a little time rather than lock the table for minutes and have all reads back up behind the DDL. Given you have autovacuum_vacuum_cost_limit set to unlimited (seems very odd), I'm not sure a manual vacuum freeze command on the tables with high age would perform differently. Still, issuing a vacuum freeze and then killing the autovacuum process might be worth trying.
Re: Hash partitioning, what function is used to compute the hash?
The documentation shows it is just a modulus operation. If you partition on object_key % 3 then you will create three partitions for remainder values 0-2 for instance. Afaik, hash partition doesn't have real world expected use cases just yet. List or range is probably what you want to use.
Re: Hash partitioning, what function is used to compute the hash?
On Mon, May 11, 2020 at 3:13 PM Alvaro Herrera wrote: > On 2020-May-11, Michael Lewis wrote: > > > Afaik, hash partition doesn't have real world expected use cases just > yet. > > I don't think I agree with this assertion. > I didn't mean to be critical at all, or even make a statement of fact. Just sharing my impression. I typically view partitioning from the perspective of multi-tenancy and with the restrictions on primary keys & partition keys, I can't typically use partitioning except for audit logging tables and then range partitions make the most sense there because of doing backups and dropping the oldest data. Perhaps it is just that hash has never been the right tool for my use cases. I'd love to know some real life examples of when hash partitioning was the best option.
Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?
On Thu, May 14, 2020 at 2:20 PM Eduard Rozenberg wrote: > I did verify postgresql.conf has always been properly configured re: > autovacuum: 'autovacuum = on'and 'track_counts = on' > This may be insufficient to keep up if you have large tables. The default scale factor allows for 20% of the rows to be dead before the autovacuum will kick in to mark the space as available for reuse. Assuming you have the I/O capacity and prefer to do a little cleanup more often rather than HUGE cleanup work all at once on rare occasions, it may be ideal to look at turning down the autovacuum_vacuum_scale_factor. You can tweak these settings on large tables only, or increase the autovacuum_vacuum_threshold at the same time to compensate a bit for decreasing the scale factor. You can also look at pg_stat_activity for autovacuums, and if you see that some are running for hours, then probably they are trying to do too much work all at once and waiting too long before tidying up. Also, the default autovacuum_vacuum_cost_delay was changed from 20ms to 2ms with PG12 so that may be worth considering as a best practice even on older versions.
Re: Bug on version 12 ?
Just wonder, have you compared these on the two servers? select * from pg_settings where name = 'DateStyle';