Re: A limit clause can cause a poor index choice

2020-05-19 Thread Michael Lewis
What does pg_stats say about column customer_id? Specifically, how many ndistinct, and what is the sum of the most common values? If you have 1000 distinct customer_id values, and the (default 100) most common values only cover 2% of the total rows, then the optimizer will assume that any given cus

Re: Table partitioning for cloud service?

2020-05-21 Thread Michael Lewis
On Thu, May 21, 2020 at 11:41 AM Adam Brusselback wrote: > As an optimization I just worked on for my database earlier this week, I > decided to logically replicate that table from my main authentication > database into a each cluster, and I replaced all references to the FDW for > read-only quer

Re: Request to help on GIS Query improvement suggestion.

2020-05-22 Thread Michael Lewis
Your indexes and operators are not compatible. You have added a btree index on md5 function result and are not using md5 in your query, and also using LIKE operator not one of the supported ones. I believe it might use a btree operator (plain value, not md5 result) if you are always searching for "

Re: Suggestion to improve query performance of data validation in proc.

2020-05-22 Thread Michael Lewis
On Fri, May 22, 2020 at 2:09 AM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Friday, May 22, 2020, postgann2020 s wrote: > > >> >> We are looking for a better query than "*SELECT 1 FROM >> schema.table_name WHERE column1=structure_id1*" this query for data >> validation. >> > If

Re: Inaccurate (sometimes wildly so) row estimates for simple join

2020-05-22 Thread Michael Lewis
On Fri, May 22, 2020 at 7:27 AM Greg Nolle wrote: > The crux seems to be that test_b does not have an even distribution for > a_id values: it only has records for two of the values in the referenced > table. This is how our real dataset is too and isn’t something we can > really change. > How do

Re: query, probably needs window functions

2020-05-22 Thread Michael Lewis
I believe something like this is what you want. You might be able to do it without a sub-query by comparing the current name value to the lag value and null it out if it's the same. select case when row_number = 1 then id end AS id, case when row_number = 1 then name end as name, phone.number from

Re: Index no longer being used, destroying and recreating it restores use.

2020-06-05 Thread Michael Lewis
Those row estimates are pretty far off. Standard indexes and partial indexes don't get custom statistics created on them, but functional indexes do. I wonder if a small function needs_backup( shouldbebackedup, backupperformed ) and an index created on that function would nicely alleviate the pain.

Re: Index no longer being used, destroying and recreating it restores use.

2020-06-08 Thread Michael Lewis
On Mon, Jun 8, 2020 at 2:34 PM Koen De Groote wrote: > So, this query: > > select * from item where shouldbebackedup=true and > itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order by > filepath asc, id asc limit 100 offset 10400; > > Was made into a function: > > create or repl

Re: Planner misestimation for JOIN with VARCHAR

2020-06-09 Thread Michael Lewis
On Tue, Jun 9, 2020 at 12:34 PM Sebastian Dressler wrote: > - Add an index on top of the whole PK > - Add indexes onto other columns trying to help the JOIN > - Add additional statistics on two related columns > > Another idea I had was to make use of generated columns and hash the PKs > together

Re: Postgres server 12.2 crash with process exited abnormally and possibly corrupted shared memory

2020-06-09 Thread Michael Lewis
On Tue, Jun 9, 2020 at 8:35 AM Ishan Joshi wrote: > I have using postgresql server v12.2 on CentOS Linux release 7.3.1611 > (Core). > > > > My application is working fine with non partition tables but recently we > are trying to adopt partition table on few of application tables. > > So we have

Re: Planner misestimation for JOIN with VARCHAR

2020-06-09 Thread Michael Lewis
> > the join selectivity functions have yet to learn about extended statistics. > That is very interesting to me. So, extended statistics would help to properly estimate the result set coming out of a single table when comparing each of those columns to one or many values, but not when joining up

Re: Logical replication stuck in catchup state

2020-06-09 Thread Michael Lewis
I don't know if it would be relevant to this problem, but you are missing almost 1 full year of bug fixes. 11.4 was released on 20 June last year. Upgrading minor versions asap is recommended. I do see this in the release notes from 11.8 last month ( https://www.postgresql.org/docs/release/11.8/)-

Re: Postgres server 12.2 crash with process exited abnormally and possibly corrupted shared memory

2020-06-10 Thread Michael Lewis
On Wed, Jun 10, 2020 at 12:05 AM Ishan Joshi wrote: > How many rows did these tables have before partitioning? à We starts > test with 0 rows in partition table. > Partitions are far from free and pruning is great but not guaranteed. How many total rows do you currently have or foresee hav

Re: Index no longer being used, destroying and recreating it restores use.

2020-06-15 Thread Michael Lewis
On Tue, Jun 9, 2020 at 6:24 AM Koen De Groote wrote: > Right. In that case, the function I ended up with is this: > > create or replace function still_needs_backup(bool, bool) > returns BOOLEAN as $$ > BEGIN > PERFORM 1 from item where shouldbebackedup=$1 and backupperformed=$2; > IF FO

Re: pg_repack: WARNING: relation must have a primary key or not-null unique keys

2020-06-16 Thread Michael Lewis
On Tue, Jun 16, 2020, 4:52 AM Eugene Pazhitnov wrote: > xbox=> \d herostat >Table "public.herostat" > Indexes: > "herostat_pkey" PRIMARY KEY, btree (xuid, titleid, heroid) INCLUDE > (valfloat) > > WARNING: relation "public.herostat" must have a primary key or not-null > un

Re: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked

2020-06-16 Thread Michael Lewis
On Tue, Jun 16, 2020 at 10:01 AM Jim Hurne wrote: > Other than the increasing elapsed times for the autovacuum, we don't see > any other indication in the logs of a problem (no error messages, etc). > > We're currently using PostgreSQL version 10.10. Our service is JVM-based > and we're using the

Re: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked

2020-06-16 Thread Michael Lewis
On Tue, Jun 16, 2020 at 1:45 PM Jim Hurne wrote: > Thanks Michael, > > Here are our current autovacuum settings: > > autovacuum | on > autovacuum_analyze_scale_factor | 0.1 > autovacuum_analyze_threshold| 50 > autovacuum_freeze_max_age | 2000

Re: Importing a Large .ndjson file

2020-06-18 Thread Michael Lewis
> > I spoke too soon. While this worked fine when there were no indexes > and finished within 10 minutes, with GIN index on the jsonb column, it > is taking hours and still not completing. > It is always recommended to create indexes AFTER loading data. Sometimes it can be faster to drop all index

Re: DISTINCT on jsonb fields and Indexes

2020-06-22 Thread Michael Lewis
On Sun, Jun 21, 2020 at 10:43 PM Sankar P wrote: > I have a table with the schema: > > CREATE TABLE fluent (id BIGSERIAL, record JSONB); > > Then I created a couple of indexes: > 1) CREATE INDEX idx_records ON fluent USING GIN (record); > What about using non-default jsonb_path_ops? > 2) CREA

Re: DISTINCT on jsonb fields and Indexes

2020-06-22 Thread Michael Lewis
> > In the example of "select distinct expression", the planner will never > notice > that that expression has anything to do with an index. > Thanks for that explanation. I assume re-writing as a 'group by' would have no bearing on that planner decision.

Re: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked

2020-06-22 Thread Michael Lewis
On Tue, Jun 16, 2020 at 2:41 PM Michael Lewis wrote: > On Tue, Jun 16, 2020 at 1:45 PM Jim Hurne wrote: > >> Thanks Michael, >> >> Here are our current autovacuum settings: >> >> autovacuum | on >>

Re: n_distinct off by a factor of 1000

2020-06-23 Thread Michael Lewis
> > > > On 23/06/2020 14:42, Klaudie Willis wrote: > > > > > > > I got my first hint of why this problem occurs when I looked at the > > > > statistics. For the column in question, "instrument_ref" the > > > > statistics claimed it to be: > > > > The default_statistics_target=500, and analyze has

Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Michael Lewis
> > >Are you updating *every* row in the table? >> >> No I am using an update like so: UPDATE members SET regdate='2038-01-18' >> WHERE regdate='2020-07-07' >> >> DB=# select count(*) from members where regdate = '2020-07-07'; >> >> count >> >> -- >> >> 17333090 >> >> (1 row) >> >> Just u

Re: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked

2020-06-23 Thread Michael Lewis
On Tue, Jun 23, 2020 at 2:34 PM Jim Hurne wrote: > Sure! Below are more of the details from the same set of logs. Looking at > them myself, I see that there is always some percentage of tuples that are > dead but are not yet removable. And that number increases on every vacuum, > which might expl

Re: Persistent Connections

2020-06-23 Thread Michael Lewis
On Tue, Jun 23, 2020 at 2:29 PM Bee.Lists wrote: > I have an issue with a server (v10) that’s seeing increasing connections > until it’s maxxed-out. > > max_connections for my 4-core server is set to 12. > > I’ve installed pg_stat_activity and pg_stat_statements. > Do you see anything in pg_stat

Re: SQL delete and update at the same time

2020-06-24 Thread Michael Lewis
> > But how can I set the edited vale = 1 on the objects (line segments) that > are not deleted (in the current buffer) at the same time so it won’t be > deleted in the next run with an adjacent buffer? > You might want to create a temporary table to hold unique identifiers of all records that you

Re: n_distinct off by a factor of 1000

2020-06-24 Thread Michael Lewis
On Wed, Jun 24, 2020, 2:35 PM Peter J. Holzer wrote: > Yes, estimating the number of distinct values from a relatively small > sample is hard when you don't know the underlying distribution. It might > be possible to analyze the sample to find the distribution and get a > better estimate. But I'm

Re: n_distinct off by a factor of 1000

2020-06-25 Thread Michael Lewis
On Thu, Jun 25, 2020 at 7:27 AM Pavel Luzanov wrote: > I have tried to increase the statistics target to 5000, and it helps, but > it reduces the error to 100X. Still crazy high. > > > As far as I know, increasing default_statistics_target will not help. [1] > > I have considered these fixes: >

Re: SQL delete and update at the same time

2020-06-25 Thread Michael Lewis
Sorry, I don't know much about postgis at all. I assume you meant to have THEN 1 in your update statement as well. I notice b.fid=l.fid and NOT b.fid=l.fid in the two clauses. How about separate update statements? UPDATE linesegments l set edited = 1 WHERE l.gid IN (SELECT li.gid FROM linesegmen

Re: Logical Replication Issue

2020-06-30 Thread Michael Lewis
Per the release notes, there are some enhancements to logical replication that came after 11.5 like 11.8 particularly related to replication identity full. Do you have a primary key or unique index that is being used for the replication identity?

Re: Different results from identical matviews

2020-07-02 Thread Michael Lewis
Is now() computed at the time the view is defined and not at refresh? If this were a function, I would be more suspicious of that but a plain view, surely not. I hope. >

Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread Michael Lewis
On Mon, Jul 6, 2020 at 5:37 AM Robins Tharakan wrote: > This need came up while reviewing generated SQL, where the need was to > return true when > at least one of two lists had a row. > Generated SQL... yep. That will happen. Manual SQL may be more work, but often has significant reward. If yo

Re: Is postgres able to share sorts required by common partition window functions?

2020-07-06 Thread Michael Lewis
Does this give the same result and do the optimization you want? select c1, min(c2) AS c2, min(c3) AS c3, min(c4) AS c4 from t group by c1; >

Re: Is postgres able to share sorts required by common partition window functions?

2020-07-06 Thread Michael Lewis
Distinct is a great way to get quick results when writing quick & dirty queries, but I rarely have them perform better than a re-write that avoids the need. It collects a ton of results, orders them, and throws away duplicates in the process. I don't love the idea of that extra work. Did you say yo

Re: Is postgres able to share sorts required by common partition window functions?

2020-07-07 Thread Michael Lewis
On Monday, July 6, 2020, Michael Lewis wrote: > Did you say you have an index on c1? > [...] > I don't know the data, but I assume there may be many rows with the same > c1 value, so then you would likely benefit from getting that distinct set > first like below as your

Re: single table - fighting a seq scan

2020-07-14 Thread Michael Lewis
rows=3832 rows=3870 Your estimate changed very little when you included 100 values vs 200 values. That is interesting to me. What does the below query give you? How many of those 200 values are found in the MCVs list? If n_distinct is low, and most of the values are NOT in the most common value l

Re: Switching Primary Keys to BigInt

2020-07-21 Thread Michael Lewis
Curious- what requires that the unique index be declared a primary key? What advantage does that give you? Just ensuring it isn't null? Side note- EOL for 9.6 is coming next year so just a plug for upgrading when possible, perhaps utilizing pglogical to get to v11 or 12. >

Re: Switching Primary Keys to BigInt

2020-07-22 Thread Michael Lewis
On Wed, Jul 22, 2020 at 12:23 PM Mohamed Wael Khobalatte < mkhobala...@grubhub.com> wrote: > No worries. I suppose the answer to the original question, which is how to > avoid a table scan when adding a primary key constraint to a newly > backfilled column is "there is no way"? Downtime might be a

Re: How does vacuum works in postgresql

2020-07-27 Thread Michael Lewis
Your question is a bit vague, and your referenced article is quite old. It doesn't reference pg_stat_progress_vacuum* even despite 9.6 being out for a while before the article. What version are you using and what questions or problems do you have that you think are related to vacuum? *See below ht

Re: Format generation_expression

2020-07-27 Thread Michael Lewis
Are you caching the definition some other place in the application stack and checking it later to compare? If so, I would likely alter the table and in the same transaction read the definition as recorded in generation_expression from information_schema.columns. Seems simple enough, but not sure of

Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Michael Lewis
On Tue, Jul 28, 2020 at 7:59 AM Shaozhong SHI wrote: > Hi, Gavin, > > PostgreSQL Version - 10.11. PostGIS Version - 2.4.7. Operating System - > Red Hat Enterprise Linux 7.7 . > > That is all I know at the moment. > > As I understand, our IT staff is building another one to sit on Azure. > Please

Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Michael Lewis
> > Many thanks. Is this way correct? I am learning what is meant by 'top > posting'. > Yes. On the subject of your settings, I don't see anything for work_mem, random_page_cost and other commonly tuned parameters. That would be a good start. What sort of machine specs are there for Postgres in

Re: Apparent missed query optimization with self-join and inner grouping

2020-08-01 Thread Michael Lewis
Subqueries are evaluated separately when they include an aggregate or window function as I understand it. I agree that it would be helpful in your case if that outside qual was pushed inside. I assume this query is generated by an ORM and you don't just have the option to simply change it?

Re: How to rebuild index efficiently

2020-08-03 Thread Michael Lewis
> > creating another index concurrently is taking lot of time > Could you increase maintenance_work_mem significantly or is that already quite high?

Re: UUID or auto-increment

2020-08-10 Thread Michael Lewis
UUID are also random and not correlated with time typically, so with a very large table when accessing primarily recent data, hitting an index on a big table will pull random pages into memory instead of primarily the end of the index.

Re: Sizing PostgreSQL VM server sizing

2020-08-12 Thread Michael Lewis
On Tue, Aug 11, 2020 at 10:19 AM Samarendra Sahoo < sahoo.samaren...@gmail.com> wrote: > Dear all, > We are getting ready to install in production and would like to know what > are key considerations and how do we use them to provision VMs for the same? > It is going to be highly dependent on how

Re: hash joins are causing no space left error

2020-08-13 Thread Michael Lewis
On Wed, Aug 12, 2020 at 5:52 PM Ayub M wrote: > > This is PostgreSQL 11 on AWS, there is a mview query in this OLAP > database, the tables involved are huge - 50-100m records on average records > hundreds of columns in most cases. > How many tables and how many partitions each? Can you share an E

Re: Case insensitive query on existing data

2020-08-13 Thread Michael Lewis
create index concurrently lower( jsonb->>'name' ), drop old_index concurrently, ensure that where/on/group by conditions use lower( jsonb->>'name' ), then take lunch. What's your concern with this process? >

Re: Window functions speed

2020-08-19 Thread Michael Lewis
Where's the query? Are you able to run explain analyze so we can see estimates vs actual counts? What version are you using? Can you share explain analyze before and after the window function? I mean, that limit at the top makes me think it is doing a lot less work without the window function, vs

Misestimate when applying condition like id = id

2020-08-20 Thread Michael Lewis
where this estimate change comes from. *Michael Lewis | Database Engineer* *Entrata*

Re: Misestimate when applying condition like id = id

2020-08-20 Thread Michael Lewis
Thanks very much. I tested with a much bigger table and it seems that it is using the default .005 selectivity for this integer column. Cool. Nice to understand even if it was just a typo that had things going off the rails. And thanks for your work 3 years ago for that commit.

Re: Query plan prefers hash join when nested loop is much faster

2020-08-21 Thread Michael Lewis
Your system is preferring sequential scan to using test_result_module_result_id_idx in this case. What type of storage do you use, what type of cache hits do you expect, and what do you have random_page_cost set to? That comes to mind as a significant factor in choosing index scans based on costs.

Re: How bad is using queries with thousands of values for operators IN or ANY?

2020-09-01 Thread Michael Lewis
On Tue, Sep 1, 2020 at 1:22 AM Kyotaro Horiguchi wrote: > FWIW, the attached is the dusted-off version of a part of a stalled > development of mine, which unconditionally(!) creates on-the-fly > statistics on VALUES list. It seems to work for certain cases, > although the planning time increa

Re: How to enumerate partitions from a window function?

2020-09-03 Thread Michael Lewis
It seems like you are maybe wanting this- If the previous row is the same, then get the previous row's run_nr. If it is different, then increment. case when lag( property_A ) over() = property_A and lag( property_B ) over() = property_B then coalesce( lag( run_nr ) over(), 1 ) else lag( run_nr ) o

Re: Bitmap scan seem like such a strange choice when "limit 1"

2020-09-04 Thread Michael Lewis
Index Scan using bigtable_y2020_cars_ref_idx on bigtable_y2020 bigtable (cost=0.57..2966738.51 rows=2873818 width=636) (actual time=0.026..0.026 rows=1 loops=1) Given the system expects to get almost 3 million rows when it should be just 1, it seems like a stats problem to me. How is ndistinct on

Re: Dirty buffers with suppress_redundant_updates_trigger

2020-09-14 Thread Michael Lewis
Just curious, are you doing this in a trigger or in your application code? Either way, I'd think you could use the table record type to compare the temp vs real table values as an entire unit. with cte_indexes as( select * from pg_indexes limit 10 ) select i1.indexdef, i2.tablename from cte_indexe

Re: temp table same name real table

2020-09-30 Thread Michael Lewis
On Wed, Sep 30, 2020, 7:27 AM v.br...@joinsnc.com wrote: > > Hi all, > I have this strange behavior when I use temp table with same name of a > permanent table in a function. > IMO, you are asking for strange behavior when you overload a table name. I expect that one of the people who work down

Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)

2020-09-30 Thread Michael Lewis
On Wed, Sep 30, 2020 at 3:41 PM Adrian Klaver wrote: > On 9/30/20 2:30 PM, Adam Sjøgren wrote: > > Adrian writes: > > > >> I don't have an answer. Not even sure if this is relevant to the > >> problem, but how are the jobs getting into the queue? > > > > Plain INSERTs - often a lot at the same ti

Re: Writing WAL files

2020-10-05 Thread Michael Lewis
> I suggest that in PG12 you can monitor the >> "lag" of a standby server more directly by looking at columns write_lag, >> flush_lag, replay_lag in the pg_stat_replication view. > > > And are those things updated when there are no changes to the master > database? > If so, can anyone make the case

Re: How to update a table with the result of deleting rows in another table

2020-10-06 Thread Michael Lewis
> > Adding the customer id to your returning clause and using update..from > could help: > > with data as ( > delete from orders > where customer_id = > returning customer_id, price > ), total as ( > select customer_id, sum(price) as total_price > from data

Re: Does the work made by vaccum in the current pass is lost when interrupted?

2020-10-09 Thread Michael Lewis
On Thu, Oct 8, 2020, 1:16 PM Jean-Marc Lessard < jean-marc.less...@ultra-ft.com> wrote: > I have a large table (billions of records) which has not been vacuum and > bloated. > > Vacuum scale factor was left at the default. > > > > I ran a vacuum on a DEV system and it makes several passes (scannin

Re: pgbouncer installation example (Step by step)

2020-10-14 Thread Michael Lewis
On Wed, Oct 14, 2020 at 11:08 AM Atul Kumar wrote: > Hi Team, > > Please share a clean example of installing, https://www.pgbouncer.org/install.html > configuring https://www.pgbouncer.org/config.html > and testing pgBouncer. > https://www.pgbouncer.org/usage.html What else do you want to

Re: json_to_recordset() and CTE performance

2020-10-21 Thread Michael Lewis
Version? What is the value for work_mem and other configs that are non-default? I see some estimates that are rather off like - -> Nested Loop (cost=0.26..4.76 rows=100 width=148) (actual time=183.906..388716.550 rows=8935 loops=1) Buffers: shared hit=53877 dirtied=

Re: Bad planner performance for tables with empty tuples when using JIT

2020-10-27 Thread Michael Lewis
On Tue, Oct 27, 2020 at 6:14 AM Jurrie Overgoor < postgresql-mailingl...@jurr.org> wrote: > - I could turn off JIT in the server config, but I'd like to use the JIT feature where it's appropriate! I would do this, until PG14 and you can verify it works in most cases for you. 9.6 to 13 is alread

Re: Large index

2020-11-05 Thread Michael Lewis
On Thu, Nov 5, 2020, 6:52 AM Yambu wrote: > What disadvantage does a large table (30mil records) has over a small > table about 20k records when it comes to querying using an indexed column? > Table with 20k rows will likely fit entirely into shared_buffers and not involve any disk i/o for its u

Re: Temporary tables usage in functions

2020-11-09 Thread Michael Lewis
> > Also may I know if excessive use of temporary tables may cause locks? >> > > Usually there are no problems with locks, but there is a problem with > system tables bloating. Creating and dropping temp tables is expensive like > creating or dropping normal tables. > Dropping a real table require

Re: Execution plan does not use index

2020-11-09 Thread Michael Lewis
On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens wrote: > Adding the tzn.utc_offset results in the fact that the execution plan no > longer considers to use the index on the measurement_value table. Is there > any way the SQL can be rewritten so that the index is used? Or any other > solution so that

Re: Execution plan does not use index

2020-11-10 Thread Michael Lewis
On Tue, Nov 10, 2020 at 1:25 AM Peter Coppens wrote: > Triggered by Michael mentioning subqueries I ended up trying > > explain > select d.short_id,mv.timestamp ,mv.I64_01 > from device d, device_configuration dc, measurement_value mv > where mv.device_id=d.short_id and dc.device_id = d.id

Re: Execution plan does not use index

2020-11-10 Thread Michael Lewis
On Tue, Nov 10, 2020, 3:24 PM Peter Coppens wrote: > Index is not used for the subquery > > explain > select mv_inner.* > from measurement_value AS mv_inner > where mv_inner.timestamp > '2020-11-06'::timestamp - interval '1 day' > and mv_inner.timestamp < '2020-11-07'::timestamp + interval '1

Re: Execution plan does not use index

2020-11-11 Thread Michael Lewis
On Tue, Nov 10, 2020, 10:51 PM Peter Coppens wrote: > If you disable sequential scan, does it choose the index and what cost > does it show? > > > It chooses the index, but apparently to create some intermediate structure > that then later still needs to be joined on the device_id. Probably > req

Re: Execution plan does not use index

2020-11-11 Thread Michael Lewis
On Wed, Nov 11, 2020, 7:30 AM Peter Coppens wrote: > > > It seems odd to me to not do any basic adjustment of random_page_cost > though. It isn't a magic number that the core team know to be perfect. It > is a baseline that is likely to be quite different for each use case and > server config. Wh

Re: identify partitioning columns and best practices of partitioning in prod enviornments

2020-11-11 Thread Michael Lewis
On Wed, Nov 11, 2020 at 3:58 PM Ron wrote: > On 11/11/20 4:31 PM, Atul Kumar wrote: > > Hi, > > > > I want to about best practices of partitioning in prod environments > > and how to identify partitioning columns. > > It depends on what you want to do. If your purpose is to simplify the > deleti

Re: Is it possible to write a generic UPSERT?

2020-11-12 Thread Michael Lewis
On Thu, Nov 12, 2020 at 6:58 AM Mario Emmenlauer wrote: > I can see how "ON CONFLICT" is very powerful. But that power seems > often a burden for us. We would prefer something that is less manual > effort for the specific use case. Basically, we would like: > INSERT if not exist, and > UP

Re: Encryption with customer provided key in a multi tenant Postgres JSONB DB

2020-11-12 Thread Michael Lewis
On Wed, Nov 11, 2020 at 10:49 PM Saurav Sarkar wrote: > We have a multi tenant application where for each tenant we create > separate tables . So for e.g. if i have 100 tenants then i have 100 tables. > Depending how many tables each client gets and what you expect your tenant growth may be like

Re: INSERT Trigger to check for existing records

2020-11-22 Thread Michael Lewis
If you can modify your insert statement, and live with an extra column in the data, no trigger is needed as best I can figure. Create a unique index over the existing columns, add a "created_on" field and call insert on conflict (unique index) do nothing. This should give the behavior you want.

Re: ERROR : invalid transaction termination : PostgreSQL v12

2020-11-23 Thread Michael Lewis
On Mon, Nov 23, 2020 at 10:03 AM Jagmohan Kaintura wrote: > Hi , > The Block is only failing immediately at First COMMIT only. It's not > supporting COMMIT. I have removed some portion of code before the second > COMMIT. > Please don't top-post on the Postgres lists by the way (reply with all p

Re: ERROR : invalid transaction termination : PostgreSQL v12

2020-11-23 Thread Michael Lewis
On Mon, Nov 23, 2020 at 6:52 PM Jagmohan Kaintura wrote: > It doesn't works putting that block inside additional BEGIN END > > CREATE OR REPLACE PROCEDURE TEST_TRANSACTION( > ) > LANGUAGE 'plpgsql' > SECURITY DEFINER > AS $BODY$ > DECLARE > G_LAST_UPDATE_USER_SYSTEM VARCHAR2(6) :=

Re: Number of parallel workers chosen by the optimizer for parallel append

2020-11-25 Thread Michael Lewis
What have you tried? Changing the relevant cost parameters I assume? Nothing else going on that may be taking up those workers, right? https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-PARALLEL-SETUP-COST

Re: concurrent re-partitioning of declarative partitioned tables

2020-12-02 Thread Michael Lewis
You can not have overlapping partitions that are both attached. Why do you want to merge partitions that you are "done with" instead of just leaving them partitioned by day? Why are you partitioning at all? Are you confident that you need partitions for performance & that the trade-offs are worth

Re: Alter the column data type of the large data volume table.

2020-12-03 Thread Michael Lewis
On Wed, Dec 2, 2020 at 11:53 PM charles meng wrote: > Hi all, > > I have a table with 1.6 billion records. The data type of the primary key > column is incorrectly used as integer. I need to replace the type of the > column with bigint. Is there any ideas for this? > > Solutions that have been tr

Re: Alter the column data type of the large data volume table.

2020-12-03 Thread Michael Lewis
On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard wrote: > On Thu, 3 Dec 2020, Michael Lewis wrote: > > > On Wed, Dec 2, 2020 at 11:53 PM charles meng wrote: > > >> I have a table with 1.6 billion records. The data type of the primary > key > >> column is in

Re: Alter the column data type of the large data volume table.

2020-12-03 Thread Michael Lewis
On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard wrote: > Tell me, please, why > > ALTER TABLE ALTER COLUMN SET DATA TYPE BIGINT > > will not do the job? > > I've found some varchar columns in a couple of tables too small and used > the > above to increase their size. Worked perfectly. > Somethin

Re: Alter the column data type of the large data volume table.

2020-12-04 Thread Michael Lewis
On Fri, Dec 4, 2020 at 9:04 AM Kevin Brannen wrote: > *>From:* Olivier Gautherot > > >>5) If you're brave enough, convert your current table as a partition > (rename it to something like table_hist), duplicate the table model under > the same name as now (adjusting the primary key type) and set

Extended statistics for correlated columns, row estimates when values are not in MCVs list

2020-12-04 Thread Michael Lewis
On version 12.5, I have a query similar to the below where I am getting rather unfortunate row estimates and a sub-optimal index choice as a result. SELECT id FROM messages WHERE client_id = 1234 and site_id = 889977 and message_type_id in ( 59, 62, 102, 162 ) and sent_on > NOW() - INT

Re: User input to queries

2020-12-09 Thread Michael Lewis
On Wed, Dec 9, 2020 at 10:04 AM Rich Shepard wrote: > While I develop the application's GUI I use the database from the command > line (psql). While some queries are self-contained others need user input. > I've not found a search term that locates this information in the > postgres-12 user manua

Re: Performance penalty during logical postgres replication

2020-12-09 Thread Michael Lewis
On Wed, Dec 9, 2020 at 2:21 AM Lars Vonk wrote: > Hi, > > We are doing a logical postgres replication from Postgres 11 to 12. Our > database is around 700GB (8 cpu's, 32 GB). > During the replication process, at some point, we see a huge performance > penalty on a particular table. This table act

Re: User input to queries

2020-12-09 Thread Michael Lewis
What application is taking the user input and needs to include the parameters in the query string?

Re: Extended statistics for correlated columns, row estimates when values are not in MCVs list

2020-12-10 Thread Michael Lewis
On Mon, Dec 7, 2020 at 8:31 AM Tomas Vondra wrote: > What you might try is defining the statistics with only the functional > dependencies. That should consider the column-level correlation even > when the combination of values is not in the MCV. It might make the > "good" estimate worse, but tha

Mitigating impact of long running read-only queries

2020-12-14 Thread Michael Lewis
https://www.postgresql.org/docs/current/sql-set-transaction.html "The DEFERRABLE transaction property has no effect unless the transaction is also SERIALIZABLE and READ ONLY. When all three of these properties are selected for a transaction, the transaction may block when first acquiring its snaps

Re: Autovacuum not functioning for large tables but it is working for few other small tables.

2020-12-16 Thread Michael Lewis
Absolutely check the logs, or do a manual vacuum verbose with setting cost delay and cost limit (and maintenance work mem) the same as the values for auto vacuum runs. It should work out the same and you could time it for a period when the system is more lightly used it applicable. If you have man

Re: Unexpected result count from update statement on partitioned table

2020-12-18 Thread Michael Lewis
On Fri, Dec 18, 2020 at 12:16 PM Tom Lane wrote: > Laurenz Albe writes: > > The subquery is executed twice, and the two executions obviously don't > > return the same results. I am at a loss for an explanation ... > > Yeah, this is a fairly fundamental shortcoming in inheritance_planner(): > it

Re: Is there a good discussion of optimizations?

2020-12-28 Thread Michael Lewis
On Wed, Dec 23, 2020 at 6:56 PM Guyren Howe wrote: > I’d like to put together a good video and writeup about what the… > philosophy behind relational databases is. > > Most folks, in my experience, who use relational databases don’t really > understand the basic theory or even more important the

Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

2020-12-31 Thread Michael Lewis
Why is your fillfactor so low? That seems pretty crazy, especially for a table with only 4 columns that are fixed width. 100 million rows with so little data in each row is not very much at all. You should be looking to other solutions before partitioning I expect. Perhaps a silly question, but do

Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

2020-12-31 Thread Michael Lewis
On Thu, Dec 31, 2020 at 11:18 AM Thorsten Schöning wrote: > Guten Tag Michael Lewis, > am Donnerstag, 31. Dezember 2020 um 18:20 schrieben Sie: > > > Why is your fillfactor so low?[...] > > I've just copied what my GUI-tool pgModeler generated as SQL right > now

Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

2020-12-31 Thread Michael Lewis
My apologies. You are correct. My brain may have already switched to holiday mode. Hopefully others will chime in shortly.

Re: Keep needing to run manual analyze

2021-01-06 Thread Michael Lewis
On Wed, Jan 6, 2021 at 10:29 AM Rob Northcott wrote: > We have an application that uses a Postgres database (currently 9.6). All > the autovacuum/analyze settings are just left at the defaults. > > We’ve had a few instances recently where users have complained of slow > performance and running a

Re: Suggestion: provide a "TRUNCATE PARTITION" command

2021-01-08 Thread Michael Lewis
On Fri, Jan 8, 2021 at 2:36 AM Thomas Kellerer wrote: > Hello, > > I wonder if it made sense to add a "TRUNCATE PARTITION" command to > Postgres? > > Especially during bulk loads it's more efficient to TRUNCATE a partition > if I know I want to replace all rows, rather than doing a DELETE. > > Cu

Re: Suggestion: provide a "TRUNCATE PARTITION" command

2021-01-08 Thread Michael Lewis
On Fri, Jan 8, 2021 at 9:38 AM Thomas Kellerer wrote: > Michael Lewis schrieb am 08.01.2021 um 16:32: > > On Fri, Jan 8, 2021 at 2:36 AM Thomas Kellerer sham...@gmx.net>> wrote: > > > > Hello, > > > > I wonder if it made sense to add a &qu

Re: Suggestion: provide a "TRUNCATE PARTITION" command

2021-01-08 Thread Michael Lewis
On Fri, Jan 8, 2021 at 10:12 AM Thomas Kellerer wrote: > Michael Lewis schrieb am 08.01.2021 um 17:47: > > > For me, it seems too easily error prone such that a single typo in > > > the IN clause may result in an entire partition being removed that > >

Re: Define hash partition for certain column values

2021-01-12 Thread Michael Lewis
On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна wrote: > List or range partitioning isn't suitable for my case. > I am using a column of numeric(20) type as a base for partitioning. The > values of the column are generated randomly. > So there will be too many partitions if I use list partitioning a

<    1   2   3   4   >