RE: Query unable to utilize index without typecast to fixed length character

2023-04-08 Thread msalais
Hi,

 

Your error is the use of quotes around the constant numeric value!

You should not use it because that means then that it is a character constant 
causing an implicit conversion. 

We must consider any implicit conversion in our queries as a potential problem 
and we must absolutely avoid using implicit conversions…

 

Best regards

 

Michel SALAIS

Consultant Oracle, PostgreSQL

De : ahi  
Envoyé : vendredi 7 avril 2023 09:09
À : Tom Lane 
Cc : [email protected]
Objet : Re: Query unable to utilize index without typecast to fixed length 
character

 

You are right we should move from character(N) to text, however the explicit 
typecast is also required for the numeric column not just the character one

 

On Thu, Apr 6, 2023 at 4:50 PM Tom Lane mailto:[email protected]> > wrote:

ahi mailto:[email protected]> > writes:
> CREATE TABLE public.marketplace_sale (
> log_index integer NOT NULL,
> created_at timestamp with time zone DEFAULT now() NOT NULL,
> updated_at timestamp with time zone DEFAULT now() NOT NULL,
> block_timestamp timestamp with time zone NOT NULL,
> block bigint NOT NULL,
> contract_address character(42) NOT NULL,
> buyer_address character(42) NOT NULL,
> seller_address character(42) NOT NULL,
> transaction_hash character(66) NOT NULL,
> quantity numeric NOT NULL,
> token_id numeric NOT NULL,
  ...

Type character(N) is a hangover from the days of punched cards.
Don't use it.  It has weird semantics concerning trailing spaces,
which are almost never the behavior you actually want, and cause
interoperability issues with type text.  (Text is Postgres' native
string type, meaning that unlabeled string constants will tend to
get resolved to that.)

regards, tom lane



RE: Forced to use UNION ALL when having multiple ANY operators and ORDER BY LIMIT

2023-06-18 Thread msalais
Hi,

Do you really need to do “select *”?

In other words, is it necessary to have all columns in the result?

 

Michel SALAIS

 

De : benoit  
Envoyé : lundi 12 juin 2023 23:35
À : Chris Hoover 
Cc : [email protected]
Objet : RE: Forced to use UNION ALL when having multiple ANY operators and
ORDER BY LIMIT

 

This new index is used but still the read is 230mb.

 

https://explain.dalibo.com/plan/b0f28a9e8a136afd

 

  _  

De : Chris Hoover mailto:[email protected]> >
Envoyé : lundi 12 juin 2023 22:55
À : benoit
Cc : [email protected]
 
Objet : Re: Forced to use UNION ALL when having multiple ANY operators and
ORDER BY LIMIT 

 

I normally create my indexes to match the where clause of the query. While
technically, it should not matter, I find a lot of time, it does. 

 

I would create an index on (status, sender_reference, sent_at) and see if
the improves your query performance.

 


 

SELECT * FROM docs WHERE status 

IN ('draft',

'sent')

AND sender_reference 

IN ('Custom/1175',

'Client/362',

'Custom/280')

ORDER BY sent_at DESC

 

 

 

Thanks,

 

 

Chris Hoover

Senior DBA

AWeber.com

Cell: (803) 528-2269

Email: [email protected]  

 

 





On Jun 12, 2023, at 4:17 PM, benoit mailto:[email protected]> > wrote:

 

Hello

 

I have a database with few 60gb tables. Tables rows are requested with
multiple ANY or IN operators. I am not able to find an easy way to make DB
able to use indexes. I often hit the index, but see a a spike of 200mb of IO
or disk read.

 

I am using version 13 but soon 14.

 

I wrote a reproduction script on version 14 with plans included.
https://gist.github.com/benoittgt/ab72dc4cfedea2a0c6a5ee809d16e04d

 

I also have plans on a snapshot of the DB with real data.

- The current query that I try to improve :
https://explain.dalibo.com/plan/8b8f6e0he9feb551

  - I added the DB schema + index in query view. As you can see I have many
indexes for testing purpose and try what the planner can do.

- The optimized query when I have only one ANY and migrate to UNION ALL for
each parameter of the ANY operator
https://explain.dalibo.com/plan/427gg053d07328ga . Query is fast as I would
like but it means generate some merge to be able to get a fast result.

- The new issue I have when I have a new ANY operator on the previous
optimized query. Big IO/read
https://explain.dalibo.com/plan/e7ha9g637b4eh946

 

It seems to me quite undoable to generate for every parameters a query that
will then merge. I have sometimes 3-4 ANY operators with up to 15 elements
in an array.

 

Is there a misusage of my indexes?

Is there a limitation when using ANY or IN operators and ordered LIMIT
behind?

 

Thanks a lot

 



RE: CTE Inline On TPC-DS Query 95

2024-12-27 Thread msalais
Hi,

The choice between inlining and materializing is not a question of syntax. It 
matters if the parent query references the CTE expression more than once but 
this is not a decisive factor...

Sometimes one is better and sometimes the other is best.
In Oracle, the choice is done by COST. This should be the same for PostgreSQL. 
In essence, it is the same thing for views: inlining or materializing...

Michel SALAIS


-Message d'origine-
De : David Rowley  
Envoyé : mardi 26 novembre 2024 05:03
À : Ba Jinsheng 
Cc : Andrei Lepikhov ; [email protected]
Objet : Re: CTE Inline On TPC-DS Query 95

On Tue, 26 Nov 2024 at 10:03, Ba Jinsheng  wrote:
> If we specify ``NOT MATERIALIZED`` for the query, the execution time is 
> reduced from 1min to 1s due to the usage of inline CTE. It seems expected as 
> described in the PostgreSQL documentation. However, from the code: 
> https://github.com/postgres/postgres/blob/REL_17_STABLE/src/backend/optimizer/plan/subselect.c#L939,
>  I understand that this query does not contain volatile functions and 
> includes simple predicates. I am wondering whether we can relax this 
> condition checking, to enable inline CTE for such queries? Because it seems 
> the performance benefit is huge on a standard benchmark.

The relevant part of the documentation [1] seems to be "By default, this 
happens if the parent query references the WITH query just once, but not if it 
references the WITH query more than once."

If you're proposing that we change the heuristics for when to inline CTEs, then 
I suggest you go and check the archives for why we made this decision. You'll 
need to prove to us why your newly proposed heuristic is better than what we 
have. That's going to take more than one example query to show that your 
heuristics are better.

However, just to save you from wasting any time, I highly doubt an adjustment 
to the CTE inlining heuristics alone is enough.  Just imagine the CTE 
evaluation performed a Seq Scan on a 1 billion row table and found 1 matching 
row. Let's assume the remaining part of the query is very cheap. It's obviously 
going to be better to materialise the CTE so that we perform the Seq Scan once 
rather than twice (or however many times the CTE is referenced). Adjusting the 
heuristics here is just going to make some queries faster at the expense of 
making other queries slower. That's never going to pass community standards.

If you truly wanted to improve things here, the heuristics would need to be 
swapped out with a cost-based decision. I can tell you now, that doing that is 
a very difficult task as it basically requires performing the join search and 
quite a large portion of planning once for each combination of CTE inlined vs 
not-inlined. If you had a query with a dozen or so CTEs, that's going to be a 
very costly thing to plan.

David

[1] https://www.postgresql.org/docs/current/queries-with.html






RE: Reg. Postgres Unique contraint

2024-12-27 Thread msalais
Hi,

 

Does the following query return something:

 

Select user_name, count(*) from your_table group by user_name having count(*) > 
1

 

Best regards

 

Michel SALAIS

 

De : Sajjad Abid  
Envoyé : mercredi 27 novembre 2024 05:28
À : Diomedes Dominguez 
Cc : [email protected]
Objet : Re: Reg. Postgres Unique contraint

 

Version of Postgres is 12.9

 

On Wed, Nov 27, 2024 at 9:49 AM Diomedes Dominguez 
mailto:[email protected]> > wrote:

Which version of Postgres are you using? 

 

El mié, 27 de nov de 2024, 00:14, Sajjad Abid mailto:[email protected]> > escribió:

A column is character varying(50), also on this column a unique constraint is 
defined, but I found some duplicate values in this column for some records

  user_name character varying(50) 

  CONSTRAINT  user_name_unq   UNIQUE ( user_name )

 

Kindly help me to resolve this issue.

 

-- 

Thanks & Regards, 

Sajjad Abid




 

-- 

Thanks & Regards, 

Sajjad Abid



RE: Vacuum Questions

2025-05-03 Thread msalais
Hi

 

It is not your question but for such situations, you should consider using 
partitioning.

And more closely to your question: I would not disable autovacuum but it must 
not work with default values.

 

Best regards

 

Michel SALAIS

De : Leo  
Envoyé : vendredi 2 mai 2025 16:23
À : [email protected]
Objet : Vacuum Questions

 

I have been working on AWS PostgreSQL RDS for a few years, but still not very 
experienced when it comes to performance issues.  Plus RDS is slightly 
different from the pure PostgreSQL.

 

I am trying to comprehend exactly how vacuum works.

 

Here is what I am trying to do.

 

I am purging old records from a table (500 million rows, but I am doing it in 
sets of  50,000,000 with a smaller loop of 100,000).  That works just fine.

 

Because of the amount of data/rows deleted, I disabled the autovacuum for this 
table (I want to have control over vacuum, autovacuum does not complete anyway 
due to the timeout, sizing, etc settings that I do not want to change system 
wide).  I will put the autovacuum back once I am done of course.

 

The issue is when I start vacuuming.  This table has 4 indexes and a PK that I 
worry about.  The PK takes about 30 minutes to vacuum and two of the indexes 
take about an hour each.  The problem comes in for the other 2 indexes - they 
take 12+ hours each to vacuum:

 

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  vacuuming "public.pc_workflowlog"

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  launched 4 parallel vacuum 
workers for index vacuuming (planned: 4)

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"pc_workflowlog_pk" to remove 5000 row versions

DETAIL:  CPU: user: 191.03 s, system: 12.43 s, elapsed: 1711.22 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"workflowlo_u_publicid_g6uqp9lkn6e8" to remove 5000 row versions

DETAIL:  CPU: user: 325.75 s, system: 19.75 s, elapsed: 2674.24 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"workflowlo_n_workflow_2tc9k2hdtry9v" to remove 5000 row versions

DETAIL:  CPU: user: 312.17 s, system: 16.94 s, elapsed: 3097.88 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"workflowlo_n_frozenseti_2kjkbjgf3c6ro" to remove 5000 row versions

DETAIL:  CPU: user: 41187.70 s, system: 216.14 s, elapsed: 42749.36 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index 
"workflowlo_n_userid_14kqw6qdsnndw" to remove 5000 row versions 

DETAIL:  CPU: user: 41280.66 s, system: 216.67 s, elapsed: 42832.16 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  "pc_workflowlog": removed 
5000 row versions in 1129870 pages

 

I've increased max_parallel_maintenance_workers to 8 for the session and it 
used parallel 4 (one for each index I assume) to handle it and the two indexes 
were done in ~ an hour.  What I am trying to figure out is how to force the 
other two large indexes to be vacuumed in parallel - a few workers going 
against an index.  It seems it is possible to do, the index size is large 
enough to kick in, but I have not been able to figure it out yet.  Most of the 
parameters are at default values.

 

What am I missing?

 

I have a few other questions.  Does vacuum time depend on the number of dead 
rows only and the size of the table, or does the entire storage allocation 
(including dead tuples) also affect it?

 

Would it be more beneficial to drop the two large indexes, purge, vacuum, and 
recreate the indexes after make more sense (I know it needs to be tested)?  The 
reason I am doing it in stages is to make sure I have enough time to vacuum, 
but maybe it would not take much longer to vacuum after the complete purge?

 

Lastly, is it better to delete all the rows (500 mil) instead of doing it in 
smaller batches, and vacuum only once?

 

The current size of the table is about 1T and the indexes add another 1.5T to 
it.

 

Truncate is not an option as I am only deleting rows older than 6 months.  
Client was not doing purging for years, but will do it after the clean up.

 

P.S. This is my very first post here, please advise if it is the wrong channel. 
 Thank you in advance.