Re: Parallel hints in PostgreSQL with consistent perfromance

2024-01-02 Thread mohini mane
On Fri, Dec 29, 2023 at 10:25 AM Jeff Janes  wrote:

>
>
> On Thu, Dec 28, 2023 at 7:47 AM mohini mane 
> wrote:
>
>> Thank you for your response !!
>> I am experimenting with SQL query performance for SELECT queries on large
>> tables and I observed that changing/increasing the degree of parallel hint
>> doesn't give the expected performance improvement.
>>
>
> But you still have addressed the fact that PostgreSQL *does not have
> planner hints*.
>
> Are you using some nonstandard extension, or nonstandard fork?
> * >> I am using pg_hint_plan extension to enforce the parallel execution
> of specific table .*
>
   *  postgres=# load 'pg_hint_plan';*
* LOAD*


> I have executed the SELECT query with 2,4 & 6 parallel degree however
>> every time only 4 workers launched & there was a slight increase in
>> Execution time as well,
>>
>
> Adding an ignored comment to your SQL would not be expected to do
> anything.  So it is not surprising that it does not do anything about
> the number of workers launched.  It is just a comment.  A note to the human
> who is reading the code.
> * >> As I am using ph_hint_plan extension so as expected hints should not
> get ignored by the optimizer .*
>
>> why there is an increase in execution time with parallel degree 6 as
>> compared to 2 or 4?
>>
>
> Those small changes seem to be perfectly compatible with random noise.
> You would need to repeat them dozens of times in random order, and then do
> a statistical test to convince me otherwise.
> * >> I am expecting desired number of parallel workers should get
> allocated as VM has sufficient vCores [16] and with needed session
> parameters 
> [parallel_tuple_cost=0.1,max_parallel_workers_per_gather=6,**max_parallel_workers=8
> and I am using parallel hints like this : * */*+ PARALLEL(A 5 hard) */
> so 5 worker processes should launched this is not happening]*
>
>>


Re: Parallel hints in PostgreSQL with consistent perfromance

2024-01-02 Thread David G. Johnston
On Tue, Jan 2, 2024 at 8:12 AM mohini mane  wrote:

>
> I have executed the SELECT query with 2,4 & 6 parallel degree however
>>> every time only 4 workers launched & there was a slight increase in
>>> Execution time as well,
>>>
>>
>> Adding an ignored comment to your SQL would not be expected to do
>> anything.  So it is not surprising that it does not do anything about
>> the number of workers launched.  It is just a comment.  A note to the human
>> who is reading the code.
>> * >> As I am using ph_hint_plan extension so as expected hints should not
>> get ignored by the optimizer .*
>>
>
Sounds like a bug you should go tell the pg_hint_plan authors about then.

David J.


Re: Parallel hints in PostgreSQL with consistent perfromance

2024-01-02 Thread mohini mane
On Tue, 2 Jan 2024, 21:45 David G. Johnston, 
wrote:

> On Tue, Jan 2, 2024 at 8:12 AM mohini mane 
> wrote:
>
>>
>> I have executed the SELECT query with 2,4 & 6 parallel degree however
 every time only 4 workers launched & there was a slight increase in
 Execution time as well,

>>>
>>> Adding an ignored comment to your SQL would not be expected to do
>>> anything.  So it is not surprising that it does not do anything about
>>> the number of workers launched.  It is just a comment.  A note to the human
>>> who is reading the code.
>>> * >> As I am using pg_hint_plan extension so as expected hints should
>>> not get ignored by the optimizer .*
>>>
>>
> Sounds like a bug you should go tell the pg_hint_plan authors about then.
>
*>> I am getting same results with or without extension [in my case
it's pg_hint_plan] still I will check with the respective team, Thanks .*

>
> David J.
>
>


Questions about "Output" in EXPLAIN ANALYZE VERBOSE

2024-01-02 Thread Jerry Brenner
We are currently on 13.9.  For each of the questions, I'd also like to know
if anything has changed in that area in later releases.
NOTE: We are capturing all explain plans via auto_explain and storing them
in a database table.  One of our longer term goals is to build the
relationship between queries and indexes, so we can tell where each of the
indexes is used and how it is used (or not used).  In the Index Only Scan
example below, we think that there are other queries that use the same
index AND also access JobID and MostRecentModel, but we want to verify that.

*Any node type accessing an index or table*

   - It looks like "Output" includes more than just the columns with
   predicates and/or being accessed or returned in other nodes.  *Has any
   thought been given to adding an additional attribute listing the columns
   that are actually used?*  (While it's possible to do this after getting
   the explain plan, it seems like that information would be available
   internally in Postgres.)


*Index Only Scan*

   - *Is it safe to assume that the columns listed with "Output" in an
   Index Only Scan node are the key columns, in order? * That's what we've
   observed, but I wanted to check if it was safe to make that assumption.
   - NOTE: IMHO, this is a case where showing all of the key columns,
   instead of just the ones that are used, is helpful because the person
   analyzing the query plan doesn't necessarily have direct access to the
   database schema.
   - In this example, policyperi_u_id_1mw8mh83lyyd9 is on
   pc_policyperiod(ID, Retired, JobID, PolicyID, TemporaryBranch,
   MostRecentModel)
   - NOTE: PolicyID is referenced in a node above the Index Only Scan, but
   neither JobID nor MostRecentModel are.

  "Plans": [
{
  "Node Type": "Index Only Scan",
  "Parent Relationship": "Outer",
  "Parallel Aware": false,
  "Scan Direction": "Forward",
  "Index Name":
"policyperi_u_id_1mw8mh83lyyd9",
  "Relation Name": "pc_policyperiod",
  "Schema": "public",
  "Alias": "qroots0_1",
  "Startup Cost": 0.57,
  "Total Cost": 15.90,
  "Plan Rows": 10,
  "Plan Width": 8,
  "Actual Startup Time": 0.234,
  "Actual Total Time": 1.223,
  "Actual Rows": 203,
  "Actual Loops": 1,
 * "Output": ["qroots0_1.id
", "qroots0_1.retired", "qroots0_1.jobid",
"qroots0_1.policyid", "qroots0_1.temporarybranch",
"qroots0_1.mostrecentmodel"],*
  "Index Cond": "((qroots0_1.id = ANY ($4))
AND (qroots0_1.retired = 0) AND (qroots0_1.temporarybranch = false))",

*Index Scan*

   - *Is it safe to assume that the columns listed are all of the columns
   in the table?* (The table has too many columns to verify.)

{
  "Node Type": "Index Scan",
  "Parent Relationship": "Inner",
  "Parallel Aware": false,
  "Scan Direction": "Forward",
  "Index Name": "ppperf10",
  "Relation Name": "pc_policyperiod",
  "Schema": "public",
  "Alias": "groot_1",
  "Startup Cost": 485987.94,
  "Total Cost": 485990.69,
  "Plan Rows": 1,
  "Plan Width": 16,
  "Actual Startup Time": 5.710,
  "Actual Total Time": 5.710,
  "Actual Rows": 0,
  "Actual Loops": 117,
*  "Output": ["groot_1.paymentinstrument_wmic",
"groot_1.cipminretainedpremium_wmic", "groot_1.pendingreindex",
"groot_1.locked", "groot_1.editeffectivedate", "groot_1.invoicingmethod",
"groot_1.archivestate", "groot_1.archiveschemainfo",
"groot_1.prioraddressfk_ext", "groot_1.locationautonumberseq",
"groot_1.csioid_ext", "groot_1.updatetime",
"groot_1.multiproddiscapplied_wmic", "groot_1.paymentdesc_wmic",
"groot_1.id ", "groot_1.singlecheckingpatterncode",
"groot_1.billingmethod", "groot_1.fleetdiscount_wmic",
"groot_1.createuserid", "groot_1.cp_auditwrapuplblty_wmic",
"groot_1.totalcostourshare", "groot_1.allowgapsbefore",
"groot_1.quoteidentifier", "groot_1.quotehidden", "groot_1.orphaned",
"groot_1.beanversion", "groot_1.packagediscount_

Re: Questions about "Output" in EXPLAIN ANALYZE VERBOSE

2024-01-02 Thread Jeff Janes
On Tue, Jan 2, 2024 at 1:29 PM Jerry Brenner  wrote:

> We are currently on 13.9.
>

Why not just use the latest minor release, 13.13?  For security reasons,
that is the only minor release of v13 you should be using anyway.  I think
it is a bit much to hope that people will spend their time for free
researching obsolete minor releases.


> *Any node type accessing an index or table*
>
>- It looks like "Output" includes more than just the columns with
>predicates and/or being accessed or returned in other nodes.
>
> Not in my hands. For SELECTs it just lists the columns that are needed.
Your example is hard to follow because it appears to be just snippets of a
plan, with no example of the query to which it belongs.

Cheers,

Jeff


Re: Questions about "Output" in EXPLAIN ANALYZE VERBOSE

2024-01-02 Thread Tom Lane
Jeff Janes  writes:
> On Tue, Jan 2, 2024 at 1:29 PM Jerry Brenner  wrote:
>> - It looks like "Output" includes more than just the columns with
>> predicates and/or being accessed or returned in other nodes.

> Not in my hands. For SELECTs it just lists the columns that are needed.

It depends.  The planner may choose to tell a non-top-level scan node
to return all columns, in hopes of saving a tuple projection step at
runtime.  That's heuristic and depends on a number of factors, so you
shouldn't count on it happening or not happening.

regards, tom lane




Re: Questions about "Output" in EXPLAIN ANALYZE VERBOSE

2024-01-02 Thread Jerry Brenner
Tom - Thanks for the response.  I guess what I am really looking for is a
simple way to find all of the columns referenced from a given instance of a
table or index from the json file, although it would be even better if it
was easy to differentiate between the columns that came from the index vs
those that could only come from the table. (We may not have direct access
to the database, the indexing may have changed since the plan was captured,
...)  I can see that all of the values in "Index Cond" and "Filter" for the
given "Alias" are relevant, but it's unclear what portion of the values in
"Output" are relevant.  Some instances of "Output" contain a superset of
the values in "Index Cond" and "Filter", including columns that are not
referenced in the query (there are a total of 187 columns in
pc_policyperiod and instances in the plan were all of them show up in
"Output" but only 7 of them are actually referenced), others contain a
mutually exclusive set of values, ...  It would be helpful if there was an
attribute that contained that information.

I now see that groot2.ID is the only column listed in "Output"  from
pc_policy here and it doesn't show up anywhere else in the plan.  It's
actually used in the evaluation of "(hashed SubPlan 3)", but I had to look
at the SQL to figure that out.
NOTE: policy_n_producerco_3e8i0ojsyckhx is an index on
pc_policy(producercodeofserviceid, retired).  It makes sense for reasons
beyond this query to add  id as the last key column to the index.

  "Filter": "((NOT groot_1.assignedrisk) AND
((groot_1.producercodeofrecordid = '10791'::bigint) OR *(hashed SubPlan 3)*
))",
  "Rows Removed by Filter": 0,
  "Shared Hit Blocks": 549472,
  "Shared Read Blocks": 0,
  "Shared Dirtied Blocks": 0,
  "Shared Written Blocks": 0,
  "Local Hit Blocks": 0,
  "Local Read Blocks": 0,
  "Local Dirtied Blocks": 0,
  "Local Written Blocks": 0,
  "Temp Read Blocks": 0,
  "Temp Written Blocks": 0,
  "I/O Read Time": 0.000,
  "I/O Write Time": 0.000,
  "Plans": [
{
  "Node Type": "Index Scan",
  "Parent Relationship": "SubPlan",
  "Subplan Name": "SubPlan 3",
  "Parallel Aware": false,
  "Scan Direction": "Forward",
  "Index Name":
"policy_n_producerco_3e8i0ojsyckhx",
  "Relation Name": "pc_policy",
  "Schema": "public",
  "Alias": "groot2",
  "Startup Cost": 0.56,
  "Total Cost": 484540.46,
  "Plan Rows": 578767,
  "Plan Width": 8,
  "Actual Startup Time": 0.035,
  "Actual Total Time": 490.349,
  "Actual Rows": 546045,
  "Actual Loops": 1,
  *"Output": ["groot2.id
"],*
  "Index Cond":
"((groot2.producercodeofserviceid = '10791'::bigint) AND (groot2.retired =
0))",


Here's the SQL:

SELECT COUNT(*)
FROM (
SELECT *
FROM (
SELECT /* ISNULL:pc_policycontactrole.EffectiveDate:,
ISNULL:pc_policycontactrole.ExpirationDate:,
pc:gw.webservice.pc.pc5000.policysearch.PolicySearchAPI#findPolicies_WMIC;
*/ gRoot.ID col0
FROM pc_policyperiod gRoot
WHERE gRoot.AssignedRisk = $1 AND gRoot.MostRecentModel =
$2 AND gRoot.PolicyID IN
 (
SELECT qRoots0.PolicyID col0
FROM pc_policyperiod qRoots0
WHERE qRoots0.ID = ANY (ARRAY
 (
SELECT qRoots1.BranchID col0
FROM pc_policycontactrole qRoots1
WHERE qRoots1.Subtype = $3 AND
qRoots1.ContactDenorm IN
 (
SELECT qRoots2.ID col0
FROM pc_contact qRoots2
WHERE qRoots2.FirstNameDenorm =
LOWER ($4) AND qRoots2.LastNameDenorm = LOWER ($5) AND qRoots2.Retired = 0)
 AND ( ( ( (qRoots1.EffectiveDate <> qRoots1.ExpirationDate) OR
(qRoots1.EffectiveDate IS NULL) OR (qRoots1.ExpirationDate IS NU