Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?

2023-12-05 Thread Jerry Brenner
Apologies if I'm not using the appropriate Postgres-specific terms.  The
simplest implementation would consider only the information that is
consistent across systems and executions - node types, relation and index
names (not ids), aliases ...  It would ignore bind variable values and
execution statistics.  This would make it easy to find:

   - Queries with different plans (we would then investigate the causes)
   - Queries where the plans changed over time (we would then investigate
   the causes)
   - The aggregated cost of a query plan shared by multiple queries (due to
   differences in the number of items in an IN list, ...) This is useful when
   no single query in the group registers high, but the group of queries
   registers high.
   - Because we use consistent table and index names across environments,
   this would make it possible to find queries with different plans in
   different environments.

In a perfect world, the hash would include the filters, index conds, ...
with the constant values masked out, but I realize that's much more
complicated.  (Without this, we could see plans that are applying different
numbers of predicates against an index mapping to the same hash value, but
it would still be a big improvement.)

As mentioned before, we are currently storing the explain plans in a
database table.  We use a combination of columns and syntax when querying
for the long executions to display some contextual information about the
explain plans (does it have a Materialize node, Init Plan, Sub Plan, ...)
Based on that little contextual information, we can see that there are
multiple plans for some queries. Based on manual investigation, we know
that there other plan differences.  It is very expensive right now to try
to figure out if a plan is changing over time, why some executions are more
expensive than others, ...

Thanks,
Jerry

On Mon, Dec 4, 2023 at 7:29 PM Tom Lane  wrote:

> Michael Paquier  writes:
> > On Mon, Dec 04, 2023 at 09:57:24AM -0500, Tom Lane wrote:
> >> Jerry Brenner  writes:
> >>> Both Oracle and SQL Server have
> >>> consistent hash values for query plans and that makes it easy to
> identify
> >>> when there are multiple plans for the same query.  Does that concept
> exist
> >>> in later releases of Postgres (and is the value stored in the json
> explain
> >>> plan)?
>
> >> No, there's no support currently for obtaining a hash value that's
> >> associated with a plan rather than an input query tree.
>
> > PlannerGlobal includes a no_query_jumble that gets inherited by all
> > its lower-level nodes, so adding support for hashes compiled from
> > these node structures would not be that complicated.  My point is that
> > the basic infrastructure is in place in the tree to be able to do
> > that, and it should not be a problem to even publish the compiled
> > hashes in EXPLAIN outputs, behind an option of course.
>
> Well, yeah, we could fairly easily activate that infrastructure for
> plans, but we haven't.  More to the point, it's not clear to me that
> that would satisfy the OP's request for "consistent" hash values.
> The hashes would vary depending on object OID values, system version,
> possibly endianness, etc.
>
> I'm also wondering exactly what the OP thinks qualifies as different
> plans.  Remembering the amount of fooling-around that's gone on with
> querytree hashes to satisfy various people's ill-defined desires for
> pg_stat_statements aggregation behavior, I'm not really eager to buy
> into the same definitional morass at the plan level.
>
> regards, tom lane
>
>


Include a timestamp in future versions of pg_stat_statements when when a query entered the cache?

2023-12-05 Thread Jerry Brenner
It would be helpful if a timestamp column was added to pg_stat_statements
to denote when a query entered the view.  This would make it easier to tell
how frequently a query is being executed (100,000 times since a specific
timestamp vs 100,000 times since the execution stats were last reset.)

I realize that Postgres is different from SQL Server.  SQL Server has
timestamps for both the time that the query entered the cache and the last
execution.  I assume that adding and maintaining a timestamp for the last
execution would be more difficult and expensive.  Having that additional
information makes it possible for us to find queries that were executed
during a time range that corresponds to a batch process, queries executed
an abnormally high number of times in a short period of time, ...

We are taking hourly snapshot of pg_stat_statements and storing the
information in a database table so we can analyze the database activity in
a given interval.  We are calculating and storing the deltas as part of
that process.  We have to make certain simplifying assumptions due to the
lack of this type of timestamp.  (We can live with these assumptions, but
having the additional timestamp(s) would increase the value of the
information.):

   - If the number of executions increased since the last snapshot, then
   use the difference as the delta. (We assume that the statement was not
   flushed from the cache and then reloaded later in the interval.)
   - If the number of executions remained the same since the last snapshot,
   then the query was not executed in the interval.  (We assume that the
   statement was not flushed from the cache and then reloaded later in the
   interval.)
   - If the number of executions decreased since the last snapshot, then
   the was flushed from the cache at some unknown point in the interval.


Thanks,
Jerry


Re: Include a timestamp in future versions of pg_stat_statements when when a query entered the cache?

2023-12-05 Thread Julien Rouhaud
Hi,

On Tue, Dec 05, 2023 at 06:28:54AM -0800, Jerry Brenner wrote:
> It would be helpful if a timestamp column was added to pg_stat_statements
> to denote when a query entered the view.  This would make it easier to tell
> how frequently a query is being executed (100,000 times since a specific
> timestamp vs 100,000 times since the execution stats were last reset.)

This was actually done a few weeks ago, and will be available with pg 17.  You
can see the 2 new timestamp counters (one for the whole record, one for the
minmax counters only) documentation at
https://www.postgresql.org/docs/devel/pgstatstatements.html