Idea: PostgreSQL equivalent to Oracle's KEEP clause

2023-03-06 Thread Ben Clements
I have an idea/request for enhancement to PostgreSQL (I'm new to PostgreSQL
and this mailing list).

Idea:

There's a technique in Oracle SQL that can be used to simplify aggregation
queries:
*Aggregate on a particular column, but get information from a different
column, using a simple calculated column in the SELECT list.*

--Oracle
--For a given country, what city has the highest population? (where the
country has more than one city)
--Include the city name as a column.
select
   country,
   count(*),
   max(population),
   max(city) keep (dense_rank first order by population desc)
from
   cities
group by
   country
having
   count(*) > 1

As shown above, the following calculated column can bring in the city name,
even though the city name isn't in the GROUP BY:
   max(city) keep (dense_rank first order by population desc)

There are a number of ways to achieve that kind of thing using PostgreSQL.
I want a solution that lets me do it in a calculated column -- all within a
single SELECT query (no subqueries, joins, WITH, etc.).

Could that functionality be added to PostgreSQL?

Related:

   - YouTube - The KEEP clause will KEEP your SQL queries SIMPLE (Oracle)
   
   - Stack Overflow - Explanation of KEEP in Oracle FIRST/LAST
   

   - DBA Stack Exchange - PostgreSQL equivalent to Oracle's MAX(...) KEEP
   (DENSE_RANK FIRST/LAST ORDER BY ...)
   

Thanks,

-Ben


Idea: Add first() and last() aggregate functions to the main release

2023-03-11 Thread Ben Clements
Idea/request for enhancement:

Add the first() and last() aggregate functions to the main release (CREATE
EXTENSION first_last_agg).

Use Case: PostgreSQL equivalent to Oracle's MAX(...) KEEP (DENSE_RANK
FIRST/LAST ORDER BY ...) 
(https://dba.stackexchange.com/a/324646/100880)

GitHub: https://github.com/wulczer/first_last_agg

Reason: As a non-dba, I can’t install additional modules like first_last_agg
.

Thanks,
-Ben


Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause

2023-03-13 Thread Ben Clements
Thanks David.

Similar to your "TOP() and BOTTOM() aggregate" idea, you might find Erwin
Brandstetter's solution using the LAST() aggregate function interesting: (
https://dba.stackexchange.com/a/324646/100880)

If the FIRST_LAST_AGG extension is installed, then we can do something like
this:

SELECT country
 , count(*) AS ct_cities
 , max(population) AS highest_population
 , last(city ORDER BY population, city) AS biggest_city  -- !FROM
 citiesGROUP  BY countryHAVING count(*) > 1;


-Ben

On Mon, Mar 6, 2023 at 9:51 PM David Rowley  wrote:

> On Tue, 7 Mar 2023 at 12:40, Tom Lane  wrote:
> >
> > Ben Clements  writes:
> > > As shown above, the following calculated column can bring in the city
> name,
> > > even though the city name isn't in the GROUP BY:
> > >max(city) keep (dense_rank first order by population desc)
> >
> > You haven't really explained what this does, let alone why it can't
> > be implemented with existing features such as FILTER and ORDER BY.
>
> (It wasn't clear to me until I watched the youtube video.)
>
> Likely KEEP is more flexible than just the given example but I think
> that something similar to the example given could be done by inventing
> a TOP() and BOTTOM() aggregate. Then you could write something like:
>
> select
>country,
>count(*),
>max(population),
>bottom(city, population)
> from
>cities
> group by
>country
> having
>count(*) > 1
>
> the transfn for bottom() would need to remember the city and the
> population for the highest yet seen value of the 2nd arg.  The
> combinefn would need to find the aggregate state with the highest 2nd
> arg value, the finalfn would just spit out the column that's stored in
> the state.  Where this wouldn't work would be if multiple columns were
> required to tiebreak the sort.
>
> You could at least parallelize the aggregation this way. If there were
> to be some form of ORDER BY in the aggregate then no parallelization
> would be possible.  I'd assume since the whole thing can be done with
> a subquery that the entire point of having special syntax for this
> would be because we don't want to pay the price of looking at the
> table twice, i.e. performance must matter, so the ability to have
> parallel aggregates here seems good.
>
> I can't quite think of a way to have parallel query and an arbitrarily
> long list of columns to sort on...
>
> For Ben, we do tend to shy away from copying other RDBMS's extensions
> to the SQL language.  The problem is that copying these can cause
> problems in the future if/when the standard adopts that syntax with
> variations or invents something else that conflicts with the grammar
> that we've added.  One example of something we didn't do was Oracle's
> CONNECT BY.  Eventually, the SQL standard got WITH RECURSIVE to allow
> queries on hierarchical data. Of course, we do have many of our own
> extensions to the standard, so we certainly do make exceptions
> sometimes. So, don't be too surprised that there's some discussion of
> other methods which might make this work which don't involve copying
> what someone else has done.
>
> David
>