How to reply to an existing bug?

2018-05-27 Thread Erwin Brandstetter
I found an existing bug report and have something to add to it.

What's the best way to reply to it? Just using a browser, with no
newsreader installed.

This one:
https://www.postgresql.org/message-id/flat/20170925084522.1442.32786%40wrigleys.postgresql.org#20170925084522.1442.32...@wrigleys.postgresql.org

I want to add a related question on stackoverflow.com
https://stackoverflow.com/questions/50553558/postgresql-add-serial-column-if-not-exists-still-creating-sequences/50557433

And that the same bug has been carried over to IDENTITY columns.

https://dbfiddle.uk/?rdbms=postgres_10&fiddle=ac978b262727fa842aa1a71349a23767

Regards
Erwin


Can functions containing a CTE be PARALLEL SAFE?

2019-10-16 Thread Erwin Brandstetter
[The manual for Postgres 12 says][1]:

> The following operations are always parallel restricted.

>  - Scans of common table expressions (CTEs).
>  - Scans of temporary tables.
>  - ...

Further down on the same [manual page:][2]

> [...] Similarly, functions must be marked PARALLEL RESTRICTED if they
access
> temporary tables, client connection state, cursors, prepared
> statements, or miscellaneous backend-local state which the system
> cannot synchronize across workers. For example, setseed and random are
> parallel restricted for this last reason.

No mention of CTEs.

I searched the list archives and found a statement from Thomas Munro
[here][3]:

> That means that these CTEs can only be scanned in the leader process.

Now I am unsure whether I can use `PARALLEL SAFE` for functions containing
a CTE (while  fulfilling all other criteria)?

Would the new inlining of CTEs in Postgres 12 have any role in this?

I posted a [similar question on dba.SE][4].

Regards
Erwin Brandstetter

  [1]: https://www.postgresql.org/docs/12/parallel-safety.html
  [2]:
https://www.postgresql.org/docs/12/parallel-safety.html#PARALLEL-LABELING
  [3]:
https://www.postgresql.org/message-id/CAEepm=03s4yih+c0pctfki0o8zgq-p4vfcvek5vvxcr6axh...@mail.gmail.com
  [4]: https://dba.stackexchange.com/q/251274/3684


 That means that these CTEs can only be scanned in the
leader process.


Re: Can functions containing a CTE be PARALLEL SAFE?

2019-10-17 Thread Erwin Brandstetter
Makes sense, thanks for the confirmation.
Maybe clarify in the manual?

Regards
Erwin

On Thu, Oct 17, 2019 at 11:20 AM Tom Lane  wrote:

> Erwin Brandstetter  writes:
> >> The following operations are always parallel restricted.
> >> - Scans of common table expressions (CTEs).
>
> > Now I am unsure whether I can use `PARALLEL SAFE` for functions
> containing
> > a CTE (while  fulfilling all other criteria)?
>
> AFAIR, the reason for treating CTEs as parallel restricted is simply to
> guarantee single evaluation of the CTE.  Within a function, that would
> only matter per-function-execution, so I can't see why a function
> containing such a query couldn't be pushed down to workers for execution.
>
> regards, tom lane
>


Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-10 Thread Erwin Brandstetter
 I am looking for stable hash functions producing 8-byte or 4-byte hashes
from long text values in Postgres 10 or later.

There is md5(), the result of which can be cast to uuid. This reliably
produces practically unique, stable 16-byte values. I have usecases where
an 8-byte or even 4-byte hash would be good enough to make collisions
reasonably unlikely. (I can recheck on the full string) - and expression
indexes substantially smaller. I could truncate md5 and cast back and
forth, but that seems like a lot of wasted computation. Are there
suggestions for text hash functions that are
- fast
- keep collisions to a minimum
- stable across major Postgres versions (so expression indexes don't break)
- croptographic aspect is not needed (acceptable, but no benefit)

There is an old post from 2012 by Tom Lane suggesting that hashtext() and
friends are not for users:

https://www.postgresql.org/message-id/24463.1329854466%40sss.pgh.pa.us

Postgres 11 added hashtextextended() and friends to generate bigint hashes.
In a more recent post from 3 months ago, Tom suggests to use it in
user-land - if portability is not needed:

https://www.postgresql.org/message-id/9434.1568839177%40sss.pgh.pa.us

Is pghashlib by Marko Kreen my best option?

https://github.com/markokr/pghashlib

Or the "version-independent hash functions for PostgreSQL" from Peter
Eisentraut:

https://github.com/petere/pgvihash

Neither received updates for a couple of years. Unmaintained? Or obsolete?
And neither is available on most hosted services like RDS or Heroku (which
would be required in come cases).

So what are my best options?

Regards
Erwin


Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-11 Thread Erwin Brandstetter
Thanks for the suggestion. Seems like a good assumption and I have been
using hashtext() in the past. But I am uncertain whether it is the best
option.

Guess Tom's warning in
https://www.postgresql.org/message-id/9434.1568839...@sss.pgh.pa.us about
portability only refers to hashtextextended() and friends not being there
in Postgres 10 or older.

But why are none of these functions documented? Does the project still not
...

> want people to rely on them continuing to have exactly the current
behavior.

I am not complaining, maybe just nobody did the work. But it's also
mentioned in this old thread, that hastext() changed in the past. Is all of
that outdated and we are welcome to use those functions for indexing?
https://www.postgresql.org/message-id/flat/24463.1329854466%40sss.pgh.pa.us#c18e62281dc78f6d64c1a4d41ab8569b
<https://www.postgresql.org/message-id/24463.1329854...@sss.pgh.pa.us>

Filtering with amprocnum = 2 gets functions producing bigint in Postgres 11
or later.  Not sure about the exact meaning of amprocnum, manual says
"Support function number".

Remaining problem either way: no hash function returning bigint for
Postgres 10.

Regards
Erwin

On Tue, Dec 10, 2019 at 11:13 PM Laurenz Albe 
wrote:

> On Tue, 2019-12-10 at 22:11 +0100, Erwin Brandstetter wrote:
> > I am looking for stable hash functions producing 8-byte or 4-byte hashes
> from long text values in Postgres 10 or later.
> >
> > [...]
> >
> > There is an old post from 2012 by Tom Lane suggesting that hashtext()
> and friends are not for users:
> >
> > https://www.postgresql.org/message-id/24463.1329854466%40sss.pgh.pa.us
>
> Changing a hash function would corrupt hash indexes, wouldn't it?
>
> So I'd expect these functions to be pretty stable:
>
> SELECT amp.amproc
> FROM pg_amproc AS amp
>JOIN pg_opfamily AS opf ON amp.amprocfamily = opf.oid
>JOIN pg_am ON opf.opfmethod = pg_am.oid
> WHERE pg_am.amname = 'hash'
>   AND amp.amprocnum = 1;
>
> Or at least there would have to be a fat warning in the release notes
> to reindex hash indexes.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>


Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-11 Thread Erwin Brandstetter
On Tue, Dec 10, 2019 at 11:34 PM Miles Elam 
wrote:

> In terms of "wasted computation", MD5, SHA1, and the others always compute
> the full length before they are passed to a UUID, int, or whatever. It's a
> sunk cost. It's also a minor cost considering many hash algorithms are
> performed in CPU hardware now. All that's left is the truncation and cast,
> which you can't avoid easily.
>
>
> Sure, you could reimplement Java's .hashCode() method by iterating through
> the characters and processing the character codes:
>
> s[0]*31^(n - 1) + s[1]*31^(n - 2) + ... + s[n - 1]
>
>
> I don't see how that would beat the CPU-based hashes though unless you
> wrote a C-based extension. Maybe it's better just to embrace the
> user-defined function first and then decide if performance is insufficient
> for your use cases.
>
>
> CREATE EXTENSION IF NOT EXISTS pgcrypto;
>
> CREATE OR REPLACE FUNCTION hash8 (p_data text, p_algo text = 'md5')
> RETURNS int8 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 16),
> 'hex'))::bit(64)::int8
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> CREATE OR REPLACE FUNCTION hash4 (p_data text, p_algo text = 'md5')
> RETURNS int4 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 8),
> 'hex'))::bit(32)::int4
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> SELECT
>
> hash4('something something something'),
>
> hash4('something something something', 'sha1'),
>
> hash8('something something something'),
>
> hash8('something something something', 'sha1');
>
>
>
> Cheers,
>
>
> Miles
>



Thanks for the custom functions! May be useful as fallback.

But I am really looking for standard functions in Postgres first. Those
should be faster and more reliable than writing my own.


Regards

Erwin


Re: Idea: Add first() and last() aggregate functions to the main release

2023-03-11 Thread Erwin Brandstetter
On Sat, 11 Mar 2023 at 23:38, Ben Clements 
wrote:

> Idea/request for enhancement:
>
> Add the first() and last() aggregate functions to the main release (CREATE
> EXTENSION first_last_agg).
>
> Use Case: PostgreSQL equivalent to Oracle's MAX(...) KEEP (DENSE_RANK
> FIRST/LAST ORDER BY ...) 
> (https://dba.stackexchange.com/a/324646/100880)
>
> GitHub: https://github.com/wulczer/first_last_agg
>
> Reason: As a non-dba, I can’t install additional modules like
> first_last_agg.
>


For what it's worth, I support the inclusion of first() and last() in the
main release. It makes queries involving multiple aggregate functions a lot
simpler. Most hosted services only allow official contrib modules.
The module has been around for more than 10 years now. First on
https://www.pgxn.org/dist/first_last_agg, then on
https://github.com/wulczer/first_last_agg. And it's been available from
apt.postgresql.org for a long time now.
Related questions pop up on Stackoverflow every now and then.
(I don't see anything related in the Postgres Todo, yet.)

The implementation in the additional module first_last_agg looks very
simple and straight-forward. Might be an easy task to incorporate into
mainline Postgres. But my C foo is just not good enough to judge whether
the implementation is ideal.

Regards
Erwin


EXCLUDE USING hash(i WITH =)

2019-03-06 Thread Erwin Brandstetter
The manual currently advises:
https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-EXCLUDE

EXCLUDE [ USING *index_method* ] ( *exclude_element* WITH *operator* [, ...
> ] ) *index_parameters* [ WHERE ( *predicate* ) ][...]Although it's
> allowed, there is little point in using B-tree or hash indexes with an
> exclusion constraint, *because this does nothing that an ordinary unique
> constraint doesn't do better*. So in practice the access method will
> always be GiST or SP-GiST.


However, hash indexes do not support UNIQUE:
https://www.postgresql.org/docs/current/indexes-unique.html

Currently, only B-tree indexes can be declared unique.
>

But an exclusion constraint with "USING hash" seems to do exactly that
(more expensively, granted), handling hash collisions gracefully. Demo
(original idea by user FunctorSalad on stackoverflow:
https://stackoverflow.com/questions/47976185/postgresql-ok-to-use-hash-exclude-constraint-for-uniqueness/47976504?noredirect=1#comment96799970_47976504
):

CREATE TABLE exclude_hast_test(
  i int,
  EXCLUDE USING hash(i WITH =)
);

INSERT INTO exclude_hast_test VALUES (213182),(1034649);  --  hashint4()
collision!

More detailed fiddle:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=8a9fc48f74f93f8aed0964f3796a0b04

Would seem particularly attractive for values too large for btree indexes.
An index on a hash value is the recommended workaround, but an exclusion
constraint also handles hash collisions automatically. (Or even for any wide
column to keep index size low.)

Hence my questions:

- Why does an exclusion constraint with "USING hash(i WITH =)" enforce
uniqueness, while we still can't create a "UNIQUE index ... USING hash .."?
- Why would the manual discourage its use? Should I file a documentation
bug?

Regards
Erwin