How to pass table column values to function

2023-02-11 Thread Andrus

Hi!

Table source contains integer column. Its values should be passed to 
function for selecting data from other table.


I tried

    CREATE OR REPLACE FUNCTION 
public.TestAddAssetTransactions(dokumnrs int[])

 RETURNS int AS
    $BODY$

    with i1 as (
    INSERT INTO bilkaib (dokumnr)
    select dokumnr  from dok where dokumnr in (select * from 
unnest(dokumnrs))

    returning *
    )

    select count(*) from i1;
    $BODY$ language sql;

    create temp table bilkaib (dokumnr int ) on commit drop;
    create temp table dok (dokumnr serial primary key ) on commit drop;
    create temp table source (dokumnr int ) on commit drop;
    insert into source values (1),(2);

    select TestAddAssetTransactions( (select ARRAY[dokumnr] from 
source)::int[] )



but got error

> ERROR: more than one row returned by a subquery used as an expression

How to pass set of integers to function? Should temp table with fixed 
name used or is there better solution?


Using Postgresql 12+

Andrus.


Re: How to pass table column values to function

2023-02-11 Thread Marcos Pegoraro
Em sáb., 11 de fev. de 2023 às 07:10, Andrus  escreveu:

> Hi!
>
> Table source contains integer column. Its values should be passed to
> function for selecting data from other table.
>
> I tried
>
> CREATE OR REPLACE FUNCTION
> public.TestAddAssetTransactions(dokumnrs int[])
>  RETURNS int AS
> $BODY$
>
> with i1 as (
> INSERT INTO bilkaib (dokumnr)
> select dokumnr  from dok where dokumnr in (select * from
> unnest(dokumnrs))
> returning *
> )
>
> select count(*) from i1;
> $BODY$ language sql;
>
> create temp table bilkaib (dokumnr int ) on commit drop;
> create temp table dok (dokumnr serial primary key ) on commit drop;
> create temp table source (dokumnr int ) on commit drop;
> insert into source values (1),(2);
>
> select TestAddAssetTransactions( (select ARRAY[dokumnr] from
> source)::int[] )
>
>
> but got error
>
> > ERROR: more than one row returned by a subquery used as an expression
>
> Probably you want an array_agg and not an array
select TestAddAssetTransactions( (select array_agg(dokumnr) from source) )


Re: How to pass table column values to function

2023-02-11 Thread jian he
On Sat, Feb 11, 2023 at 3:55 PM Andrus  wrote:

> Hi!
>
> Table source contains integer column. Its values should be passed to
> function for selecting data from other table.
>
> I tried
>
> CREATE OR REPLACE FUNCTION
> public.TestAddAssetTransactions(dokumnrs int[])
>  RETURNS int AS
> $BODY$
>
> with i1 as (
> INSERT INTO bilkaib (dokumnr)
> select dokumnr  from dok where dokumnr in (select * from
> unnest(dokumnrs))
> returning *
> )
>
> select count(*) from i1;
> $BODY$ language sql;
>
> create temp table bilkaib (dokumnr int ) on commit drop;
> create temp table dok (dokumnr serial primary key ) on commit drop;
> create temp table source (dokumnr int ) on commit drop;
> insert into source values (1),(2);
>
> select TestAddAssetTransactions( (select ARRAY[dokumnr] from
> source)::int[] )
>
>
> but got error
>
> > ERROR: more than one row returned by a subquery used as an expression
>
> How to pass set of integers to function? Should temp table with fixed name
> used or is there better solution?
>
> Using Postgresql 12+
>
> Andrus.
>
try this:
select TestAddAssetTransactions((select array(select * from source)));


Re: WHERE col = ANY($1) extended to 2 or more columns?

2023-02-11 Thread Peter J. Holzer
On 2023-02-09 18:35:42 +0100, Dominique Devienne wrote:
> On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston 
> wrote:
> 
> On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys  wrote:
> 
> > On 9 Feb 2023, at 16:41, Dominique Devienne 
> wrote:
> > Now we'd like to do the same for composite keys, and I don't know 
> how
> to do that.
> 
> This works:
> => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1,
> 'one'::text), (2, 'two'::text));
> 
> But you cannot write the right-side of the IN as a single parameter which
> seems to be the primary constraint trying to be conformed to.
> 
> 
> Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind
> the RHS (binary) array
> and do a single exec (single round-trip) to get the matching rows. AFAIK, this
> is the fastest way.

Planning time is often much less than execution time, so minimizing it
may not give you the fastest results.

For example I tried to fetch two data points from a medium sized (3 mio
rows) table ((macrobondtimeseries, date) is the primary key here):

wds=> explain (analyze) select macrobondtimeseries, date, value
from facttable_oecd_naq_2018
where (macrobondtimeseries, date) = any(
array [
( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ),
( 'naq_lux_p3s13_gpsa_a', '1961-01-01' )
]::tsd[]
);
╔╗
║   QUERY PLAN  
 ║
╟╢
║ Gather  (cost=1000.00..334945.37 rows=35242 width=34) (actual 
time=6.194..1618.968 rows=2 loops=1)
 ║
║   Workers Planned: 2  
 ║
║   Workers Launched: 2 
 ║
║   ->  Parallel Seq Scan on facttable_oecd_naq_2018  (cost=0.00..330421.17 
rows=14684 width=34) (actual time=1054.739..1589.818 rows=1 loops=3) ║
║ Filter: (ROW(macrobondtimeseries, date) = ANY 
('{"(naq_mex_b1_gi_cqr_a,2013-01-01)","(naq_lux_p3s13_gpsa_a,1961-01-01)"}'::tsd[]))
 ║
║ Rows Removed by Filter: 1178191   
 ║
║ Planning Time: 1.833 ms   
 ║
║ JIT:  
 ║
║   Functions: 12   
 ║
║   Options: Inlining false, Optimization false, Expressions true, Deforming 
true║
║   Timing: Generation 1.026 ms, Inlining 0.000 ms, Optimization 0.948 ms, 
Emission 12.613 ms, Total 14.586 ms   ║
║ Execution Time: 1619.729 ms   
 ║
╚╝
(12 rows)

wds=> explain (analyze) select macrobondtimeseries, date, value
from facttable_oecd_naq_2018
where (macrobondtimeseries, date) in
(
( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ),
( 'naq_lux_p3s13_gpsa_a', '1961-01-01' )
);
╔═══╗
║   
 QUERY PLAN 
║
╟───╢
║ Bitmap Heap Scan on facttable_oecd_naq_2018  (cost=4.13..7.17 rows=2 
width=34) (actual time=0.259..0.263 rows=2 loops=1) 
 ║
║   Recheck Cond: macrobondtimeseries)::text = 'naq_mex_b1_gi_cqr_a'::text) 
AND (date = '2013-01-01'::date)) OR (((macrobond

Re: valgrind a background worker

2023-02-11 Thread Jon Erdman
On 2/10/23 3:05 PM, Tom Lane wrote:
> Jeffrey Walton  writes:
>> On Fri, Feb 10, 2023 at 10:04 AM Tom Lane  wrote:
>>> You have to valgrind the whole cluster AFAIK.  Basically, start
>>> the postmaster under valgrind with --trace-children=yes.
>>> For leak tracking you probably also want
>>> --leak-check=full --track-origins=yes --read-var-info=yes
> 
>> One additional comment... the program in question and PostgreSQL
>> should also be built with -g -O1 per
>> https://valgrind.org/docs/manual/quick-start.html . Otherwise, there's
>> a risk the line information will not be accurate or usable.
> 
> Yeah.  Also, you need to compile Postgres with -DUSE_VALGRIND
> if you want valgrind to have any idea about palloc/pfree.

Thanks much both of you! I'll report back how it goes ;)
--
Jon Erdman (aka StuckMojo)
 PostgreSQL Zealot




Re: WHERE col = ANY($1) extended to 2 or more columns?

2023-02-11 Thread Peter J. Holzer
On 2023-02-11 16:21:49 +0100, Peter J. Holzer wrote:
> On 2023-02-09 18:35:42 +0100, Dominique Devienne wrote:
> > Right. The goal is to (re)use a prepared statement (i.e. plan once), and 
> > bind
> > the RHS (binary) array
> > and do a single exec (single round-trip) to get the matching rows. AFAIK, 
> > this
> > is the fastest way.
>
> Planning time is often much less than execution time, so minimizing it
> may not give you the fastest results.
>
> For example I tried to fetch two data points from a medium sized (3 mio
> rows) table ((macrobondtimeseries, date) is the primary key here):
>
> wds=> explain (analyze) select macrobondtimeseries, date, value
> from facttable_oecd_naq_2018
> where (macrobondtimeseries, date) = any(
> array [
> ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ),
> ( 'naq_lux_p3s13_gpsa_a', '1961-01-01' )
> ]::tsd[]
> );
[Sequential scan]
> ║ Planning Time: 1.833 ms
> ║ Execution Time: 1619.729 ms
> ╚
> (12 rows)
>
> wds=> explain (analyze) select macrobondtimeseries, date, value
> from facttable_oecd_naq_2018
> where (macrobondtimeseries, date) in
> (
> ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ),
> ( 'naq_lux_p3s13_gpsa_a', '1961-01-01' )
> );
[Bitmap index scan]
> ║ Planning Time: 1.414 ms
> ║ Execution Time: 0.330 ms
> ╚
> (10 rows)
>
> The latter is almost 1000 times faster. Saving 1.8 ms on planning time
> doesn't help you if you then waste 1.6 s on execution.
>
> So sometimes it pays off to give the planner a little bit of extra
> information to work on.

And here is a variant which does achieve a good execution
plan with an array:

wds=> explain(analyze) with a as (
select * from unnest (
array [
( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ),
( 'naq_lux_p3s13_gpsa_a', '1961-01-01' )
]::tsd[]
)
)
select macrobondtimeseries, date, value
from a natural join facttable_oecd_naq_2018;
╔═╗
║ QUERY 
PLAN  ║
╟─╢
║ Nested Loop  (cost=0.56..7.17 rows=1 width=44) (actual time=0.075..0.115 
rows=2 loops=1) 
   ║
║   ->  Function Scan on unnest  (cost=0.00..0.02 rows=2 width=36) (actual 
time=0.011..0.012 rows=2 loops=1)   
   ║
║   ->  Index Scan using facttable_oecd_naq_2018_pkey on 
facttable_oecd_naq_2018  (cost=0.56..3.57 rows=1 width=34) (actual 
time=0.045..0.045 rows=1 loops=2) ║
║ Index Cond: (((macrobondtimeseries)::text = 
(unnest.macrobondtimeseries)::text) AND (date = unnest.date))   
║
║ Planning Time: 1.061 ms   
  ║
║ Execution Time: 0.172 ms  
  ║
╚═╝
(6 rows)

(At least with these parameters on this table with this version of
PostgreSQL. Your mileage may vary, of course.)

hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: pg_trgm vs. Solr ngram

2023-02-11 Thread Christian Ramseyer (mx04)

On 10.02.23 04:48, Laurenz Albe wrote:

On Fri, 2023-02-10 at 03:20 +0100, Chris wrote:

In Solr I was using ngrams and customized the
TokenizerFactories until more or less only whitespace was as separator,
while [.:-_\d] remains part of the ngrams. This allows to search for
".12.255/32" or "xzy-eth5.example.org" without any false positives.

It looks like a straight conversion of this method is not possible 



On 10.02.23 04:48, Laurenz Albe wrote:

Here is a hack that you can try: pre-process your strings and replace
symbols with rare characters:

   SELECT show_trgm(translate('127.0.0.1/32', './', 'qx'));

   show_trgm
   ═
{"  1"," 12",0q0,0q1,127,1x3,27q,"32 ",7q0,q0q,q1x,x32}
   (1 row)

Then you could search like

   WHERE translate(search_string, './', 'qx') LIKE translate('%127.0.0.1/32%', 
'./', 'qx')
 AND search_string LIKE '%127.0.0.1/32%'

The first condition can use a trigram index, and the second filters out
false positives.



Hehe that is a nifty idea. I went to try this but then it turned out 
that I was probably overthinking the whole issue already. Using a 
gist_trgm_ops index and % as operator works perfectly well:


insert into docs (orig) values ('120.2.10.22');
insert into docs (orig) values ('120 2 10 22');
CREATE INDEX iorig ON docs USING GIST (orig gist_trgm_ops);

set enable_seqscan = off;
explain analyze verbose select * from docs where orig like '%.10.22%';

Index Scan using iorig on public.docs  (cost=0.14..8.16 rows=1 width=32) 
(actual time=0.952..1.018 rows=1 loops=1)

  Output: orig
  Index Cond: (docs.orig ~~ '%.10.22%'::text)
  Rows Removed by Index Recheck: 1

Even though this query has the same trigrams like e.g. '% 10 22%', the 
index recheck takes care of it and only the matching row is returned. 
Excellent, not quite sure why I was expecting false positives in the 
first place, it would be a pretty stark violation of how % is supposed 
to behave.


Not quite sure how big the performance hit of not having the optimal 
trigrams with punctuation in the index and rechecking some hits will be, 
but for now I'll assume it's negligible. Otherwise I'll try the 
translate variant.


Many thanks!







Queries running forever, because of wrong rowcount estimate

2023-02-11 Thread Peter


TL;DR Version:
==
For a table where all rows have been deleted, under certain conditions
the planner does assume that it contains the values that were
previousely present, and will therefore assume that nothing needs to
be added, while in fact everything needs to be added.
-
This is
PostgreSQL 12.13 on amd64-portbld-freebsd13.1, compiled by FreeBSD
clang version 13.0.0 (g...@github.com:llvm/llvm-project.git
llvmorg-13.0.0-0-gd7b669b3a303), 64-bit

and if somebody can tell me this has been fixed in a more recent
version, then I am happy.
---

More elaborate Version:
===
When you have a table populated, pg_class may look like this:

 relpages | reltuples   
   
--+---  
   
1 | 3   
   

Now when you delete all rows, this will not change, and the planner
will now make the -wrong- assumption that the tuples are still
present, and planning may be bad. That is why ANALYZE is now
needed, and after ANALYZE things look like this, and do work again:

 relpages | reltuples 
--+---
1 | 0

But, if you happen to run a VACUUM ANALYZE on that table, then things
will look like this:

 relpages | reltuples 
--+---
0 | 0

And now, for reasons unknown to me, the planner will again make the
assupltion that all the previous values are still present in the
table, and planning will be bad. And at that point you can run
ANALYZE as often as you want, it does not help anymore.
---

Full Version:
=

Lets start with the query:

INSERT INTO opcode (opcode)
SELECT DISTINCT
   incoming.opcode AS opcode
  FROM incoming AS incoming
   LEFT JOIN opcode AS opcode USING (opcode)
 WHERE opcode.opcode IS NULL

Table "incoming" will bring millions of rows, but column "opcode"
knows only three distinct values. Others might probably appear, so
this INSERT will add such new value to table "opcode".

So far, no problem.

But what happens if I delete all rows from "opcode" and start the
process afresh? Then the INSERT *should* reinsert all the (three)
values freshly into table "opcode", i.e. there will be *millions*
of rows "WHERE opcode.opcode IS NULL", before the "DISTINCT".

This is also not a problem. Not yet.

Now we decide that not all the rows in table "incoming" are already
processable, so we limit the whole operation to those that are:

INSERT INTO opcode (opcode)
SELECT DISTINCT
   incoming.opcode AS opcode
  FROM incoming AS incoming
   LEFT JOIN opcode AS opcode USING (opcode)
 WHERE opcode.opcode IS NULL
   AND EXISTS (
 SELECT ready.id
   FROM tmp_ready AS ready
  WHERE ready.id = incoming.id
 )

And now this query will run forever and never return:

"Unique  (cost=434008.43..434008.44 rows=1 width=6)"
"  ->  Sort  (cost=434008.43..434008.44 rows=1 width=6)"
"Sort Key: incoming.opcode"
"->  Nested Loop Semi Join  (cost=1058.15..434008.42 rows=1 width=6)"
"  Join Filter: (incoming.id = pairs.ori)"
"  ->  Gather  (cost=1058.15..151427.56 rows=1 width=14)"
"Workers Planned: 3"
"->  Hash Anti Join  (cost=58.15..150427.46 rows=1 
width=14)"
"  Hash Cond: (incoming.opcode = opcode.opcode)"
"  ->  Parallel Seq Scan on incoming  
(cost=0.00..147589.32 rows=741332 width=14)"
"  ->  Hash  (cost=31.40..31.40 rows=2140 width=6)"
"->  Seq Scan on opcode  (cost=0.00..31.40 
rows=2140 width=6)"
"  ->  Seq Scan on tmp_ready ready  (cost=0.00..253869.27 
rows=2296927 width=8)"

The LEFT JOIN is estimated with only one result row (we know there can
actually be millions), and therefore this goes into a Nested Loop that
will read tmp_ready forever. 

Putting an Index onto tmp_ready helps the matter - but that is not the
root cause here.
The root cause is a wrong estimate: When deleting all rows from table
"opcode", this apparently does not change the planner behaviour. The
planner still thinks that all values from the incoming.opcode column are
already in opcode.opcode.

After trying with different experiments, I finally managed to somehow
kick ANALYZE to produce working statistics, and now it looks like this:

"HashAggregate  (cost=554927.43..554927.44 rows=1 width=6)"
"  Group Key: incoming.opcode"
"  ->  Hash Semi Join  (cost=291554.87..549213.83 rows=2285442 width=6)"
"Hash Cond: (incoming.id = ready.id)"
"->  Hash Anti Join  (cost=1.01..192070.20 rows=2286638 width=14)"
"  Hash Cond: (incoming.opcode = opcode.opcode)"
"  ->  Seq S