How to pass table column values to function
Hi! Table source contains integer column. Its values should be passed to function for selecting data from other table. I tried CREATE OR REPLACE FUNCTION public.TestAddAssetTransactions(dokumnrs int[]) RETURNS int AS $BODY$ with i1 as ( INSERT INTO bilkaib (dokumnr) select dokumnr from dok where dokumnr in (select * from unnest(dokumnrs)) returning * ) select count(*) from i1; $BODY$ language sql; create temp table bilkaib (dokumnr int ) on commit drop; create temp table dok (dokumnr serial primary key ) on commit drop; create temp table source (dokumnr int ) on commit drop; insert into source values (1),(2); select TestAddAssetTransactions( (select ARRAY[dokumnr] from source)::int[] ) but got error > ERROR: more than one row returned by a subquery used as an expression How to pass set of integers to function? Should temp table with fixed name used or is there better solution? Using Postgresql 12+ Andrus.
Re: How to pass table column values to function
Em sáb., 11 de fev. de 2023 às 07:10, Andrus escreveu: > Hi! > > Table source contains integer column. Its values should be passed to > function for selecting data from other table. > > I tried > > CREATE OR REPLACE FUNCTION > public.TestAddAssetTransactions(dokumnrs int[]) > RETURNS int AS > $BODY$ > > with i1 as ( > INSERT INTO bilkaib (dokumnr) > select dokumnr from dok where dokumnr in (select * from > unnest(dokumnrs)) > returning * > ) > > select count(*) from i1; > $BODY$ language sql; > > create temp table bilkaib (dokumnr int ) on commit drop; > create temp table dok (dokumnr serial primary key ) on commit drop; > create temp table source (dokumnr int ) on commit drop; > insert into source values (1),(2); > > select TestAddAssetTransactions( (select ARRAY[dokumnr] from > source)::int[] ) > > > but got error > > > ERROR: more than one row returned by a subquery used as an expression > > Probably you want an array_agg and not an array select TestAddAssetTransactions( (select array_agg(dokumnr) from source) )
Re: How to pass table column values to function
On Sat, Feb 11, 2023 at 3:55 PM Andrus wrote: > Hi! > > Table source contains integer column. Its values should be passed to > function for selecting data from other table. > > I tried > > CREATE OR REPLACE FUNCTION > public.TestAddAssetTransactions(dokumnrs int[]) > RETURNS int AS > $BODY$ > > with i1 as ( > INSERT INTO bilkaib (dokumnr) > select dokumnr from dok where dokumnr in (select * from > unnest(dokumnrs)) > returning * > ) > > select count(*) from i1; > $BODY$ language sql; > > create temp table bilkaib (dokumnr int ) on commit drop; > create temp table dok (dokumnr serial primary key ) on commit drop; > create temp table source (dokumnr int ) on commit drop; > insert into source values (1),(2); > > select TestAddAssetTransactions( (select ARRAY[dokumnr] from > source)::int[] ) > > > but got error > > > ERROR: more than one row returned by a subquery used as an expression > > How to pass set of integers to function? Should temp table with fixed name > used or is there better solution? > > Using Postgresql 12+ > > Andrus. > try this: select TestAddAssetTransactions((select array(select * from source)));
Re: WHERE col = ANY($1) extended to 2 or more columns?
On 2023-02-09 18:35:42 +0100, Dominique Devienne wrote: > On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston > wrote: > > On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys wrote: > > > On 9 Feb 2023, at 16:41, Dominique Devienne > wrote: > > Now we'd like to do the same for composite keys, and I don't know > how > to do that. > > This works: > => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, > 'one'::text), (2, 'two'::text)); > > But you cannot write the right-side of the IN as a single parameter which > seems to be the primary constraint trying to be conformed to. > > > Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind > the RHS (binary) array > and do a single exec (single round-trip) to get the matching rows. AFAIK, this > is the fastest way. Planning time is often much less than execution time, so minimizing it may not give you the fastest results. For example I tried to fetch two data points from a medium sized (3 mio rows) table ((macrobondtimeseries, date) is the primary key here): wds=> explain (analyze) select macrobondtimeseries, date, value from facttable_oecd_naq_2018 where (macrobondtimeseries, date) = any( array [ ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ), ( 'naq_lux_p3s13_gpsa_a', '1961-01-01' ) ]::tsd[] ); ╔╗ ║ QUERY PLAN ║ ╟╢ ║ Gather (cost=1000.00..334945.37 rows=35242 width=34) (actual time=6.194..1618.968 rows=2 loops=1) ║ ║ Workers Planned: 2 ║ ║ Workers Launched: 2 ║ ║ -> Parallel Seq Scan on facttable_oecd_naq_2018 (cost=0.00..330421.17 rows=14684 width=34) (actual time=1054.739..1589.818 rows=1 loops=3) ║ ║ Filter: (ROW(macrobondtimeseries, date) = ANY ('{"(naq_mex_b1_gi_cqr_a,2013-01-01)","(naq_lux_p3s13_gpsa_a,1961-01-01)"}'::tsd[])) ║ ║ Rows Removed by Filter: 1178191 ║ ║ Planning Time: 1.833 ms ║ ║ JIT: ║ ║ Functions: 12 ║ ║ Options: Inlining false, Optimization false, Expressions true, Deforming true║ ║ Timing: Generation 1.026 ms, Inlining 0.000 ms, Optimization 0.948 ms, Emission 12.613 ms, Total 14.586 ms ║ ║ Execution Time: 1619.729 ms ║ ╚╝ (12 rows) wds=> explain (analyze) select macrobondtimeseries, date, value from facttable_oecd_naq_2018 where (macrobondtimeseries, date) in ( ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ), ( 'naq_lux_p3s13_gpsa_a', '1961-01-01' ) ); ╔═══╗ ║ QUERY PLAN ║ ╟───╢ ║ Bitmap Heap Scan on facttable_oecd_naq_2018 (cost=4.13..7.17 rows=2 width=34) (actual time=0.259..0.263 rows=2 loops=1) ║ ║ Recheck Cond: macrobondtimeseries)::text = 'naq_mex_b1_gi_cqr_a'::text) AND (date = '2013-01-01'::date)) OR (((macrobond
Re: valgrind a background worker
On 2/10/23 3:05 PM, Tom Lane wrote: > Jeffrey Walton writes: >> On Fri, Feb 10, 2023 at 10:04 AM Tom Lane wrote: >>> You have to valgrind the whole cluster AFAIK. Basically, start >>> the postmaster under valgrind with --trace-children=yes. >>> For leak tracking you probably also want >>> --leak-check=full --track-origins=yes --read-var-info=yes > >> One additional comment... the program in question and PostgreSQL >> should also be built with -g -O1 per >> https://valgrind.org/docs/manual/quick-start.html . Otherwise, there's >> a risk the line information will not be accurate or usable. > > Yeah. Also, you need to compile Postgres with -DUSE_VALGRIND > if you want valgrind to have any idea about palloc/pfree. Thanks much both of you! I'll report back how it goes ;) -- Jon Erdman (aka StuckMojo) PostgreSQL Zealot
Re: WHERE col = ANY($1) extended to 2 or more columns?
On 2023-02-11 16:21:49 +0100, Peter J. Holzer wrote: > On 2023-02-09 18:35:42 +0100, Dominique Devienne wrote: > > Right. The goal is to (re)use a prepared statement (i.e. plan once), and > > bind > > the RHS (binary) array > > and do a single exec (single round-trip) to get the matching rows. AFAIK, > > this > > is the fastest way. > > Planning time is often much less than execution time, so minimizing it > may not give you the fastest results. > > For example I tried to fetch two data points from a medium sized (3 mio > rows) table ((macrobondtimeseries, date) is the primary key here): > > wds=> explain (analyze) select macrobondtimeseries, date, value > from facttable_oecd_naq_2018 > where (macrobondtimeseries, date) = any( > array [ > ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ), > ( 'naq_lux_p3s13_gpsa_a', '1961-01-01' ) > ]::tsd[] > ); [Sequential scan] > ║ Planning Time: 1.833 ms > ║ Execution Time: 1619.729 ms > ╚ > (12 rows) > > wds=> explain (analyze) select macrobondtimeseries, date, value > from facttable_oecd_naq_2018 > where (macrobondtimeseries, date) in > ( > ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ), > ( 'naq_lux_p3s13_gpsa_a', '1961-01-01' ) > ); [Bitmap index scan] > ║ Planning Time: 1.414 ms > ║ Execution Time: 0.330 ms > ╚ > (10 rows) > > The latter is almost 1000 times faster. Saving 1.8 ms on planning time > doesn't help you if you then waste 1.6 s on execution. > > So sometimes it pays off to give the planner a little bit of extra > information to work on. And here is a variant which does achieve a good execution plan with an array: wds=> explain(analyze) with a as ( select * from unnest ( array [ ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ), ( 'naq_lux_p3s13_gpsa_a', '1961-01-01' ) ]::tsd[] ) ) select macrobondtimeseries, date, value from a natural join facttable_oecd_naq_2018; ╔═╗ ║ QUERY PLAN ║ ╟─╢ ║ Nested Loop (cost=0.56..7.17 rows=1 width=44) (actual time=0.075..0.115 rows=2 loops=1) ║ ║ -> Function Scan on unnest (cost=0.00..0.02 rows=2 width=36) (actual time=0.011..0.012 rows=2 loops=1) ║ ║ -> Index Scan using facttable_oecd_naq_2018_pkey on facttable_oecd_naq_2018 (cost=0.56..3.57 rows=1 width=34) (actual time=0.045..0.045 rows=1 loops=2) ║ ║ Index Cond: (((macrobondtimeseries)::text = (unnest.macrobondtimeseries)::text) AND (date = unnest.date)) ║ ║ Planning Time: 1.061 ms ║ ║ Execution Time: 0.172 ms ║ ╚═╝ (6 rows) (At least with these parameters on this table with this version of PostgreSQL. Your mileage may vary, of course.) hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: pg_trgm vs. Solr ngram
On 10.02.23 04:48, Laurenz Albe wrote: On Fri, 2023-02-10 at 03:20 +0100, Chris wrote: In Solr I was using ngrams and customized the TokenizerFactories until more or less only whitespace was as separator, while [.:-_\d] remains part of the ngrams. This allows to search for ".12.255/32" or "xzy-eth5.example.org" without any false positives. It looks like a straight conversion of this method is not possible On 10.02.23 04:48, Laurenz Albe wrote: Here is a hack that you can try: pre-process your strings and replace symbols with rare characters: SELECT show_trgm(translate('127.0.0.1/32', './', 'qx')); show_trgm ═ {" 1"," 12",0q0,0q1,127,1x3,27q,"32 ",7q0,q0q,q1x,x32} (1 row) Then you could search like WHERE translate(search_string, './', 'qx') LIKE translate('%127.0.0.1/32%', './', 'qx') AND search_string LIKE '%127.0.0.1/32%' The first condition can use a trigram index, and the second filters out false positives. Hehe that is a nifty idea. I went to try this but then it turned out that I was probably overthinking the whole issue already. Using a gist_trgm_ops index and % as operator works perfectly well: insert into docs (orig) values ('120.2.10.22'); insert into docs (orig) values ('120 2 10 22'); CREATE INDEX iorig ON docs USING GIST (orig gist_trgm_ops); set enable_seqscan = off; explain analyze verbose select * from docs where orig like '%.10.22%'; Index Scan using iorig on public.docs (cost=0.14..8.16 rows=1 width=32) (actual time=0.952..1.018 rows=1 loops=1) Output: orig Index Cond: (docs.orig ~~ '%.10.22%'::text) Rows Removed by Index Recheck: 1 Even though this query has the same trigrams like e.g. '% 10 22%', the index recheck takes care of it and only the matching row is returned. Excellent, not quite sure why I was expecting false positives in the first place, it would be a pretty stark violation of how % is supposed to behave. Not quite sure how big the performance hit of not having the optimal trigrams with punctuation in the index and rechecking some hits will be, but for now I'll assume it's negligible. Otherwise I'll try the translate variant. Many thanks!
Queries running forever, because of wrong rowcount estimate
TL;DR Version: == For a table where all rows have been deleted, under certain conditions the planner does assume that it contains the values that were previousely present, and will therefore assume that nothing needs to be added, while in fact everything needs to be added. - This is PostgreSQL 12.13 on amd64-portbld-freebsd13.1, compiled by FreeBSD clang version 13.0.0 (g...@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303), 64-bit and if somebody can tell me this has been fixed in a more recent version, then I am happy. --- More elaborate Version: === When you have a table populated, pg_class may look like this: relpages | reltuples --+--- 1 | 3 Now when you delete all rows, this will not change, and the planner will now make the -wrong- assumption that the tuples are still present, and planning may be bad. That is why ANALYZE is now needed, and after ANALYZE things look like this, and do work again: relpages | reltuples --+--- 1 | 0 But, if you happen to run a VACUUM ANALYZE on that table, then things will look like this: relpages | reltuples --+--- 0 | 0 And now, for reasons unknown to me, the planner will again make the assupltion that all the previous values are still present in the table, and planning will be bad. And at that point you can run ANALYZE as often as you want, it does not help anymore. --- Full Version: = Lets start with the query: INSERT INTO opcode (opcode) SELECT DISTINCT incoming.opcode AS opcode FROM incoming AS incoming LEFT JOIN opcode AS opcode USING (opcode) WHERE opcode.opcode IS NULL Table "incoming" will bring millions of rows, but column "opcode" knows only three distinct values. Others might probably appear, so this INSERT will add such new value to table "opcode". So far, no problem. But what happens if I delete all rows from "opcode" and start the process afresh? Then the INSERT *should* reinsert all the (three) values freshly into table "opcode", i.e. there will be *millions* of rows "WHERE opcode.opcode IS NULL", before the "DISTINCT". This is also not a problem. Not yet. Now we decide that not all the rows in table "incoming" are already processable, so we limit the whole operation to those that are: INSERT INTO opcode (opcode) SELECT DISTINCT incoming.opcode AS opcode FROM incoming AS incoming LEFT JOIN opcode AS opcode USING (opcode) WHERE opcode.opcode IS NULL AND EXISTS ( SELECT ready.id FROM tmp_ready AS ready WHERE ready.id = incoming.id ) And now this query will run forever and never return: "Unique (cost=434008.43..434008.44 rows=1 width=6)" " -> Sort (cost=434008.43..434008.44 rows=1 width=6)" "Sort Key: incoming.opcode" "-> Nested Loop Semi Join (cost=1058.15..434008.42 rows=1 width=6)" " Join Filter: (incoming.id = pairs.ori)" " -> Gather (cost=1058.15..151427.56 rows=1 width=14)" "Workers Planned: 3" "-> Hash Anti Join (cost=58.15..150427.46 rows=1 width=14)" " Hash Cond: (incoming.opcode = opcode.opcode)" " -> Parallel Seq Scan on incoming (cost=0.00..147589.32 rows=741332 width=14)" " -> Hash (cost=31.40..31.40 rows=2140 width=6)" "-> Seq Scan on opcode (cost=0.00..31.40 rows=2140 width=6)" " -> Seq Scan on tmp_ready ready (cost=0.00..253869.27 rows=2296927 width=8)" The LEFT JOIN is estimated with only one result row (we know there can actually be millions), and therefore this goes into a Nested Loop that will read tmp_ready forever. Putting an Index onto tmp_ready helps the matter - but that is not the root cause here. The root cause is a wrong estimate: When deleting all rows from table "opcode", this apparently does not change the planner behaviour. The planner still thinks that all values from the incoming.opcode column are already in opcode.opcode. After trying with different experiments, I finally managed to somehow kick ANALYZE to produce working statistics, and now it looks like this: "HashAggregate (cost=554927.43..554927.44 rows=1 width=6)" " Group Key: incoming.opcode" " -> Hash Semi Join (cost=291554.87..549213.83 rows=2285442 width=6)" "Hash Cond: (incoming.id = ready.id)" "-> Hash Anti Join (cost=1.01..192070.20 rows=2286638 width=14)" " Hash Cond: (incoming.opcode = opcode.opcode)" " -> Seq S