Re: Slow planning time for simple query

2018-06-17 Thread Andrew Gierth
> "Amit" == Amit Kapila  writes:

 >> Presumably the problem is that the standby isn't authorized to change
 >> the btree index's "entry is dead" bits,

 Amit> I don't see anything like that in the code. We use _bt_killitems
 Amit> to mark the items as dead and neither that function or any of its
 Amit> caller has any such assumption.

See index_fetch_heap:

/*
 * If we scanned a whole HOT chain and found only dead tuples, tell 
index
 * AM to kill its entry for that TID (this will take effect in the next
 * amgettuple call, in index_getnext_tid).  We do not do this when in
 * recovery because it may violate MVCC to do so.  See comments in
 * RelationGetIndexScan().
 */
if (!scan->xactStartedInRecovery)
scan->kill_prior_tuple = all_dead;

(this is the only place where kill_prior_tuple can be set to true)

-- 
Andrew (irc:RhodiumToad)



Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes...

2018-10-06 Thread Andrew Gierth
> "Phil" == Phil Endecott  writes:

 Phil> Next I tried
 Phil> to_tsvector('english',convert_from(body::text,'UTF-8')). That
 Phil> doesn't work because convert_from is not immutable. (This is 9.6;
 Phil> maybe that has changed.) Is there a good reason for that?

I would guess because conversions are controlled by the pg_conversion
table which can be modified by create/drop conversion.

 Phil> Maybe because I might change the client encoding?

No, because convert_from converts from the specified encoding to the
server_encoding, not the client_encoding, and the server_encoding can't
be changed except at db creation time.

 Phil> As a hack I tried ALTER FUNCTION to make it immutable,

A better approach is to wrap it in a function of your own which is
declared immutable, rather than hacking the catalogs:

create function from_utf8(bytea) returns text language plpgsql immutable
  as $$ begin return convert_from($1, 'UTF8'); end; $$;

 Phil> and now I get:

 Phil> ERROR:  invalid byte sequence for encoding "UTF8": 0x00

 Phil> Hmm, as far as I'm aware 0x00 is fine in UTF-8; what's that mean?

PG doesn't allow 0x00 in text values regardless of encoding.

 Phil> But actually I'd be more than happy to ignore invalid UTF-8 here,
 Phil> since I'm only using it for text search; there may be some truly
 Phil> invalid UTF-8 in the data. Is there a "permissive" mode for
 Phil> charset conversion?

Unfortunately not.

 Phil> (That error also suggests that the convert_from is not optimising
 Phil> the conversion from UTF-8 to UTF-8 to a no-op.)

Indeed not, because it must validate that the data really is UTF-8
before treating it as such.

 Phil> Anyway: given the problem of creating a text search index over
 Phil> bytea data that contains UTF-8 text, which may include oddities
 Phil> like null bytes, what would you do?

You can search for 0x00 in a bytea using position() or LIKE. What do you
want to do with values that contain null bytes? or values which you
think are supposed to be valid utf8 text but are not?

-- 
Andrew (irc:RhodiumToad)



Re: SELECT UNION into a text[]

2018-10-10 Thread Andrew Gierth
> "David" == David G Johnston  writes:

 >> SELECT ARRAY_AGG(hashed)
 >> FROM words_nouns
 >> WHERE added > TO_TIMESTAMP(1539100913)
 >> UNION
 >> SELECT ARRAY_AGG(hashed)
 >> FROM words_verbs
 >> WHERE added > TO_TIMESTAMP(1539100913);

 David> SELECT array_agg(words) FROM (
 David> SELECT hashed FROM words_nouns
 David> UNION ALL
 David> SELECT hashed FROM words_verbs
 David> ) AS src

Or even better,

SELECT ARRAY(SELECT hashed FROM words_nouns
 UNION
 SELECT hashed FROM words_verbs);

-- 
Andrew (irc:RhodiumToad)



Re: Measuring the Query Optimizer Effect: Turning off the QO?

2019-07-08 Thread Andrew Gierth
> "Tom" == Tom Mercha  writes:

 Tom> Hi All

 Tom> As we know, a query goes through number of stages before it is
 Tom> executed. One of these stages is query optimization (QO).

That's not really true at all. One of the stages is query _planning_,
which takes the (rewritten) query as input and converts it to something
that the executor can take action on. There isn't actually any separate
"optimization" phase.

-- 
Andrew (irc:RhodiumToad)




Re: Measuring the Query Optimizer Effect: Turning off the QO?

2019-07-08 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Two I'd particularly draw your attention to are
 Tom> join_collapse_limit and from_collapse_limit --- if you set both to
 Tom> 1, that'll effectively disable searching for a good join order,
 Tom> causing the join order to match the syntactic structure of the
 Tom> FROM clause. For instance "FROM a,b,c" will always be done by
 Tom> joining a to b first

FROM a,b,c can always be planned in any join order. If you want to force
the join order you have to set join_collapse_limit=1 AND write it as
FROM a JOIN b ON ... JOIN c ON ...

For an example, try:

explain select * from onek o1, tenk1 t, onek o2
 where o1.unique1=t.unique1 and t.unique1=o2.unique1
   and o1.unique2<10 and o2.unique2<10;

which (at least for me) joins o1 and o2 together first even with the
collapse limits set to 1.

-- 
Andrew (irc:RhodiumToad)




Re: SCRAM-SHA-256, is it possible to retrieve enough information from PG server (pg_authid etc) to perform authentication as a client

2019-08-02 Thread Andrew Gierth
> "Vladimir" == Vladimir Soldatov  writes:

 Vladimir> Hi guys,

 Vladimir> I am new to PostgreSQL, so sorry for maybe stupid question. I
 Vladimir> am working on some application implementing Frontend/Backend
 Vladimir> PG protocol and one of the goals - having only "admin" users
 Vladimir> credentials (like postgres user) be able to retrieve enough
 Vladimir> information from PG server (for example, from pg_authid
 Vladimir> table) to perform authentication for any user created in PG
 Vladimir> (without any user interaction, so we don't know the user's
 Vladimir> password).

It's an explicit goal of SCRAM to make it impossible to use the server's
stored authentication data to actually authenticate from a client
(without first breaking the preimage resistance of the hash function).

Specifically, the authentication exchange proves to the server that the
client knows key_c, but the server only stores H(key_c); the server can
validate the client message just by applying the hash function, but the
correct value of key_c can't be determined in advance on the server
without a successful preimage attack on H(key_c).

The right way to allow a privileged user to operate as if they were
someone else is to use SET ROLE or SET SESSION AUTHORIZATION rather than
actually trying to log in as the other user.

-- 
Andrew (irc:RhodiumToad)




Re: Use of ?get diagnostics'?

2019-09-21 Thread Andrew Gierth
> "Thiemo" == Thiemo Kellner  writes:

 Thiemo> Hi all
 Thiemo> I try to create a function (code at
 Thiemo> https://pastebin.com/mTs18B90)

Paste sites are for IRC, on the mailing list you should always attach
the necessary details to your message.

 Thiemo> using 'get diagnostics' to retrieve the number of affected
 Thiemo> rows. However, it throws

 Thiemo>   the following exception was thrown:
 Thiemo> SQLSTATE: 42703
 Thiemo> column "row_count" does not exist

line 44 of your paste:  V_TEXT := V_TEXT || ROW_COUNT || ' row.';

should be V_ROW_COUNT, I suspect. Likewise line 46.

(The CONTEXT lines of the error message would have identified the
offending line of the function for you.)

-- 
Andrew (irc:RhodiumToad)




Re: How to get timezone offset in timestamp with time zone AT TIME ZONE output.

2019-09-23 Thread Andrew Gierth
> "Paul" == Paul McGarry  writes:

 Paul> Hi there,
 Paul> Does anyone have a good way of doing:

 Paul> =
 Paul> select '2020-04-04 15:00:00+00'::timestamp with time zone AT TIME ZONE
 Paul> 'Australia/Sydney';
 Paul>   timezone
 Paul> -
 Paul>  2020-04-05 02:00:00

 Paul> select '2020-04-04 16:00:00+00'::timestamp with time zone AT TIME ZONE
 Paul> 'Australia/Sydney';
 Paul>   timezone
 Paul> -
 Paul>  2020-04-05 02:00:00
 Paul> =

 Paul> but with the output including the offset, eg:
 Paul> 2020-04-05 02:00:00+11
 Paul> 2020-04-05 02:00:00+10

This is ugly in some respects but minimizes the possible hazards (since
using a function-local SET clause ensures that the timezone is reset on
exit):

create function ts_to_char(t timestamptz, z text) returns text
  language plpgsql immutable
  set timezone = 'GMT'
  as $$
begin
  perform set_config('timezone', z, true);
  return t::text;
end;
$$;

select ts_to_char(timestamptz '2020-04-04 16:00:00+00', 'Australia/Sydney');
   ts_to_char   

 2020-04-05 02:00:00+10

-- 
Andrew (irc:RhodiumToad)




Re: How to get timezone offset in timestamp with time zone AT TIME ZONE output.

2019-09-23 Thread Andrew Gierth
> "Adrian" == Adrian Klaver  writes:

 Adrian> This has come up before and the general suggestion has been to
 Adrian> have a column for a naive(timestamp w/o tz) timestamp and a
 Adrian> column for the timezone.

No, that's usually (not always) backwards, and in any event wouldn't
solve this particular issue.

-- 
Andrew (irc:RhodiumToad)




Re: How to get timezone offset in timestamp with time zone AT TIME ZONE output.

2019-09-24 Thread Andrew Gierth
> "Paul" == Paul McGarry  writes:

 > On Wed, Sep 25, 2019 at 5:44 AM Adrian Klaver 
 > wrote:
 >>
 >> "Therefore whatever renders the offset needs to be capable of doing
 >> it per row, independently of the server/session time zone."

The key point here is that storing the timestamp as WITHOUT rather than
WITH time zone doesn't help you solve the problem of outputting the data
with a different time zone offset shown for different rows. (Since
timestamp without time zone values will be output without an offset,
whereas timestamp with time zone values will be output with the offset
corresponding to the server's timezone, which shouldn't be changing
globally between rows - local changes within a function are OK.)

 Paul> Andrew's function seems plausible and I need to find some time to
 Paul> test it, but I am slightly surprised there isn't a native way to
 Paul> get the output, as it seems like something that would be fairly
 Paul> useful when dealing with dates.

The set of functions for dealing with timestamps has, like most of
postgres, largely grown in an unplanned fashion and therefore often has
deficiencies. Also, the fact that we don't follow the spec's semantics
for WITH TIME ZONE (for good reason, the spec can't handle DST
boundaries or historical timezone changes _at all_) complicates choices
of functions and operators to provide.

Possible functions we could add:

  strftime('format', t [,timezone])  -- like the C function
  to_char(t, 'format', timezone) -- 3-arg version of existing to_char

You can do a getOffset(timestamptz,timezone) function like this:

create function getOffset(t timestamptz, zone text)
  returns interval
  language sql immutable
  as $$
select (t at time zone zone) - (t at time zone 'GMT');
$$;

but formatting the interval result as text is a little more challenging
due to needing explicit + signs:

create function getOffsetStr(t timestamptz, zone text)
  returns text
  language sql stable
  as $$
select regexp_replace(to_char(getOffset(t,zone), 'HH24:MI'),
  '^(?!-)', '+');
$$;

-- 
Andrew (irc:RhodiumToad)




Re: How to get timezone offset in timestamp with time zone AT TIME ZONE output.

2019-09-24 Thread Andrew Gierth
> "rob" == rob stone  writes:

 rob> You can EXTRACT timezone, timezone_hour and timezone_minute from a
 rob> timestamp. Using 'timezone' returns a value representing seconds
 rob> from UTC.

Yes, but this always outputs a value representing the server timezone;
there's no way (other than the same function set_config trick I gave
above) to make it return a value that represents a different timezone
per row.

-- 
Andrew (irc:RhodiumToad)




Re: Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Andrew Gierth
> "Alastair" == Alastair McKinley  writes:

 Alastair> Hi all,

 Alastair> I noticed this strange behaviour whilst trying to write a
 Alastair> function for Postgres 11.5 (PostgreSQL 11.5 on
 Alastair> x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623
 Alastair> (Red Hat 4.8.5-36), 64-bit) and reduced it to this minimal
 Alastair> example. Using a function parameter in the window frame
 Alastair> definition seems to be the cause of the error.

 [...]

 Alastair> This appears to be a bug to me.

Yes, it's a bug, related to function inlining (the select f(3); is not
inlined and therefore works, but the select * from f(3); is being
inlined, but the original Param is somehow making it into the final plan
rather than being substituted with its value). Looking into why.

-- 
Andrew (irc:RhodiumToad)




Re: Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> It looks to me that the reason is that query_tree_mutator
 Tom> (likewise query_tree_walker) fails to visit query->windowClause,

I noticed this too. I spent some time looking at what might break if
that was changed (found two places so far, see attached draft patch).

 Tom> which is a bug of the first magnitude if we allow those to contain
 Tom> expressions. Not sure how we've missed that up to now.

I suspect because the partition/order by expressions are actually in the
targetlist instead (with only SortGroupClause nodes in the
windowClause), so only window framing expressions are being missed.

 Tom> Looking at struct Query, it seems like that's not the only
 Tom> questionable omission. We're also not descending into

 Tom> Node   *utilityStmt;/* non-null if commandType == CMD_UTILITY 
*/

I assume that utility statements are doing any necessary expression
processing themselves...

 Tom> List   *groupClause;/* a list of SortGroupClause's */

There's at least one place that walks this (and the distinct and sort
clauses) explicitly (find_expr_references_walker) but most places just
aren't interested in SortGroupClause nodes given that the actual
expressions are elsewhere.

 Tom> List   *groupingSets;   /* a list of GroupingSet's if present */

Likewise, GroupingSet nodes are not any form of expression, they only
reference the groupClause entries. 

 Tom> List   *distinctClause; /* a list of SortGroupClause's */
 Tom> List   *sortClause; /* a list of SortGroupClause's */

Same goes as for groupClause.

 Tom> List   *rowMarks;   /* a list of RowMarkClause's */

 Tom> Now probably this is never called on utility statements, and maybe
 Tom> there is never a reason for anyone to examine or mutate
 Tom> SortGroupClauses, GroupingSets, or RowMarkClauses, but I'm not
 Tom> sure it's any business of this module to assume that.

I think the logic that query_tree_walker is specifically there to walk
places that might contain _expressions_ is reasonably valid. That said,
the fact that we do have one caller that finds it necessary to
explicitly walk some of the places that query_tree_walker omits suggests
that this decision may have been a mistake.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index dd0a7d8dac..2862c47314 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -2217,7 +2217,6 @@ find_expr_references_walker(Node *node,
 		/* query_tree_walker ignores ORDER BY etc, but we need those opers */
 		find_expr_references_walker((Node *) query->sortClause, context);
 		find_expr_references_walker((Node *) query->groupClause, context);
-		find_expr_references_walker((Node *) query->windowClause, context);
 		find_expr_references_walker((Node *) query->distinctClause, context);
 
 		/* Examine substructure of query */
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 18bd5ac903..7f485ae29a 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2292,6 +2292,8 @@ query_tree_walker(Query *query,
 		return true;
 	if (walker(query->havingQual, context))
 		return true;
+	if (walker(query->windowClause, context))
+		return true;
 	if (walker(query->limitOffset, context))
 		return true;
 	if (walker(query->limitCount, context))
@@ -3151,6 +3153,7 @@ query_tree_mutator(Query *query,
 	MUTATE(query->jointree, query->jointree, FromExpr *);
 	MUTATE(query->setOperations, query->setOperations, Node *);
 	MUTATE(query->havingQual, query->havingQual, Node *);
+	MUTATE(query->windowClause, query->windowClause, List *);
 	MUTATE(query->limitOffset, query->limitOffset, Node *);
 	MUTATE(query->limitCount, query->limitCount, Node *);
 	if (!(flags & QTW_IGNORE_CTE_SUBQUERIES))
diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c
index 31a5f702a9..dabd904999 100644
--- a/src/backend/parser/parse_collate.c
+++ b/src/backend/parser/parse_collate.c
@@ -485,6 +485,7 @@ assign_collations_walker(Node *node, assign_collations_context *context)
 		case T_FromExpr:
 		case T_OnConflictExpr:
 		case T_SortGroupClause:
+		case T_WindowClause:
 			(void) expression_tree_walker(node,
 		  assign_collations_walker,
 		  (void *) &loccontext);


Re: Possible bug: SQL function parameter in window frame definition

2019-09-28 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> However, we need to fix this in all active branches, and I
 Tom> definitely agree with minimizing the amount of change to back
 Tom> branches. The fact that the minimal change breaks (or exposes an
 Tom> oversight in) assign_collations_walker makes it very plausible
 Tom> that it will also break somebody's third-party code. If we push
 Tom> the API change further we increase the risk of breaking stuff.
 Tom> That seems OK in HEAD but not in back branches.

We could minimize the chance of breakage in a back-patched fix by having
query_tree_walker/mutator iterate the windowClause list itself and
invoke the walker only on offset expressions; is it worth it?

Walkers that follow the recommended code structure should be unaffected;
it only shows up in the collations walker because that treats
expressions as the "default" case and tries to explicitly handle all
non-expression nodes.

-- 
Andrew (irc:RhodiumToad)




Re: Possible bug: SQL function parameter in window frame definition

2019-09-29 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:

 Andrew> We could minimize the chance of breakage in a back-patched fix
 Andrew> by having query_tree_walker/mutator iterate the windowClause
 Andrew> list itself

Here is a draft patch along those lines; the intent of this one is that
no existing walker or mutator should need to change (the change to the
dependency code is basically cosmetic I believe, just avoids walking
some things twice).

Also added some tests.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index dd0a7d8dac..03582781f6 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -2214,18 +2214,13 @@ find_expr_references_walker(Node *node,
 			   context->addrs);
 		}
 
-		/* query_tree_walker ignores ORDER BY etc, but we need those opers */
-		find_expr_references_walker((Node *) query->sortClause, context);
-		find_expr_references_walker((Node *) query->groupClause, context);
-		find_expr_references_walker((Node *) query->windowClause, context);
-		find_expr_references_walker((Node *) query->distinctClause, context);
-
 		/* Examine substructure of query */
 		context->rtables = lcons(query->rtable, context->rtables);
 		result = query_tree_walker(query,
    find_expr_references_walker,
    (void *) context,
-   QTW_IGNORE_JOINALIASES);
+   QTW_IGNORE_JOINALIASES |
+   QTW_EXAMINE_SORTGROUP);
 		context->rtables = list_delete_first(context->rtables);
 		return result;
 	}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 18bd5ac903..d063bee271 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2296,6 +2296,33 @@ query_tree_walker(Query *query,
 		return true;
 	if (walker(query->limitCount, context))
 		return true;
+	if ((flags & QTW_EXAMINE_SORTGROUP))
+	{
+		if (walker((Node *) query->groupClause, context))
+			return true;
+		if (walker((Node *) query->windowClause, context))
+			return true;
+		if (walker((Node *) query->sortClause, context))
+			return true;
+		if (walker((Node *) query->distinctClause, context))
+			return true;
+	}
+	else
+	{
+		/*
+		 * We need to walk the expressions in WindowClause nodes even if we're
+		 * not interested in SortGroupClause nodes.
+		 */
+		ListCell   *lc;
+		foreach(lc, query->windowClause)
+		{
+			WindowClause *wc = lfirst_node(WindowClause, lc);
+			if (walker(wc->startOffset, context))
+return true;
+			if (walker(wc->endOffset, context))
+return true;
+		}
+	}
 	if (!(flags & QTW_IGNORE_CTE_SUBQUERIES))
 	{
 		if (walker((Node *) query->cteList, context))
@@ -3153,6 +3180,38 @@ query_tree_mutator(Query *query,
 	MUTATE(query->havingQual, query->havingQual, Node *);
 	MUTATE(query->limitOffset, query->limitOffset, Node *);
 	MUTATE(query->limitCount, query->limitCount, Node *);
+
+	if ((flags & QTW_EXAMINE_SORTGROUP))
+	{
+		MUTATE(query->groupClause, query->groupClause, List *);
+		MUTATE(query->windowClause, query->windowClause, List *);
+		MUTATE(query->sortClause, query->sortClause, List *);
+		MUTATE(query->distinctClause, query->distinctClause, List *);
+	}
+	else
+	{
+		/*
+		 * We need to mutate the expressions in WindowClause nodes even if
+		 * we're not interested in SortGroupClause nodes.
+		 */
+		List	   *resultlist;
+		ListCell   *temp;
+
+		resultlist = NIL;
+		foreach(temp, query->windowClause)
+		{
+			WindowClause *wc = lfirst_node(WindowClause, temp);
+			WindowClause *newnode;
+
+			FLATCOPY(newnode, wc, WindowClause);
+			MUTATE(newnode->startOffset, wc->startOffset, Node *);
+			MUTATE(newnode->endOffset, wc->endOffset, Node *);
+
+			resultlist = lappend(resultlist, (Node *) newnode);
+		}
+		query->windowClause = resultlist;
+	}
+
 	if (!(flags & QTW_IGNORE_CTE_SUBQUERIES))
 		MUTATE(query->cteList, query->cteList, List *);
 	else		/* else copy CTE list as-is */
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 0cb931c82c..4b5408fa9b 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -27,6 +27,7 @@
 #define QTW_EXAMINE_RTES_AFTER		0x20	/* examine RTE nodes after their
 			 * contents */
 #define QTW_DONT_COPY_QUERY			0x40	/* do not copy top Query */
+#define QTW_EXAMINE_SORTGROUP		0x80	/* include SortGroupNode lists */
 
 /* callback function for check_functions_in_node */
 typedef bool (*check_function_callback) (Oid func_id, void *context);
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index edc93d5729..d5fd4045f9 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3821,3 +3821,45 @@ SELECT i, b, bool_and(b) OVER w, bool_or(b) OVER w
  5 | t | t| t
 (5 rows)

Re: Possible bug: SQL function parameter in window frame definition

2019-09-29 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> Here is a draft patch along those lines; the intent of this one is
 >> that no existing walker or mutator should need to change (the change
 >> to the dependency code is basically cosmetic I believe, just avoids
 >> walking some things twice).

 Tom> Hmm.  I think this is a reasonable direction to go in, but
 Tom> what about groupingSets and rowMarks?

groupingSets ultimately contains nothing but numbers which are
meaningless without reference to the matching groupClause list. So
anything that cares about those is really going to have to process them
in its Query case in the walker function in order to get at both
clauses.

Similarly, rowMarks contains indexes into the rangetable (and no
recursive substructure at all), so it's likewise better processed at the
Query level.

 Tom> Also, in HEAD I'd be inclined to add assertions about utilityStmt
 Tom> being NULL.

Yup.

-- 
Andrew (irc:RhodiumToad)




Re: Arrays and ANY problem

2019-09-30 Thread Andrew Gierth
> "David" == David Salisbury  writes:

 David> I didn't specify the real problem as it's all wrapped up in
 David> layers and I didn't want to post a "can someone write the query
 David> for me". The real problem was I have a table with a string
 David> holding comma separated numbers, and needed to go to a lookup
 David> table and replace each of those numbers with it's correlated
 David> value. So '12,2,10' gets converted to 'twelve,two,ten'.

 David> Tom's "I'd suggest that making his sub-select return a rowset
 David> result rather than an array" was spot on and lead me to
 David> "unnest". For my posted problem this was the simple solution.
 David> Sorry to narrow things down to my specific array method.

 David> select name from table_name_ds_tmp where categoryid = ANY ( select
 David> unnest(string_to_array( '200,400', ',')::bigint[]) );

This won't necessarily preserve the order of elements (it might
sometimes look like it does, but it's fooling you). It also won't handle
duplicate numbers.

The right solution that does preserve order would be:

select name
  from unnest(string_to_array( '200,400', ',')::bigint[])
 with ordinality as u(id,ord)
   join table_name_ds_tmp t on (t.category_id=u.id)
 order by u.ord;

(wrap an ARRAY( ) around that if you need the result as a single array
rather than as rows, or use string_agg(name, ',' order by ord) if you
want a comma-separated string result)

regexp_split_to_table might be a better method than
unnest/string_to_array.

-- 
Andrew (irc:RhodiumToad)




Re: JSON vs. JSONB storage size

2019-10-11 Thread Andrew Gierth
> "Thomas" == Thomas Kellerer  writes:

 Thomas> The table size with jsonb was bigger in general, but the one
 Thomas> with the "integer" value was even bigger than the one with the
 Thomas> "string" storage.

jsonb stores numeric values as "numeric", not as integers or floats, so
the storage needed will depend on the number of decimal digits.

The size results you're seeing are mainly the consequence of the fact
that jsonb stores the whole Numeric datum, varlena header included (and
without packing the header), so there's an extra 4 bytes you might not
have accounted for: 1234567890 is three numeric "digits" (2 bytes each)
plus a 2 byte numeric header (for weight/scale/sign) plus the 4 byte
varlena header, for 12 bytes total, whereas "1234567890" takes only 10
(since the length is encoded in the jsonb value offsets). Furthermore,
there may be up to 3 padding bytes before the numeric value.

I think in your test, the extra 3 bytes is pushing the size of a single
row up to the next multiple of MAXALIGN, so you're getting slightly
fewer rows per page. I don't know what Windows is doing, but on my
system (freebsd amd64) I get 136 rows/page vs. 120 rows/page, which
would make a million rows take 57MB or 65MB. (Your use of
pg_total_relation_size is including the pkey index, which confuses the
results a bit.)

-- 
Andrew (irc:RhodiumToad)




Re: day interval

2019-10-12 Thread Andrew Gierth
> "Abraham" == Abraham, Danny  writes:

 Abraham> Hi
 Abraham> A question on day interval

 Abraham> select date('20191001') - date('20190923');

 Abraham> Will provide sometimes '8' - an integer , but sometimes '8
 Abraham> day' - a string

No, it'll always return an integer. You will only get an interval result
if you subtract timestamps rather than dates, for example if one of the
operands is actually an expression returning a timestamp.

Give an example of an actual expression you used that returned an
interval instead, and we may be able to tell you how to fix it.

-- 
Andrew (irc:RhodiumToad)




Re: day interval

2019-10-12 Thread Andrew Gierth
> "Abraham" == Abraham, Danny  writes:

 Abraham> The problematic code is:
 Abraham> select date(cm_curr_date) - date(CM_DATE) into diff_days from 
CMS_SYSPRM;

This will always return an integer, unless either the date() cast or the
-(date,date) operator have been redefined or modified.

 Abraham> The fix is:
 Abraham> select date_part ('day', age( date(cm_curr_date), date(CM_DATE))) 
into diff_days from CMS_SYSPRM;

This doesn't do the same thing, it will give a different result if the
dates differ by a month or more.

 Abraham> The problem:
 Abraham> How to recreate the problem.  (You know - QA).

 Abraham> Tried changing lc_time, timezone and datestyle .. but nothing
 Abraham> seems to work

None of these things can affect data types.

-- 
Andrew (irc:RhodiumToad)




Re: day interval

2019-10-12 Thread Andrew Gierth
> "Abraham" == Abraham, Danny  writes:

 Abraham> Thanks for the clarification.
 Abraham> The problem is still this:
 Abraham> select date('20191001') - date('20190101') ;
 Abraham> in my servers it is always '273'.
 Abraham> In the customer's DB it is '273 days';

Then you need to establish why that is.

For example, try these in psql on the customer's db and show us the
outputs:

\dT *.date
\df *.date

select castsource::regtype,
   casttarget::regtype,
   castfunc::regprocedure,
   castcontext,
   castmethod
  from pg_cast c join pg_type t on (casttarget=t.oid)
 where typname='date';

select oprresult::regtype   
  from pg_operator
   join pg_type t1 on (t1.oid=oprleft)
   join pg_type t2 on (t2.oid=oprright)
 where oprname='-' and t1.typname='date' and t2.typname='date';

-- 
Andrew (irc:RhodiumToad)




Re: SELECT d02name::bytea FROM ... && DBI::Pg

2019-10-12 Thread Andrew Gierth
> "Matthias" == Matthias Apitz  writes:

 Matthias> Hello,

 Matthias> I can SELECT && print a column in hex with:

 Matthias> pos71=# select d02name::bytea from d02ben where d02bnr = 
'1048313' ;

If your intention is to see the raw bytes of the stored text value, for
example to check the encoding, then you should not do it that way.
Casting from text to bytea does NOT do what you think.

Instead use  convert_to(d02name,'SQL_ASCII')  (yes, always specify
SQL_ASCII regardless of what encoding you think it should be in, since
the actual meaning of SQL_ASCII is "no conversions"). For readability,
you may then want to wrap that as
encode(convert_to(d02name,'SQL_ASCII'),'escape')  which will keep the
ASCII characters but use \nnn escapes for non-ascii.

-- 
Andrew (irc:RhodiumToad)




Re: Inserting multiple rows wtih a SELECt in the values clause

2019-10-15 Thread Andrew Gierth
> "stan" == stan   writes:

 stan> I suspect this may be because the SELECT in the values clause
 stan> returns multiple rows?

Understand this: VALUES is really just a special form of SELECT that
returns only the specific rows that you tell it to construct. Every
single row returned by a VALUES clause is separately constructed.

i.e. VALUES (...),(...);  will return exactly two rows regardless of
what is inside the (...). VALUES (...);  is always exactly one row.
And so on.

The general form of INSERT is actually:

INSERT INTO table(columns) 

where  is any valid query returning any number of rows. The use
of VALUES for the  is just a convenient shorthand for cases where
the exact number of rows to be inserted, and their content, is known in
advance.

So, if you're inserting some set of rows generated from a query, the
word VALUES should not appear in the top-level statement. What you want
is:

INSERT INTO rate(employee_key, project_key, work_type_key, rate)
SELECT employee.employee_key,
   project.project_key,
   work_type.work_type_key,
   1 as rate
  FROM employee 
 CROSS JOIN project 
 CROSS JOIN work_type;

-- 
Andrew (irc:RhodiumToad)




Re: here does postgres take its timezone information from?

2019-11-17 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >>> Ugh. It doesn't have the old backward compatibility names like
 >>> US/Pacific installed by default, which is a problem if that's what
 >>> initdb picked for your cluster (or you've stored references to any
 >>> of those names in other ways).

 >> One quick fix is to revert the change. Tom thinks this is not reason
 >> to revert. Would it be enough to edit the postgresql.conf to use the
 >> correct "modern" name for US/Pacific (PST?)? In rhar case, an update
 >> note might be sufficient?

 Tom> I think the "official" name of that zone is America/Los_Angeles.
 Tom> But initdb might seize on the US/Pacific alias, if available,

And now you know why I have been saying for so many years that initdb
should use the official names!

-- 
Andrew (irc:RhodiumToad)




Re: here does postgres take its timezone information from?

2019-11-17 Thread Andrew Gierth
>>>>> "Tom" == Tom Lane  writes:

 Tom> Andrew Gierth  writes:
 Tom> I think the "official" name of that zone is America/Los_Angeles.
 Tom> But initdb might seize on the US/Pacific alias, if available,

 >> And now you know why I have been saying for so many years that initdb
 >> should use the official names!

 Tom> [ shrug... ] The installed format doesn't provide any way to
 Tom> distinguish which are the "official" names. They're typically all
 Tom> hardlinks to the same file.

zone.tab / zone1970.tab. Which I first pointed out before 8.2 came
out...

-- 
Andrew.




Re: Encoding/collation question

2019-12-11 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich> I doubt that my use will notice meaningful differences. Since
 Rich> there are only two or three databases in UTF8 and its collation
 Rich> perhaps I'll convert those to LATIN1 and C.

Note that it's perfectly fine to use UTF8 encoding and C collation (this
has the effect of sorting strings in Unicode codepoint order); this is
as fast for comparisons as LATIN1/C is.

For those cases where you need data to be sorted in a
culturally-meaningful order rather than in codepoint order, you can set
collations on specific columns or in individual queries.

-- 
Andrew (irc:RhodiumToad)




Re: Max locks

2019-12-19 Thread Andrew Gierth
> "Peter" == Peter Eisentraut  writes:

 Peter> max_locks_per_transactions only affects relation locks (also
 Peter> known as heavy weight locks), but pg_locks also shows other
 Peter> kinds of locks.

pg_locks shows exactly two types of locks: "heavy" locks (which are not
merely relation locks but also object, tuple, extension, transaction and
advisory locks), and predicate locks (SIReadLock) which are limited by a
separate configuration parameter (and are not relevant in this case
based on the prior IRC discussion).

 Peter> Filter by locktype = 'relation' to get the appropriate view.

This is incorrect (some predicate locks may be tagged 'relation' as
well, and some heavy locks will not be relation locks).

There's also the question of fastpath locks, but I believe there can
only be a small number of these (16?) per backend, so that wouldn't
account for this.

I think what's going on is that the max size of the lock hashtable isn't
strictly enforced; it'll add enough memory space for the configured
number of locks to the total size of the shared memory segment, but it
won't actually report an error until shared memory is actually
exhausted, and it's possible that there may be unused space.
(Performance may degrade if there are more locks than the configured
maximum, because the hash table will have been sized for that maximum
and can't be grown.) See comments for ShmemInitHash.

-- 
Andrew (irc:RhodiumToad)




Re: Are PostgreSQL functions that return sets or tables evaluated lazily or eagerly?

2020-01-03 Thread Andrew Gierth
> "Gerald" == Gerald Britton  writes:

 Gerald> That leads me to ask:

 Gerald> If (and under what circumstances) PostgreSQL evaluates
 Gerald> functions lazily (returning rows as requested by the caller) or
 Gerald> eagerly (evaluation all rows before returning the first one)?

This isn't trivial to answer because it depends significantly on the
language the function is written in and how it was called. The first
matching rule below controls what happens.


1. LANGUAGE SQL with inlining

Table functions in language SQL are candidates for inlining, see
https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions

If an SQL-language function is inlined, then it behaves exactly as
though the function body was written inline, which means it is evaluated
as lazily as the query plan otherwise permits (for example, if there's a
non-indexable ORDER BY clause, then clearly all the values have to be
fetched before any are returned).


2. Table function called in the FROM clause

Table functions in the FROM clause, e.g. SELECT ... FROM myfunc();
are always evaluated eagerly.


3. LANGUAGE SQL without inlining, in the select-list

If the final statement of an SQL function is a plain select with no
wCTEs, then it is evaluated lazily: the first fetch will execute
everything up to the first row of the final select, and subsequently one
row will be fetched at a time. If the final statement is a DML statement
with a RETURNING clause, or contains wCTEs, then it is evaluated
eagerly.


4. LANGUAGE C / INTERNAL

C-language functions (and therefore internal functions too) can choose
whether to use value-per-call mode or materialize mode. Materialize mode
is always "eager", but value-per-call mode is sometimes still eager (as
in case 2 above); it can only be lazy if no preceding rule forced it to
be otherwise.

Most built-in table functions use value-per-call mode (a notable
exception being the functions in the tablefunc module).


5. LANGUAGE PLPGSQL, PLPERL, PLTCL

Plpgsql, plperl, and pltcl functions are always evaluated eagerly.


6. LANGUAGE plpython

Plpython functions that return an iterator run in value-per-call mode,
with a "next" call on the iterator for each row. To what extent this is
a lazy or eager evaluation depends on the python code.


7. Other PL languages

For non-core PL languages the documentation or source code may indicate
whether the language uses materialize mode or value-per-call mode. (Most
languages are probably not well equipped to do value-per-call mode. One
that does allow it is pl/lua, which runs table functions as coroutines.)


-- 
Andrew (irc:RhodiumToad)




Re: When should parameters be passed as text v. binary?

2020-01-04 Thread Andrew Gierth
> "Paula" == Paula Kirsch  writes:

 Paula> I'm just trying to understand the trade-offs between sending
 Paula> everything always as text, all integer parameters as binary,
 Paula> floats as binary, etc.

For passing data from client to server, there's no particular reason not
to use the binary format for any data type that you understand (and
where you're passing the data type oid explicitly in the query, rather
than just leaving it as unknown).

For results, things are harder, because libpq is currently
all-or-nothing about result type formats, and if you start using
extension types then not all of them even _have_ a binary format. And to
decode a binary result you need to know the type, and have code to
handle every specific type's binary format.

-- 
Andrew (irc:RhodiumToad)




Re: \COPY to accept non UTF-8 chars in CHAR columns

2020-03-27 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 Thomas> Something like this approach might be useful for fixing the CSV file:

 Thomas> 
https://codereview.stackexchange.com/questions/185821/convert-a-mix-of-latin-1-and-utf-8-to-proper-utf-8

Or:

perl -MEncode -pe '
 use bytes;
 sub c { decode("UTF-8",shift,sub { decode("windows-1252", chr(shift)) }); }
 s/([\x80-\xFF]+)/encode("UTF-8",c($1))/eg' outfile

-- 
Andrew (irc:RhodiumToad)




Re: \COPY to accept non UTF-8 chars in CHAR columns

2020-03-27 Thread Andrew Gierth
> "Rory" == Rory Campbell-Lange  writes:

 Rory> Or:

 Rory> iconv -f WINDOWS-1252 -t UTF-8 -c < tempfile2 > tempfile3

No. That's just a conversion of win1252 to utf8 without regard for any
UTF8 that might already be present in the input. Any such input will end
up double-encoded, requiring further work to fix.

-- 
Andrew (irc:RhodiumToad)




Re: \COPY to accept non UTF-8 chars in CHAR columns

2020-03-28 Thread Andrew Gierth
> "Matthias" == Matthias Apitz  writes:

 Matthias>   i.e. 0xc3 is translated to 0xc383 and the 2nd half, the
 Matthias>   0xbc to 0xc2bc, both translations have nothing to do with
 Matthias>   the original split 0xc3bc, and perhaps in this case it
 Matthias>   would be better to spill out a blank 0x40 for each of the
 Matthias>   bytes which formed the 0xc3bc.

If the only malformed sequences are there as a result of splitting up
valid sequences, then you could do something like convert all invalid
sequences to (sequences of) noncharacters, then once the data is
imported, fix it up by adjusting how the data is split and regenerating
the correct sequence (assuming your application allows this).

For example you could encode an arbitrary byte xy as a sequence of two
codepoints U+FDDx U+FDEy (the range FDD0-FDEF are all defined as
noncharacters).

-- 
Andrew (irc:RhodiumToad)




Re: Compiling C Extension Functions against PostgreSQL 12

2020-05-03 Thread Andrew Gierth
> "TalGloz" == TalGloz   writes:

 TalGloz> Which is good, my seal_mean_cxx_v2.so being created and copied
 TalGloz> to /usr/pgsql-12/lib/. But right after that I get this and it
 TalGloz> doesn't seem to effect my seal_mean_cxx_v2.so library:

What's happening here is that it's attempting to build LLVM bitcode
files for your library for the benefit of the JIT compiler that exists
in recent postgres versions - without these files, your extension
functions cannot be inlined into generated code.

This requires that your code be compilable using clang (as well as gcc
if that's what was used to build PG itself), and there's clearly some
disagreement going on between clang and your system header files that's
causing the failure.

I didn't see an easy way of disabling bitcode emission for a module,
though I think that has been discussed before.

-- 
Andrew (irc:RhodiumToad)




Re: How to declare PG version for compiling extensions.

2018-10-23 Thread Andrew Gierth
> "GPT" == GPT   writes:

 GPT> Unfortunately, I had not installed the following package:
 GPT> "postgresql-server-dev-11"

 GPT> By the way, shouldn't a warning message appear while trying to run:

 GPT> `PATH=/usr/lib/postgresql/11/bin:$PATH make USE_PGXS=1`

 GPT> warning the user that some files are missing.

It's more reliable to do (if the makefile is correctly written):

make USE_PGXS=1 PG_CONFIG=/usr/lib/postgresql/11/bin/pg_config

By specifying the pg_config binary explicitly rather than relying on the
PATH, you avoid the chance of picking up an incorrect copy by mistake.
This is why the standard form for pgxs makefiles has an assignment for
PG_CONFIG before the PGXS assignment line; the command-line option
overrides it.

-- 
Andrew (irc:RhodiumToad)



Re: Shell Command within function

2018-10-26 Thread Andrew Gierth
> "Mike" == Mike Martin  writes:

 Mike> Is this possible?

In an untrusted pl language, yes, but it's a bad idea because:

 Mike> I have a script which imports csvlogs into a table, and it would
 Mike> be useful to truncate the log files after import

If you had an error after truncating the file but before committing the
transaction, then the imported data would be gone, but the file would be
gone too, so you'd have lost it.

Things like this need to be handled _after_ commit, not before, so a
function is the wrong place for it.

-- 
Andrew (irc:RhodiumToad)



Re: Broken postgres links need to find callers

2018-10-31 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich>   I managed to mess up postgresql-10.3 on this Slackware-14.2
 Rich> desktop server/workstation. It worked OK until I tried adding
 Rich> access to an another application.

 Rich> waiting for server to start2018-10-31 10:02:01.312 PDT [1285] FATAL:
 Rich> could not open directory "/usr/share/postgresql-10.2/timezonesets": No 
such
 Rich> file or directory 2018-10-31 10:02:01.312 PDT [1285] HINT: This may 
indicate
 Rich> an incomplete PostgreSQL installation, or that the file
 Rich> "/usr/lib/postgresql/10.3/bin/postgres" has been moved away from its 
proper
 Rich> location.

Is there a pg_config binary in /usr/lib/postgresql/10.3/bin/ and if so,
what is the output of  /usr/lib/postgresql/10.3/bin/pg_config --sharedir

Also what is the output of  /usr/lib/postgresql/10.3/bin/postgres -V

The most plausible explanation I can see for what you're seeing there is
that what you have as /usr/lib/postgresql/10.3/bin/postgres is not
actually the 10.3 binary but rather the 10.2 one. There should be no
symlinks involved there - the path that is reported in the error message
is the one that the postgres binary actually did try to open.

-- 
Andrew (irc:RhodiumToad)



Re: Broken postgres links need to find callers

2018-10-31 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich>   Yes, pg_config is present but pointing to the wrong directory:
 Rich> # /usr/lib/postgresql/10.3/bin/pg_config --sharedir
 Rich> /usr/share/postgresql-10.2

What this says is that you somehow have a pg 10.3 binary which has been
compiled with ./configure --datadir=/usr/share/postgresql-10.2

which seems, to say the least, somewhat odd.

pg_config isn't used by the postgres binary to find paths, so "fixing"
it wouldn't help. The same paths that were compiled into pg_config are
compiled into the postgres binary, and pg_config and postgres contain
the same relocation logic.

-- 
Andrew (irc:RhodiumToad)



Re: recursion in plpgsql

2018-11-07 Thread Andrew Gierth
> "David" == David Gauthier  writes:

 David> Tom: I seem to remember (but am probably wrong) that cursors
 David> were locally scoped (or could be made so). This was several
 David> years ago with an earlier v8 version. Was that sort of thing
 David> around back then ?

There are two distinct objects here being called "cursor": one is the
plpgsql variable, which is locally scoped, and the other is the actual
open portal, which must have a unique name within the session.

By default, plpgsql explicit bound cursors (but not plain "refcursor"
variables) take their portal name from the plpgsql variable name, and
hence don't work recursively by default. This is a convenience so that
code outside the function can use the same name to refer to the open
portal.

However, plpgsql cursor variables (whether declared bound or unbound)
can be assigned a text value or NULL _before_ being opened, and if so,
that value will be used for the portal name, or if NULL, a name of
"" will be uniquely generated. (_After_ the open, the
variable's text value is the actually assigned portal name.) Unbound
refcursor variables default to NULL, so they are assigned unique portal
names on opening.

So in your example, adding

child_node_curr := NULL;

immediately before the OPEN statement should be sufficient.

-- 
Andrew (irc:RhodiumToad)



Re: Largest & Smallest Functions

2018-11-07 Thread Andrew Gierth
> "Ken" == Ken Tanzer  writes:

 Ken> Hi. Building on the [type]_larger and _smaller functions (and
 Ken> lifting from the documentation), I put together a couple of
 Ken> functions that will take any number of arguments:

 Ken> CREATE FUNCTION largest(VARIADIC anyarray) RETURNS anyelement AS
 Ken> $$ SELECT max($1[i]) FROM generate_subscripts($1, 1) g(i); $$
 Ken> LANGUAGE SQL IMMUTABLE;

 Ken> So far so good. I can do smallest(2,4,7), etc. But to take this a
 Ken> convenient step further, sometimes I want the smallest or largest
 Ken> from values already in an array. So I can create these functions:

But you don't need to create more functions, because you can do this:

select largest(variadic array[1,2,3]);
 largest 
-
   3

 Ken> So here's my questions:

 Ken> 1) Is there any way to collapse those four functions into two? (Or
 Ken> is there a better way to go about this?)

See above

 Ken> 2) Is there any particular reason functions like that aren't built
 Ken> into Postgres? They seem like they would be useful. (Or maybe I
 Ken> missed them?)

As already pointed out, greatest() and least() exist (though they were
added before VARIADIC was, so they don't use it)

 Ken> 3) Bonus question--how come all the existing _larger and _smaller
 Ken> functions are specific to each data type, as opposed to more
 Ken> general smaller/larger functions?

Because it saves looking up the type comparison function and doing an
indirect call.

-- 
Andrew (irc:RhodiumToad)



Re: Largest & Smallest Functions

2018-11-07 Thread Andrew Gierth
> "Pavel" == Pavel Stehule  writes:

 Pavel> The variadic parameters should not be a arrays - can be of "any"
 Pavel> type. But this functionality is available only for C language
 Pavel> functions.

You mean (VARIADIC "any")?  - that is not actually restricted to C
language functions, any pl/* handler can choose to support it (it's just
that all the built-in ones don't).

-- 
Andrew (irc:RhodiumToad)



Re: Full list of operations that constitute a "maintenance" operation?

2018-11-09 Thread Andrew Gierth
> "Lance" == Lance Luvaul  writes:

 Lance> Hi all, I've read on the Postgres documentation for
 Lance> 'maintenance_work_mem' that VACUUM, CREATE INDEX, and ALTER
 Lance> TABLE ADD FOREIGN KEY are considered maintenance operations, but
 Lance> are there others? For example I use ALTER TABLE ADD COLUMN and
 Lance> ALTER TABLE SET LOGGED in my scripts... are they maintenance
 Lance> operations that would cause a maintenance_work_mem-sized chunk
 Lance> of memory (or more than 1 such chunk) to be allocated?

There are essentially two classes of operations that use
maintenance_work_mem:

1. Specific operations: non-full VACUUM (for the deleted tid list),
CLUSTER (for sorting the table content), ALTER ... ADD FOREIGN KEY (for
the validation query, which might want to use sorts or a hashjoin).
(It's not impossible, though it should be rare, for an FK validation to
use two chunks of maintenance_work_mem - with the right table sizes and
data types, the validation query could plan as a merge anti-join with
explicit sorts on both sides.)

2. Anything that (re)builds an index for any reason. This includes a
large set of operations: CREATE INDEX and REINDEX are obvious, likewise
VACUUM FULL and CLUSTER, but also any form of ALTER TABLE that rewrites
the heap (which _includes_ SET LOGGED, but does not include adding a
column with no default, or in pg11+ adding a column with a default). It
also includes non-concurrent refresh of a materialized view. (TRUNCATE
also technically rewrites the table heap, but since the new heap is
empty, memory consumption during reindex is not an issue.) ALTER TABLE
to add a unique, primary key or exclusion constraint also creates an
index to implement the constraint with (unless USING INDEX was used to
adopt an existing index) so that counts too.

-- 
Andrew (irc:RhodiumToad)



Re: query patterns for multipass aggregating

2018-11-11 Thread Andrew Gierth
> "Rob" == Rob Nikander  writes:

 Rob> I want a query to list items, with their colors and images. Each
 Rob> result row is an item, so the colors and images must be aggregated
 Rob> into arrays or json.

 Rob> If there were only one other table, it’s a simple join and group…

 Rob> select items.*, array_agg(color_name)
 Rob> from items join colors on items.id = colors.item_id
 Rob> group by items.id

Method 1:

select items.*, c.colors, i.images
  from items
   left join (select item_id, array_agg(color_name) as colors
from colors
   group by item_id) c
 on c.item_id=items.id
   left join (select item_id, array_agg(image) as images
from images
   group by item_id) i
 on i.item_id=items.id;

Method 2:

select items.*, c.colors, i.images
  from items
   left join lateral (select array_agg(c0.color_name) as colors
from colors c0
   where c0.item_id=items.id) c
 on true
   left join lateral (select array_agg(i0.image) as images
from images i0
   where i0.item_id=items.id) i
 on true;

Unfortunately, the planner isn't smart enough yet to know that these two
are equivalent, so they generate different sets of possible query plans.
Method 1 gets plans that work well if the entire items table is being
selected, since it will read the whole of the images and colors tables
in one go, and it will also get plans that work well for reading a
_single_ item selected by WHERE items.id=? because equivalence-class
processing will propagate a copy of that condition down to below the
grouping clauses. It will _not_ get a good plan for reading any other
small subset of items (e.g. selected by other criteria); for this you
need method 2, which in turn doesn't get very good plans when you fetch
the whole items table.

Don't be tempted to use CTEs for the subqueries in either plan; that'll
only make it much worse.

-- 
Andrew (irc:RhodiumToad)



Re: query patterns for multipass aggregating

2018-11-11 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:

 Andrew> Unfortunately, the planner isn't smart enough yet to know that
 Andrew> these two are equivalent,

oops, I edited the second one before posting in a way that made them not
be equivalent: adding a "group by x0.item_id" in both subqueries in
method 2 makes them equivalent again. Without that, the result differs
slightly if there are no matching color or image rows (NULL vs. empty
array).

-- 
Andrew (irc:RhodiumToad)



Re: query patterns for multipass aggregating

2018-11-11 Thread Andrew Gierth
> "Ondřej" == Ondřej Bouda  writes:

 Ondřej> What about subqueries?

 Ondřej> SELECT
 Ondřej> items.*,
 Ondřej> (SELECT array_agg(color_name) FROM colors WHERE item_id =
 Ondřej> items.id) AS color_names,
 Ondřej> (SELECT array_agg(image_file) FROM images WHERE item_id =
 Ondřej> items.id) AS image_files
 Ondřej> FROM items

 Ondřej> According to my experience, not only the code is readable (no
 Ondřej> questions about how many rows are there for each item), but it
 Ondřej> also leads to a better query plan.

This is (generally speaking) no improvement over the LATERAL method I
showed, and is less flexible (for example it's very painful to return
more than one value from the subqueries).

-- 
Andrew (irc:RhodiumToad)



Re: Move cluster to new host, upgraded version

2018-11-11 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich> My current desktop server/workstation is running version 10.5.
 Rich> I'm configuring a replacement desktop and have installed version
 Rich> 11.1 on it. To copy all databases from the 10.5 version to the
 Rich> 11.1 version I assume that I should do a pg_dumpall on the
 Rich> current host and read in that file on the replacement host. Is
 Rich> this the proper procedure?

The most reliable and preferred procedure is to use the _new_ version's
pg_dumpall, for example by allowing access to the old host from the new
one (possibly using an ssh port forward), or (on OSes that make it easy
to do package installs of multiple versions) to install the new pg_dump
and pg_dumpall on the old system.

Using the old version's pg_dumpall also generally speaking works, but
there may be occasional rough edges.

-- 
Andrew (irc:RhodiumToad)



Re: Debian : No echo after pg_dump | psql

2018-11-29 Thread Andrew Gierth
> "Moreno" == Moreno Andreo  writes:

 Moreno> The command I'm using is
 Moreno> root@x:~# pg_dump -v -C -h  -p 6543 -U postgres
 Moreno>  | psql -h localhost -p 6543 -U postgres 
 Moreno> It presents a double password prompt after I run it:
 Moreno> Password: Password for user postgres:

This is going to prompt once for the remote host's password and twice
for the local one (because -C), and the concurrently-running commands
are going to be fighting over access to the terminal to do it. Best
avoided by using pgpass or non-password-based auth methods.

More seriously, you're misunderstanding how -C works. When you use -C,
the database you specify to psql (or pg_restore) is NOT the database
you're restoring into - the restored db will ALWAYS have the same name
as it had when dumped (if that's not what you want then don't use -C).
Instead, the database you specify to psql or pg_restore is the database
to connect to to issue the CREATE DATABASE command, which should usually
be 'postgres'. That explains this bit:

 Moreno> If I create database (just database, not schema) on target
 Moreno> machine, I receive the error "database  already exists" but
 Moreno> the dump goes on If I don't create it, I receive the error
 Moreno> "database  does not exist" and processing aborts.

-- 
Andrew (irc:RhodiumToad)



Re: Transition Tables doesn´t have OID

2018-12-01 Thread Andrew Gierth
> "PegoraroF10" == PegoraroF10   writes:

 PegoraroF10> I would like to user OID value because we change our
 PegoraroF10> primary keys, sometimes, OID doesn´t.

"oid" as a special system column and the WITH OIDS option when creating
tables are being removed in pg12 (having been deprecated for something
like 12 years now), you might want to start thinking about alternatives.

 PegoraroF10> So, there is a way to get OID on transition tables ?

Well, arguably it is an oversight in the implementation of transition
tables that they were not added to the various places in the parser that
treat "oid" as a system column name. However, I not sure that you'll get
any agreement to fix that in light of the demise of "oid" as mentioned
above. (Not least because backpatching it could be dangerous in that it
could break queries that now work, by making "oid" an ambiguous column
reference.)

I tried various workarounds, but they were defeated by the fact that
evaluation of a whole-row Var does not copy the oid value (if any).
(I'm not sure if this ever really worked, so calling it a bug may be a
bit of a stretch.)

-- 
Andrew (irc:RhodiumToad)



Re: using a function in where

2018-12-02 Thread Andrew Gierth
> "Glenn" == Glenn Schultz  writes:

 Glenn> All,
 Glenn> I am using the function below to convert a continuous variable
 Glenn> to a binned value. Sometimes a value other than zero is passed
 Glenn> through the query. For example -.5 result value is passed to the
 Glenn> query result. The basic of the query is below.

 Glenn> select
 Glenn> incentivebin(wac, rate, .25)
 Glenn> from
 Glenn> my_table
 Glenn> where incentivebin(was, rate, .25) = 0

Those two calls to incentivebin() have different parameters - is that an
error in the original query, or a typo in transcribing it to email?

-- 
Andrew (irc:RhodiumToad)



Re: Query never completes with an OR condition

2018-12-03 Thread Andrew Gierth
> "Kenneth" == Kenneth Marshall  writes:

 Kenneth> The individual queries run as expected, but when the OR
 Kenneth> condition is added, it never finishes.

http://blog.rhodiumtoad.org.uk/2017/01/22/performance-issues-with-ored-conditions/

-- 
Andrew (irc:RhodiumToad)



Re: != ANY(array) does not behave as expected

2018-12-07 Thread Andrew Gierth
> "Chris" == Chris Wilson  writes:

 Chris> However, if we try to invert it by using the != operator, then
 Chris> we get unexpected results:

Mr. De Morgan would like a word.
https://en.wikipedia.org/wiki/De_Morgan%27s_laws

In short, if you have a condition of the form (a OR b) and you want to
negate it, then you find that:

NOT (a OR b)  is equivalent to  (NOT a) AND (NOT b)

Since  x = ANY (array[1,2])  is equivalent to (x = 1) OR (x = 2), then
the negation would be (x != 1) AND (x != 2), not OR.

Which can be conveniently expressed as  x != ALL (array[1,2]).

So just as you interchange AND and OR when inverting the sense of a
condition, you also interchange ALL and ANY for exactly the same
reasons.

 Chris> expression NOT IN (subquery)

https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_NOT_IN

 Chris> And is it a bug that one can't use unnest in a NOT IN expression
 Chris> in the WHERE clause?

No.

-- 
Andrew (irc:RhodiumToad)



Re: Code for getting particular day of week number from month

2018-12-12 Thread Andrew Gierth
> "Mike" == Mike Martin  writes:

 Mike> Hi

 Mike> For a particular sequence I needed to do (schedule 2nd monday in
 Mike> month for coming year) I created the following query

That doesn't look like the best way - you're generating and discarding a
lot of rows.

"second monday in month X" can be expressed as follows:

"second monday in X" = "(first monday in X) + 1 week"

"first monday in X"
  = "first day of X + N days, where N is (dow(Monday) - dow(1st))
 reduced to 0..6 mod 7"

i.e. if the month starts on Monday, N=0
  .. on Tuesday, N = 6   (1 - 2 == 6 mod 7)
  .. on Wednesday, N = 5  etc.

So:

select to_char(d, 'Day DD/MM/')
  from (select month
   + interval '1 week'
   + ((1 - extract(dow from month)::integer + 7) % 7)
 * interval '1 day'
   as d
  from generate_series(timestamp '2018-12-01',
   timestamp '2020-12-01',
   interval '1 month') month) s;

-- 
Andrew (irc:RhodiumToad)



Re: How to build a btree index with integer values on jsonb data?

2018-12-13 Thread Andrew Gierth
> "Johann" == Johann Spies  writes:

 Johann> How can I transform the following definition to index pubyear
 Johann> as integer and not text?

 Johann> CREATE INDEX pubyear_idx
 Johann> ON some_table_where_data_field_is_of_type_jsonb USING btree
 Johann> ((data -> 'REC'::text) -> 'static_data'::text) ->
 Johann> 'summary'::text) -> 'pub_info'::text) ->> '@pubyear'::text) COLLATE
 Johann> pg_catalog."default");

 Johann> While I can cast the value in a SELECT statement to integer I
 Johann> have been able to do the same while creating the index.

Laurenz' answer was almost correct, just got the position of the parens
wrong.

When you use an expression in an index, the outermost level of the
expression must either be (syntactically) a function call, or it must
have parens around its _outermost_ level.

You can simplify selecting from nested json using #>> in place of the ->
and ->> operators. (x #>> array['foo','bar']) is equivalent to doing
((x -> 'foo') ->> 'bar')

So:

CREATE INDEX pubyear_idx
ON some_table_where_data_field_is_of_type_jsonb USING btree
   (
((data #>> 
array['REC','static_data','summary','pub_info','@pubyear'])::integer)
   );

Note the ::integer is inside the parens that define the column value
within the outermost ( ) which enclose the column _list_.

-- 
Andrew (irc:RhodiumToad)



Re: Format an Update with calculation

2018-12-18 Thread Andrew Gierth
> "Bret" == Bret Stern  writes:

 Bret> My statement below updates the pricing no problem, but I want it
 Bret> to be formatted with 2 dec points eg (43.23).

UPDATE ...
   SET suggested_retail_price = round(suggested_retail_price*1.13, 2)
 WHERE ...

assuming suggested_retail_price is already of type "numeric".

You could also have given the columm a type of numeric(18,2) or similar,
which would round all assigned values to 2 places.

-- 
Andrew (irc:RhodiumToad)



Re: jsonb : find row by array object attribute

2018-12-30 Thread Andrew Gierth
> "Rory" == Rory Campbell-Lange  writes:

 Rory> and data like this:
 Rory> j 
 Rory> --
 Rory>  {"a": 1, "people": [{"id": 2002}, {"id": 2004}]}
 Rory>  {"a": 3, "people": [{"id": 2003}, {"id": 2005}]}
 Rory> (2 rows)

 Rory> I'd like to be able to find any row with a particular people id
 Rory> attribute.

where j @> '{"people":[{"id":2003}]}'

(meaning: j contains a key "people" whose value is an array containing
an element {"id":2003})

Since @> can use GIN indexes, you should usually try and find a search
condition using it before resorting to playing with -> or expanding out
array values.

-- 
Andrew (irc:RhodiumToad)



Re: jsonb : find row by array object attribute

2018-12-30 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:
>>>>> "Rory" == Rory Campbell-Lange  writes:

 Rory> and data like this:
 Rory> j 
 Rory> --
 Rory> {"a": 1, "people": [{"id": 2002}, {"id": 2004}]}
 Rory> {"a": 3, "people": [{"id": 2003}, {"id": 2005}]}
 Rory> (2 rows)

 Rory> I'd like to be able to find any row with a particular people id
 Rory> attribute.

 Andrew> where j @> '{"people":[{"id":2003}]}'

 Andrew> (meaning: j contains a key "people" whose value is an array
 Andrew> containing an element {"id":2003})

or to be more precise: j is an object containing a key "people" whose
value is an array containing an element which is an object containing a
key "id" with value 2003

i.e. {"a":3, "people": [{"id":2003,"blah":123},{"id":2004}]}  would
match the condition too.

-- 
Andrew (irc:RhodiumToad)



Re: Can't quote_literal with COPY FROM PROGRAM

2018-12-31 Thread Andrew Gierth
> "Mark" == Mark Mikulec  writes:

 Mark> To be honest this whole affair with COPY FROM program seems like
 Mark> a bug to me though.

Perhaps you're misunderstanding what COPY FROM PROGRAM is actually for.
Its purpose is to do exactly what COPY does, that is to say, take as
input a file in either PG's tab-delimited format or in CSV format, break
it into records and fields, and insert the data into a table. Note that
JSON is not a supported input format for COPY, though of course JSON
_values_ can appear as data within a field inside either the
tab-delimited or CSV formats. COPY FROM PROGRAM simply does COPY but
with the input (whether in tab or CSV format) taken from the output of
the program rather than a file.

In tab-delimited format, the delimiter can be changed to something other
than a tab, but the escape character is fixed as \ and the characters
NL, CR, \, and the delimiter character are required to be escaped. Thus,
any literal \ in the data MUST be escaped as \\ before passing the data
to COPY in this mode. In CSV mode, CSV quoting and escaping rules are
followed.

It's not COPY's job to read a single datum, whether in JSON format or
anything else.

-- 
Andrew (irc:RhodiumToad)



Re: Function `set_config` doesn't work in with query?

2019-01-04 Thread Andrew Gierth
> "Zexuan" == Zexuan Luo  writes:

 Zexuan> For instance:
 Zexuan> ```
 Zexuan> with t as (
 Zexuan> select set_config('blah', '1', false)
 Zexuan> )
 Zexuan> select current_setting('blah');

A CTE containing a SELECT query which is not referenced anywhere will
not be executed, even if it contains volatile functions. (CTEs
containing INSERT/UPDATE/DELETE that are not referenced _will_ still be
executed.)

-- 
Andrew (irc:RhodiumToad)



Re: Immutable way to cast timestamp TEXT to DATE? (for index)

2019-01-04 Thread Andrew Gierth
> "Ken" == Ken Tanzer  writes:

 Ken> Hi. I've got a text field in a table that holds this style of
 Ken> timestamp:

 Ken> 2014-10-23T00:00:00

You can't make this a field of type "timestamp" rather than text?

Casts from text to either date or timestamp are mutable because they
depend on the current DateStyle value (_you_ might know that your values
are always ISO format, but the code doesn't). You can't even avoid this
with to_date or to_timestamp and a fixed format, because those functions
are also mutable since some of the format options are locale-dependent
(again, you might know that you're not using those, but the code
doesn't).

If the column type were timestamp, though, then you could do comparisons
directly, or cast the value to "date" (which is an immutable cast).

If you absolutely can't change the column type, then one option would be
to do your own fixed-format date parsing function (and label it
immutable), e.g.

create function iso_timestamp(text)
  returns timestamp without time zone
  as $$ select case when $1 ~ '^\d\d\d\d-?\d\d-?\d\dT\d\d:?\d\d:?\d\d$'
then $1::timestamp
else null end $$
  set DateStyle = 'ISO,YMD'
  language sql immutable strict;

or

create function iso_date(text)
  returns date
  as $$ select case when $1 ~ '^\d\d\d\d-?\d\d-?\d\d(?![^T])'
then substring($1 from '^\d\d\d\d-?\d\d-?\d\d')::date
else null end $$
  set DateStyle = 'ISO,YMD'
  language sql immutable strict;

-- 
Andrew (irc:RhodiumToad)



Re: Immutable way to cast timestamp TEXT to DATE? (for index)

2019-01-04 Thread Andrew Gierth
> "Ken" == Ken Tanzer  writes:

 >> If you absolutely can't change the column type, then one option
 >> would be to do your own fixed-format date parsing function (and
 >> label it immutable), e.g.
 >> 
 >> create function iso_timestamp(text)
 >> returns timestamp without time zone
 >> as $$ select case when $1 ~ '^\d\d\d\d-?\d\d-?\d\dT\d\d:?\d\d:?\d\d$'
 >> then $1::timestamp
 >> else null end $$
 >> set DateStyle = 'ISO,YMD'
 >> language sql immutable strict;
 >> 
 >> or
 >> 
 >> create function iso_date(text)
 >> returns date
 >> as $$ select case when $1 ~ '^\d\d\d\d-?\d\d-?\d\d(?![^T])'
 >> then substring($1 from '^\d\d\d\d-?\d\d-?\d\d')::date
 >> else null end $$
 >> set DateStyle = 'ISO,YMD'
 >> language sql immutable strict;

 Ken> Yeah, I thought I might have to do that, but when I create that
 Ken> index it still doesn't seem to use the index for queries.

It won't use the index unless you use the same function in the query
too.

i.e.

CREATE INDEX ON ... (iso_date("Service_Date"));

SELECT * FROM ... WHERE iso_date("Service_Date") BETWEEN ... AND ...;

-- 
Andrew (irc:RhodiumToad)



Re: Adding LEFT JOIN to a query has increased execution time 10 times

2019-01-05 Thread Andrew Gierth
> "AF" == Alexander Farber  writes:

 AF> Here are the only modified settings in postgresql.conf:

 AF> max_connections = 120 # (change requires restart)
 AF> work_mem = 8MB # min 64kB
 AF> maintenance_work_mem = 128MB # min 1MB

 AF> 90% of the backend source code are JSON-emitting stored functions
 AF> and there is one function which is the main core of the game and is
 AF> a SELECT query over 7 tables.

 AF> It is called for every Websocket-connected client and delivers a
 AF> JSON list of active games for the player.

 AF> Until recently the query needed 1-2 seconds for completion,

That seems slow in itself, even before adding the extra join - the
explain suggests that you're both short on indexes and you're getting
pretty bad plans, possibly due to exceeding join_collapse_limit.

(You might try increasing that in your config, along with
from_collapse_limit; the default values are a legacy of the days when
CPUs were much slower and planning time more of an issue.)

 AF> but after I have added a LEFT JOIN with the following table, the
 AF> query takes 7-10 seconds for completion and makes the game
 AF> unpleasant to play:

 AF> # \d words_geoip;
 AF>  Table "public.words_geoip"
 AF>  Column |   Type   | Collation | Nullable | Default
 AF> +--+---+--+-
 AF>  block  | inet |   | not null |
 AF>  lat| double precision |   |  |
 AF>  lng| double precision |   |  |
 AF> Indexes:
 AF> "words_geoip_pkey" PRIMARY KEY, btree (block)

And here's yet another missing index, resulting in your query having to
process and discard 27 million rows in the course of generating a result
of only 9 rows:

  Join Filter: (CASE WHEN (g.player1 = 5) THEN u2.ip ELSE u1.ip END << i2.block)
  Rows Removed by Join Filter: 27660682

(you probably wanted <<= rather than << as that comparison, if there's
any chance your geoip table might have entries for single IPs)

Fortunately, this being pg10, you can use either of these indexes:

CREATE INDEX ON words_geoip USING gist (block inet_ops);

or

CREATE INDEX ON words_geoip USING spgist (block);

As for the rest of the query, here are places you could probably
work on:

 AF> LEFT JOIN words_moves m ON m.gid = g.gid
 AF> AND NOT EXISTS (SELECT 1
 AF> FROM words_moves m2
 AF> WHERE m2.gid = m.gid
 AF> AND m2.played > m.played)

Whar you're asking for here is that the words_moves row that you're
joining not have a matching row with a larger "played" value. You can do
this far more efficiently with a lateral join, given the right index.

 AF> LEFT JOIN words_social s1 ON s1.uid = 5
 AF> AND NOT EXISTS (SELECT 1
 AF> FROM words_social s
 AF> WHERE s1.uid = s.uid
 AF> AND s.stamp > s1.stamp)
 AF> LEFT JOIN words_social s2 ON s2.uid = (CASE WHEN g.player1 = 5 THEN
 AF> g.player2 ELSE g.player1 END)
 AF> AND NOT EXISTS (SELECT 1
 AF> FROM words_social s
 AF> WHERE s2.uid = s.uid
 AF> AND s.stamp > s2.stamp)

Similar considerations apply to both of the above.

 AF> WHERE 5 IN (g.player1, g.player2)
 AF> AND (g.finished IS NULL OR g.finished > CURRENT_TIMESTAMP - INTERVAL '1
 AF> day');

This WHERE clause could be written as

  WHERE 5 IN (g.player1, g.player2)
AND coalesce(g.finished,'infinity') > (current_timestamp - interval '1 day')

and you could then create the following indexes,

CREATE INDEX ON words_games (player1, coalesce(finished,'infinity'));
CREATE INDEX ON words_games (player2, coalesce(finished,'infinity'));

which should get you a BitmapOr plan for that condition.

 AF> I have also asked my question at [dba.stack]

If you ask questions like this on the IRC channel (#postgresql on
chat.freenode.net - see http://freenode.net for info or web-based client
access), you can usually get feedback in real time (I rarely answer
performance questions in email because getting responses just takes too
long). You may have to be patient.

-- 
Andrew (irc:RhodiumToad)



Re: Adding LEFT JOIN to a query has increased execution time 10 times

2019-01-06 Thread Andrew Gierth
> "Alexander" == Alexander Farber  writes:

 Alexander> Good evening, thank you for the useful hints!

 Alexander> With the further help of the IRC folks the query has been
 Alexander> optimized (7-10 seconds -> 0.3 second)

0.3 MILLIseconds, actually.

(You chanced not to catch me around on IRC, but I see that didn't
matter.)

 Alexander> by adding the following indices:

 Alexander> CREATE INDEX ON words_games (player1, COALESCE(finished, 
'INFINITY'));
 Alexander> CREATE INDEX ON words_games (player2, COALESCE(finished, 
'INFINITY'));

 Alexander> CREATE INDEX ON words_moves (gid, played DESC);
 Alexander> CREATE INDEX ON words_social (uid, stamp DESC);

I'm not a big fan of using DESC on indexes; it's almost never needed,
because any btree index can be scanned in reverse. (You only actually
need it if you're mixing ASC/DESC orderings in an ORDER BY and want an
index that matches it.)

 Alexander> Also I have increased the following parameters in
 Alexander> postgresql.conf -

 Alexander> from_collapse_limit = 24
 Alexander> join_collapse_limit = 24

Ironically, I think these settings don't affect the query now since
removing the EXISTS conditions (which count as joins) means there are
now less than 8 joined tables. But keeping them high is probably a good
idea so that you don't get problems if you ever add another join or two.

 Alexander> Now the whole query looks as following and the EXPLAIN
 Alexander> output pasted is below -

Just for future reference, when you include explain output in your email
(which you should, for the benefit of the archives - paste sites and
explain.depesz.com have limited lifetimes), it's best if you can make
sure your email client doesn't word-wrap them into near-unreadability.

 Alexander> I was told that it still could be improved (by rearranging
 Alexander> WHERE clauses?)

Maybe, but once you're down to sub-millisecond execution times, further
optimization is usually only worthwhile for very heavily executed
queries.

-- 
Andrew (irc:RhodiumToad)



Re: Not sure which part of the query needs optimization

2019-01-07 Thread Andrew Gierth
> "Alexander" == Alexander Farber  writes:

 Alexander> Good afternoon,

 Alexander> for each visitor of my website I generate a JSON list of 30
 Alexander> top players ( https://slova.de/words/top.php ), who played
 Alexander> in the past week, with their average scores and average time
 Alexander> between moves.

 -> Parallel Seq Scan on words_moves m_1
 (cost=0.00..73600.05 rows=8666 width=16)
 (actual time=0.761..130.844 rows=11074 loops=3)
  Filter: (played > (CURRENT_TIMESTAMP - '7 days'::interval))
  Rows Removed by Filter: 492241

This is telling you that an index on words_moves(played) would likely
help.

But the real hot point in the query is here (reformatted for clarity):

 -> Aggregate  (cost=6097.83..6097.84 rows=1 width=32)
   (actual time=19.401..19.401 rows=1 loops=30)
-> Bitmap Heap Scan on words_moves
(cost=33.97..6093.45 rows=1748 width=4)
(actual time=1.680..18.153 rows=15011 loops=30)
Recheck Cond: (uid = u.uid)
Heap Blocks: exact=216312
   -> Bitmap Index Scan on words_moves_uid_idx
   (cost=0.00..33.54 rows=1748 width=0)
   (actual time=0.979..0.979 rows=15011 loops=30)
   Index Cond: (uid = u.uid)

(the explain.depesz.com view points this out with an orange highlight)

This corresponds to this subquery:

  (SELECT ROUND(AVG(score), 1)
 FROM words_moves
WHERE uid = u.uid) AS avg_score,

The basic issue here is that you're calculating an average over around
15k rows per user, for each user in the query result (so 30 times,
making 450k rows). You can see from the "Heap Blocks" stat that this is
having to scan a lot of data; it's taking on average 19.4ms per user,
but multiply that by 30 users and you get ~580ms total, or about 70% of
the total execution time.

The obvious thing to do is to keep a computed average score for each
user - either in a separate table which you update based on changes to
words_moves, which you could do with a trigger, or using a materialized
view which you refresh at suitable intervals (this has the drawback that
the data will not be immediately up-to-date).

Combining these two changes should get you to under 100ms, maybe.

-- 
Andrew (irc:RhodiumToad)



Re: How to always run UPDATE FROM despite missing records in the source table?

2019-01-11 Thread Andrew Gierth
> "Alexander" == Alexander Farber  writes:

 Alexander> However the following query does not work as intended and
 Alexander> does not update any fields, because there is no matching
 Alexander> block in the geoip table found:

 Alexander> UPDATE users u SET
 Alexander> visited = now(),  -- HOW TO ALWAYS UPDATE THIS 
FIELD?
 Alexander> ip  = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS 
FIELD?
 Alexander> lat = i.lat,
 Alexander> lng = i.lng
 Alexander> FROM geoip i
 Alexander> WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;

 Alexander> The field visited and ip however should be always updated -
 Alexander> regardless if the block was found or not.

 Alexander> Kind of LEFT JOIN, but for UPDATE - how to achieve this please?

It can be done like this (this assumes you want to preserve the previous
values of u.lat/u.lng if the block was not found; if you want to set
them to null instead, then remove the coalesce() calls):

UPDATE users u
   SET visited = now(),
   ip = v.ip,
   lat = coalesce(i.lat, u.lat),
   lng = coalesce(i.lng, u.lng)
  FROM (VALUES ('20.20.20.20'::inet)) v(ip)
   LEFT JOIN geoip i ON (v.ip <<= i.block)
 WHERE u.uid = 2;

 Alexander> But that would run the same subquery twice (correct?) and my
 Alexander> geoip table is already slow with 3073410 records

Slow even with a gist or spgist index? what does the explain analyze
look like?

(You could also try using the ip4r module; I've not done any serious
benchmarking to see if it's faster than the built-in index types, though
it has some theoretical advantages due to not being restricted to CIDR
ranges. In pg versions before the built-in inet type got a gist index
method, ip4r was _the_ way to do ip block lookups for geoip etc.)

-- 
Andrew (irc:RhodiumToad)



Re: How to always run UPDATE FROM despite missing records in the source table?

2019-01-13 Thread Andrew Gierth
> "Alexander" == Alexander Farber  writes:

 Alexander> even though I wonder what is the (VALUES
 Alexander> ('20.20.20.20'::inet)) v(ip) construct there, some temporary
 Alexander> table which is then LEGT JOINed to the geoip table?

The SQL spec calls it a . The VALUES clause
behaves like a SELECT that returns a fixed number of rows (1 or more)
whose columns contain the results of the specified expressions. The
v(ip) part is just a table and column alias (I omitted the optional AS
keyword out of long habit) to name the constructed table.

-- 
Andrew (irc:RhodiumToad)



Re: lost "left join"

2019-01-16 Thread Andrew Gierth
> "Олег" == Олег Самойлов  writes:

 Олег> Hi, all.
 Олег> I got some mystic behaviour of PostgreSQL, perhaps this is a bug.

Feature, actually.

 Олег> But things begin be strange if I add validation by time.

 Олег> => explain select * from node as parent left join link on
 Олег> parent.node_id=link.parent left join node as child on
 Олег> link.child=child.node_id where parent.node_id=1 and current_date
 Олег> <@ parent.valid and current_date <@ link.valid and current_date
 Олег> <@ child.valid;

The problem here is that (for example) child.valid is null if there was
no matching child row in the join, and the planner knows that x <@ NULL
is not true (since the chosen <@ operator is defined as strict), and
therefore it knows that the left join is unnecessary and can be reduced
to an inner join.

At least 90% of the time when you refer to values from the nullable side
of a join in a WHERE clause, you're making a mistake (e.g. the condition
should have been in the ON clause instead). The other 10% or less of the
time, you have to make sure you use non-strict conditions, i.e. take
account of the fact that the values might be null.

-- 
Andrew (irc:RhodiumToad)



Re: Weird behaviour of ROLLUP/GROUPING

2019-01-16 Thread Andrew Gierth
> "Guillaume" == Guillaume Lelarge  writes:

 Guillaume>   CASE grouping(to_char(b, 'MM'))

 Guillaume> ERROR:  arguments to GROUPING must be grouping expressions of the
 Guillaume> associated query level
 Guillaume> LINE 3: CASE grouping(to_char(b, 'MM')) WHEN 1 THEN 'some date' 
...

 Guillaume> AFAICT, both queries should behave the same, though their
 Guillaume> actual behaviours are quite opposite. Working fine for the
 Guillaume> first, erroring out on the second.

 Guillaume> Does anyone has any idea what's going on here?

Not yet. But I will find out, since it seems to be a bug.

-- 
Andrew (irc:RhodiumToad)



Re: Weird behaviour of ROLLUP/GROUPING

2019-01-16 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth  writes:
>>>>> "Guillaume" == Guillaume Lelarge  writes:

 Guillaume> CASE grouping(to_char(b, 'MM'))

 Guillaume> ERROR:  arguments to GROUPING must be grouping expressions of the
 Guillaume> associated query level
 Guillaume> LINE 3: CASE grouping(to_char(b, 'MM')) WHEN 1 THEN 'some date' 
...

 Guillaume> AFAICT, both queries should behave the same, though their
 Guillaume> actual behaviours are quite opposite. Working fine for the
 Guillaume> first, erroring out on the second.

 Guillaume> Does anyone has any idea what's going on here?

 Andrew> Not yet. But I will find out, since it seems to be a bug.

It is definitely a bug, to do with assignment of collations. It
specifically happens when you use GROUPING which contains any
subexpression of a collatable type, inside a CASE expr WHEN clause,
since that assigns collations to the expression much earlier in parsing
than the rest of the query, so the code that validates GROUPING ends up
trying to compare an expression which has had collations assigned to it
to one which has not, and so it thinks they differ.

I will see about fixing this, somehow.

-- 
Andrew (irc:RhodiumToad)



Re: Question about array_to_string()'s behavior and the behavior might be changed in the future or not

2019-01-16 Thread Andrew Gierth
> "Egashira" == Egashira, Yusuke  writes:

 Egashira> Hi, 
 Egashira> I have a question about array_to_string(). 

 Egashira> I think array_to_string() concatenates always array elements
 Egashira> from the "beginning" of the array elements and this rule will
 Egashira> not be changed on the future minor releases.

 Egashira> Is my understanding correct? 

The order of elements in the original array will always be preserved in
the string output, yes.

 Egashira> For example, I understand that array_to_string() "never"
 Egashira> output like following result now and in the future, is my
 Egashira> understanding correct?

Yes.

-- 
Andrew (irc:RhodiumToad)



Re: Weird behaviour of ROLLUP/GROUPING

2019-01-16 Thread Andrew Gierth
> "Guillaume" == Guillaume Lelarge  writes:

 >> I will see about fixing this, somehow.

 Guillaume> Thanks a lot.

I've committed a fix (to all supported branches, since this bug actually
precedes the addition of GROUPING SETS and can be triggered with a
simple GROUP BY if you try hard enough). The regression test says it
works now, but it'd be good if you could try it again on REL_11_STABLE
(at commit e74d8c5085 or later) to check that it fixes your case.

-- 
Andrew (irc:RhodiumToad)



Re: commit within a procedure loop - cannot commite with subtransaction

2019-01-21 Thread Andrew Gierth
> "andyterry" == andyterry   writes:

 andyterry> Hi,

 andyterry> Using a procedure for the first time to run some processing
 andyterry> for each row in a table, generating output to a target
 andyterry> table. The following works without COMMIT the example below
 andyterry> gives:

 andyterry> INFO: Error Name:cannot commit while a subtransaction is active
 andyterry> INFO: Error State:2D000

What client are you using to execute this? Some clients may insert
SAVEPOINT statements (which create subtransactions) behind your back
(e.g. psql with \set ON_ERROR_ROLLBACK does this) in order to recover
from errors without aborting the whole transaction.

(turn on log_statement=all in your config, or for the user you're
executing this as, and look for the command in the server log)

-- 
Andrew (irc:RhodiumToad)



Re: Implementing an expanded object in C

2019-01-27 Thread Andrew Gierth
> "Michel" == Michel Pelletier  writes:

 Michel> Replying to my own problem here, I guess this was a situation
 Michel> where explaining it in detail revealed the problem to me. By
 Michel> specifying my type is 'passedbyvalue'

That cannot possibly be appropriate.

 Michel> CREATE TYPE matrix (
 Michel> internallength = 8,

Your type is clearly not a fixed-length type, because fixed-length types
cannot have expanded datums. A fixed-length type must contain its entire
representation within the fixed length - it is not allowed to be a
pointer to something else.

-- 
Andrew (irc:RhodiumToad)



Re: Displaying Comments in Views

2019-01-28 Thread Andrew Gierth
> "Susan" == Susan Hurst  writes:

 Susan> What is the trick for displaying column comments in views? The
 Susan> query below works as expected when the table_schema includes
 Susan> tables, however it shows nothing when the table_schema contains
 Susan> only views. I tried putting the query into an inline statement
 Susan> as a column selection in a wrapper query...I got all the
 Susan> table/column data but the comment column values were all null.

 Susan>   from pg_catalog.pg_statio_all_tables  st

That is the wrong place to look for the purposes of this query, since as
the name implies it only shows tables (and not views, since
non-materialized views don't have or need I/O statistics). Also, it's
_NOT_ the place to look when you just want a list of tables in the db or
to look up tables by oid; use pg_class for that.

I'd have gone with something along the lines of:

select n.nspname as table_schema,
   c.relname as table_name,
   a.attname as column_name,
   pd.description as description
  from pg_class c
   join pg_namespace n on (n.oid=c.relnamespace)
   join pg_attribute a on (a.attrelid=c.oid
   and a.attnum > 0
   and not a.attisdropped)
   join pg_description pd
 on (pd.classoid='pg_class'::regclass
 and pd.objoid=c.oid
 and pd.objsubid=a.attnum)
 where n.nspname = 'devops'
 order by n.nspname, c.relname, a.attname;

-- 
Andrew (irc:RhodiumToad)



Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-28 Thread Andrew Gierth
> "pabloa98" == pabloa98   writes:

 pabloa98> the table baseline_denull has 1765 columns,

Uhh...

#define MaxHeapAttributeNumber  1600/* 8 * 200 */

Did you modify that?

(The back of my envelope says that on 64bit, the largest usable t_hoff
would be 248, of which 23 is fixed overhead leaving 225 as the max null
bitmap size, giving a hard limit of 1800 for MaxTupleAttributeNumber and
1799 for MaxHeapAttributeNumber. And the concerns expressed in the
comments above those #defines would obviously apply.)

-- 
Andrew (irc:RhodiumToad)



Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-29 Thread Andrew Gierth
> "pabloa98" == pabloa98   writes:

 pabloa98> I did not modify it.

Then how did you create a table with more than 1600 columns? If I try
and create a table with 1765 columns, I get:

ERROR:  tables can have at most 1600 columns

-- 
Andrew (irc:RhodiumToad)



Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-29 Thread Andrew Gierth
> "pabloa98" == pabloa98   writes:

 pabloa98> I found this article:

 pabloa98> 
https://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux

Those instructions contain obvious errors.

 pabloa98> It seems I should modify: uint8 t_hoff;
 pabloa98> and replace it with something like: uint32 t_hoff; or uint64 t_hoff;

At the very least, that ought to be uint16 t_hoff; since there is never
any possibility of hoff being larger than 32k since that's the largest
allowed pagesize. However, if you modify that, it's then up to you to
ensure that all the code that assumes it's a uint8 is found and fixed.
I have no idea what else would break.

-- 
Andrew (irc:RhodiumToad)



Re: Help : Update and insert record based on several value in the parameter

2019-01-29 Thread Andrew Gierth
> "Hengky" == Hengky Lie  writes:

 Hengky> Hi,

 Hengky> I want to create a function to update my table (flag='9') and
 Hengky> insert new record (flag='0') with the rfidnumber specified in a
 Hengky> parameter.

rfidnumber is stored as text/varchar? if it's something else, change the
suggestions below accordingly.

 Hengky> This parameter may have several value seperated by space (ie.
 Hengky> 11 22 33 44)

update ... and rfidnumber = any (string_to_array(znumber,' '))

Or, do the split just once:

DECLARE
  z_ids text[] := string_to_array(znumber, ' ');
BEGIN
  update ... where flag='0' and rfidnumber =any (z_ids);
  insert into ...
  select localtimestamp, '0', id from unnest(z_ids) as u(id);

-- 
Andrew (irc:RhodiumToad)



Re: Date calculation

2019-01-31 Thread Andrew Gierth
> "Ron" == Ron   writes:

 Ron> Hi,
 Ron> v9.6.6

 Ron> Is there a built in function to calculate, for example, next
 Ron> Sunday?

No, but such things aren't hard to calculate using the available
primitives.

To get "next Xday", for example, you can add 7 days and then do
"previous or current Xday". In turn, "previous or current Xday" can be
done by subtracting (X-Monday), doing date_trunc 'week', and adding
(X-Monday) again.

select current_date,
   date_trunc('week', (current_date + 7 - 6)::timestamp)::date + 6; 
 current_date |  ?column?  
--+
 2019-01-31   | 2019-02-03
(1 row)

If you do this sort of thing a lot, then define your own functions for
it:

-- create this to override the cast to timestamptz that otherwise messes
-- things up:
create function date_trunc(text,date)
  returns date language sql immutable
  as $f$
select date_trunc($1, $2::timestamp)::date;
$f$;

-- perfect hash function for weekday names, with Monday=0
-- (accepts upper, lower or mixed case)
create function dayno(text)
  returns integer
  language sql immutable
  as $f$
select (( ((ascii(substring($1 from 3)) & 22)*10)
  # (ascii($1) & 23) )*5 + 2) % 7;
$f$;

create function next_dow(start_date date, day_name text)
  returns date language sql immutable
  as $f$
select date_trunc('week', (start_date + 7 - dayno(day_name)))
   + dayno(day_name);
$f$;

select current_date,
   next_dow(current_date, 'Thursday'),
   next_dow(current_date, 'Friday');
 current_date |  next_dow  |  next_dow  
--++
 2019-01-31   | 2019-02-07 | 2019-02-01

-- 
Andrew (irc:RhodiumToad)



Re: Date calculation

2019-01-31 Thread Andrew Gierth
> "Bruce" == Bruce Momjian  writes:

 Bruce> Oh, right, you want date, so use:

 Bruce> SELECT date_trunc('week', CURRENT_DATE) + '6 days';

Three major things wrong with this:

1. If you do this on Sunday, it gives you the current day not the _next_
Sunday.

2. If you try and do this for other days of the week it doesn't work at
all, instead giving you the specified day of the current week whether or
not it's before or after the current day.

3. It's letting PG cast the date to a timestamptz, which is inefficient,
possibly incorrect, and mutable; you want to force it to cast to
timestamp without timezone instead. (A good rule of thumb is that you
virtually never want to cast dates to timestamptz; the natural cast from
date is to timestamp _without_ timezone.)

-- 
Andrew (irc:RhodiumToad)



Re: Unused files in the database directory after crashed VACUUM FULL

2019-02-10 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Hannes Erven  writes:
 >> I've just had a "VACUUM FULL " crash due to 100% disk usage.
 >> Clearly my fault, I was expecting the new table to be small enough.

 Tom> What do you mean by "crash" exactly? A normal transactional
 Tom> failure should've cleaned up orphaned files. I suppose if the
 Tom> kernel decided to kill -9 the vacuum process, that wouldn't happen
 Tom> --- but that's not the usual response to out-of-disk-space.

The failure mode of this kind we see reported most often is getting a
PANIC failure from inability to extend the WAL.

-- 
Andrew (irc:RhodiumToad)



Re: Subquery to select max(date) value

2019-02-12 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich> Will probably try DISTINCT ON, too, if that makes it simpler or
 Rich> faster.

You want LATERAL.

-- 
Andrew (irc:RhodiumToad)



Re: Subquery to select max(date) value

2019-02-13 Thread Andrew Gierth
> "Adrian" == Adrian Klaver  writes:

 Adrian> Close to your last posted query. person_id 2 and 3 have NULL
 Adrian> values for activities data as there is no record for 2 and 3 is
 Adrian> out of the date range.:

 Adrian>  select
 Adrian>p.person_id,
 Adrian>p.desc_fld,
 Adrian>a.next_contact
 Adrian> from
 Adrian>people as p
 Adrian>LEFT JOIN (
 Adrian>SELECT
 Adrian>DISTINCT ON (person_id)
 [...]
 Adrian>) a USING (person_id)
 Adrian> ;

DISTINCT ON with no matching ORDER BY at the _same_ query level is
non-deterministic.

Also DISTINCT ON isn't efficient. Consider instead something along the
lines of:

select p.*,
   a.* -- for illustration
  from people p
   join lateral (select *
   from activities a1
  where a1.person_id = p.person_id
and a1.next_contact > '2018-12-31'
and a1.next_contact <= 'today'
  order by a1.next_contact desc
  limit 1) a
 on true;

(make sure to have an index on activities(person_id,next_contact))

-- 
Andrew (irc:RhodiumToad)



Re: Size estimation of postgres core files

2019-02-15 Thread Andrew Gierth
> "Jeremy" == Jeremy Finzel  writes:

 Jeremy> Yes Linux. This is very helpful, thanks. A follow-up question -
 Jeremy> will it take postgres a really long time to crash (and
 Jeremy> hopefully recover) if I have say 1T of RAM because it has to
 Jeremy> write that all out to a core file first?

It doesn't write out all of RAM, only the amount in use by the
particular backend that crashed (plus all the shared segments attached
by that backend, including the main shared_buffers, unless you disable
that as previously mentioned).

And yes, it can take a long time to generate a large core file.

-- 
Andrew (irc:RhodiumToad)



Re: Subquery to select max(date) value

2019-02-15 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich> I found a couple of web pages describing the lateral join yet
 Rich> have not correctly applied them. The manual's page did not help
 Rich> me get the correct syntax, either. Think I'm close, however:

 Rich> select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, 
a.next_contact
 Rich> from people as p, organizations as o
 Rich> lateral
 Rich> (select a.next_contact

LATERAL (SELECT ...)   is syntactically like (SELECT ...) in that it
comes _after_ a "," in the from-clause or after a [LEFT] JOIN keyword.
Don't think of LATERAL as being a type of join, think of it as
qualifying the (SELECT ...) that follows.

 Rich> from activities as a
 Rich> where a.next_contact is not null and a.next_contact <= 'today' 
and
 Rich>   a.next_contact > '2018-12-31'

You'd want a condition here that references the "people" table; the
whole point of LATERAL is that it opens up the scope of column
references in the subquery to include those tables which are to its left
in the from-clause.

 Rich> order by person_id,next_contact);

and I'm guessing you want that ordered by next_contact alone, possibly
with LIMIT 1 to get just the nearest following next_contact time.

-- 
Andrew (irc:RhodiumToad)



Re: Subquery to select max(date) value

2019-02-15 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich> Using LIMIT 1 produces only the first returned row. This
 Rich> statement (using max() for next_contact) produces no error
 Rich> message, but also no results so I killed the process after 30
 Rich> seconds. Without a syntax error for guidance I don't know how to
 Rich> proceed. I've not before run 'explain' on a query. Would that be
 Rich> appropriate here?

Yes.

 Rich> select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name
 Rich> from people as p, organizations as o,
 Rich> lateral
 Rich> (select p.person_id, p.lname, p.fname, p.direct_phone, 
o.org_name,
 Rich> max(a.next_contact)
 Rich> from people as p, organizations as o, activities as a
 Rich> where a.next_contact > '2018-12-31' and
 Rich>   a.next_contact <= 'today' and
 Rich>   a.next_contact is not null
 Rich> group by p.person_id, o.org_name, a.next_contact
 Rich> order by p.person_id, o.org_name, a.next_contact) sq;

The problem here is that you have no join conditions at all, so the
result set of this query is massive. And you've duplicated many tables
inside the subquery which is not necessary or appropriate.

select p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, sq.*
  from people as p
   join organizations as o on p.organization_id=o.id   -- OR WHATEVER
   cross join
 lateral (select a.next_contact
from activities as a
   where a.person_id=p.person_id --VERY IMPORTANT
 and a.next_contact > '2018-12-31'
 and a.next_contact <= 'today'
 and a.next_contact is not null
   order by a.next_contact DESC
   limit 1) sq;

Ordering by DESC with a limit 1 is used to get the max next_contact
value rather than the smallest; this is similar to max(), but makes it
trivial to also access the other columns of the _same_ activities row
which is being selected.

-- 
Andrew (irc:RhodiumToad)



Re: HAVING query structured wrong

2019-02-18 Thread Andrew Gierth
> "Chuck" == Chuck Martin  writes:

 Chuck> I am trying to create a query that returns all transactions for
 Chuck> each person who has a balance over a given amount. I thought
 Chuck> HAVING was the answer, but if so, I'm mis-using it. This query
 Chuck> returns only transactions exceeding the given amount rather than
 Chuck> transactions for people whose balance is over the amount:
 [snip]
 Chuck> Since that returned the wrong set of records, I created another
 Chuck> that returns the correct set of people with balances over the
 Chuck> given amount. But I can't figure out how to use this to get all
 Chuck> the transactions for people returned by this query:

 Chuck> SELECT case_pkey
 Chuck> FROM trans,ombcase,status
 Chuck> WHERE case_fkey = case_pkey
 Chuck> AND status_fkey = status_pkey
 Chuck> AND statusopen = 1
 Chuck> AND transistrust <> 1
 Chuck> AND transcleared <> 1
 Chuck> GROUP BY case_pkey
 Chuck> HAVING sum(transamount) >= 50

 Chuck> ORDER BY case_pkey

 Chuck> So how do I get all transactions for each case_pkey?

You can join the result of any subquery as if it were a table, either
with or without using a CTE:

SELECT ...
  FROM (select case_pkey from ... having ...) AS cases,
   trans
 WHERE trans.case_fkey = cases.case_pkey;

(incidentally, please qualify all the column references in your query
with a table name or alias, otherwise people reading your code have no
idea which column is supposed to be in which table)

or with a CTE,

WITH cases AS (select ... from ... having ...)
SELECT ...
  FROM cases, trans
 WHERE trans.case_fkey = cases.case_pkey;

There's also a third method with window functions instead of GROUP BY,
which is to do something like

SELECT ...
  FROM (select ...,
   sum(transamount) over (partition by case_pkey) as total_amt
  from ...) s
 WHERE total_amt > 50;

-- 
Andrew (irc:RhodiumToad)



Re: RECURSIVE allowed only ONCE in a CTE

2019-02-22 Thread Andrew Gierth
> "Jitendra" == Jitendra Loyal  writes:

 Jitendra> I find that the RECURSIVE can be used only once in a CTE.

RECURSIVE can be specified only once, but it applies to all CTEs at that
level. That is to say, RECURSIVE qualifies the preceding WITH, _not_ the
following CTE.

Note that just specifying RECURSIVE doesn't mean that any CTE is
recursive, it simply changes the name scoping rules such that CTEs _can_
be recursive. (Without it, a CTE's name is not in scope in its own body
so recursion is impossible.)

-- 
Andrew (irc:RhodiumToad)



Re: crosstab function

2019-02-26 Thread Andrew Gierth
> "Martin" == Martin Mueller  writes:

 Martin> I run Postgres 10.5. I understand that there is something
 Martin> called tablefunc and it includes a crosstab function. On Stack
 Martin> Overflow I learn that you import this function. But from where
 Martin> and how? The Postgres documentation is quite clear and
 Martin> intelligible to a retired English professor like me, but there
 Martin> is nothing in the Postgres documentation about how to do the
 Martin> import, and a search for ‘import modules’ yields nothing.

 Martin> I tried to emulate a Stack overflow query that does what I want
 Martin> to do, but got an error message saying that the crosstab
 Martin> function doesn’t exist. I tried

 Martin> CREATE EXTENSION IF NOT EXISTS tablefunc;

 Martin> but it did nothing. It doesn’t seem to work as import
 Martin> statements in Python do

CREATE EXTENSION causes the functions to be defined in the current
database; that's the only "import" that is needed. In psql, you can do
this:

\df+ *.crosstab*

to see what functions of that name are defined and what schema they are
in (should be "public" by default). Likewise  \dx  lists installed
extensions in the current database.

Remember that you need to do the CREATE EXTENSION command actually in
the database in which you want to use the functions, not in any other
database.

Errors about functions not existing are usually caused by a problem with
the number or type of parameters, since function names aren't unique
(overloaded functions are allowed). You didn't tell us the actual error
you got, so we can't say exactly what the problem is there.

-- 
Andrew (irc:RhodiumToad)



Re: Overloaded && operator from intarray module prevents index usage.

2019-02-28 Thread Andrew Gierth
> "Thomas" == Thomas Kellerer  writes:

[intarray woes]

 Thomas> Is this expected behaviour? Is this caused by the Postgres core
 Thomas> (e.g. the optimizer to taking the opclass into account) or is
 Thomas> it a "problem" in the way the intarray module defines its
 Thomas> operators?

It's basically a conflict between intarray (which is historically older)
and the built-in array indexing support.

The reason it happens is that the operator resolution logic matches an
(integer[] && integer[]) operator in preference to (anyarray && anyarray)
regardless of their relative position on the search_path. This
resolution happens before anything is known about any indexes that might
be applicable. Then later, at planning time, an index is chosen based on
the operator, not the reverse.

My own recommendation for most cases is to never install intarray on the
search path, and invoke its functions via explicit qualification or wrap
them in your own functions.

-- 
Andrew (irc:RhodiumToad)



Re: Overloaded && operator from intarray module prevents index usage.

2019-03-01 Thread Andrew Gierth
> "Ron" == Ron   writes:

 Ron> Arrays are -- by definition -- not atomic, and so they
 Ron> fundamentally break the model that relational databases are
 Ron> founded upon.  If you want to be a good database designer, don't
 Ron> use arrays.

"In theory there is no difference between theory and practice, but in
practice there is."

Sometimes a good database designer has to tell the theoreticians where
to get off, and do something more pragmatic.

-- 
Andrew (irc:RhodiumToad)



Re: Camel case identifiers and folding

2019-03-16 Thread Andrew Gierth
> "Morris" == Morris de Oryx  writes:

 Morris> UUIDs as a type are an interesting case in Postgres. They're
 Morris> stored as a large numeric for efficiency (good!), but are
 Morris> presented by default in the 36-byte format with the dashes.
 Morris> However, you can also search using the dashes 32-character
 Morris> formatand it all works. Case-insensitively.

That works because UUIDs have a convenient canonical form (the raw
bytes) which all input is converted to before comparison.

Text is ... not like this.

Even citext is really only a hack - it assumes that comparisons can be
done by conversion to lowercase, which may work well enough for English
but I'm pretty sure it does not correctly handle the edge cases in, for
example, German (consider 'SS', 'ss', 'ß') or Greek (final sigma). Doing
it better would require proper application of case-folding rules, and
even that would require handling of edge cases (the Unicode case folding
algorithm is designed to be language-independent, which means that it
breaks for Turkish without special-case exceptions).

-- 
Andrew (irc:RhodiumToad)



Re: printing JsonbPair values of input JSONB on server side?

2019-03-18 Thread Andrew Gierth
> "T" == T L  writes:

 T> //Problem line!!!
 T> //elog(NOTICE, "print_kv_pair(): k = %s, v = %s",
 T>  ptr-> key.val.string.val, numeric_out(ptr->value.val.numeric));

string.val isn't a C string (notice the "not null terminated" comment in
the structure definition), and you can't call numeric_out like that.
Either of those would crash it.

You could use pnstrdup to get a valid C string, and use
DatumGetCString(DirectFunctionCall1(
  numeric_out,
  NumericGetDatum(ptr->value.val.numeric)))

to get the numeric value as a C string.

-- 
Andrew (irc:RhodiumToad)



Re: printing JsonbPair values of input JSONB on server side?

2019-03-18 Thread Andrew Gierth
> "T" == T L  writes:

 T> //problem lines!!! //either elog crashes pg server
 T> char *buf = pnstrdup(ptr->key.val.string.val,
 T>  ptr-> key.val.string.len);
 T> elog(NOTICE, "print_kv_pair(): k = %s",
 T> (ptr->key).val.string.val);  //debug

It doesn't help to make a null-terminated copy of the string if you're
then just going to try and print the original.

elog(NOTICE, "print_kv_pair(): k = %s", buf);

 T> elog(NOTICE, "print_kv_pair(): v = %s",
 T> DatumGetCString(DirectFunctionCall1(numeric_out,
 T> NumericGetDatum(ptr->value.val.numeric))) ); //debug

That should work, _provided_ that value.type == jbvNumeric - did you
consider checking that first?

-- 
Andrew (irc:RhodiumToad)



Re: printing JsonbPair values of input JSONB on server side?

2019-03-18 Thread Andrew Gierth
> "T" == T L  writes:

 T> Below is my test. It prints a strange character instead of "a"; and
 T> says that the value isn't numeric.

Yeah, there's plenty else wrong with your code.

Did you look at how JsonbToCStringWorker does it? that looks like the
best example I can find on a quick scan.

-- 
Andrew (irc:RhodiumToad)



Re: FreeBSD 12 and Postgres build failures

2019-03-19 Thread Andrew Gierth
> "Karl" == Karl Denninger  writes:

 Karl> Both 10.7 and 11.2 fail on FreeBSD 12-STABLE if --with-openssl is
 Karl> part of the configure string.

Nope, they work fine for me, running under 12-stable as of a couple of
weeks ago.

-- 
Andrew (irc:RhodiumToad)



Re: FreeBSD 12 and Postgres build failures

2019-03-19 Thread Andrew Gierth
> "Karl" == Karl Denninger  writes:

 Karl> That was on a bare, new grab of the source from your repository and a
 Karl> straight-up run of "./configure --with-openssl --with-perl", then 
"gmake".

That's a configure line that has never worked on FreeBSD:

% ./configure --with-openssl --with-perl
[...]
configure: error: readline header not found
If you have libedit already installed, see config.log for details on the
failure.  It is possible the compiler isn't looking in the proper directory.
Use --without-readline to disable libedit support.
%

(If you don't use --without-readline then you must add
--with-includes=/usr/local/include and --with-libs=/usr/local/lib to
pick up either ports readline or ports libedit. Or you might be able to
use --with-includes=/usr/include/edit to get base system libedit, but
that's not really the recommended method. Besides libedit sucks for
actual use so you want readline anyway.)

If you didn't explicitly specify any of those, but configure found a
readline to use anyway, then it means you have defined some non-standard
compiler options or have messed with the include paths in some other
way, which could be relevant to your problem.

-- 
Andrew (irc:RhodiumToad)



Re: FreeBSD 12 and Postgres build failures

2019-03-19 Thread Andrew Gierth
> "Karl" == Karl Denninger  writes:

 >> That's a configure line that has never worked on FreeBSD:
 >> 
 >> % ./configure --with-openssl --with-perl
 >> [...]
 >> configure: error: readline header not found
 >> If you have libedit already installed, see config.log for details on the
 >> failure.  It is possible the compiler isn't looking in the proper directory.
 >> Use --without-readline to disable libedit support.
 >> %

 Karl> Uh, it has worked on all my FreeBSD systems for a very long time;

Then you have done something non-standard, because "cc" on FreeBSD does
not look in /usr/local/include or /usr/local/lib without being told to
do so, and the configure checks for readline will fail without that.
Possible explanations include having a "gcc" (without a specific version
suffix) on the $PATH (gcc will be chosen over cc in that case), or
having configured CFLAGS or CPPFLAGS in the environment to ./configure.

You might post your full output of ./configure rather than leaving us to
guess at it.

-- 
Andrew (irc:RhodiumToad)



Re: FreeBSD 12 and Postgres build failures

2019-03-19 Thread Andrew Gierth
> "Karl" == Karl Denninger  writes:

 Karl> "cc" on FreeBSD is now (and has been for quite some time) clang:

 Karl> If it will compile Postgres now it definitely would not when it
 Karl> was first shifted to;

People have been building PG with clang since at least as far back as
clang 3.1, and the FreeBSD 10.0 (which I believe was the first FreeBSD
release where clang was the default) package builds of PG were built
with clang 3.3. (Now, of course, clang is _required_ if you want to try
out the new JIT features.)

Building with gcc on FreeBSD is not the default on a clean system, and
as we've recently seen with another report, there are issues with mixing
the gcc and llvm toolchains on some architectures that are not in any
way PG's problem to solve. (Though the fact that we prefer gcc to cc in
configure if both are found is a decision that should probably be
revisited, given that the reasons for that preference are mostly
consigned to the graveyard of commercial Unix variants.)

You might also have considered that the fact that package builds exist
for pg 10 and 11 on FreeBSD 12 is a demonstration that building them is,
in fact, possible on a clean system...

 Karl> root@NewFS:/home/karl # which gcc
 Karl> /usr/local/bin/gcc

I had not previously noticed that the lang/gcc metaport (which I have
never installed, though I have lang/gcc8 installed at the moment)
installs a symlink to the selected gcc version as just "gcc". That
explains a lot.

-- 
Andrew (irc:RhodiumToad)



Re: Subquery to select max(date) value

2019-03-28 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich> Tried this and did not do it correctly. Should there be two
 Rich> 'order by', one in the sub-query, the other in the top-level
 Rich> query?

Yes.

 Rich> This does not return the desired order:

 Rich> select p.person_id, p.lname, p.fname, p.direct_phone, p.active, 
o.org_name, sq.*
 Rich> from people as p
 Rich>  join organizations as o on p.org_id = o.org_id
 Rich>  cross join
 Rich>  lateral
 Rich>  (select a.next_contact
 Rich>  from activities as a
 Rich>  where a.person_id = p.person_id and
 Rich>  p.active='True' and
 Rich>  a.next_contact is not null
 Rich>  order by a.next_contact DESC
 Rich>  limit 1) sq
 Rich>  order by sq.next_contact DESC;

That query seems correct assuming you want the result in descending
order of next_contact. How did the actual result differ from your
expectation?

-- 
Andrew (irc:RhodiumToad)




Re: Postgres comparison bugfixes between arbitrary versions

2019-04-05 Thread Andrew Gierth
> "Thomas" == Thomas Kellerer  writes:

 Thomas> Hello,

 Thomas> some time ago someone published a website where it was possible
 Thomas> to select two arbitrary Postgres version and then see a list of
 Thomas> Bugfixes (and features) that are missing in the older version
 Thomas> of the two.

why-upgrade.depesz.com

-- 
Andrew (irc:RhodiumToad)




Re: Computed index on transformation of jsonb key set

2019-04-27 Thread Andrew Gierth
> "Steven" == Steven Schlansker  writes:

 Steven> I figured I'd end up with significantly better storage and
 Steven> performance characteristics if I first compute a uuid[] value
 Steven> and build the GIN over that, and use the array operator class
 Steven> instead. Additionally, this eliminates possible confusion about
 Steven> uuid casing (text is case sensitive, uuid is not) and this has
 Steven> already caused at least one bug in our application.

 Steven> I attempted to optimize a query like:
 Steven> select * from tbl where array(select jsonb_object_keys(mapData)::uuid) 
&& array['320982a7-cfaa-572a-b5ea-2074d7f3b014'::uuid];

Obvious solution:

create function uuid_keys(mapData jsonb) returns uuid[]
  language plpgsql immutable strict
  as $$
begin
  return array(select jsonb_object_keys(mapData)::uuid);
end;
  $$;

create index on tbl using gin (uuid_keys(mapData));

select * from tbl where uuid_keys(mapData) && array[...];

-- 
Andrew (irc:RhodiumToad)




  1   2   >