Dynamically filtering a CTE?

2018-04-19 Thread W. Trevor King
I have a slow ‘WITH RECURSIVE’ CTE like:

  CREATE VIEW ancestors AS
  WITH RECURSIVE _ancestors(descendant, ancestors) AS (
  SELECT
item.id AS id,
ARRAY[item.ancestor_id] AS ancestors
  FROM items AS item
UNION ALL
  SELECT
child.id AS id
child.ancestors || ancestor.ancestor_id AS ancestors
  FROM _ancestors AS child
  JOIN items as ancestor
ON child.ancestors[array_length(child.ancestors, 1)] = ancestor.id
  )
  SELECT *
  FROM _ancestors
  WHERE child.ancestors[array_length(child.ancestors, 1)] IS NULL;

I'll usually only need a few rows, so I'm being bitten by PostgreSQL's
CTE optimization fence [1].  I'm looking to optimize it by limiting
the dynamically limiting the number of rows in the initial query, as
if it had been:

  WITH RECURSIVE _ancestors(id, ancestors) AS (
  SELECT
item.id AS id,
ARRAY[item.ancestor_id] AS ancestors
  FROM items AS item
  WHERE {your condition here}
UNION ALL
  …
  )

My initial thought was to create a function which accepted a WITH
clause as an argument.  Something like:

  CREATE OR REPLACE FUNCTION ancestors(condition)
RETURNS TABLE(id integer, ancestors integer[]) AS
  $$
WITH RECURSIVE _ancestors(id, ancestors) AS (
SELECT
  item.id AS id,
  ARRAY[item.ancestor_id] AS ancestors
FROM items AS item
WHERE condition
  UNION ALL
…
)
…
  $$ LANGUAGE SQL;

or with ‘WHERE condition(item)’.  But I couldn't find a way to define
an argument that was a where condition [2] or a record→boolean
function [3].  I could probably use PREPARE/EXECUTE [4] to dynamically
construct the WHERE statement, but that looks like it may have its own
optimization issues and there's no way to stash it for use in
subsequent sessions.  Perhaps a function to run the PREPARE?  Is there
an idiomatic way to approach this problem?

Thanks,
Trevor

[1]: https://www.postgresql.org/message-id/201209191305.44674...@kavod.com
[2]: https://www.postgresql.org/docs/10/static/sql-select.html#SQL-WHERE
[3]: https://www.postgresql.org/docs/10/static/sql-createfunction.html
[4]: https://www.postgresql.org/docs/10/static/sql-prepare.html

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Dynamically filtering a CTE?

2018-04-20 Thread W. Trevor King
On Thu, Apr 19, 2018 at 05:28:00PM -0700, David G. Johnston wrote:
> On Thursday, April 19, 2018, W. Trevor King wrote:
> > Is there an idiomatic way to approach this problem?
>
> I would use pl/pgsql as the language and build a query using a
> combination of text literals and the format() function - invoking
> via pl/pgsql's EXECUTE command.

That works.  I've ended up with:

  CREATE OR REPLACE FUNCTION ancestors(condition text)
RETURNS TABLE(id integer, ancestors integer[]) AS
  $$
  BEGIN
  RETURN QUERY EXECUTE format('
WITH RECURSIVE _ancestors(id, ancestors) AS (
SELECT
  item.id AS id,
  ARRAY[item.ancestor_id] AS ancestors
FROM items AS item
%s
  UNION ALL
SELECT
  descendant.id AS id,
  descendant.ancestors || ancestor.ancestor_id AS ancestors
FROM _ancestors AS descendant
JOIN items as ancestor
  ON descendant.ancestors[array_length(descendant.ancestors, 1)] = 
ancestor.id
)
SELECT
  id,
  ancestors[1:array_length(ancestors, 1) - 1] AS ancestors -- drop the 
trailing NULL
FROM _ancestors
WHERE ancestors[array_length(ancestors, 1)] IS NULL -- remove non-terminal 
recursion
', condition);
  END
  $$ LANGUAGE plpgsql STABLE;

which you can use like:

  SELECT * FROM ancestors('WHERE item.id = 62324721');

or (without filtering, for the full, slow CTE):

  SELECT * FROM ancestors('');

Thanks,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Dynamically filtering a CTE?

2018-04-20 Thread W. Trevor King
On Fri, Apr 20, 2018 at 09:33:22AM -0700, David G. Johnston wrote:
> On Fri, Apr 20, 2018 at 9:22 AM, W. Trevor King wrote:
> > format('
> > WITH RECURSIVE _ancestors(id, ancestors) AS (
> > SELECT
> >   item.id AS id,
> >   ARRAY[item.ancestor_id] AS ancestors
> > FROM items AS item
> > %s
> > ​[...]​
> >
> > ', condition);
> >
> >   SELECT * FROM ancestors('WHERE item.id = 62324721');
> 
> ​Just keep in mind that this opens up a huge SQL-injection hole in
> your database.  Depending on how its called you might want to
> validation the input text for both whitelist and blacklist items
> before executing it.

I'm not calling it on user-supplied conditions, but yeah, if I were it
would certainly need some guards.  Unfortunately, neither format [1]
nor USING [2,3] seem to have auto-quoting for “make sure this is just
a WHERE condition [4] without side-effects” ;).  I think we'd need a
WHERE-condition data type to support that, just like we'd need a
WHERE-condition data type (or a function data type) to support my
initial idea [5]:

  CREATE OR REPLACE FUNCTION ancestors(condition WHERE-condition-type)
  RETURNS TABLE(id integer, ancestors integer[]) AS
$$
  WITH RECURSIVE _ancestors(id, ancestors) AS (
  SELECT
item.id AS id,
ARRAY[item.ancestor_id] AS ancestors
  FROM items AS item
  WHERE condition -- or, with a function type, condition(item)
UNION ALL
  …
  )
  …
$$ LANGUAGE SQL;

And even if you had a WHERE-condition data type, enforcing the
no-side-effects constraint would be tricky.

Things like blacklisting condition text with semicolons, etc. might
help against unintentional typos, although they seem too easily
avoided to be relied on against potentially malicious user input.
Parsing the condition text as WHERE-clause SQL to look for dangerous
constructs might be strong enough, but seems like a lot of work and
something I'm likely to get wrong if I tried ;).  For now, I'm just
making sure I'm not allowing untrusted users to provide condition
text.

Thanks,
Trevor

[1]: 
https://www.postgresql.org/docs/10/static/functions-string.html#FUNCTIONS-STRING-FORMAT
[2]: 
https://www.postgresql.org/docs/10/static/plpgsql-control-structures.html#id-1.8.8.8.3.4
[3]: 
https://www.postgresql.org/docs/10/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
[4]: https://www.postgresql.org/docs/10/static/sql-select.html#SQL-WHERE
[5]: https://www.postgresql.org/message-id/2018042055.GL27577%40valgrind.us

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature