Currently, the planner can remove useless left joins if the join
condition cannot match more than one RHS row, and the RHS rel is not
referenced above the join.  I'd like to propose a similar optimization
for inner joins.

For an inner join to be removable, we need to make sure that the join
condition matches exactly one (no more, no less) RHS row.  We can
leverage foreign key constraints to achieve that.  But we need to be
careful about the case where the referencing relation's foreign key
columns are null.  We can inject an IS NOT NULL filter for any foreign
key columns that are not defined as NOT NULL.

Here is an example to show this removal:

CREATE TABLE users (id int primary key, name text);
CREATE TABLE orders (id int, user_id int references users(id), amount int);

EXPLAIN (COSTS OFF)
SELECT o.* FROM orders o JOIN users u ON o.user_id = u.id;
           QUERY PLAN
---------------------------------
 Seq Scan on orders o
   Filter: (user_id IS NOT NULL)
(2 rows)

One interesting aspect of this patch is that we can actually allow the
usage of the referenced key columns in ECs even above the join, which
can help bridge multi-hop joins.  This is pretty useful for the
brand-new graph queries.

As an example, consider:

CREATE TABLE knows (
    edge_id int primary key,
    src_id int not null references users(id),
    dst_id int not null references users(id),
    creation_date date
);

CREATE PROPERTY GRAPH social_graph
    VERTEX TABLES (users)
    EDGE TABLES (knows
        SOURCE KEY (src_id) REFERENCES users(id)
        DESTINATION KEY (dst_id) REFERENCES users(id)
    );

Suppose we only want to find the creation dates of a 3-hop connection
chain:

EXPLAIN (COSTS OFF)
SELECT e1_date, e2_date, e3_date
FROM GRAPH_TABLE (social_graph
    MATCH (u1 IS users)
          -[e1 IS knows]-> (u2 IS users)
          -[e2 IS knows]-> (u3 IS users)
          -[e3 IS knows]-> (u4 IS users)
    COLUMNS (
        e1.creation_date AS e1_date,
        e2.creation_date AS e2_date,
        e3.creation_date AS e3_date
    )
);
                     QUERY PLAN
----------------------------------------------------
 Hash Join
   Hash Cond: (knows_1.dst_id = knows_2.src_id)
   ->  Hash Join
         Hash Cond: (knows.dst_id = knows_1.src_id)
         ->  Seq Scan on knows
         ->  Hash
               ->  Seq Scan on knows knows_1
   ->  Hash
         ->  Seq Scan on knows knows_2
(9 rows)

As a result, all user nodes are removed, and this query is reduced
from a 7-table join down to a 3-table join.

Please see the draft commit message in the attached patch for the
implementation details.  Any thoughts and feedback are very welcome.

- Richard

Attachment: v1-0001-Remove-inner-joins-based-on-foreign-keys.patch
Description: Binary data

Reply via email to