GitHub user my-ship-it closed a discussion: Enable answer query using Materialized View for external table.
### Description Currently, for inner table, planner knows to use MV to speed up query, for example: ``` create table tbl_heap(c1 int, c2 int); insert into tbl_heap select generate_series(1,1000000); create materialized view mv as select c1, c2 from tbl_heap where c1 > 10 and c2 < 20; analyze mv; set optimizer = off; set enable_answer_query_using_materialized_views = on; postgres=# explain select c1, c2 from tbl_heap where c1 > 10 and c2 < 20; QUERY PLAN ---------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1.03 rows=1 width=8) -> Seq Scan on mv (cost=0.00..1.01 rows=1 width=8) Optimizer: Postgres query optimizer (3 rows) ``` But for external table, it doesn't work ``` create external table tbl_ext(c1 int, c2 int) location('gpfdist://localhost:8080/tbl_ext') format 'csv'; create materialized view mv_ext as select c1, c2 from tbl_ext where c1 > 10 and c2 < 20; analyze mv_ext; set optimizer = off; set enable_answer_query_using_materialized_views = on; postgres=# explain select c1, c2 from tbl_ext where c1 > 10 and c2 < 20; QUERY PLAN ---------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..246940.11 rows=17320508 width=8) -> Foreign Scan on tbl_ext (cost=0.00..16000.00 rows=5773503 width=8) Filter: ((c1 > 10) AND (c2 < 20)) Optimizer: Postgres query optimizer (4 rows) ``` The reason why the replacement for external doesn't work is that, if external table changes, planner doesn't know that, and replacement of MV would produce wrong results. But for some cases, access to external table(for example, Iceberg, Hudi, and etc) is much slower than inner table(maybe 10X slower or more). If user knows there is no data changes for external table, we could speed up query against external table, which is a big win in this scenario. Another approach is that user load external table into inner table for analysis, but there is latency, and loading process is also time consuming. How about we introduce a GUC to control the behavior? If the user is 100% sure that the external table data has not been modified, they could turn on the GUC and speed up query. ### Use case/motivation _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! GitHub link: https://github.com/apache/cloudberry/discussions/693 ---- This is an automatically sent email for dev@cloudberry.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org For additional commands, e-mail: dev-h...@cloudberry.apache.org