github-actions[bot] commented on PR #61741: URL: https://github.com/apache/doris/pull/61741#issuecomment-4154004550
## Code Review Summary **PR: [fix](fe) Fix Ranger column-level privilege bypass when CTE combined with JOIN** ### Overall Assessment This is a well-targeted security fix for a privilege bypass vulnerability (Issue #61631). The approach is sound: when a CTE is materialized (not inlined due to `inlineCTEReferencedThreshold`), `LogicalCTEConsumer` is a leaf node, so `CheckPrivileges` (which inherits `ColumnPruning`'s traversal) never reaches the underlying tables in the CTE producer subtree. The fix explicitly traverses the producer plan when visiting a CTE consumer. ### Critical Checkpoint Conclusions **1. Goal and correctness:** The goal is to prevent column-level privilege bypass when CTEs are used in JOIN queries (causing materialization). The fix correctly stores the CTE producer plan in `StatementContext` during `RewriteCteChildren.visitLogicalCTEAnchor` and retrieves it in `CheckPrivileges.visitLogicalCTEConsumer` to traverse for privilege checks. Tests prove the fix works for both positive (allowed) and negative (denied) cases. **2. Modification minimality:** The change is focused and minimal: 4 files, 68 additions. The approach reuses existing infrastructure (`StatementContext`, `PruneContext`, visitor pattern) without introducing new abstractions. **3. Concurrency:** No concurrency concerns. `CheckPrivileges` is instantiated fresh per use (`CheckPrivileges::new` in `Rewriter.java:505`). The `checkedCteIds` instance field is thread-local to each instance. `StatementContext` is per-statement and not shared across threads during the rewrite phase. **4. Lifecycle management:** The `cteProducerPlansForPrivCheck` map on `StatementContext` stores immutable plan references (Nereids plans are immutable trees). No lifecycle issues or circular references. The map lives as long as the `StatementContext` (per-statement scope). **5. Execution ordering:** Verified correct. `RewriteCteChildren.visitLogicalCTEAnchor` stores the producer plan at lines 83-84 **before** recursing into the consumer side. `CheckPrivileges` runs as part of `CTE_CHILDREN_REWRITE_JOBS_BEFORE_SUB_PATH_PUSH_DOWN` inside the consumer-side rewriting. The producer plan is guaranteed to be in the map when `visitLogicalCTEConsumer` looks it up. **6. Pre-rewrite producer plan correctness:** The stored producer plan is `cteAnchor.child(0)` before the producer undergoes further rewriting. This is correct because: (a) Nereids plans are immutable, so the reference remains valid, and (b) the pre-rewrite plan contains all `LogicalCatalogRelation` nodes (the post-rewrite version can only restructure or eliminate relations, never add new ones). Checking all referenced tables is the correct security posture. **7. `producerPlan.getOutputExprIdBitSet()` as requiredSlotsIds:** Passing all output columns as "required" is correct. The CTE definition explicitly references those columns, and the user needs privilege for every column they reference, regardless of downstream usage. This is consistent with how inlined CTEs behave (column pruning propagates all CTE-body columns to the scan). **8. Edge cases verified:** - **Nested CTEs**: The recursive `accept` call naturally discovers nested `LogicalCTEConsumer` nodes. Anchor ordering ensures outer CTEs' producers are stored first. - **Partial inlining**: Inlined CTEs have tables directly in the plan tree (normal check). Non-inlined CTEs use the `cteProducerPlansForPrivCheck` map. - **Recursive CTEs**: `LogicalWorkTableReference` (self-reference) is not a `LogicalCatalogRelation`, so it's safely skipped. - **`checkedCteIds` guard**: Correctly prevents redundant checking when multiple consumers share the same producer. **9. Functionally parallel code paths:** No other code paths need this fix. The `privChecked` flag on `StatementContext` correctly prevents double-checking in the view permission passthrough scenario. **10. Configuration items:** No new configuration items added. None needed. **11. Incompatible changes:** None. No storage format changes, no function symbol changes. **12. Test coverage:** Tests cover 4 scenarios: CTE+LEFT JOIN with full privilege (pass), CTE+LEFT JOIN with restricted column (denied), CTE+LEFT JOIN with only allowed columns (pass), CTE+INNER JOIN with restricted column (denied). Tests reuse the existing test infrastructure and privilege mock framework. **13. Observability:** Not applicable for this change (privilege denial throws `AnalysisException` with descriptive messages from the existing `checkColsPriv` implementation). **14. Performance:** Negligible impact. The producer plan traversal happens once per CTE (guarded by `checkedCteIds`), and only for materialized (non-inlined) CTEs. ### Minor Observations (Non-blocking) - **Dead code in `ColumnPruning`**: The base class `ColumnPruning` has `private` methods `computeUsedColumns` (line 556) and `checkColumnPrivileges` (line 572) that appear to be dead code -- they are never called from within `ColumnPruning` itself. `CheckPrivileges` has its own copies of these methods. This is a pre-existing issue, not introduced by this PR, and could be cleaned up separately. ### Verdict **No issues found.** The fix is correct, minimal, well-tested, and handles edge cases properly. The security vulnerability is properly addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
