gortiz commented on code in PR #13733: URL: https://github.com/apache/pinot/pull/13733#discussion_r1752031714
########## pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java: ########## @@ -256,4 +262,66 @@ private Map<String, String> consolidateMetadata(Map<String, String> customProper public void cancel(long requestId) { _opChainScheduler.cancel(requestId); } + + public StagePlan explainQuery( + WorkerMetadata workerMetadata, StagePlan stagePlan, Map<String, String> requestMetadata) { + + if (!workerMetadata.isLeafStageWorker()) { + LOGGER.debug("Explain query on intermediate stages is a NOOP"); + return stagePlan; + } + long requestId = Long.parseLong(requestMetadata.get(CommonConstants.Query.Request.MetadataKeys.REQUEST_ID)); + long timeoutMs = Long.parseLong(requestMetadata.get(CommonConstants.Broker.Request.QueryOptionKey.TIMEOUT_MS)); + long deadlineMs = System.currentTimeMillis() + timeoutMs; + + StageMetadata stageMetadata = stagePlan.getStageMetadata(); + Map<String, String> opChainMetadata = consolidateMetadata(stageMetadata.getCustomProperties(), requestMetadata); + + if (PipelineBreakerExecutor.hasPipelineBreakers(stagePlan)) { + // TODO: Support pipeline breakers before merging this feature. + LOGGER.error("Pipeline breaker is not supported in explain query"); + return stagePlan; + } Review Comment: The main problem is that in order to have the exact physical plan in this case we would need to actually execute the pipeline breaker part. For example, a query like: ```sql select whatever from table1 where col1 in (select something from table2 where col2 = cte) ``` the actual physical plan will depend on the result of `(select something from table2 where col2 = cte)`. Assuming that subquery is evaluated to [100, 200, 300], the query would be: ```sql select whatever from table1 where col1 in (100, 200, 300) ``` Which could, for example, use a inverted index. I had the happy idea of generating a random set of values on the `col1` type, but that could create incorrect plans. For example imagine we randomly generate the set of values `1,2`. The query we would be using to explain would be: ```sql select whatever from table1 where col1 in (1, 2) ``` Now imagine that values of `col1` went from 100 to 200. That query would be explained as `ALL_SEGMENTS_PRUNED_ON_SERVER` when the actual plan would be probably different -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org