mcvsubbu commented on a change in pull request #6886: URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r628637817
########## File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java ########## @@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() { DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable(); mainThreadTimer.stop(); - long totalThreadCpuTimeNs = - intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs(); + long endWallClockTimeNs = System.nanoTime(); + + long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs(); + long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs(); + long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs; + long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs; + /* + * It's possible that totalThreadCpuTimeNs < totalWallClockTimeNs even if server launch multiple threads to process + * a query, this is because system activities time such as OS paging, GC, context switching are not captured by + * totalThreadCpuTimeNs. For example, let's divide query processing into 4 phases. + * - phase 1: single thread preparing. Time used: T1 + * - phase 2: N threads processing segments in parallel, each thread use time T2 + * - phase 3: GC/OS paging. Time used: T3 + * - phase 4: single thread merging intermediate results blocks. Time used: T4 + * + * Then we have following equations: + * - singleThreadCpuTimeNs = T1 + T4 + * - multipleThreadCpuTimeNs = T2 * N + * - totalWallClockTimeNs = T1 + T2 + T3 + T4 = singleThreadCpuTimeNs + T2 + T3 + * - totalThreadCpuTimeNsWithoutSystemActivities = T1 + T2 * N + T4 = singleThreadCpuTimeNs + T2 * N + * - systemActivitiesTimeNs = T3 = (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1) + * + * Thus: + * totalThreadCpuTimeNsWithSystemActivities = totalThreadCpuTimeNsWithoutSystemActivities + systemActivitiesTimeNs + * = totalThreadCpuTimeNsWithoutSystemActivities + T3 + * = totalThreadCpuTimeNsWithoutSystemActivities + (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1) + * = totalWallClockTimeNs + T2 * (N - 1) + * = totalWallClockTimeNs + (multipleThreadCpuTimeNs / N) * (N - 1) + */ + if (totalThreadCpuTimeNs < totalWallClockTimeNs) { Review comment: The only (unlikely) case that we should be careful about is if total turns out to be negative for some reason. That means the wall clock was probably moving a lot slower than the measured thread times. This indicates some fault as far as I can see. I don't think it can happen in a properly functioning system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org