[GitHub] [pinot] mqliang commented on a change in pull request #8433: Adjust comparison of numDocsScanned

GitBox Tue, 29 Mar 2022 17:16:53 -0700


mqliang commented on a change in pull request #8433:
URL: https://github.com/apache/pinot/pull/8433#discussion_r838004763




##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/SqlResultComparator.java
##########
@@ -381,22 +384,22 @@ private static boolean 
areNumEntriesScannedInFilterEqual(JsonNode actual, JsonNo
     return true;
   }
 
-  private static boolean areNumEntriesScannedPostFilterEqual(JsonNode actual, 
JsonNode expected) {
+  private static boolean isNumEntriesScannedPostFilterBetter(JsonNode actual, 
JsonNode expected) {
     long actualNumEntriesScannedPostFilter = 
actual.get(FIELD_NUM_ENTRIES_SCANNED_POST_FILTER).asLong();
     long expectedNumEntriesScannedPostFilter = 
expected.get(FIELD_NUM_ENTRIES_SCANNED_POST_FILTER).asLong();
-    if (actualNumEntriesScannedPostFilter != 
expectedNumEntriesScannedPostFilter) {
+    if (actualNumEntriesScannedPostFilter > 
expectedNumEntriesScannedPostFilter) {
       LOGGER.error("The numEntriesScannedPostFilter don't match! Actual: {}, 
Expected: {}",

Review comment:
       The numEntriesScannedPostFilter is worse

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/SqlResultComparator.java
##########
@@ -126,29 +126,32 @@ public static boolean areEqual(JsonNode actual, JsonNode 
expected, String query)
      */
     if (expected.has(FIELD_IS_SUPERSET) && 
expected.get(FIELD_IS_SUPERSET).asBoolean(false)) {
       return areElementsSubset(actualElementsSerialized, 
expectedElementsSerialized);
-    } else {
-      if (!areLengthsEqual(actual, expected)) {
-        return false;
-      }
-      /*
-       * Pinot server do some early termination optimization (process in 
parallel and early return if get enough
-       * documents to fulfill the LIMIT and OFFSET requirement) for queries:
-       * - selection without order by
-       * - selection with order by (by sorting the segments on min-max value)
-       * - DISTINCT queries.
-       * numDocsScanned is non-deterministic for those queries so 
numDocsScanned comparison should be skipped.
-       *
-       * NOTE: DISTINCT queries are modeled as non-selection queries during 
query processing, but all DISTINCT
-       * queries are selection queries during Calcite parsing (DISTINCT 
queries are selection queries with
-       * selectNode.getModifierNode(SqlSelectKeyword.DISTINCT) != null).
-       */
-      if (!isSelectionQuery(query) && !areNumDocsScannedEqual(actual, 
expected)) {
-        return false;
-      }
-      return isOrderByQuery(query) ? areOrderByQueryElementsEqual(actualRows, 
expectedRows, actualElementsSerialized,
-          expectedElementsSerialized, query)
-          : areNonOrderByQueryElementsEqual(actualElementsSerialized, 
expectedElementsSerialized);
     }
+    if (!areLengthsEqual(actual, expected)) {
+      return false;
+    }
+    boolean areResultsEqual = isOrderByQuery(query) ? 
areOrderByQueryElementsEqual(actualRows, expectedRows,
+        actualElementsSerialized, expectedElementsSerialized, query)
+        : areNonOrderByQueryElementsEqual(actualElementsSerialized, 
expectedElementsSerialized);
+    /*
+     * Pinot servers implement early termination optimization (process in 
parallel and early return if we get enough
+     * documents to fulfill the LIMIT and OFFSET requirement) for queries for 
the following cases:
+     * - selection without order by
+     * - selection with order by (by sorting the segments on min-max value)
+     * - DISTINCT queries.
+     * numDocsScanned is non-deterministic for those queries so numDocsScanned 
comparison should be skipped.
+     *
+     * In other cases, we accept if numDocsScanned is better than the expected 
one, since that is indicative of
+     * performance improvement.
+     *
+     * NOTE: DISTINCT queries are modeled as non-selection queries during 
query processing, but all DISTINCT
+     * queries are selection queries during Calcite parsing (DISTINCT queries 
are selection queries with
+     * selectNode.getModifierNode(SqlSelectKeyword.DISTINCT) != null).
+     */
+    if (areResultsEqual && !isSelectionQuery(query) && 
!isNumDocsScannedBetter(actual, expected)) {
+      return false;
+    }
+    return areResultsEqual;

Review comment:
       ```
   if ( !isSelectionQuery(query) && !isNumDocsScannedBetter(actual, expected)) {
         return false;
   }
   return areResultsEqual;
   ```

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/SqlResultComparator.java
##########
@@ -369,10 +372,10 @@ private static boolean areNumServersQueriedEqual(JsonNode 
actual, JsonNode expec
     return true;
   }
 
-  private static boolean areNumEntriesScannedInFilterEqual(JsonNode actual, 
JsonNode expected) {
+  private static boolean isNumEntriesScannedInFilterBetter(JsonNode actual, 
JsonNode expected) {
     long actualNumEntriesScannedInFilter = 
actual.get(FIELD_NUM_ENTRIES_SCANNED_IN_FILTER).asLong();
     long expectedNumEntriesScannedInFilter = 
expected.get(FIELD_NUM_ENTRIES_SCANNED_IN_FILTER).asLong();
-    if (actualNumEntriesScannedInFilter != expectedNumEntriesScannedInFilter) {
+    if (actualNumEntriesScannedInFilter > expectedNumEntriesScannedInFilter) {
       LOGGER
           .error("The numEntriesScannedInFilter don't match! Actual: {}, 
Expected: {}", actualNumEntriesScannedInFilter,

Review comment:
       The numEntriesScannedInFilter is worse




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] mqliang commented on a change in pull request #8433: Adjust comparison of numDocsScanned

Reply via email to