richardstartin commented on PR #10043: URL: https://github.com/apache/pinot/pull/10043#issuecomment-1371979974
The compatibility verifier is failing because the numDocsScannedInFilter increases, but timeUsedMs decreases by a large factor in each case: e.g. `numEntriesScannedInFilter` increased from 12 to 69, but `timeUsedMs` decreased from 20ms to 3ms. ``` 2023/01/05 04:11:35.051 ERROR [QueryOp] [main] Comparison FAILED: Line: 23, query: 'SELECT longDimSV1, doubleDimSV1 from FeatureTest3 WHERE doubleDimSV1 > 99 AND generationNumber = 1 LIMIT 1000', actual response: {"resultTable":{"dataSchema":{"columnNames":["longDimSV1","doubleDimSV1"],"columnDataTypes":["LONG","DOUBLE"]},"rows":[[270,99.62],[268,99.08],[183,99.11],[286,99.1]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":60,"numSegmentsProcessed":4,"numSegmentsMatched":3,"numConsumingSegmentsQueried":3,"numConsumingSegmentsProcessed":0,"numConsumingSegmentsMatched":0,"numDocsScanned":4,"numEntriesScannedInFilter":69,"numEntriesScannedPostFilter":8,"numGroupsLimitReached":false,"totalDocs":1200,"timeUsedMs":3,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlineResponseSerializationCpuTimeNs":0,"realtimeResponseSerializationCpuTimeNs":0,"offlineTotalCpuTimeNs" :0,"realtimeTotalCpuTimeNs":0,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":1672891889033,"explainPlanNumEmptyFilterSegments":0,"numSegmentsPrunedByBroker":0,"numRowsResultSet":4,"numSegmentsPrunedByLimit":0,"numSegmentsPrunedByValue":54,"explainPlanNumMatchAllFilterSegments":0,"numSegmentsPrunedByServer":56,"numSegmentsPrunedInvalid":0}, expected response: {"resultTable":{"dataSchema":{"columnNames":["longDimSV1","doubleDimSV1"],"columnDataTypes":["LONG","DOUBLE"]},"rows":[[286,99.1],[183,99.11],[270,99.62],[268,99.08]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":17,"numSegmentsProcessed":3,"numSegmentsMatched":3,"numConsumingSegmentsQueried":3,"numDocsScanned":4,"numEntriesScannedInFilter":12,"numEntriesScannedPostFilter":8,"numGroupsLimitReached":false,"totalDocs":300,"timeUsedMs":20,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlin eResponseSerializationCpuTimeNs":0,"realtimeResponseSerializationCpuTimeNs":0,"offlineTotalCpuTimeNs":0,"realtimeTotalCpuTimeNs":0,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":1641415527398,"numRowsResultSet":4} ``` I think this is because filter bitmaps are pushed down to `SVScanDocIdIterator.applyAnd`, so only the already filtered doc ids are counted towards the "cost" of the scan, but they are not pushed down to the range filter, so even though the range filter is faster than scanning, it has to consider more rows before filtering them out, and reports a higher `numEntriesScannedInFilter` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org