Jibing-Li commented on code in PR #26435:
URL: https://github.com/apache/doris/pull/26435#discussion_r1390572530
##########
fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java:
##########
@@ -194,22 +226,66 @@ protected Pair<List<Long>, Long>
calcActualSampleTablets() {
long tabletId = ids.get(seekTid);
sampleTabletIds.add(tabletId);
actualSampledRowCount +=
baseIndex.getTablet(tabletId).getRowCount(true);
+ if (actualSampledRowCount >= sampleRows &&
!forPartitionColumn) {
+ enough = true;
+ break;
+ }
}
-
totalRows += p.getBaseIndex().getRowCount();
totalTablet += ids.size();
+ if (enough) {
+ break;
+ }
}
// all hit, direct full
if (totalRows < sampleRows) {
// can't fill full sample rows
sampleTabletIds.clear();
- } else if (sampleTabletIds.size() == totalTablet) {
- // TODO add limit
+ } else if (sampleTabletIds.size() == totalTablet && !enough) {
sampleTabletIds.clear();
- } else if (!sampleTabletIds.isEmpty()) {
- // TODO add limit
}
return Pair.of(sampleTabletIds, actualSampledRowCount);
}
+
+ /**
+ * For ordinary column (neither key column nor partition column), need to
limit sample size to user specified value.
+ * @return Return true when need to limit.
+ */
+ protected boolean needLimit() {
+ // Key column is sorted, use limit will cause the ndv not accurate
enough, so skip key columns.
+ if (col.isKey()) {
Review Comment:
In this non-sorted case, col.isKey will always return false, so doesn't need
to handle this case separatly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]