[GitHub] [doris] yiguolei commented on a diff in pull request #10420: [improvement]Do not lazily read dict encoded columns

GitBox Sat, 25 Jun 2022 01:42:17 -0700


yiguolei commented on code in PR #10420:
URL: https://github.com/apache/doris/pull/10420#discussion_r906655600



##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -704,61 +705,48 @@ void SegmentIterator::_vec_init_lazy_materialization() {
     //   When output block to query layer, delete column can be skipped.
     //  _schema.column_ids() stands for storage layer block schema, so it 
contains delete columnid
     //  we just regard delete column as common pred column here.
-    if (_schema.column_ids().size() > pred_column_ids.size()) {
-        for (auto cid : _schema.column_ids()) {
-            if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
-                FieldType type = _schema.column(cid)->type();
-
-                // todo(wb) maybe we can make read char type faster
-                // todo(wb) support map/array type
-                // todo(wb) consider multiple integer columns cost, such as 
1000 columns, maybe lazy materialization faster
-                if (!_lazy_materialization_read &&
-                    (_is_need_vec_eval ||
-                     _is_need_short_eval) && // only when pred exists, we need 
to consider lazy materialization
-                    (type == OLAP_FIELD_TYPE_HLL || type == 
OLAP_FIELD_TYPE_OBJECT ||
-                     type == OLAP_FIELD_TYPE_VARCHAR || type == 
OLAP_FIELD_TYPE_CHAR ||
-                     type == OLAP_FIELD_TYPE_STRING || type == 
OLAP_FIELD_TYPE_BOOL ||
-                     type == OLAP_FIELD_TYPE_DATE || type == 
OLAP_FIELD_TYPE_DATETIME ||
-                     type == OLAP_FIELD_TYPE_DECIMAL)) {
-                    _lazy_materialization_read = true;
+    for (size_t i = 0; i < _schema.num_column_ids(); ++i) {
+        auto cid = _schema.column_id(i);
+        FieldType type = _schema.column(cid)->type();
+        if (!_is_pred_column[cid]) {
+            _non_predicate_columns.emplace_back(cid);
+            switch (type) {
+            case OLAP_FIELD_TYPE_VARCHAR:
+            case OLAP_FIELD_TYPE_CHAR:
+            case OLAP_FIELD_TYPE_STRING: {
+                // if a string column is all dict encoding in one segment, 
it's almost same as
+                // an int32_t column, it can be read together with predicate 
columns.
+                if (config::enable_low_cardinality_optimize &&
+                    _column_iterators[cid]->is_all_dict_encoding()) {
+                    pred_column_ids.insert(cid);

Review Comment:
   A litter confusing, we should distinguish first read columns and predicate 
columns. Here, the string column (all dict encoding) is not a predicate column 
but you add it to pred_column_ids. The code is very confusing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #10420: [improvement]Do not lazily read dict encoded columns

Reply via email to