[GitHub] [incubator-doris] yinzhijian commented on a diff in pull request #9433: [feature-wip](parquet-vec) Support parquet scanner in vectorized engine

GitBox Thu, 12 May 2022 02:14:57 -0700


yinzhijian commented on code in PR #9433:
URL: https://github.com/apache/incubator-doris/pull/9433#discussion_r871144364



##########
fe/fe-core/src/main/java/org/apache/doris/load/Load.java:
##########
@@ -1044,30 +1047,61 @@ private static void initColumns(Table tbl, 
List<ImportColumnDesc> columnExprs,
         if (!needInitSlotAndAnalyzeExprs) {
             return;
         }
-
+        Set<String> exprArgsColumns = 
Sets.newTreeSet(String.CASE_INSENSITIVE_ORDER);
+        for (ImportColumnDesc importColumnDesc : copiedColumnExprs) {
+            if (importColumnDesc.isColumn()) {
+                continue;
+            }
+            List<SlotRef> slots = Lists.newArrayList();
+            importColumnDesc.getExpr().collect(SlotRef.class, slots);
+            for (SlotRef slot : slots) {
+                String slotColumnName = slot.getColumnName();
+                exprArgsColumns.add(slotColumnName);
+            }
+        }
+        Set<String> excludedColumns = 
Sets.newTreeSet(String.CASE_INSENSITIVE_ORDER);
         // init slot desc add expr map, also transform hadoop functions
         for (ImportColumnDesc importColumnDesc : copiedColumnExprs) {
             // make column name case match with real column name
             String columnName = importColumnDesc.getColumnName();
             String realColName;
-            if (tbl.getColumn(columnName) == null || 
importColumnDesc.getExpr() == null) {
+            if (tblColumn == null || importColumnDesc.getExpr() == null) {
                 realColName = columnName;
             } else {
-                realColName = tbl.getColumn(columnName).getName();
+                realColName = tblColumn.getName();
             }
             if (importColumnDesc.getExpr() != null) {
                 Expr expr = transformHadoopFunctionExpr(tbl, realColName, 
importColumnDesc.getExpr());
                 exprsByName.put(realColName, expr);
             } else {
                 SlotDescriptor slotDesc = 
analyzer.getDescTbl().addSlotDescriptor(srcTupleDesc);
-                slotDesc.setType(ScalarType.createType(PrimitiveType.VARCHAR));
+                // only support parquet format now
+                if (useVectorizedLoad  && formatType == 
TFileFormatType.FORMAT_PARQUET
+                        && tblColumn != null) {
+                    // in vectorized load
+                    if (exprArgsColumns.contains(columnName)) {

Review Comment:
   A is varchar type. Suppose A is datetime in the schema , INT in the 
expression, so it is not sure whether to use the type of the schema or the 
inferred type of the expression.
   
   B follows the exprsByName logic, assuming that B is int in the schema, then 
cast((a+1) as int)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[GitHub] [incubator-doris] yinzhijian commented on a diff in pull request #9433: [feature-wip](parquet-vec) Support parquet scanner in vectorized engine

Reply via email to