Re: [PR] [feature](datalake) Add BucketShuffleJoin support for bucketed hive tables [doris]

via GitHub Thu, 07 Dec 2023 03:21:07 -0800


Nitin-Kashyap commented on code in PR #27784:
URL: https://github.com/apache/doris/pull/27784#discussion_r1418803158



##########
fe/fe-core/src/main/java/org/apache/doris/planner/external/HiveScanNode.java:
##########
@@ -423,4 +430,37 @@ protected TFileCompressType getFileCompressType(FileSplit 
fileSplit) throws User
         }
         return compressType;
     }
+
+    @Override
+    public DataPartition constructInputPartitionByDistributionInfo() {
+        if (hmsTable.isBucketedTable()) {
+            DistributionInfo distributionInfo = 
hmsTable.getDefaultDistributionInfo();
+            if (!(distributionInfo instanceof HashDistributionInfo)) {
+                return DataPartition.RANDOM;
+            }
+            List<Column> distributeColumns = ((HiveExternalDistributionInfo) 
distributionInfo).getDistributionColumns();
+            List<Expr> dataDistributeExprs = Lists.newArrayList();
+            for (Column column : distributeColumns) {
+                SlotRef slotRef = new SlotRef(desc.getRef().getName(), 
column.getName());
+                dataDistributeExprs.add(slotRef);
+            }
+            return DataPartition.hashPartitioned(dataDistributeExprs, 
THashType.SPARK_MURMUR32);
+        }
+
+        return DataPartition.RANDOM;
+    }
+
+    public HMSExternalTable getHiveTable() {
+        return hmsTable;
+    }
+
+    @Override
+    public THashType getHashType() {
+        if (hmsTable.isBucketedTable()
+                && hmsTable.getDefaultDistributionInfo() instanceof 
HashDistributionInfo) {

Review Comment:
   hmsTable.isBucketedTable() shall retun true only when its generated by 
Spark; I share remove the spark specific check once we add hive support as 
well..
   
   
![image](https://github.com/apache/doris/assets/66766227/4a0b10b2-c528-4447-8fa3-58efd4b18bf2)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [feature](datalake) Add BucketShuffleJoin support for bucketed hive tables [doris]

Reply via email to