[GitHub] [incubator-doris] morningman commented on a change in pull request #4163: [Spark Load]Create spark load's repository in HDFS for dependencies

GitBox Fri, 24 Jul 2020 08:45:29 -0700


morningman commented on a change in pull request #4163:
URL: https://github.com/apache/incubator-doris/pull/4163#discussion_r460096973




##########
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/SparkResource.java
##########
@@ -144,6 +146,12 @@ public SparkResource getCopiedResource() {
         return new SparkResource(name, Maps.newHashMap(sparkConfigs), 
workingDir, broker, brokerProperties);
     }
 
+    public SparkRepository getRemoteRepository() {
+        String remoteRepositoryPath = workingDir + "/" + Config.cluster_id + 
"/" + SparkRepository.REPOSITORY_DIR;

Review comment:
       Use `Catalog.getClusterId()` instead of `Config.cluster_id`.
   
   And how about add `SparkResource`'s name to path? Such as:
   
   `/path/to/work/dir/cluster_id/__spark_repository__my_resource_name/`
   
   No need to add additional level of dir.
   

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/load/loadv2/SparkEtlJobHandler.java
##########
@@ -70,15 +72,16 @@
 public class SparkEtlJobHandler {
     private static final Logger LOG = 
LogManager.getLogger(SparkEtlJobHandler.class);
 
-    private static final String APP_RESOURCE_NAME = "palo-fe.jar";
     private static final String CONFIG_FILE_NAME = "jobconfig.json";
-    private static final String APP_RESOURCE_LOCAL_PATH = 
PaloFe.DORIS_HOME_DIR + "/lib/" + APP_RESOURCE_NAME;
     private static final String JOB_CONFIG_DIR = "configs";
     private static final String ETL_JOB_NAME = "doris__%s";
     // 5min
     private static final int GET_APPID_MAX_RETRY_TIMES = 300;
     private static final int GET_APPID_SLEEP_MS = 1000;
 
+    // cluster_id -> lock
+    private static final ConcurrentMap<Integer, Object> DPP_LOCK_MAP = 
Maps.newConcurrentMap();

Review comment:
       Use SparkResource' name as Key, not ClusterId.
   ClusterId is same all the way.
   
   And I think this lock map is unnecessary. You can just call 
`resource.prepareDppArchive()`.
   
   And `prepareDppArchive()` is protected by `synchronized`.
   

##########
File path: fe/fe-core/src/main/java/org/apache/doris/common/Config.java
##########
@@ -509,12 +509,29 @@
     @ConfField(mutable = true, masterOnly = true)
     public static int hadoop_load_default_timeout_second = 86400 * 3; // 3 day
 
+    // Configurations for spark load
+    /**
+     * Default spark dpp version
+     */
+    public static String spark_dpp_version = "1_0_0";

Review comment:
       ```suggestion
       @ConfField
       public static String spark_dpp_version = "1_0_0";
   ```
   
   All config must has `@ConfField` annotation.

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/load/loadv2/SparkEtlJobHandler.java
##########
@@ -255,6 +295,17 @@ public void killEtlJob(SparkAppHandle handle, String 
appId, long loadJobId, Spar
         }
     }
 
+    private SparkRepository.SparkArchive prepareDppArchive(SparkResource 
resource) throws LoadException {
+        // remote archive
+        SparkRepository.SparkArchive archive = null;
+        SparkRepository repository = resource.getRemoteRepository();

Review comment:
       Each SparkResource has and only has one SparkRepository.
   So why not just call `resource. prepareDppArchive()` and return 
`SparkRepository.SparkArchive`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[GitHub] [incubator-doris] morningman commented on a change in pull request #4163: [Spark Load]Create spark load's repository in HDFS for dependencies

Reply via email to