[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

GitBox Mon, 08 Feb 2021 10:42:49 -0800


snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572273541




##########
File path: 
pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + 
AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + 
AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new 
StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new 
DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new 
BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new 
StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new 
ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, 
accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new 
DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = 
dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, 
fileSystemName);
 
-    _blobServiceClient =
-        new 
BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, 
fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, 
fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system 
creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient 
getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       I think that we should throw the exception instead of creating the file 
system in Azure. We can assume that the blob container is created beforehand 
because this file system may need to be configured differently depending on the 
requirements of Pinot users. Also, I don't think that Pinot should handle the 
life cycle of the deep storage layer and it should simply read/write data based 
on the configuration.
   
   Auto-managing containers will be handy if we need to use multiple containers 
but in our case, we just need a single container with ADLS Gen2 feature 
enabled. Also, please double-check with Google Cloud Storage and AWS S3 
implementation. We should keep the behavior the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Reply via email to