I've built Spark 3.0.0-preview2 with the -Phadoop-3.2 profile switch and deployed it via Kubernetes.
I launch Spark with a switch to pull in the relevant Hadoop/Azure dependencies: --packages org.apache.hadoop:hadoop-azure:3.2.0,org.apache.hadoop:hadoop-azure-datalake:3.2.0 and see that com.microsoft.azure#azure-storage;7.0.0 is indeed pulled in. I can see files using a blob.core.windows.net URL but the dfs.core.windows.net throws an Exception saying "The specified Rest Version is Unsupported". I use tcpdump and see that my client is indeed using: x-ms-version: 2017-07-29 in its HTTP headers. If I upgrade to azure-storage:8.6.0, I see in the HTTP headers: x-ms-version: 2019-02-02 and the job gets slightly further but reading the Parquet file now fails with "Incorrect Blob type, please use the correct Blob type to access a blob on the server. Expected BLOCK_BLOB, actual UNSPECIFIED". This is not overly surprising as I am shoe-horning in a binary that Hadoop was unprepared for. I just did this to demonstrate that this version of the library seems to talk to Azure as its version is more recent. Does anybody have any ideas on how I can talk to Azure? [Note: for various non-technical reasons, I cannot use HDInsight nor DataBricks.] Kind regards, Phillip
