Re: [I] Iceberg / Spark writing to s3 warehouse : Unable to load region from any of the providers in the chain software [iceberg]

via GitHub Wed, 01 May 2024 07:46:31 -0700


UFMurphy commented on issue #7570:
URL: https://github.com/apache/iceberg/issues/7570#issuecomment-2088572698


   Hi dramatically,
   
   In your {SPARK_HOME}/conf directory, you should see a file called
   spark-defaults.conf.
   At the bottom of that file, just add your environment variables in form :
   NAME VALUE
   
   Here is a copy of the one I use locally:
   
   
   spark.master                     spark://master:7077
   
   spark.sql.caseSensitive true
   
   spark.executor.memory 5g
   
   spark.executor.cores 2
   
   spark.driver.memory 2g
   
   spark.ui.port 4042
   
   spark.executor.heartbeatInterval 10000000
   
   spark.network.timeout 10000000
   
   spark.local.dir /tmp/spark
   
   spark.hadoop.fs.s3a.access.key xxxxxxxxxxxxxx
   
   spark.hadoop.fs.s3a.aws.credentials.provider
   org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
   
   spark.hadoop.fs.s3a.block.size 512M
   
   spark.hadoop.fs.s3a.committer.name directory
   
   spark.hadoop.fs.s3a.committer.magic.enabled false
   
   spark.hadoop.fs.s3a.committer.staging.abort.pending.uploads true
   
   spark.hadoop.fs.s3a.committer.staging.conflict-mode append
   
   spark.hadoop.fs.s3a.committer.staging.tmp.path /tmp/staging
   
   spark.hadoop.fs.s3a.committer.staging.unique-filenames true
   
   spark.hadoop.fs.s3a.committer.staging.abort.pending.uploads true
   
   spark.hadoop.fs.s3a.committer.threads 2048
   
   spark.hadoop.fs.s3a.connection.establish.timeout 5000
   
   spark.hadoop.fs.s3a.connection.maximum 8192
   
   spark.hadoop.fs.s3a.connection.ssl.enabled false
   
   spark.hadoop.fs.s3a.connection.timeout 200000
   
   spark.hadoop.fs.s3a.endpoint http://localminio:9000/
   
   spark.hadoop.fs.s3a.fast.upload true
   
   spark.hadoop.fs.s3a.fast.upload.active.blocks 2048
   
   spark.hadoop.fs.s3a.fast.upload.buffer disk
   
   spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
   
   spark.hadoop.fs.s3a.max.total.tasks 2048
   
   spark.hadoop.fs.s3a.multipart.size 512M
   
   spark.hadoop.fs.s3a.multipart.threshold 512M
   
   spark.hadoop.fs.s3a.path.style.access True
   
   spark.hadoop.fs.s3a.secret.key xxxxxxxxxxxx
   
   spark.hadoop.fs.s3a.socket.recv.buffer 65536
   
   spark.hadoop.fs.s3a.socket.send.buffer 65536
   
   spark.hadoop.fs.s3a.threads.max 2048
   
   spark.worker.cleanup.enabled True
   
   spark.jars.packages
   
org.postgresql:postgresql:42.5.1,org.apache.hadoop:hadoop-aws:3.3.1,com.amazonaws:aws-java-sdk-bom:1.12.50$
   
   
   
   Then restart the cluster and it *should* work
   
   
   On Sun, Apr 28, 2024 at 4:10 PM YannickLecroart21 ***@***.***>
   wrote:
   
   > @dramaticlly <https://github.com/dramaticlly>
   >
   > Thank you for your suggestion. I wasn't able to properly set the
   > environment variables using that specific approach. But it got me thinking
   > about other ways. I got it working by setting the aws credentials as
   > environment variables in the conf/spark-env.sh file. These Jupyter
   > notebooks throw a wrench in to everything ;-). Thanks again!
   >
   > How did you do that please?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/7570#issuecomment-2081657309>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAHNFISBPRZKJ4SSZS2PPA3Y7VQTHAVCNFSM6AAAAAAX3OCJFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRGY2TOMZQHE>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Iceberg / Spark writing to s3 warehouse : Unable to load region from any of the providers in the chain software [iceberg]

Reply via email to