Re: [I] Support Nessie catalog [iceberg-python]

via GitHub Thu, 26 Sep 2024 22:57:24 -0700


cee-shubham commented on issue #19:
URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2378455568


   I want to create iceberg tables using pyiceberg and store it in minio store, 
so for this i have created docker containers for services named as: nessie, 
minio, dremio
   Earlier i was using pyspark and was able to create tables using code:
   import pyspark
   from pyspark.sql import SparkSession
   import os
   
   ## DEFINE SENSITIVE VARIABLES
   NESSIE_URI = "http://nessie:19120/api/v1";
   MINIO_ACCESS_KEY = "my_access_key"
   MINIO_SECRET_KEY = "my_secret_access_key"
   
   
   
   conf = (
       pyspark.SparkConf()
           .setAppName('app_name')
                #packages
           .set('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.3.1,org.projectnessie.nessie-integrations:nessie-spark-extensions-3.3_2.12:0.67.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
                #SQL Extensions
           .set('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions')
                #Configuring Catalog
           .set('spark.sql.catalog.nessie', 
'org.apache.iceberg.spark.SparkCatalog')
           .set('spark.sql.catalog.nessie.uri', NESSIE_URI)
           .set('spark.sql.catalog.nessie.ref', 'main')
           .set('spark.sql.catalog.nessie.authentication.type', 'NONE')
           .set('spark.sql.catalog.nessie.catalog-impl', 
'org.apache.iceberg.nessie.NessieCatalog')
           .set('spark.sql.catalog.nessie.warehouse', 's3a://warehouse')
           .set('spark.sql.catalog.nessie.s3.endpoint', 'http://minio:9000')
           .set('spark.sql.catalog.nessie.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO')
                #MINIO CREDENTIALS
           .set('spark.hadoop.fs.s3a.access.key', MINIO_ACCESS_KEY)
           .set('spark.hadoop.fs.s3a.secret.key', MINIO_SECRET_KEY)
   )
   
   ## Start Spark Session
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   print("Spark Running")
   
   ## LOAD A CSV INTO AN SQL VIEW
   csv_df = spark.read.format("csv").option("header", 
"true").load("../datasets/df_open_2023.csv")
   csv_df.createOrReplaceTempView("csv_open_2023")
   
   ## CREATE AN ICEBERG TABLE FROM THE SQL VIEW
   spark.sql("CREATE TABLE IF NOT EXISTS nessie.df_open_2023 USING iceberg AS 
SELECT * FROM csv_open_2023").show()
   
   ## QUERY THE ICEBERG TABLE
   spark.sql("SELECT * FROM nessie.df_open_2023 limit 10").show()
   
   Please tell me how to do it with pyiceberg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support Nessie catalog [iceberg-python]

Reply via email to