kevinjqliu commented on code in PR #2491:
URL: https://github.com/apache/iceberg-python/pull/2491#discussion_r2366289869


##########
.gitignore:
##########
@@ -37,13 +37,6 @@ coverage.xml
 bin/
 .vscode/
 
-# Hive/metastore files
-metastore_db/
-
-# Spark/metastore files
-spark-warehouse/
-derby.log

Review Comment:
   no longer needed since we no longer run spark and metastore locally



##########
Makefile:
##########
@@ -37,7 +37,7 @@ endif
 ifeq ($(KEEP_COMPOSE),1)
   CLEANUP_COMMAND = echo "Keeping containers running for debugging 
(KEEP_COMPOSE=1)"
 else
-  CLEANUP_COMMAND = docker compose -f dev/docker-compose-integration.yml down 
-v --remove-orphans 2>/dev/null || true
+  CLEANUP_COMMAND = docker compose -f dev/docker-compose-integration.yml down 
-v --remove-orphans --timeout 0 2>/dev/null || true

Review Comment:
   dont wait for docker compose down, more responsive



##########
dev/docker-compose-integration.yml:
##########
@@ -26,15 +26,13 @@ services:
       - rest
       - hive
       - minio
-    volumes:
-      - ./warehouse:/home/iceberg/warehouse
     environment:
       - AWS_ACCESS_KEY_ID=admin
       - AWS_SECRET_ACCESS_KEY=password
       - AWS_REGION=us-east-1
     ports:
-      - 8888:8888
-      - 8080:8080

Review Comment:
   removed port 8888 that was previously used for notebooks
   replaced spark master web ui (8080) with spark app ui (4040)



##########
Makefile:
##########
@@ -18,7 +18,7 @@
 # Configuration Variables
 # ========================
 
-PYTEST_ARGS ?= -v  # Override with e.g. PYTEST_ARGS="-vv --tb=short"
+PYTEST_ARGS ?= -v -x  # Override with e.g. PYTEST_ARGS="-vv --tb=short"

Review Comment:
   -x to exit test when ctrl-c
   
   ```
     -x, --exitfirst       exit instantly on first error or failed test.
   ```
   https://docs.pytest.org/en/6.2.x/reference.html#command-line-flags



##########
dev/Dockerfile:
##########
@@ -36,25 +36,38 @@ ENV 
PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.7-src.zip:$
 RUN mkdir -p ${HADOOP_HOME} && mkdir -p ${SPARK_HOME} && mkdir -p 
/home/iceberg/spark-events
 WORKDIR ${SPARK_HOME}
 
-# Remember to also update `tests/conftest`'s spark setting
 ENV SPARK_VERSION=3.5.6
-ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_2.12
-ENV ICEBERG_VERSION=1.9.1
+ENV SCALA_VERSION=2.12
+ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_${SCALA_VERSION}
+ENV ICEBERG_VERSION=1.9.2
 ENV PYICEBERG_VERSION=0.10.0
+ENV HADOOP_VERSION=3.3.4
+ENV AWS_SDK_VERSION=1.12.753

Review Comment:
   copied over from `tests/conftest`, these were originally downloaded client 
side



##########
tests/integration/test_add_files.py:
##########
@@ -503,7 +503,7 @@ def 
test_add_files_to_partitioned_table_fails_with_lower_and_upper_mismatch(
 
 @pytest.mark.integration
 def test_add_files_snapshot_properties(spark: SparkSession, session_catalog: 
Catalog, format_version: int) -> None:
-    identifier = f"default.unpartitioned_table_v{format_version}"
+    identifier = f"default.snapshot_properties_v{format_version}"

Review Comment:
   name conflict with another test above
   
   
https://github.com/apache/iceberg-python/blob/6935b4115557a0bf225de6d86dbfbbdb39a33618/tests/integration/test_add_files.py#L160-L161



##########
dev/spark-defaults.conf:
##########
@@ -16,20 +16,35 @@
 #
 
 spark.sql.extensions                   
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
+
+# Configure Iceberg REST catalog
 spark.sql.catalog.rest                 org.apache.iceberg.spark.SparkCatalog
 spark.sql.catalog.rest.type            rest
 spark.sql.catalog.rest.uri             http://rest:8181
 spark.sql.catalog.rest.io-impl         org.apache.iceberg.aws.s3.S3FileIO
 spark.sql.catalog.rest.warehouse       s3://warehouse/rest/
 spark.sql.catalog.rest.s3.endpoint     http://minio:9000
+spark.sql.catalog.rest.cache-enabled   false
+
+# Configure Iceberg Hive catalog
 spark.sql.catalog.hive                 org.apache.iceberg.spark.SparkCatalog
 spark.sql.catalog.hive.type            hive
-spark.sql.catalog.hive.uri             http://hive:9083
+spark.sql.catalog.hive.uri             thrift://hive:9083
 spark.sql.catalog.hive.io-impl         org.apache.iceberg.aws.s3.S3FileIO
 spark.sql.catalog.hive.warehouse       s3://warehouse/hive/
 spark.sql.catalog.hive.s3.endpoint     http://minio:9000
+
+# Configure Spark's default session catalog (spark_catalog) to use Iceberg 
backed by the Hive Metastore
+spark.sql.catalog.spark_catalog        
org.apache.iceberg.spark.SparkSessionCatalog
+spark.sql.catalog.spark_catalog.type   hive
+spark.sql.catalog.spark_catalog.uri    thrift://hive:9083
+spark.hadoop.fs.s3a.endpoint           http://minio:9000
+spark.sql.catalogImplementation        hive
+spark.sql.warehouse.dir                s3a://warehouse/hive/

Review Comment:
   spark_catalog is primarily used by the `test_migrate_table` test. It calls 
`<catalog>.system.snapshot` which requires spark_catalog
   
   
https://github.com/apache/iceberg-python/blob/6935b4115557a0bf225de6d86dbfbbdb39a33618/tests/integration/test_hive_migration.py#L73



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to