kevinjqliu commented on code in PR #2491:
URL: https://github.com/apache/iceberg-python/pull/2491#discussion_r2366289869
##########
.gitignore:
##########
@@ -37,13 +37,6 @@ coverage.xml
bin/
.vscode/
-# Hive/metastore files
-metastore_db/
-
-# Spark/metastore files
-spark-warehouse/
-derby.log
Review Comment:
no longer needed since we no longer run spark and metastore locally
##########
Makefile:
##########
@@ -37,7 +37,7 @@ endif
ifeq ($(KEEP_COMPOSE),1)
CLEANUP_COMMAND = echo "Keeping containers running for debugging
(KEEP_COMPOSE=1)"
else
- CLEANUP_COMMAND = docker compose -f dev/docker-compose-integration.yml down
-v --remove-orphans 2>/dev/null || true
+ CLEANUP_COMMAND = docker compose -f dev/docker-compose-integration.yml down
-v --remove-orphans --timeout 0 2>/dev/null || true
Review Comment:
dont wait for docker compose down, more responsive
##########
dev/docker-compose-integration.yml:
##########
@@ -26,15 +26,13 @@ services:
- rest
- hive
- minio
- volumes:
- - ./warehouse:/home/iceberg/warehouse
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
ports:
- - 8888:8888
- - 8080:8080
Review Comment:
removed port 8888 that was previously used for notebooks
replaced spark master web ui (8080) with spark app ui (4040)
##########
Makefile:
##########
@@ -18,7 +18,7 @@
# Configuration Variables
# ========================
-PYTEST_ARGS ?= -v # Override with e.g. PYTEST_ARGS="-vv --tb=short"
+PYTEST_ARGS ?= -v -x # Override with e.g. PYTEST_ARGS="-vv --tb=short"
Review Comment:
-x to exit test when ctrl-c
```
-x, --exitfirst exit instantly on first error or failed test.
```
https://docs.pytest.org/en/6.2.x/reference.html#command-line-flags
##########
dev/Dockerfile:
##########
@@ -36,25 +36,38 @@ ENV
PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.7-src.zip:$
RUN mkdir -p ${HADOOP_HOME} && mkdir -p ${SPARK_HOME} && mkdir -p
/home/iceberg/spark-events
WORKDIR ${SPARK_HOME}
-# Remember to also update `tests/conftest`'s spark setting
ENV SPARK_VERSION=3.5.6
-ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_2.12
-ENV ICEBERG_VERSION=1.9.1
+ENV SCALA_VERSION=2.12
+ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_${SCALA_VERSION}
+ENV ICEBERG_VERSION=1.9.2
ENV PYICEBERG_VERSION=0.10.0
+ENV HADOOP_VERSION=3.3.4
+ENV AWS_SDK_VERSION=1.12.753
Review Comment:
copied over from `tests/conftest`, these were originally downloaded client
side
##########
tests/integration/test_add_files.py:
##########
@@ -503,7 +503,7 @@ def
test_add_files_to_partitioned_table_fails_with_lower_and_upper_mismatch(
@pytest.mark.integration
def test_add_files_snapshot_properties(spark: SparkSession, session_catalog:
Catalog, format_version: int) -> None:
- identifier = f"default.unpartitioned_table_v{format_version}"
+ identifier = f"default.snapshot_properties_v{format_version}"
Review Comment:
name conflict with another test above
https://github.com/apache/iceberg-python/blob/6935b4115557a0bf225de6d86dbfbbdb39a33618/tests/integration/test_add_files.py#L160-L161
##########
dev/spark-defaults.conf:
##########
@@ -16,20 +16,35 @@
#
spark.sql.extensions
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
+
+# Configure Iceberg REST catalog
spark.sql.catalog.rest org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.rest.type rest
spark.sql.catalog.rest.uri http://rest:8181
spark.sql.catalog.rest.io-impl org.apache.iceberg.aws.s3.S3FileIO
spark.sql.catalog.rest.warehouse s3://warehouse/rest/
spark.sql.catalog.rest.s3.endpoint http://minio:9000
+spark.sql.catalog.rest.cache-enabled false
+
+# Configure Iceberg Hive catalog
spark.sql.catalog.hive org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.hive.type hive
-spark.sql.catalog.hive.uri http://hive:9083
+spark.sql.catalog.hive.uri thrift://hive:9083
spark.sql.catalog.hive.io-impl org.apache.iceberg.aws.s3.S3FileIO
spark.sql.catalog.hive.warehouse s3://warehouse/hive/
spark.sql.catalog.hive.s3.endpoint http://minio:9000
+
+# Configure Spark's default session catalog (spark_catalog) to use Iceberg
backed by the Hive Metastore
+spark.sql.catalog.spark_catalog
org.apache.iceberg.spark.SparkSessionCatalog
+spark.sql.catalog.spark_catalog.type hive
+spark.sql.catalog.spark_catalog.uri thrift://hive:9083
+spark.hadoop.fs.s3a.endpoint http://minio:9000
+spark.sql.catalogImplementation hive
+spark.sql.warehouse.dir s3a://warehouse/hive/
Review Comment:
spark_catalog is primarily used by the `test_migrate_table` test. It calls
`<catalog>.system.snapshot` which requires spark_catalog
https://github.com/apache/iceberg-python/blob/6935b4115557a0bf225de6d86dbfbbdb39a33618/tests/integration/test_hive_migration.py#L73
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]