Ferdinanddb commented on code in PR #14108: URL: https://github.com/apache/iceberg/pull/14108#discussion_r2361311963
########## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ########## @@ -2645,4 +2645,50 @@ public boolean matches(RewriteFileGroup argument) { return groupIDs.contains(argument.info().globalIndex()); } } + + @TestTemplate + public void testZOrderWithDateColumn() { + spark.conf().set("spark.sql.ansi.enabled", "false"); Review Comment: @ronkapoor86 Ok that is weird - I cloned the repo, did the same change as your PR in `SparkZOrderUDF.java`, built the JAR,, then executed the following code: ```python from pyspark.sql import SparkSession catalog_name = "biglakeCatalog" spark: SparkSession = ( SparkSession.builder.appName("Richfox Data Loader") .master("local[12]") .config("spark.driver.memory", "18g") .config("spark.jars.ivy", "/tmp/.ivy_spark") .config( "spark.jars", "https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.7/postgresql-42.7.7.jar," # "https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-4.0_2.13/1.10.0/iceberg-spark-runtime-4.0_2.13-1.10.0.jar," "/home/mypath/work/perso/iceberg/spark/v4.0/spark-runtime/build/libs/iceberg-spark-runtime-4.0_2.13-1d558a9.dirty.jar," "https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-gcp-bundle/1.10.0/iceberg-gcp-bundle-1.10.0.jar," "https://repo1.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/3.1.7/gcs-connector-3.1.7-shaded.jar," "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/3.3.6/hadoop-common-3.3.6.jar", ) .config("spark.sql.execution.arrow.pyspark.enabled", "true") .config(f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkCatalog") .config(f"spark.sql.catalog.{catalog_name}.type", "rest") .config(f"spark.sql.catalog.{catalog_name}.uri", "https://biglake.googleapis.com/iceberg/v1beta/restcatalog") .config(f"spark.sql.catalog.{catalog_name}.warehouse", "gs://some bucket") .config(f"spark.sql.catalog.{catalog_name}.header.x-goog-user-project", "some project") .config(f"spark.sql.catalog.{catalog_name}.rest.auth.type", "org.apache.iceberg.gcp.auth.GoogleAuthManager") .config(f"spark.sql.catalog.{catalog_name}.io-impl", "org.apache.iceberg.gcp.gcs.GCSFileIO") .config(f"spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled", "false") .config("spark.hadoop.fs.gs.project.id", "some project") .config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem") .config("spark.hadoop.google.cloud.auth.service.account.enable", "true") .config("spark.hadoop.fs.gs.auth.type", "APPLICATION_DEFAULT") .config( "spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", ) .getOrCreate() ) spark.sql("""--sql CALL biglakeCatalog.system.rewrite_data_files( table => 'biglakeCatalog.silver.cumu_adj_factors_daily', strategy => 'sort', sort_order => 'zorder(ticker,sec_id,trade_date)', options => map('rewrite-all', 'true', 'target-file-size-bytes', '536870912', 'max-concurrent-file-group-rewrites', '5') ); """).show() +--------------------------+----------------------+---------------------+-----------------------+--------------------------+ |rewritten_data_files_count|added_data_files_count|rewritten_bytes_count|failed_data_files_count|removed_delete_files_count| +--------------------------+----------------------+---------------------+-----------------------+--------------------------+ | 1| 1| 1998| 0| 0| +--------------------------+----------------------+---------------------+-----------------------+--------------------------+ ``` where: - `ticker` is a STRING column - `sec_id` is an INT column - `trade_date` is a DATE column And it works fine as you can see. Or am I missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org