LDVSOFT opened a new issue, #13571: URL: https://github.com/apache/iceberg/issues/13571
## Description Packaged `iceberg-spark-runtime-‹sparkApi›_‹scalaAbi›` artifacts are shadow/fat jars with relocated dependencies to be used against Spark deployments without conflicts. Unfortunately the way it's done doesn't handle all resources properly, such as: * `META-INF/services/` files, in both offered interface (in the file name) and implementation name. * `META-INF/maven/` files contain Maven POMs for included projects, even though those have a chance to overwrite those resources for proper unshaded JARs on same classpath. * `META-INF/native-image` & `META-INF/proguard` configuration drop-ins, if one would be brave enough to use Spark with these tools, mention to-be-shaded classes. * Annotation dependencies, such as Intellij & Yetus Annotations, are included as-is. Those probably don't need to be shaded, but also probably might be better not included. * Data resources that aren't relocated, like `mozilla/public-suffix-list.txt` (_it's probably from Hadoop client libraries_). For example, in `org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1`: 1. `META-INF/services/com.fasterxml.jackson.databind.Module` contains `com.fasterxml.jackson.datatype.jsr310.JavaTimeModule`, that _in theory_ could conflict with non-shaded Jackson (if unshaded module is absent); or don't work with shaded Jackson's SPI by not being available for shaded `findAndRegisterModules()`. Explicit module load should work just fine for you. 2. `META-INF/services/java.time.chrono.Chronology` contains several classes from `org.threeten.extra.chrono` package. This service file, in current form, won't have effect if the original package isn't present; and if it is present would be a useless duplicate. Direct class usage is unaffected. ### Thoughts It's probably an underconfiguration, as I see you using Gradle Shadow plugin, that might lack functionality of rewrites of resource files. However, some projects, like AWS SDK for Java v2, seem to — somehow — properly edit service files in third-party dependencies (see `software.amazon.awssdk:third-party-jackson-core:2.31.16` for example). Rewriting all files referencing shaded packages might be worth it compared to manual analysis of if those should be kept and if they need alteration. Also, this seem not to show up in testing and production, so it's not critical, but it does cause my tooling that verifies classpath integrity to raise alerts, and previously some of those alerts did lead to production errors. Hopefully those bad ones are usually from `.class` files duplicates, and those seem to be properly covered here, or are annotations definitions. ## Priority Probably low, as I guess it doesn't cause any behavior changes. However if you know of any weird things happening that might be one of the causes. ## Affected versions I was trying to use the mentioned Iceberg 1.7.1 for Spark 3.5, but I've also poked into Iceberg 1.9.1 artifact and it's the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org