LDVSOFT opened a new issue, #13571:
URL: https://github.com/apache/iceberg/issues/13571

   ## Description
   
   Packaged `iceberg-spark-runtime-‹sparkApi›_‹scalaAbi›` artifacts are 
shadow/fat jars with relocated dependencies to be used against Spark 
deployments without conflicts. Unfortunately the way it's done doesn't handle 
all resources properly, such as:
   * `META-INF/services/` files, in both offered interface (in the file name) 
and implementation name.
   * `META-INF/maven/` files contain Maven POMs for included projects, even 
though those have a chance to overwrite those resources for proper unshaded 
JARs on same classpath.
   * `META-INF/native-image` & `META-INF/proguard` configuration drop-ins, if 
one would be brave enough to use Spark with these tools, mention to-be-shaded 
classes.
   * Annotation dependencies, such as Intellij & Yetus Annotations, are 
included as-is. Those probably don't need to be shaded, but also probably might 
be better not included.
   * Data resources that aren't relocated, like 
`mozilla/public-suffix-list.txt` (_it's probably from Hadoop client libraries_).
   
   For example, in `org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1`:
   1. `META-INF/services/com.fasterxml.jackson.databind.Module` contains 
`com.fasterxml.jackson.datatype.jsr310.JavaTimeModule`, that _in theory_ could 
conflict with non-shaded Jackson (if unshaded module is absent); or don't work 
with shaded Jackson's SPI by not being available for shaded 
`findAndRegisterModules()`. Explicit module load should work just fine for you.
   2. `META-INF/services/java.time.chrono.Chronology` contains several classes 
from `org.threeten.extra.chrono` package. This service file, in current form, 
won't have effect if the original package isn't present; and if it is present 
would be a useless duplicate. Direct class usage is unaffected.
   
   ### Thoughts
   
   It's probably an underconfiguration, as I see you using Gradle Shadow 
plugin, that might lack functionality of rewrites of resource files. However, 
some projects, like AWS SDK for Java v2, seem to — somehow — properly edit 
service files in third-party dependencies (see 
`software.amazon.awssdk:third-party-jackson-core:2.31.16` for example). 
Rewriting all files referencing shaded packages might be worth it compared to 
manual analysis of if those should be kept and if they need alteration.
   
   Also, this seem not to show up in testing and production, so it's not 
critical, but it does cause my tooling that verifies classpath integrity to 
raise alerts, and previously some of those alerts did lead to production 
errors. Hopefully those bad ones are usually from `.class` files duplicates, 
and those seem to be properly covered here, or are annotations definitions.
   
   ## Priority
   
   Probably low, as I guess it doesn't cause any behavior changes. However if 
you know of any weird things happening that might be one of the causes.
   
   ## Affected versions
   
   I was trying to use the mentioned Iceberg 1.7.1 for Spark 3.5, but I've also 
poked into Iceberg 1.9.1 artifact and it's the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to