kevinjqliu opened a new pull request, #16215: URL: https://github.com/apache/iceberg/pull/16215
Fix LICENSE and NOTICE compliance for all spark-runtime shadow JARs (v3.4, v3.5, v4.0, v4.1) to accurately represent bundled contents per [ASF licensing policy](https://infra.apache.org/licensing-howto.html). Audit of the shadow JAR contents revealed several Category B dependencies with missing full license texts, missing NOTICE propagation, and undeclared Apache-licensed transitive dependencies. ## Build and verify ```bash # Build shadow JARs (all versions) ./gradlew -DsparkVersions=3.4,3.5,4.0,4.1 \ :iceberg-spark:iceberg-spark-runtime-3.4_2.12:shadowJar \ :iceberg-spark:iceberg-spark-runtime-3.5_2.12:shadowJar \ :iceberg-spark:iceberg-spark-runtime-4.0_2.13:shadowJar \ :iceberg-spark:iceberg-spark-runtime-4.1_2.13:shadowJar -x test ``` --- ## LICENSE changes All four versions (v3.4, v3.5, v4.0, v4.1) receive the same set of additions unless noted. - **FastDoubleParser (MIT)** — **Required.** Shaded into Jackson Core at `com/fasterxml/jackson/core/io/doubleparser/`. Category B license requires full text per [resolved.html](https://www.apache.org/legal/resolved.html). ```bash jar tf spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar | grep FastDouble # META-INF/FastDoubleParser-LICENSE # META-INF/FastDoubleParser-NOTICE # org/apache/iceberg/shaded/com/fasterxml/jackson/core/io/doubleparser/FastDoubleMath.class ``` Upstream: https://github.com/wrandelshofer/FastDoubleParser/blob/main/LICENSE - **fast_float (MIT, bundled by FastDoubleParser)** — **Required.** Transitively included. MIT license full text required. Upstream: https://github.com/fastfloat/fast_float/blob/main/LICENSE-MIT - **bigint (BSD 2-Clause, bundled by FastDoubleParser)** — **Required.** Transitively included. BSD license full text required. Upstream: https://github.com/tbuktu/bigint/blob/master/LICENSE - **JCTools (Apache 2.0, via Netty)** — Not strictly required (Apache-licensed) but declared for completeness, consistent with aws-bundle/gcp-bundle convention. 136 classes shaded at `io/netty/util/internal/shaded/org/jctools/`. ```bash jar tf spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar | grep -c jctools # 136 ``` Upstream: https://github.com/JCTools/JCTools/blob/master/LICENSE - **Mozilla Public Suffix List (MPL 2.0, via Apache HttpComponents)** — **Required.** Category B license requires full text. Data file embedded at `org/publicsuffix/list/effective_tld_names.dat`. ```bash jar tf spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar | grep publicsuffix # org/publicsuffix/list/effective_tld_names.dat ``` Upstream: https://mozilla.org/MPL/2.0/ - **Eclipse Collections — full EPL-1.0 + EDL-1.0 text added** (all versions) — **Required.** Previously only referenced by URL. Category B licenses require full text in LICENSE. 5,664 classes at `org/eclipse/collections/`. ```bash jar tf spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar | grep -c "org/eclipse/collections" # 5664 ``` Upstream: https://github.com/eclipse/eclipse-collections/blob/master/LICENSE-EPL-1.0.txt - **JTS Topology Suite — full EPL-2.0 text added** (all versions) — **Required.** Previously only referenced by URL. Category B license requires full text. 795 classes at `org/locationtech/jts/`. ```bash jar tf spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar | grep -c "org/locationtech/jts" # 795 ``` Upstream: https://github.com/locationtech/jts/blob/master/LICENSE_EPL ### v4.1 only: reorder entries Existing entries for Project Nessie, Eclipse MicroProfile OpenAPI, Eclipse Collections, Apache Datasketches, and JTS were reordered to group Apache-licensed entries together, followed by Category B entries with full license texts. --- ## NOTICE changes All four versions receive the same additions: - **Jackson JSON Processor NOTICE** — **Required.** Propagation of upstream NOTICE per ASF policy. Includes FastDoubleParser copyright attribution. ```bash # Jackson is bundled in the JAR (2326 entries) jar tf spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar | grep -c "com/fasterxml/jackson" ``` Upstream NOTICE: https://github.com/FasterXML/jackson-core/blob/2.x/NOTICE - **Apache DataSketches NOTICE** — **Required.** Upstream NOTICE contents must be reproduced. DataSketches is an Apache project and ships a NOTICE file. Previously missing from all versions. ```bash # DataSketches is bundled in the JAR (534 entries) jar tf spark/v4.1/spark-runtime/build/libs/iceberg-spark-runtime-4.1_2.13-1.11.0-SNAPSHOT.jar | grep -c "datasketches" ``` Upstream NOTICE: https://github.com/apache/datasketches-java/blob/master/NOTICE -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
