wirybeaver commented on issue #4632: URL: https://github.com/apache/datafusion-comet/issues/4632#issuecomment-4720970848
Potential solution for the Arrow C Data class conflict seen in the packaged smoke test: The failure does not look solvable by jar ordering. Today the packaged Comet jar can expose classes named `org.apache.arrow.c.*`, but their bytecode has been rewritten to use Comet's relocated Arrow allocator/types under `org.apache.comet.shaded.arrow.*`. That produces a hybrid ABI: - If Comet's jar wins class loading, Lance Spark loads Comet-mutated `org.apache.arrow.c.*` classes and normal Lance/Arrow calls fail with `NoSuchMethodError`. - If Lance Spark's Arrow C Data jar wins class loading, Comet loads normal upstream `org.apache.arrow.c.*` classes and Comet calls that pass shaded Arrow allocator/types fail with `NoSuchMethodError`. The clean boundary should be that Comet never publishes mutated classes under the upstream Arrow package name: 1. Fully relocate Comet's Arrow C Data classes into Comet's shaded namespace, for example `org.apache.comet.shaded.arrow.c.*`. 2. Ensure the packaged `comet-spark` jar does not contain `org/apache/arrow/c/**` classes rewritten against shaded Arrow types. 3. Update Comet JNI/native class lookups that currently reference `org/apache/arrow/c/...` to use the shaded class names in packaged Comet. If needed for dev/test compatibility, use a small dual-lookup helper that tries shaded first and then unshaded. 4. Let Lance Spark keep using the normal upstream Arrow namespace: `org.apache.arrow.c.*` and `org.apache.arrow.memory.*`. 5. Add a packaging regression test that asserts the packaged Comet jar contains shaded Arrow C Data classes and does not contain Comet-mutated upstream `org/apache/arrow/c/**` classes. 6. Add an integration smoke test with both packaged Comet and Lance Spark jars on the same Spark classpath. I would avoid solving this with Spark classloader ordering or Lance Spark changes. The durable rule should be: Comet's shaded Arrow universe uses shaded package names consistently, and Lance Spark's upstream Arrow universe remains untouched. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
