wirybeaver commented on issue #4632:
URL: 
https://github.com/apache/datafusion-comet/issues/4632#issuecomment-4720970848

   Potential solution for the Arrow C Data class conflict seen in the packaged 
smoke test:
   
   The failure does not look solvable by jar ordering. Today the packaged Comet 
jar can expose classes named `org.apache.arrow.c.*`, but their bytecode has 
been rewritten to use Comet's relocated Arrow allocator/types under 
`org.apache.comet.shaded.arrow.*`. That produces a hybrid ABI:
   
   - If Comet's jar wins class loading, Lance Spark loads Comet-mutated 
`org.apache.arrow.c.*` classes and normal Lance/Arrow calls fail with 
`NoSuchMethodError`.
   - If Lance Spark's Arrow C Data jar wins class loading, Comet loads normal 
upstream `org.apache.arrow.c.*` classes and Comet calls that pass shaded Arrow 
allocator/types fail with `NoSuchMethodError`.
   
   The clean boundary should be that Comet never publishes mutated classes 
under the upstream Arrow package name:
   
   1. Fully relocate Comet's Arrow C Data classes into Comet's shaded 
namespace, for example `org.apache.comet.shaded.arrow.c.*`.
   2. Ensure the packaged `comet-spark` jar does not contain 
`org/apache/arrow/c/**` classes rewritten against shaded Arrow types.
   3. Update Comet JNI/native class lookups that currently reference 
`org/apache/arrow/c/...` to use the shaded class names in packaged Comet. If 
needed for dev/test compatibility, use a small dual-lookup helper that tries 
shaded first and then unshaded.
   4. Let Lance Spark keep using the normal upstream Arrow namespace: 
`org.apache.arrow.c.*` and `org.apache.arrow.memory.*`.
   5. Add a packaging regression test that asserts the packaged Comet jar 
contains shaded Arrow C Data classes and does not contain Comet-mutated 
upstream `org/apache/arrow/c/**` classes.
   6. Add an integration smoke test with both packaged Comet and Lance Spark 
jars on the same Spark classpath.
   
   I would avoid solving this with Spark classloader ordering or Lance Spark 
changes. The durable rule should be: Comet's shaded Arrow universe uses shaded 
package names consistently, and Lance Spark's upstream Arrow universe remains 
untouched.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to