gada121982 opened a new issue, #10825:
URL: https://github.com/apache/gravitino/issues/10825

   ### Version
   
   main branch
   
   ### Describe what's wrong
   
   When `GravitinoSparkPlugin` is registered in a Spark session, Spark SQL's 
built-in DataSource shortcut syntax — `SELECT * FROM <format>.\`<path>\`` — 
fails with a server-side `IllegalArgumentException` instead of falling back to 
Spark's DataSource resolver.
   
   The shortcut applies to all built-in Spark formats (`parquet`, `csv`, 
`json`, `orc`, `text`, `avro`, `binaryFile`). Spark parses 
`<format>.\`<path>\`` as a multipart identifier whose first part is the format 
name. The Gravitino Spark Connector's `BaseCatalog` intercepts the lookup and 
forwards it to the Gravitino server. The server-side authorization filter then 
calls `MetadataObjects.of(TABLE, names)` which requires `names.length == 3` 
(catalog.schema.table); the multipart identifier has only 2 parts, so the check 
throws.
   
   Before the plugin is registered, the same query works in vanilla Spark 
because the analyzer's DataSource shortcut resolver handles the format name 
directly.
   
   ### Error message and/or stacktrace
   
   \`\`\`
   Authorization failed due to system internal error. Please contact 
administrator.
   java.lang.IllegalArgumentException: If the type is TABLE, the length of 
names must be 3
       at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
       at org.apache.gravitino.MetadataObjects.of(MetadataObjects.java:116)
       at org.apache.gravitino.MetadataObjects.parse(MetadataObjects.java:184)
       at org.apache.gravitino.MetadataObjects.of(MetadataObjects.java:93)
       at 
org.apache.gravitino.utils.NameIdentifierUtil.toMetadataObject(NameIdentifierUtil.java:628)
       at 
org.apache.gravitino.server.authorization.expression.AuthorizationExpressionEvaluator...
       ...
       at 
org.apache.gravitino.spark.connector.catalog.BaseCatalog.loadGravitinoTable(BaseCatalog.java:437)
       at 
org.apache.gravitino.spark.connector.catalog.BaseCatalog.loadTable(BaseCatalog.java:245)
       at 
org.apache.spark.sql.connector.catalog.CatalogV2Util\$.getTable(CatalogV2Util.scala:363)
       at org.apache.spark.sql.catalyst.analysis.Analyzer\$ResolveRelations\$...
   \`\`\`
   
   ### How to reproduce
   
   1. Gravitino server (main branch), a metalake with authorization enabled.
   2. Spark 3.5 session with:
      \`\`\`
      
spark.plugins=org.apache.gravitino.spark.connector.plugin.GravitinoSparkPlugin
      spark.sql.gravitino.uri=<server-uri>
      spark.sql.gravitino.metalake=<metalake>
      \`\`\`
   3. Any file accessible by Spark (local, s3a, gvfs, hdfs, etc.).
   4. Run:
      \`\`\`sql
      SELECT * FROM parquet.\`s3a://bucket/file.parquet\`;
      -- or csv, json, orc, text, avro — same failure
      \`\`\`
   
   Expected: Spark reads the file via \`HadoopFsRelation\`.
   
   Actual: \`IllegalArgumentException\` "If the type is TABLE, the length of 
names must be 3".
   
   ### Additional context
   
   The shortcut syntax is a long-standing public Spark API (since 2.0). 
Workarounds today are: (a) \`CREATE TEMPORARY VIEW ... USING <format> OPTIONS 
(path=...)\`, or (b) use the DataFrame API 
\`spark.read.format(...).load(path)\`. Both bypass the Gravitino catalog 
registration and succeed.
   
   Discovered against Gravitino 1.2.0 in a Spark Connect + Kyuubi shared engine 
setup using GVFS paths, but the bug is format-agnostic and path-agnostic — any 
built-in format + any path reproduces it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to