gada121982 opened a new issue, #10825:
URL: https://github.com/apache/gravitino/issues/10825
### Version
main branch
### Describe what's wrong
When `GravitinoSparkPlugin` is registered in a Spark session, Spark SQL's
built-in DataSource shortcut syntax — `SELECT * FROM <format>.\`<path>\`` —
fails with a server-side `IllegalArgumentException` instead of falling back to
Spark's DataSource resolver.
The shortcut applies to all built-in Spark formats (`parquet`, `csv`,
`json`, `orc`, `text`, `avro`, `binaryFile`). Spark parses
`<format>.\`<path>\`` as a multipart identifier whose first part is the format
name. The Gravitino Spark Connector's `BaseCatalog` intercepts the lookup and
forwards it to the Gravitino server. The server-side authorization filter then
calls `MetadataObjects.of(TABLE, names)` which requires `names.length == 3`
(catalog.schema.table); the multipart identifier has only 2 parts, so the check
throws.
Before the plugin is registered, the same query works in vanilla Spark
because the analyzer's DataSource shortcut resolver handles the format name
directly.
### Error message and/or stacktrace
\`\`\`
Authorization failed due to system internal error. Please contact
administrator.
java.lang.IllegalArgumentException: If the type is TABLE, the length of
names must be 3
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
at org.apache.gravitino.MetadataObjects.of(MetadataObjects.java:116)
at org.apache.gravitino.MetadataObjects.parse(MetadataObjects.java:184)
at org.apache.gravitino.MetadataObjects.of(MetadataObjects.java:93)
at
org.apache.gravitino.utils.NameIdentifierUtil.toMetadataObject(NameIdentifierUtil.java:628)
at
org.apache.gravitino.server.authorization.expression.AuthorizationExpressionEvaluator...
...
at
org.apache.gravitino.spark.connector.catalog.BaseCatalog.loadGravitinoTable(BaseCatalog.java:437)
at
org.apache.gravitino.spark.connector.catalog.BaseCatalog.loadTable(BaseCatalog.java:245)
at
org.apache.spark.sql.connector.catalog.CatalogV2Util\$.getTable(CatalogV2Util.scala:363)
at org.apache.spark.sql.catalyst.analysis.Analyzer\$ResolveRelations\$...
\`\`\`
### How to reproduce
1. Gravitino server (main branch), a metalake with authorization enabled.
2. Spark 3.5 session with:
\`\`\`
spark.plugins=org.apache.gravitino.spark.connector.plugin.GravitinoSparkPlugin
spark.sql.gravitino.uri=<server-uri>
spark.sql.gravitino.metalake=<metalake>
\`\`\`
3. Any file accessible by Spark (local, s3a, gvfs, hdfs, etc.).
4. Run:
\`\`\`sql
SELECT * FROM parquet.\`s3a://bucket/file.parquet\`;
-- or csv, json, orc, text, avro — same failure
\`\`\`
Expected: Spark reads the file via \`HadoopFsRelation\`.
Actual: \`IllegalArgumentException\` "If the type is TABLE, the length of
names must be 3".
### Additional context
The shortcut syntax is a long-standing public Spark API (since 2.0).
Workarounds today are: (a) \`CREATE TEMPORARY VIEW ... USING <format> OPTIONS
(path=...)\`, or (b) use the DataFrame API
\`spark.read.format(...).load(path)\`. Both bypass the Gravitino catalog
registration and succeed.
Discovered against Gravitino 1.2.0 in a Spark Connect + Kyuubi shared engine
setup using GVFS paths, but the bug is format-agnostic and path-agnostic — any
built-in format + any path reproduces it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]