gada121982 opened a new pull request, #10826: URL: https://github.com/apache/gravitino/pull/10826
### What changes were proposed in this pull request? Short-circuit `BaseCatalog.loadTable` and `BaseCatalog.tableExists` with `NoSuchTableException` when the identifier's namespace is a single built-in Spark DataSource format name (`parquet`, `csv`, `json`, `orc`, `text`, `avro`, `binaryfile`). When the connector declines these lookups, the Spark analyzer falls back to its DataSource shortcut resolver and builds a `HadoopFsRelation` directly — the same path vanilla Spark takes. Unqualified Gravitino catalogs/tables are unaffected: they arrive with a namespace that references a registered catalog, not a format name, so they continue through the existing `loadGravitinoTable` path and the usual authorization filter. ### Why are the changes needed? `SELECT * FROM parquet.\`path\`` is a long-standing Spark built-in syntax. Today, when `GravitinoSparkPlugin` is registered, the analyzer sends `(namespace=[parquet], name=<path>)` to `BaseCatalog.loadTable`, which forwards it to the server. The server-side authorization filter calls `MetadataObjects.of(TABLE, names)` which requires `names.length == 3` (catalog.schema.table), so every such query fails with `IllegalArgumentException: If the type is TABLE, the length of names must be 3`. Users expect Spark's built-in shortcut to keep working when the connector is installed; today it does not. Fix: #10825 ### Does this PR introduce _any_ user-facing change? Yes. After this PR, the following queries work when \`GravitinoSparkPlugin\` is enabled: \`\`\`sql SELECT * FROM parquet.\`s3a://bucket/file.parquet\`; SELECT * FROM csv.\`/data/file.csv\`; SELECT * FROM json.\`hdfs:///events.json\`; SELECT * FROM orc.\`/data/file.orc\`; SELECT * FROM text.\`/data/file.txt\`; SELECT * FROM avro.\`/data/file.avro\`; SELECT * FROM binaryFile.\`/data/folder/\`; \`\`\` Previously all of these failed at the server-side 3-part-name assertion. No API or property-key changes. No behavior change for user Gravitino tables. ### How was this patch tested? - New \`TestBaseCatalog\` covering six cases for the new \`isBuiltinDataSourceReference\` helper: built-in formats recognized, case-insensitive match, regular schema namespaces ignored, multipart namespaces ignored, empty namespaces ignored, unknown formats ignored. - \`./gradlew :spark-connector:spark-common:test --tests org.apache.gravitino.spark.connector.catalog.TestBaseCatalog\` → all tests pass. - \`./gradlew :spark-connector:spark-common:spotlessCheck\` → clean. - End-to-end manual verification on Spark 3.5.8 + Gravitino 1.2.0 with GVFS: \`SELECT * FROM parquet.\`gvfs://fileset/.../file.parquet\`\` now returns rows; \`SELECT * FROM <gravitino_iceberg_cat>.db.t\` remains unaffected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
