ametel01 opened a new issue, #23171:
URL: https://github.com/apache/datafusion/issues/23171
### Describe the bug
When importing a Substrait physical plan containing `ReadRel.LocalFiles`,
DataFusion builds a Parquet scan against `ObjectStoreUrl::local_filesystem()`
and copies the plan-provided `UriPath` / `UriPathGlob` / `UriFile` /
`UriFolder` value into `ObjectMeta.location` without a host-supplied root or
object-store policy check.
In embeddings that accept Substrait physical plans from lower-trust callers,
this can allow the imported plan to select process-local Parquet files outside
the host's intended dataset roots.
Relevant code path on current `main`:
- `datafusion/substrait/src/physical_plan/consumer.rs`:
`FileScanConfigBuilder::new(ObjectStoreUrl::local_filesystem(), ...)`
- `datafusion/substrait/src/physical_plan/consumer.rs`: cloned Substrait
path becomes `ObjectMeta { location: path.into(), ... }`
- the configured scan is returned as `DataSourceExec::from_data_source(...)`
### To Reproduce
1. Import a Substrait physical plan using `ReadRel.LocalFiles` for a Parquet
read.
2. Set the file path in the serialized plan to a local path selected by the
plan submitter.
3. Execute the returned physical plan in a host process that accepts the
imported plan.
I am intentionally not including a full payload in the public issue. The
static source path above is enough to identify the behavior.
### Expected behavior
Imported physical plans should not be able to directly choose arbitrary
process-local filesystem paths unless the embedding host explicitly supplies
that policy. Possible fixes include rejecting absolute/traversing paths,
requiring an allowlisted root or object-store binding during physical plan
import, or resolving imported file references through registered
catalog/object-store policy rather than hard-coding the local filesystem.
### Additional context
This came from a local security review of Apache DataFusion at revision
`38269f9c0cf1a80897aee588ea2daebe0aba4f6b`. The impact depends on an embedding
host exposing Substrait physical plan import across a trust boundary;
DataFusion itself does not ship a standalone server/auth boundary in this
repository.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]