ametel01 opened a new issue, #23171:
URL: https://github.com/apache/datafusion/issues/23171

   ### Describe the bug
   
   When importing a Substrait physical plan containing `ReadRel.LocalFiles`, 
DataFusion builds a Parquet scan against `ObjectStoreUrl::local_filesystem()` 
and copies the plan-provided `UriPath` / `UriPathGlob` / `UriFile` / 
`UriFolder` value into `ObjectMeta.location` without a host-supplied root or 
object-store policy check.
   
   In embeddings that accept Substrait physical plans from lower-trust callers, 
this can allow the imported plan to select process-local Parquet files outside 
the host's intended dataset roots.
   
   Relevant code path on current `main`:
   
   - `datafusion/substrait/src/physical_plan/consumer.rs`: 
`FileScanConfigBuilder::new(ObjectStoreUrl::local_filesystem(), ...)`
   - `datafusion/substrait/src/physical_plan/consumer.rs`: cloned Substrait 
path becomes `ObjectMeta { location: path.into(), ... }`
   - the configured scan is returned as `DataSourceExec::from_data_source(...)`
   
   ### To Reproduce
   
   1. Import a Substrait physical plan using `ReadRel.LocalFiles` for a Parquet 
read.
   2. Set the file path in the serialized plan to a local path selected by the 
plan submitter.
   3. Execute the returned physical plan in a host process that accepts the 
imported plan.
   
   I am intentionally not including a full payload in the public issue. The 
static source path above is enough to identify the behavior.
   
   ### Expected behavior
   
   Imported physical plans should not be able to directly choose arbitrary 
process-local filesystem paths unless the embedding host explicitly supplies 
that policy. Possible fixes include rejecting absolute/traversing paths, 
requiring an allowlisted root or object-store binding during physical plan 
import, or resolving imported file references through registered 
catalog/object-store policy rather than hard-coding the local filesystem.
   
   ### Additional context
   
   This came from a local security review of Apache DataFusion at revision 
`38269f9c0cf1a80897aee588ea2daebe0aba4f6b`. The impact depends on an embedding 
host exposing Substrait physical plan import across a trust boundary; 
DataFusion itself does not ship a standalone server/auth boundary in this 
repository.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to