andygrove opened a new pull request, #1588: URL: https://github.com/apache/datafusion-ballista/pull/1588
# Which issue does this PR close? <!-- No tracking issue — purely a docs improvement. --> Closes #. # Rationale for this change The contributor guide currently has only a single bullet about the Python bindings under `code-organization.md`, and it links to `python/src/context.rs` — a file that no longer exists (the current files are `lib.rs`, `cluster.rs`, and `utils.rs`). There is no contributor-facing explanation of how the wheel actually works, even though the design is non-obvious: the Python package depends on `datafusion-python`, intercepts `SessionContext` via a metaclass to return a `DistributedDataFrame`, and only crosses into Ballista at execution time by serializing the locally-built logical plan and shipping it to a fresh `SessionContext::remote_with_state`. The known limitations listed in `python/README.md` are direct consequences of that design but the connection isn't documented anywhere. # What changes are included in this PR? - New `docs/source/contributors-guide/python-client.md` describing the crate/package layout, the metaclass + bridge mechanism, the cluster lifecycle helpers (`BallistaScheduler`, `BallistaExecutor`, `setup_test_cluster`), and how each documented limitation maps back to a specific piece of the design. Cross-links to `architecture.md` and the relevant tracking issues (#1142, #173). - Fixed the broken `python/src/context.rs` link in `code-organization.md` and pointed the PyBallista section at the files that exist today, plus the new design page. - Added the new page to the contributors-guide toctree in `docs/source/index.rst`. This PR only touches contributor guide content — the user guide is being improved separately. # Are there any user-facing changes? No code changes; documentation only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
