timsaucer opened a new pull request, #1505: URL: https://github.com/apache/datafusion-python/pull/1505
# Which issue does this PR close? Part of #1394 (implements PR 4 of the plan). # Rationale for this change #1394 tracks making datafusion-python legible to AI coding assistants without breaking the experience for humans browsing the docs. Earlier PRs shipped the repo-root `SKILL.md` (#1497), enriched module docstrings and doctests (#1498), added a README section pointing agents at the skill (#1503), and rewrote the TPC-H examples in idiomatic DataFrame form (#1504). This PR fills in the docs-site layer: a machine-readable entry point for LLM tooling, a short human-written page explaining how to wire up an AI assistant, and two contributor-facing skills that agents working on this repo can pick up. It also relocates the pattern demos that #1504 removed from the TPC-H queries (CASE filtering, array-based membership, UDF-vs-expression predicates, `array_agg` with filter) into the common-operations docs, so those teaching examples still live somewhere concrete. # What changes are included in this PR? - `docs/source/llms.txt` — an [llmstxt.org](https://llmstxt.org) entry point, copied verbatim to the site root via `html_extra_path`. Categorized links to the skill, user guide, DataFrame API reference, and TPC-H examples. - `docs/source/ai-coding-assistants.rst` — a short human-written page mirroring the README section added in #1503. Explains what the skill is, how to install it (`npx skills add apache/datafusion-python` or a manual `AGENTS.md` / `CLAUDE.md` pointer), and what it covers. Wired into the User Guide toctree. - `.ai/skills/write-dataframe-code/SKILL.md` — a contributor skill layered on top of the repo-root `SKILL.md`. Adds a TPC-H pattern index (which query demonstrates which API), the plan-comparison diagnostic workflow for translating SQL to DataFrame form, and the project-specific docstring conventions. - `.ai/skills/audit-skill-md/SKILL.md` — a contributor skill that cross-references `SKILL.md` against the current public Python surface (functions module, `DataFrame`, `Expr`, `SessionContext`, package-root re-exports) and reports new APIs needing coverage and stale mentions. Diff-only; does not auto-edit. - `AGENTS.md` (symlinked as `CLAUDE.md`) — lists the three contributor skills and documents the plan-comparison diagnostic workflow. - `docs/source/user-guide/common-operations/expressions.rst` — adds a "Testing membership in a list" section comparing `|`-compound filters, `in_list`, and `array_position` / `make_array`, plus a "Conditional expressions" section contrasting switched and searched `case`. - `docs/source/user-guide/common-operations/udf-and-udfa.rst` — adds a "When not to use a UDF" subsection showing the compound-OR predicate that replaces a Python-side UDF for disjunctive bucket filters (the Q19 case). - `docs/source/user-guide/common-operations/aggregations.rst` — adds a "Building per-group arrays" subsection covering `array_agg(filter=..., distinct=True)` with `array_length` and `array_element` for the single-value-per-group pattern (the Q21 case). - `examples/array-operations.py` — a runnable end-to-end walkthrough of the membership and `array_agg` patterns. Linked from `examples/README.md`. Verified with `pre-commit run --all-files` and `sphinx-build -W --keep-going` against the full docs tree. # Are there any user-facing changes? Yes, docs-only: - New docs-site page: `ai-coding-assistants.html`, reachable from the User Guide sidebar. - New docs-site asset: `llms.txt` served at the site root (`datafusion.apache.org/python/llms.txt`). - New common-operations content (membership tests, conditional expressions, UDF guidance, `array_agg` patterns). - New example file `examples/array-operations.py`. No public Python API is added, changed, or removed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
