Hi everyone,

First, I want to take credit where it is due:  I am very glad our ongoing
discussions about automated code quality on this mailing list directly led
to the community taking action to formalize agent instructions. Thank you
to Wenchen for opening PR 54899 (SPARK-56074) to introduce AGENTS.md and
CLAUDE.md to the repository .

I intentionally paused my replies to this thread over the last few weeks. I
knew that arguing theory wouldn't get us anywhere, so I decided to wait and
use this live PR as an experiment to definitively answer the questions
raised by Dongjoon, Jungtaek, and Holden.

Dongjoon and Jungtaek, you both mentioned that our manual,
human-in-the-loop review process is enough to catch bad code, and that
active PMC members using these productivity tools aren't making mistakes.
Let's look at the actual data from PR 54899, which was recently merged and
cherry-picked.

This PR was incredibly small-exactly 83 lines of changes, with 76 additions
and 7 deletions . It was highly visible and manually reviewed by 17 of our
most senior core and PMC members, including Dongjoon, steveloughran,
zhengruifeng, szehon-ho, HeartSaVioR, and others.

Despite having 17 senior reviewers heavily analyzing a tiny 83-line text
file, it slipped right through and shipped with critical structural bugs
that actively break the automated tools the file was designed to guide:

1. The Dead-End Loop: While the file does contain some inline SBT commands
higher up, the reference links section at the bottom explicitly tells
automated tools to read docs/building-spark.md and look for the "Running
Individual Tests" section to figure out how to test the code . However, if
you actually look at that section in the documentation, it does not contain
the execution commands;it just redirects you to the developer-tools.html
web page . We just sent every automated tool into an infinite reading loop.

2. Missing Inline Scripts & Delegation: The PR's stated goal was to provide
"inline build/test commands" rather than just linking to docs . Yet, the
configuration completely omits the critical dev/connect-gen-protos.sh
script required for Spark Connect testing . Instead, it delegates
instructions to a subdirectory README
(sql/connect/common/src/main/protobuf/) . This directly contradicts the
PR's own architectural goal, forcing tools to go hunting through the
directory tree for execution paths rather than giving them the actionable
command upfront.

If 17 of our most experienced PMC members missed these structural bugs on
an 83-line plain text file, how are we going to catch them when
contributors start submitting 1,500-5000 line PRs touching the core
Catalyst optimizer?

Human reviewers read code like humans; we simply do not catch the
structural issues that trip up automated systems. A recent ETH Zurich study
(arxiv 2602.11988 <https://arxiv.org/abs/2602.11988>)   published in
February proved exactly this: feeding automated tools bad or unnecessary
context files actually increases inference costs by over 20% and reduces
task success rates by 3% .

Holden, you mentioned wanting to wait until we are actually impacted by a
flood of automated slop before implementing checks. Unfortunately, we no
longer have that time. On March 31, 2026, Anthropic accidentally leaked
over 512,000 lines of their Claude Code TypeScript source via an npm source
map error . The whole world now has the blueprint to build highly
autonomous tools . Furthermore, Claude just ran a massive usage promotion
doubling token limits that ended on March 28 , and committers are utilizing
generous quotas on the Google Antigravity Ultra plan .

In the next few months, we are going to see an absolute flood of
machine-generated code hitting our queues. If we do not add the AIV gate
now, I guarantee that within the next 1 year, our entire codebase is going
to be completely full of these invisible, machine-breaking bugs.

Jungtaek, to answer your concerns about accuracy and false positives, the
AIV Gate uses deterministic AST parsing rather than subjective guessing. It
acts as an objective linter, catching structural errors like missing inline
bash blocks or dead-end references that humans miss.

To compromise and ensure zero disruption to the current workflow, I propose
we implement this in a Phased Approach:

 Phase 1 (Shadow Mode): We deploy the AIV Gate as a non-blocking CI job. It
will simply flag PRs and append a JIRA label for "Automated Slop" or
"Structural Error" based on its AST parsing. This gives us hard data on
accuracy and volume without blocking a single merge.

 Phase 2 (Active Enforcement): Once the PMC reviews the Phase 1 data and we
agree the false-positive rate is near zero, we graduate the gate to an
active check that blocks violating code.

Let's turn on Phase 1 and let the data speak for itself.

Regards,
Viquar Khan

>

Reply via email to