Sorry, what point the AIV gate will help for PR #54899? It's a bit uneasy
to understand why you took that PR, because that is arguably not a code
change but INFRA change, and I don't see how the proposal ever works with
such a change.

I guess I have provided the same input two times and I think my input still
stands. If you were willing to prove your project, you should have a bunch
of existing PRs to go over, instead of pushing the community to give it a
try. It wouldn't be limited to Apache Spark - you are even narrowing the
scope of potential benefit of your project, while if this is useful it
should not just be useful for Apache Spark. If you go over the 100s to
1000s of PRs and figure out 10s of issues, you can get traction from many
OSS communities (not just Apache Spark and not just ASF projects). I don't
actually get the direction; why don't you draw a bigger picture?


On Thu, Apr 2, 2026 at 4:29 PM vaquar khan <[email protected]> wrote:

> Hi everyone,
>
> First, I want to take credit where it is due:  I am very glad our ongoing
> discussions about automated code quality on this mailing list directly led
> to the community taking action to formalize agent instructions. Thank you
> to Wenchen for opening PR 54899 (SPARK-56074) to introduce AGENTS.md and
> CLAUDE.md to the repository .
>
> I intentionally paused my replies to this thread over the last few weeks.
> I knew that arguing theory wouldn't get us anywhere, so I decided to wait
> and use this live PR as an experiment to definitively answer the questions
> raised by Dongjoon, Jungtaek, and Holden.
>
> Dongjoon and Jungtaek, you both mentioned that our manual,
> human-in-the-loop review process is enough to catch bad code, and that
> active PMC members using these productivity tools aren't making mistakes.
> Let's look at the actual data from PR 54899, which was recently merged and
> cherry-picked.
>
> This PR was incredibly small-exactly 83 lines of changes, with 76
> additions and 7 deletions . It was highly visible and manually reviewed by
> 17 of our most senior core and PMC members, including Dongjoon,
> steveloughran, zhengruifeng, szehon-ho, HeartSaVioR, and others.
>
> Despite having 17 senior reviewers heavily analyzing a tiny 83-line text
> file, it slipped right through and shipped with critical structural bugs
> that actively break the automated tools the file was designed to guide:
>
> 1. The Dead-End Loop: While the file does contain some inline SBT commands
> higher up, the reference links section at the bottom explicitly tells
> automated tools to read docs/building-spark.md and look for the "Running
> Individual Tests" section to figure out how to test the code . However, if
> you actually look at that section in the documentation, it does not contain
> the execution commands;it just redirects you to the developer-tools.html
> web page . We just sent every automated tool into an infinite reading loop.
>
> 2. Missing Inline Scripts & Delegation: The PR's stated goal was to
> provide "inline build/test commands" rather than just linking to docs .
> Yet, the configuration completely omits the critical
> dev/connect-gen-protos.sh script required for Spark Connect testing .
> Instead, it delegates instructions to a subdirectory README
> (sql/connect/common/src/main/protobuf/) . This directly contradicts the
> PR's own architectural goal, forcing tools to go hunting through the
> directory tree for execution paths rather than giving them the actionable
> command upfront.
>
> If 17 of our most experienced PMC members missed these structural bugs on
> an 83-line plain text file, how are we going to catch them when
> contributors start submitting 1,500-5000 line PRs touching the core
> Catalyst optimizer?
>
> Human reviewers read code like humans; we simply do not catch the
> structural issues that trip up automated systems. A recent ETH Zurich study
> (arxiv 2602.11988 <https://arxiv.org/abs/2602.11988>)   published in
> February proved exactly this: feeding automated tools bad or unnecessary
> context files actually increases inference costs by over 20% and reduces
> task success rates by 3% .
>
> Holden, you mentioned wanting to wait until we are actually impacted by a
> flood of automated slop before implementing checks. Unfortunately, we no
> longer have that time. On March 31, 2026, Anthropic accidentally leaked
> over 512,000 lines of their Claude Code TypeScript source via an npm source
> map error . The whole world now has the blueprint to build highly
> autonomous tools . Furthermore, Claude just ran a massive usage promotion
> doubling token limits that ended on March 28 , and committers are utilizing
> generous quotas on the Google Antigravity Ultra plan .
>
> In the next few months, we are going to see an absolute flood of
> machine-generated code hitting our queues. If we do not add the AIV gate
> now, I guarantee that within the next 1 year, our entire codebase is going
> to be completely full of these invisible, machine-breaking bugs.
>
> Jungtaek, to answer your concerns about accuracy and false positives, the
> AIV Gate uses deterministic AST parsing rather than subjective guessing. It
> acts as an objective linter, catching structural errors like missing inline
> bash blocks or dead-end references that humans miss.
>
> To compromise and ensure zero disruption to the current workflow, I
> propose we implement this in a Phased Approach:
>
>  Phase 1 (Shadow Mode): We deploy the AIV Gate as a non-blocking CI job.
> It will simply flag PRs and append a JIRA label for "Automated Slop" or
> "Structural Error" based on its AST parsing. This gives us hard data on
> accuracy and volume without blocking a single merge.
>
>  Phase 2 (Active Enforcement): Once the PMC reviews the Phase 1 data and
> we agree the false-positive rate is near zero, we graduate the gate to an
> active check that blocks violating code.
>
> Let's turn on Phase 1 and let the data speak for itself.
>
> Regards,
> Viquar Khan
>
>>

Reply via email to