masumi-ryugo commented on issue #21998:
URL: https://github.com/apache/datafusion/issues/21998#issuecomment-4378382425

   Thanks @alamb and @comphead.
   
   @alamb — your "fuzz can use only public APIs / doesn't need to live in this 
repo" framing is exactly the lighter-touch model I'd default to, so that's 
helpful to hear.
   
   @comphead — agreed that SQL-aware query generation (SQLsmith-style — 
generate thousands of valid-ish queries covering joins, built-ins, etc., then 
assert parser/analyzer don't panic and meet basic perf bounds) is more valuable 
per-input than byte-level on a mature SQL surface. I'd treat it as 
complementary to byte-level corpus mutation rather than a replacement, since 
they fail in different ways: a grammar generator finds bugs in the 
*combinations* the grammar can express, and a corpus mutator finds bugs in 
malformed/edge bytes near the corpus. But they're different enough harnesses 
that I think they're best authored by separate people — I'm not the right 
person for the grammar-generator one (no SQLsmith experience), so I'll scope my 
own follow-through to the byte-level side only.
   
   Given the two narrower options I floated:
   
   **Going to start with (a) — wrap `datafusion-sqlparser-rs/fuzz` under 
OSS-Fuzz.** It's the least intrusive: zero code change in this repo or in 
`datafusion-sqlparser-rs`, just a new `projects/datafusion-sqlparser-rs/` 
directory in `google/oss-fuzz` that drives the existing honggfuzz harness. 
Concrete next step before any PR: I'll open a small issue on 
`apache/datafusion-sqlparser-rs` asking for `primary_contact` + `auto_ccs` 
Google-account emails (OSS-Fuzz needs them for crash notifications and there's 
no PMC alias path — same prerequisite I hit on `apache/arrow-rs#5332`). Once I 
have those I can send the `google/oss-fuzz` PR; it's mechanical. I'll 
cross-link both back here.
   
   **Parking (b) — corpus-mutation harness in `datafusion/sql/tests`.** Won't 
touch this repo's code without an explicit go from @2010YOUY01, since this one 
does require an in-repo crate and that's the part you sounded skeptical about. 
Happy to drop it entirely if you'd rather, or revisit once (a) has produced (or 
not produced) anything interesting to bring back.
   
   No code action from me until the contact-email issue on 
`datafusion-sqlparser-rs` lands. Will report back when that's filed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to