LucaCappelletti94 commented on PR #2352: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2352#issuecomment-4634902553
<img width="2380" height="700" alt="image" src="https://github.com/user-attachments/assets/0827b534-d630-4df0-8223-85605ef07132" /> I implemented the suggested approach: the caches now store a `Copy` marker (`RecursionLimitExceeded` vs generic error) instead of the full `ParserError`, and the message is regenerated on demand on a cache hit. Re-ran the same scaling measurement on both commits, entry counts are identical in every regime, so only the stored value changed. | | old (cached `ParserError`) | new (`Copy` marker) | |---|---|---| | per-entry payload | 40 B (8 key + 32 `ParserError`) | 9 B (8 key + 1 marker) | | heap strings | nested peaks 1863 B, wide plateaus 598 B | always 0 | | nested cap (N >= 50) | 3880 B | 873 B (0.22x) | | wide N=2000 (12 KB sql) | 159598 B, 13.30x input | 35775 B, 2.98x input | | valid SQL | 0 | 0 | The caches are now completely string-free and the worst-case overhead drops from ~13.3x to ~3x of the input size. One side effect: when a position is parsed twice, the regenerated error reports the expression start instead of the original deeper error, which changes the message for one test case (`SELECT MAX(interval) FROM tbl` on MsSql), I updated the test accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
