LucaCappelletti94 commented on PR #2352:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/2352#issuecomment-4634902553

   <img width="2380" height="700" alt="image" 
src="https://github.com/user-attachments/assets/0827b534-d630-4df0-8223-85605ef07132";
 />
   
   I implemented the suggested approach: the caches now store a `Copy` marker 
(`RecursionLimitExceeded` vs generic error) instead of the full `ParserError`, 
and the message is regenerated on demand on a cache hit. Re-ran the same 
scaling measurement on both commits, entry counts are identical in every 
regime, so only the stored value changed.
   
   | | old (cached `ParserError`) | new (`Copy` marker) |
   |---|---|---|
   | per-entry payload | 40 B (8 key + 32 `ParserError`) | 9 B (8 key + 1 
marker) |
   | heap strings | nested peaks 1863 B, wide plateaus 598 B | always 0 |
   | nested cap (N >= 50) | 3880 B | 873 B (0.22x) |
   | wide N=2000 (12 KB sql) | 159598 B, 13.30x input | 35775 B, 2.98x input |
   | valid SQL | 0 | 0 |
   
   The caches are now completely string-free and the worst-case overhead drops 
from ~13.3x to ~3x of the input size. One side effect: when a position is 
parsed twice, the regenerated error reports the expression start instead of the 
original deeper error, which changes the message for one test case (`SELECT 
MAX(interval) FROM tbl` on MsSql), I updated the test accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to