avantgardnerio commented on issue #22723:
URL: https://github.com/apache/datafusion/issues/22723#issuecomment-4615716656

   @2010YOUY01 , re: issue 1 - PTAL at the actual [issue for the DataFusion 51 
regression](https://github.com/apache/datafusion/issues/22739) that motivated 
this whole path. I should have lead with that, and I think it answers a few of 
your questions. Instead of doing that, I just cranked `HEADROOM` down to `5` 
and filed a bug for the first thing it hit (NLJ).
   
   Addressing your points (as I perceive them) not addressed by the above:
   1. This conflates integration tests with memory accounting tests. i.e. the 
operator is at fault, not a random SLT. I would argue that most of our coverage 
is through SLTs so adding it here buys regression testing "for free". Normally 
that might be bad, because when an integration test breaks it's hard to debug, 
but this tracker usually points to the exact call stack of the bug. I think 
accounting regressions are real regressions and thus belong in the regression 
suite.
   2. "small memory-limited queries" - I think the question is twofold: 1. what 
is "small"? and 2. how does it scale? 700K in a test might be "small", but if 
that multiplies out in a prod environment to be 7GB, then it's big. A possible 
solution: we can define both the existing relative `HEADROOM` and an 
`ABS_HEADROOM` and it doesn't fire unless it crosses both. I would lightly push 
back though because that means we specifically need to run all these with large 
amounts of data, which means we don't get the benefit of the existing suite.
   
   I think your other concerns (difficulty of tracking, size required to track, 
etc) are all addressed in the other issue. I hope I was able to give a proper 
response after I put in proper effort.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to