2010YOUY01 commented on issue #22723: URL: https://github.com/apache/datafusion/issues/22723#issuecomment-4618433648
> [@2010YOUY01](https://github.com/2010YOUY01) , re: issue 1 - PTAL at the actual [issue for the DataFusion 51 regression](https://github.com/apache/datafusion/issues/22739) that motivated this whole path. I should have lead with that, and I think it answers a few of your questions. Instead of doing that, I just cranked `HEADROOM` down to `5` and filed a bug for the first thing it hit (NLJ). > > Addressing your points (as I perceive them) not addressed by the above: > > 1. This conflates integration tests with memory accounting tests. i.e. the operator is at fault, not a random SLT. I would argue that most of our coverage is through SLTs so adding it here buys regression testing "for free". Normally that might be bad, because when an integration test breaks it's hard to debug, but this tracker usually points to the exact call stack of the bug. I think accounting regressions are real regressions and thus belong in the regression suite. > 2. "small memory-limited queries" - I think the question is twofold: 1. what is "small"? and 2. how does it scale? 700K in a test might be "small", but if that multiplies out in a prod environment to be 7GB, then it's big. A possible solution: we can define both the existing relative `HEADROOM` and an `ABS_HEADROOM` and it doesn't fire unless it crosses both. I would lightly push back though because that means we specifically need to run all these with large amounts of data, which means we don't get the benefit of the existing suite. > > I think your other concerns (difficulty of tracking, size required to track, etc) are all addressed in the other issue. I hope I was able to give a proper response after I put in proper effort. Thanks for raising these points. I agree this is an important issue to solve, but I think the proposed approach may be difficult to maintain with the current memory accounting model. Here are a few thoughts on: 1. why it does not work reliably today 2. what we might need before it can work well - The current operator-level memory accounting is intentionally incomplete. It mostly tracks buffers that can grow without a fixed bound, such as aggregation hash tables. It generally treats one in-flight batch per operator per partition as constant overhead and leaves it untracked. This can still become large when the operator pipeline is long or when the target partition count is high. Your validation approach would be valid if all of this memory were accounted for, but today's implementation trades completeness for simplicity. - If we do not fix that first, the `HEADROOM` + `ABS_HEADROOM` model is wrong. It could produce false positives and make the tests hard to maintain. - Given this specific bug in the issue, we have to first understand all low level detail in all related operators, then do estimation to figure out if it's false positive or not. Below is the extra investigation to do for bug report triage on this issue, I believe this is very hard to do. ``` Memory limit is 150K, measured allocation 770K - generate_series holds one temporary batch: 8K rows * 8 bytes = 64KB - Current probe side batch: 64KB - NLJ holds a batch for evaluating the filter (v1, v2, 0), plus a result batch: roughly 8K rows * 8 bytes * 4 - All buffered build side batch: 150KB --> Corrected memory estimation: 534KB ``` - I agree that catching regressions in SLTs is valuable. But given the assumptions above, the easiest thing we can do for now may be to compare memory usage between commits. That would still let us detect regressions when a PR accidentally increases memory allocation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
