richardstartin commented on pull request #7927: URL: https://github.com/apache/pinot/pull/7927#issuecomment-998475535
> On a separate note, I saw that originally there were some optimizations piggy-backed to this PR. We should avoid doing that. Each PR should only focus on one feature, one bug fix, etc. Any optimization or refactoring should go in a separate PR. That might take a little longer for the author, but it benefits all of us in the long run. There were no optimisations in this PR, there were some fixes for concurrency bugs (e.g. it's incorrect to use an increment operator on a volatile variable) so I'm not sure what you're referring to. > One example of this issue happened last week when we spent a long time finding the root cause for the chunk index writer bug. The problematic optimization was added in a PR with title Add MV raw forward index and MV BYTES data type. I'm not sure this is the best place to be discussing this but you are implying that the change you mentioned was unrelated to the PR it was made in, but it wasn't. The purpose of the change, as has been discussed, is that the particular class creates very large buffers. It was included in that PR because its use for MV BYTES columns exacerbates this by multiplying what is already an overestimate of the buffer size by the maximum number of elements in the column. So the change was a mitigation to the worst case. > The table having performance problem didn't have any multi-value columns or byte data type. So we couldn't easily tell if this commit is related or not. And there were a lot of commits we needed to examine. If we had a separate PR for that optimization, we could've found the root cause much easier! Looking at commits to figure out what changed isn't an efficient diagnostic technique. Had you instead used a profiler you would have found the process was spending a large amount of time in the syscalls `mmap` and `munmap`; looking at the git blame for each Pinot stack frame above those syscalls would have found the culprit in O(stack depth) time rather than O(lines of code changed). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org