[GitHub] [pinot] richardstartin edited a comment on pull request #7927: Fix realtime ingestion when an entire batch of messages is filtered out

GitBox Mon, 20 Dec 2021 21:12:40 -0800


richardstartin edited a comment on pull request #7927:
URL: https://github.com/apache/pinot/pull/7927#issuecomment-998475535



   > On a separate note, I saw that originally there were some optimizations 
piggy-backed to this PR. We should avoid doing that. Each PR should only focus 
on one feature, one bug fix, etc. Any optimization or refactoring should go in 
a separate PR. That might take a little longer for the author, but it benefits 
all of us in the long run.
   
   There were no optimisations in this PR, there were some fixes for 
concurrency bugs (e.g. it's incorrect to use an increment operator on a 
volatile variable) so I'm not sure what you're referring to. 
   
   > One example of this issue happened last week when we spent a long time 
finding the root cause for the chunk index writer bug. The problematic 
optimization was added in a PR with title Add MV raw forward index and MV BYTES 
data type. 
   
   I'm not sure this is the best place to be discussing this but you are 
implying that the change you mentioned was unrelated to the PR it was made in, 
but it wasn't. The purpose of the change, as has been discussed, is that the 
particular class creates very large buffers. It was included in that PR because 
its use for MV BYTES columns exacerbates this by multiplying what is already an 
overestimate of the buffer size by the maximum number of elements in the 
column. So the change was a mitigation to the worst case made worse by that PR.
   
   > The table having performance problem didn't have any multi-value columns 
or byte data type. So we couldn't easily tell if this commit is related or not. 
And there were a lot of commits we needed to examine. If we had a separate PR 
for that optimization, we could've found the root cause much easier!
   
   Looking at commits to figure out what changed isn't an efficient diagnostic 
technique. Had you instead used a profiler you would have found the process was 
spending a large amount of time in the syscalls `mmap` and `munmap`; looking at 
the git blame for each Pinot stack frame above those syscalls would have found 
the culprit in O(stack depth) time rather than O(lines of code changed). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] richardstartin edited a comment on pull request #7927: Fix realtime ingestion when an entire batch of messages is filtered out

Reply via email to