sajjad-moradi opened a new issue #7929:
URL: https://github.com/apache/pinot/issues/7929


   We found out that during segment completion, segment build takes a long 
time. For one table it takes more than half an hour to build an immutable 
segment while it used to take around one minute to do so. After some 
investigation, it turned out that the root cause is in this PR: 
https://github.com/apache/pinot/pull/7595
   More specifically the issue is with refactoring of 
`BaseChunkSVForwardIndexWriter` where we used to have one separate in-memory 
byte buffer to compress each chunk and then write its content to the index file:
   ```java
         sizeToWrite = _chunkCompressor.compress(_chunkBuffer, 
_compressedBuffer);
         _dataFile.write(_compressedBuffer, _dataOffset);
         _compressedBuffer.clear();
   ```
   After writing the chunk, the bytebuffer gets cleared and the same object 
will be reused in the next writeChunk call.
   Now after refactoring, the reusable byte buffer is gone and in every 
writeChunk call, small part of the index file gets memory mapped into a new 
MappedByteBuffer and and the chunk data is compressed to that mapped byte 
buffer which in turn automatically gets written into the index file.
   ```java
       int maxCompressedSize = 
_chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
       try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, 
false, _dataOffset,
           maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {
         ByteBuffer view = compressedBuffer.toDirectByteBuffer(0, 
maxCompressedSize);
         sizeWritten = _chunkCompressor.compress(_chunkBuffer, view);
       } 
   ```
   This may look better as it doesn't need an extra byte buffer for 
compression, but since the size of the chunk is very small - 1000 * data type 
size (8 bytes for long) - memory mapping degrades the performance [1].
   We experimented a bit with the segments of the problematic table and turned 
out that even with SSD it takes more than 30% time to build the segment. For 
HDD, it's much worse and it takes more than 30x (one minute for using interim 
byte buffer vs more than 30 minutes for memory mapping).
   
   [1] From Oracle documentation:
   For most operating systems, mapping a file into memory is more expensive 
than reading or writing a few tens of kilobytes of data via the usual read and 
write methods. From the standpoint of performance it is generally only worth 
mapping relatively large files into memory.
   
https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode,%20long,%20long)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to