sajjad-moradi opened a new issue #7929: URL: https://github.com/apache/pinot/issues/7929
We found out that during segment completion, segment build takes a long time. For one table it takes more than half an hour to build an immutable segment while it used to take around one minute to do so. After some investigation, it turned out that the root cause is in this PR: https://github.com/apache/pinot/pull/7595 More specifically the issue is with refactoring of `BaseChunkSVForwardIndexWriter` where we used to have one separate in-memory byte buffer to compress each chunk and then write its content to the index file: ```java sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer); _dataFile.write(_compressedBuffer, _dataOffset); _compressedBuffer.clear(); ``` After writing the chunk, the bytebuffer gets cleared and the same object will be reused in the next writeChunk call. Now after refactoring, the reusable byte buffer is gone and in every writeChunk call, small part of the index file gets memory mapped into a new MappedByteBuffer and and the chunk data is compressed to that mapped byte buffer which in turn automatically gets written into the index file. ```java int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit()); try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset, maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) { ByteBuffer view = compressedBuffer.toDirectByteBuffer(0, maxCompressedSize); sizeWritten = _chunkCompressor.compress(_chunkBuffer, view); } ``` This may look better as it doesn't need an extra byte buffer for compression, but since the size of the chunk is very small - 1000 * data type size (8 bytes for long) - memory mapping degrades the performance [1]. We experimented a bit with the segments of the problematic table and turned out that even with SSD it takes more than 30% time to build the segment. For HDD, it's much worse and it takes more than 30x (one minute for using interim byte buffer vs more than 30 minutes for memory mapping). [1] From Oracle documentation: For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory. https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode,%20long,%20long) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org