klsince commented on code in PR #12945: URL: https://github.com/apache/pinot/pull/12945#discussion_r1591624294
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java: ########## @@ -70,23 +68,32 @@ public SingleValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType * @param maxLength length of longest entry (in bytes) * @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk * @param writerVersion writer format version + * @param targetMaxChunkSizeBytes target max chunk size in bytes, applicable only for V4 or when + * deriveNumDocsPerChunk is true + * @param targetDocsPerChunk target number of docs per chunk * @throws IOException */ public SingleValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressionType compressionType, String column, - int totalDocs, DataType valueType, int maxLength, boolean deriveNumDocsPerChunk, int writerVersion) + int totalDocs, DataType valueType, int maxLength, boolean deriveNumDocsPerChunk, int writerVersion, + int targetMaxChunkSizeBytes, int targetDocsPerChunk) throws IOException { File file = new File(baseIndexDir, column + V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION); - int numDocsPerChunk = deriveNumDocsPerChunk ? getNumDocsPerChunk(maxLength) : DEFAULT_NUM_DOCS_PER_CHUNK; + int numDocsPerChunk = + deriveNumDocsPerChunk ? getNumDocsPerChunk(maxLength, targetMaxChunkSizeBytes) : targetDocsPerChunk; + + // For columns with very small max value, target chunk size should also be capped to reduce memory during read + int dynamicTargetChunkSize = + ForwardIndexUtils.getDynamicTargetChunkSize(maxLength, targetDocsPerChunk, targetMaxChunkSizeBytes); Review Comment: should this method take numDocsPerChunk instead of targetDocsPerChunk here? or we can check deriveNumDocsPerChunk, if it's true we also derive dynamicTargetChunkSize otherwise use targetMaxChunkSizeBytes instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org