ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2027373074
########## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ########## @@ -130,4 +134,56 @@ public KnnVectorsReader getMergeInstance() { * <p>The default implementation is empty */ public void finishMerge() throws IOException {} + + /** A string representing the off-heap category for quantized vectors. */ + public static final String QUANTIZED = "QUANTIZED"; + + /** A string representing the off-heap category for the HNSW graph. */ + public static final String HNSW_GRAPH = "HNSW_GRAPH"; + + /** A string representing the off-heap category for raw vectors. */ + public static final String RAW = "RAW"; + + /** + * Returns the desired size of off-heap memory the given field. This size can be used to help + * determine the memory requirements for optimal search performance, which can be greatly affected + * by page faults when not enough memory is available. + * + * <p>For reporting purposes, the backing off-heap index structures are broken into three + * categories: 1. {@link #RAW}, 2. {@link #HNSW_GRAPH}, and 3. {@link #QUANTIZED}. The returned + * map will have zero or one entry for each of these categories. + * + * <p>The long value is the size in bytes of the off-heap space needed if the associated index + * structure were to be fully loaded in memory. While somewhat analogous to {@link + * Accountable#ramBytesUsed()} (which reports actual on-heap memory usage), the metrics reported + * by this method are not actual usage but rather the amount of available memory needed to fully + * load the index into memory, rather than an actual RAM usage requirement. + * + * <p>To determine the total desired off-heap memory size for the given field: + * + * <pre>{@code + * getOffHeapByteSize(field).values().stream().mapToLong(Long::longValue).sum(); + * }</pre> + * + * @param fieldInfo the fieldInfo + * @return a map of the desired off-heap memory requirements by category + * @lucene.experimental + */ + public abstract Map<String, Long> getOffHeapByteSize(FieldInfo fieldInfo); Review Comment: While it's convenient to reuse Maps, an alternative here is to encapsulate the categorised stats in a container, which we could enrich later with information about whether the index is actually resident in memory. ``` public abstract OffHeapStats getOffHeapByteSize(FieldInfo fieldInfo); record OffHeapStats(String field, long raw, long hnswGraph, long quantized) { } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org