ChrisHegarty commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1611648786
I ran @mayya-sharipova's exact same benchmark/test on my machine. Here are the results. ### Test environment - Dataset: - [nq](https://huggingface.co/datasets/BeIR/nq) dataset with `text` field embedded with OpenAI `text-embedding-ada-002` model, 1536 dims - [KnnGraphTester](https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java) - maxConn: 16, beamWidthIndex: 100 - Linux, x86_64 11th Intel Core i5-11400 @ 2.60GHz - AVX 512 - JDK 20.0.1 ### Result | Panama(bits)| dims | time (secs) | | ----------- | --------|-------------| | No | 1024 | 3136 | | Yes(512) | 1536 | 2633 | So the test run with 1536 dims and Panama enabled at AVX 512 was 503 secs (or ~16%) faster than the run with 1024 dims and No Panama. ### Test1: - Lucene 9.7.0 - Panama Vector API **not** enabled - vector dims=1024 (OpenAi vectors that were cut off to first 1024 dims) - Results: Indexed 2680961 documents in 3136s <details> <summary>Details</summary> ``` davekim$ time /home/chegar/binaries/jdk-20.0.1/bin/java -cp lucene-9.7.0/modules/*:/home/chegar/git/lucene/lucene/core/build/classes/java/test -Xmx16g -Xms16g org.apache.lucene.util.hnsw.KnnGraphTester -dim 1024 -ndoc 2680961 -reindex -docs vector_search-open_ai_vectors_1024-vectors_dims1024.bin -maxConn 16 -beamWidthIndex 100 creating index in vector_search-open_ai_vectors_1024-vectors_dims1024.bin-16-100.index Jun 28, 2023 1:44:34 PM org.apache.lucene.store.MemorySegmentIndexInputProvider <init> INFO: Using MemorySegmentIndexInput with Java 20; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false MS 0 [2023-06-28T12:44:34.340877459Z; main]: initDynamicDefaults maxThreadCount=4 maxMergeCount=9 IFD 0 [2023-06-28T12:44:34.355786340Z; main]: init: current segments file is "segments"; deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@7e9a5fbe IFD 0 [2023-06-28T12:44:34.358595927Z; main]: now delete 0 files: [] IFD 0 [2023-06-28T12:44:34.359321686Z; main]: now checkpoint "" [0 segments ; isCommit = false] IFD 0 [2023-06-28T12:44:34.359380405Z; main]: now delete 0 files: [] IFD 0 [2023-06-28T12:44:34.360606701Z; main]: 0 ms to checkpoint IW 0 [2023-06-28T12:44:34.361060247Z; main]: init: create=true reader=null IW 0 [2023-06-28T12:44:34.367050357Z; main]: dir=MMapDirectory@/home/chegar/git/lucene-vector-bench/vector_search-open_ai_vectors_1024-vectors_dims1024.bin-16-100.index lockFactory=org.apache.lucene.store.NativeFSLockFactory@46238e3f index= version=9.7.0 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer ramBufferSizeMB=1994.0 maxBufferedDocs=-1 mergedSegmentWarmer=null delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy commit=null openMode=CREATE similarity=org.apache.lucene.search.similarities.BM25Similarity mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=4, maxMergeCount=9, ioThrottle=true codec=Lucene95 infoStream=org.apache.lucene.util.PrintStreamInfoStream mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, maxCFSSegmentSizeMB=8.796093022208E12, noCFSRatio=0.1, deletesPctAllowed=20.0 readerPooling=true perThreadHardLimitMB=1945 useCompoundFile=false commitOnClose=true indexSort=null checkPendingFlushOnUpdate=true softDeletesField=null maxFullFlushMergeWaitMillis=500 leafSorter=null eventListener=org.apache.lucene.index.IndexWriterEventListener$1@6c9f5c0d writer=org.apache.lucene.index.IndexWriter@de3a06f IW 0 [2023-06-28T12:44:34.367221110Z; main]: MMapDirectory.UNMAP_SUPPORTED=true Jun 28, 2023 1:44:34 PM org.apache.lucene.util.VectorUtilProvider lookup WARNING: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API. DWPT 0 [2023-06-28T12:53:31.591056430Z; main]: flush postings as segment _0 numDocs=460521 IW 0 [2023-06-28T12:53:31.591842896Z; main]: 0 ms to write norms IW 0 [2023-06-28T12:53:31.592260907Z; main]: 0 ms to write docValues IW 0 [2023-06-28T12:53:31.592370750Z; main]: 0 ms to write points IW 0 [2023-06-28T12:53:32.987321518Z; main]: 1394 ms to write vectors IW 0 [2023-06-28T12:53:32.997512174Z; main]: 10 ms to finish stored fields IW 0 [2023-06-28T12:53:32.997693539Z; main]: 0 ms to write postings and finish vectors IW 0 [2023-06-28T12:53:32.998159715Z; main]: 0 ms to write fieldInfos DWPT 0 [2023-06-28T12:53:32.999257618Z; main]: new segment has 0 deleted docs DWPT 0 [2023-06-28T12:53:32.999365945Z; main]: new segment has 0 soft-deleted docs DWPT 0 [2023-06-28T12:53:33.000456314Z; main]: new segment has no vectors; no norms; no docValues; no prox; freqs DWPT 0 [2023-06-28T12:53:33.000586334Z; main]: flushedFiles=[_0_Lucene95HnswVectorsFormat_0.vem, _0.fdm, _0_Lucene95HnswVectorsFormat_0.vec, _0.fdx, _0_Lucene95HnswVectorsFormat_0.vex, _0.fdt, _0.fnm] DWPT 0 [2023-06-28T12:53:33.000673681Z; main]: flushed codec=Lucene95 DWPT 0 [2023-06-28T12:53:33.001725500Z; main]: flushed: segment=_0 ramUsed=1,945.017 MB newFlushedSize=1,824.658 MB docs/MB=252.388 DWPT 0 [2023-06-28T12:53:33.002919290Z; main]: flush time 1412.932331 ms IW 0 [2023-06-28T12:53:33.004048349Z; main]: publishFlushedSegment seg-private updates=null IW 0 [2023-06-28T12:53:33.004702334Z; main]: publishFlushedSegment _0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic, lucene.version=9.7.0, source=flush, timestamp=1687956813001, java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=1qx5zulv7rcv8o0t4f62zfjjz BD 0 [2023-06-28T12:53:33.006074639Z; main]: finished packet delGen=1 now completedDelGen=1 IW 0 [2023-06-28T12:53:33.007517182Z; main]: publish sets newSegment delGen=1 seg=_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic, lucene.version=9.7.0, source=flush, timestamp=1687956813001, java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=1qx5zulv7rcv8o0t4f62zfjjz IFD 0 [2023-06-28T12:53:33.007718974Z; main]: now checkpoint "_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic, lucene.version=9.7.0, source=flush, timestamp=1687956813001, java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=1qx5zulv7rcv8o0t4f62zfjk0" [1 segments ; isCommit = false] IFD 0 [2023-06-28T12:53:33.008114732Z; main]: now delete 0 files: [] IFD 0 [2023-06-28T12:53:33.008168685Z; main]: 0 ms to checkpoint MP 0 [2023-06-28T12:53:33.010309939Z; main]: seg=_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic, lucene.version=9.7.0, source=flush, timestamp=1687956813001, java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=1qx5zulv7rcv8o0t4f62zfjk0 size=1824.659 MB MP 0 [2023-06-28T12:53:33.010610953Z; main]: findMerges: 1 segments MP ... Indexed 2680961 documents in 3136s ``` </details> ### Test2 - Lucene 9.7 with FloatVectorValues.MAX_DIMENSIONS patched to a MAX_DIMENSIONS of 2048 - Panama Vector API **enabled** preferredBitSize=`512` - vector dims=1536 - Results: Indexed 2680961 documents in 2633s <details> <summary>Details</summary> ``` davekim$ time /home/chegar/binaries/jdk-20.0.1/bin/java \ --add-modules=jdk.incubator.vector \ -cp /home/chegar/git/lucene/lucene/core/build/libs/lucene-core-9.7.0-SNAPSHOT.jar:lucene-9.7.0/modules/*:/home/chegar/git/lucene/lucene/core/build/classes/java/test \ -Xmx16g -Xms16g \ org.apache.lucene.util.hnsw.KnnGraphTester \ -dim 1536 \ -ndoc 2680961 \ -reindex \ -docs vector_search-open_ai_vectors-vectors.bin \ -maxConn 16 \ -beamWidthIndex 100 WARNING: Using incubator modules: jdk.incubator.vector creating index in vector_search-open_ai_vectors-vectors.bin-16-100.index Jun 28, 2023 3:18:08 PM org.apache.lucene.store.MemorySegmentIndexInputProvider <init> INFO: Using MemorySegmentIndexInput with Java 20; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false MS 0 [2023-06-28T14:18:08.783226914Z; main]: initDynamicDefaults maxThreadCount=4 maxMergeCount=9 IFD 0 [2023-06-28T14:18:08.798094830Z; main]: init: current segments file is "segments"; deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1efee8e7 IFD 0 [2023-06-28T14:18:08.800639373Z; main]: now delete 0 files: [] IFD 0 [2023-06-28T14:18:08.801349082Z; main]: now checkpoint "" [0 segments ; isCommit = false] IFD 0 [2023-06-28T14:18:08.801461676Z; main]: now delete 0 files: [] IFD 0 [2023-06-28T14:18:08.802987862Z; main]: 0 ms to checkpoint IW 0 [2023-06-28T14:18:08.803265302Z; main]: init: create=true reader=null IW 0 [2023-06-28T14:18:08.809406650Z; main]: dir=MMapDirectory@/home/chegar/git/lucene-vector-bench/vector_search-open_ai_vectors-vectors.bin-16-100.index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1dd02175 index= version=9.7.0 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer ramBufferSizeMB=1994.0 maxBufferedDocs=-1 mergedSegmentWarmer=null delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy commit=null openMode=CREATE similarity=org.apache.lucene.search.similarities.BM25Similarity mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=4, maxMergeCount=9, ioThrottle=true codec=Lucene95 infoStream=org.apache.lucene.util.PrintStreamInfoStream mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, maxCFSSegmentSizeMB=8.796093022208E12, noCFSRatio=0.1, deletesPctAllowed=20.0 readerPooling=true perThreadHardLimitMB=1945 useCompoundFile=false commitOnClose=true indexSort=null checkPendingFlushOnUpdate=true softDeletesField=null maxFullFlushMergeWaitMillis=500 leafSorter=null eventListener=org.apache.lucene.index.IndexWriterEventListener$1@3d3fcdb0 writer=org.apache.lucene.index.IndexWriter@641147d0 IW 0 [2023-06-28T14:18:08.809591811Z; main]: MMapDirectory.UNMAP_SUPPORTED=true Jun 28, 2023 3:18:08 PM org.apache.lucene.util.VectorUtilPanamaProvider <init> INFO: Java vector incubator API enabled; uses preferredBitSize=512 DWPT 0 [2023-06-28T14:23:17.927393364Z; main]: flush postings as segment _0 numDocs=314897 IW 0 [2023-06-28T14:23:17.928214793Z; main]: 0 ms to write norms IW 0 [2023-06-28T14:23:17.928486805Z; main]: 0 ms to write docValues IW 0 [2023-06-28T14:23:17.928593869Z; main]: 0 ms to write points IW 0 [2023-06-28T14:23:19.282981254Z; main]: 1354 ms to write vectors IW 0 [2023-06-28T14:23:19.290000600Z; main]: 6 ms to finish stored fields IW 0 [2023-06-28T14:23:19.290178853Z; main]: 0 ms to write postings and finish vectors IW 0 [2023-06-28T14:23:19.290669001Z; main]: 0 ms to write fieldInfos DWPT 0 [2023-06-28T14:23:19.291053701Z; main]: new segment has 0 deleted docs DWPT 0 [2023-06-28T14:23:19.291129515Z; main]: new segment has 0 soft-deleted docs DWPT 0 [2023-06-28T14:23:19.292160606Z; main]: new segment has no vectors; no norms; no docValues; no prox; freqs DWPT 0 [2023-06-28T14:23:19.292249403Z; main]: flushedFiles=[_0_Lucene95HnswVectorsFormat_0.vem, _0.fdm, _0_Lucene95HnswVectorsFormat_0.vec, _0.fdx, _0_Lucene95HnswVectorsFormat_0.vex, _0.fdt, _0.fnm] DWPT 0 [2023-06-28T14:23:19.292320403Z; main]: flushed codec=Lucene95 DWPT 0 [2023-06-28T14:23:19.295665508Z; main]: flushed: segment=_0 ramUsed=1,945.012 MB newFlushedSize=1,863.46 MB docs/MB=168.985 DWPT 0 [2023-06-28T14:23:19.296825017Z; main]: flush time 1370.228388 ms IW 0 [2023-06-28T14:23:19.297541689Z; main]: publishFlushedSegment seg-private updates=null IW 0 [2023-06-28T14:23:19.298158353Z; main]: publishFlushedSegment _0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295, java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux, os.arch=amd64, os.version=6.2.0-23-generic, lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=9b08nbm1nw553b43pa9kzvach BD 0 [2023-06-28T14:23:19.299549573Z; main]: finished packet delGen=1 now completedDelGen=1 IW 0 [2023-06-28T14:23:19.301085879Z; main]: publish sets newSegment delGen=1 seg=_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295, java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux, os.arch=amd64, os.version=6.2.0-23-generic, lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=9b08nbm1nw553b43pa9kzvach IFD 0 [2023-06-28T14:23:19.301281180Z; main]: now checkpoint "_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295, java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux, os.arch=amd64, os.version=6.2.0-23-generic, lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=9b08nbm1nw553b43pa9kzvaci" [1 segments ; isCommit = false] IFD 0 [2023-06-28T14:23:19.301666023Z; main]: now delete 0 files: [] IFD 0 [2023-06-28T14:23:19.301718781Z; main]: 0 ms to checkpoint MP 0 [2023-06-28T14:23:19.303689024Z; main]: seg=_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295, java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux, os.arch=amd64, os.version=6.2.0-23-generic, lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=9b08nbm1nw553b43pa9kzvaci size=1863.460 MB MP 0 [2023-06-28T14:23:19.303936133Z; main]: findMerges: 1 segments MP .... Indexed 2680961 documents in 2633s ``` </details> Full output from the test runs can be see here https://gist.github.com/ChrisHegarty/ef008da196624c1a3fe46578ee3a0a6c. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org