ChrisHegarty commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1611648786

   I ran @mayya-sharipova's exact same benchmark/test on my machine. Here are 
the results.
   
   
   
   ### Test environment
   - Dataset:
     - [nq](https://huggingface.co/datasets/BeIR/nq) dataset with `text` field 
embedded with OpenAI `text-embedding-ada-002` model, 1536 dims
   
   - 
[KnnGraphTester](https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java)
   - maxConn: 16, beamWidthIndex: 100
   - Linux, x86_64 11th Intel Core i5-11400 @ 2.60GHz - AVX 512 
   - JDK 20.0.1
   
   ### Result
   
   | Panama(bits)|  dims   | time (secs) |
   | ----------- | --------|-------------|
   |  No         |  1024   | 3136        |
   |  Yes(512)   |  1536   | 2633        |
   
   
   So the test run with 1536 dims and Panama enabled at AVX 512 was 503 secs 
(or ~16%) faster than the run with 1024 dims and No Panama.  
   
   ### Test1:
   - Lucene 9.7.0
   - Panama Vector API **not** enabled
   - vector dims=1024 (OpenAi vectors that were cut off to first 1024 dims)
   - Results: Indexed 2680961 documents in 3136s
   
   <details>
    <summary>Details</summary>
   
   ```
   davekim$ time /home/chegar/binaries/jdk-20.0.1/bin/java  -cp 
lucene-9.7.0/modules/*:/home/chegar/git/lucene/lucene/core/build/classes/java/test
  -Xmx16g -Xms16g  org.apache.lucene.util.hnsw.KnnGraphTester  -dim 1024  -ndoc 
2680961  -reindex  -docs 
vector_search-open_ai_vectors_1024-vectors_dims1024.bin  -maxConn 16  
-beamWidthIndex 100
   creating index in 
vector_search-open_ai_vectors_1024-vectors_dims1024.bin-16-100.index
   Jun 28, 2023 1:44:34 PM 
org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
   INFO: Using MemorySegmentIndexInput with Java 20; to disable start with 
-Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
   MS 0 [2023-06-28T12:44:34.340877459Z; main]: initDynamicDefaults 
maxThreadCount=4 maxMergeCount=9
   IFD 0 [2023-06-28T12:44:34.355786340Z; main]: init: current segments file is 
"segments"; 
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@7e9a5fbe
   IFD 0 [2023-06-28T12:44:34.358595927Z; main]: now delete 0 files: []
   IFD 0 [2023-06-28T12:44:34.359321686Z; main]: now checkpoint "" [0 segments 
; isCommit = false]
   IFD 0 [2023-06-28T12:44:34.359380405Z; main]: now delete 0 files: []
   IFD 0 [2023-06-28T12:44:34.360606701Z; main]: 0 ms to checkpoint
   IW 0 [2023-06-28T12:44:34.361060247Z; main]: init: create=true reader=null
   IW 0 [2023-06-28T12:44:34.367050357Z; main]:
   
dir=MMapDirectory@/home/chegar/git/lucene-vector-bench/vector_search-open_ai_vectors_1024-vectors_dims1024.bin-16-100.index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@46238e3f
   index=
   version=9.7.0
   analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
   ramBufferSizeMB=1994.0
   maxBufferedDocs=-1
   mergedSegmentWarmer=null
   delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
   commit=null
   openMode=CREATE
   similarity=org.apache.lucene.search.similarities.BM25Similarity
   mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=4, maxMergeCount=9, 
ioThrottle=true
   codec=Lucene95
   infoStream=org.apache.lucene.util.PrintStreamInfoStream
   mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10, 
maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, 
forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, 
maxCFSSegmentSizeMB=8.796093022208E12, noCFSRatio=0.1, deletesPctAllowed=20.0
   readerPooling=true
   perThreadHardLimitMB=1945
   useCompoundFile=false
   commitOnClose=true
   indexSort=null
   checkPendingFlushOnUpdate=true
   softDeletesField=null
   maxFullFlushMergeWaitMillis=500
   leafSorter=null
   eventListener=org.apache.lucene.index.IndexWriterEventListener$1@6c9f5c0d
   writer=org.apache.lucene.index.IndexWriter@de3a06f
   
   IW 0 [2023-06-28T12:44:34.367221110Z; main]: 
MMapDirectory.UNMAP_SUPPORTED=true
   Jun 28, 2023 1:44:34 PM org.apache.lucene.util.VectorUtilProvider lookup
   WARNING: Java vector incubator module is not readable. For optimal vector 
performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
   DWPT 0 [2023-06-28T12:53:31.591056430Z; main]: flush postings as segment _0 
numDocs=460521
   IW 0 [2023-06-28T12:53:31.591842896Z; main]: 0 ms to write norms
   IW 0 [2023-06-28T12:53:31.592260907Z; main]: 0 ms to write docValues
   IW 0 [2023-06-28T12:53:31.592370750Z; main]: 0 ms to write points
   IW 0 [2023-06-28T12:53:32.987321518Z; main]: 1394 ms to write vectors
   IW 0 [2023-06-28T12:53:32.997512174Z; main]: 10 ms to finish stored fields
   IW 0 [2023-06-28T12:53:32.997693539Z; main]: 0 ms to write postings and 
finish vectors
   IW 0 [2023-06-28T12:53:32.998159715Z; main]: 0 ms to write fieldInfos
   DWPT 0 [2023-06-28T12:53:32.999257618Z; main]: new segment has 0 deleted docs
   DWPT 0 [2023-06-28T12:53:32.999365945Z; main]: new segment has 0 
soft-deleted docs
   DWPT 0 [2023-06-28T12:53:33.000456314Z; main]: new segment has no vectors; 
no norms; no docValues; no prox; freqs
   DWPT 0 [2023-06-28T12:53:33.000586334Z; main]: 
flushedFiles=[_0_Lucene95HnswVectorsFormat_0.vem, _0.fdm, 
_0_Lucene95HnswVectorsFormat_0.vec, _0.fdx, _0_Lucene95HnswVectorsFormat_0.vex, 
_0.fdt, _0.fnm]
   DWPT 0 [2023-06-28T12:53:33.000673681Z; main]: flushed codec=Lucene95
   DWPT 0 [2023-06-28T12:53:33.001725500Z; main]: flushed: segment=_0 
ramUsed=1,945.017 MB newFlushedSize=1,824.658 MB docs/MB=252.388
   DWPT 0 [2023-06-28T12:53:33.002919290Z; main]: flush time 1412.932331 ms
   IW 0 [2023-06-28T12:53:33.004048349Z; main]: publishFlushedSegment 
seg-private updates=null
   IW 0 [2023-06-28T12:53:33.004702334Z; main]: publishFlushedSegment 
_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic, 
lucene.version=9.7.0, source=flush, timestamp=1687956813001, 
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, 
os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] 
:id=1qx5zulv7rcv8o0t4f62zfjjz
   BD 0 [2023-06-28T12:53:33.006074639Z; main]: finished packet delGen=1 now 
completedDelGen=1
   IW 0 [2023-06-28T12:53:33.007517182Z; main]: publish sets newSegment 
delGen=1 seg=_0(9.7.0):C460521:[diagnostics={os.arch=amd64, 
os.version=6.2.0-23-generic, lucene.version=9.7.0, source=flush, 
timestamp=1687956813001, java.runtime.version=20.0.1+9-29, java.vendor=Oracle 
Corporation, 
os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] 
:id=1qx5zulv7rcv8o0t4f62zfjjz
   IFD 0 [2023-06-28T12:53:33.007718974Z; main]: now checkpoint 
"_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic, 
lucene.version=9.7.0, source=flush, timestamp=1687956813001, 
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, 
os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] 
:id=1qx5zulv7rcv8o0t4f62zfjk0" [1 segments ; isCommit = false]
   IFD 0 [2023-06-28T12:53:33.008114732Z; main]: now delete 0 files: []
   IFD 0 [2023-06-28T12:53:33.008168685Z; main]: 0 ms to checkpoint
   MP 0 [2023-06-28T12:53:33.010309939Z; main]:   
seg=_0(9.7.0):C460521:[diagnostics={os.arch=amd64, os.version=6.2.0-23-generic, 
lucene.version=9.7.0, source=flush, timestamp=1687956813001, 
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, 
os=Linux}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] 
:id=1qx5zulv7rcv8o0t4f62zfjk0 size=1824.659 MB
   MP 0 [2023-06-28T12:53:33.010610953Z; main]: findMerges: 1 segments
   MP ...
   Indexed 2680961 documents in 3136s
   ```
   </details>
   
   
   ### Test2
   - Lucene 9.7 with FloatVectorValues.MAX_DIMENSIONS patched to a 
MAX_DIMENSIONS of 2048
   - Panama Vector API **enabled**  preferredBitSize=`512`
   - vector dims=1536
   - Results: Indexed 2680961 documents in 2633s
   
   <details>
    <summary>Details</summary>
   
   ```
   davekim$ time /home/chegar/binaries/jdk-20.0.1/bin/java \
     --add-modules=jdk.incubator.vector \
     -cp 
/home/chegar/git/lucene/lucene/core/build/libs/lucene-core-9.7.0-SNAPSHOT.jar:lucene-9.7.0/modules/*:/home/chegar/git/lucene/lucene/core/build/classes/java/test
 \
     -Xmx16g -Xms16g \
     org.apache.lucene.util.hnsw.KnnGraphTester \
     -dim 1536 \
     -ndoc 2680961 \
     -reindex \
     -docs vector_search-open_ai_vectors-vectors.bin \
     -maxConn 16 \
     -beamWidthIndex 100
   WARNING: Using incubator modules: jdk.incubator.vector
   creating index in vector_search-open_ai_vectors-vectors.bin-16-100.index
   Jun 28, 2023 3:18:08 PM 
org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
   INFO: Using MemorySegmentIndexInput with Java 20; to disable start with 
-Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
   MS 0 [2023-06-28T14:18:08.783226914Z; main]: initDynamicDefaults 
maxThreadCount=4 maxMergeCount=9
   IFD 0 [2023-06-28T14:18:08.798094830Z; main]: init: current segments file is 
"segments"; 
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1efee8e7
   IFD 0 [2023-06-28T14:18:08.800639373Z; main]: now delete 0 files: []
   IFD 0 [2023-06-28T14:18:08.801349082Z; main]: now checkpoint "" [0 segments 
; isCommit = false]
   IFD 0 [2023-06-28T14:18:08.801461676Z; main]: now delete 0 files: []
   IFD 0 [2023-06-28T14:18:08.802987862Z; main]: 0 ms to checkpoint
   IW 0 [2023-06-28T14:18:08.803265302Z; main]: init: create=true reader=null
   IW 0 [2023-06-28T14:18:08.809406650Z; main]:
   
dir=MMapDirectory@/home/chegar/git/lucene-vector-bench/vector_search-open_ai_vectors-vectors.bin-16-100.index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@1dd02175
   index=
   version=9.7.0
   analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
   ramBufferSizeMB=1994.0
   maxBufferedDocs=-1
   mergedSegmentWarmer=null
   delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
   commit=null
   openMode=CREATE
   similarity=org.apache.lucene.search.similarities.BM25Similarity
   mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=4, maxMergeCount=9, 
ioThrottle=true
   codec=Lucene95
   infoStream=org.apache.lucene.util.PrintStreamInfoStream
   mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10, 
maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, 
forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, 
maxCFSSegmentSizeMB=8.796093022208E12, noCFSRatio=0.1, deletesPctAllowed=20.0
   readerPooling=true
   perThreadHardLimitMB=1945
   useCompoundFile=false
   commitOnClose=true
   indexSort=null
   checkPendingFlushOnUpdate=true
   softDeletesField=null
   maxFullFlushMergeWaitMillis=500
   leafSorter=null
   eventListener=org.apache.lucene.index.IndexWriterEventListener$1@3d3fcdb0
   writer=org.apache.lucene.index.IndexWriter@641147d0
   
   IW 0 [2023-06-28T14:18:08.809591811Z; main]: 
MMapDirectory.UNMAP_SUPPORTED=true
   Jun 28, 2023 3:18:08 PM org.apache.lucene.util.VectorUtilPanamaProvider 
<init>
   INFO: Java vector incubator API enabled; uses preferredBitSize=512
   DWPT 0 [2023-06-28T14:23:17.927393364Z; main]: flush postings as segment _0 
numDocs=314897
   IW 0 [2023-06-28T14:23:17.928214793Z; main]: 0 ms to write norms
   IW 0 [2023-06-28T14:23:17.928486805Z; main]: 0 ms to write docValues
   IW 0 [2023-06-28T14:23:17.928593869Z; main]: 0 ms to write points
   IW 0 [2023-06-28T14:23:19.282981254Z; main]: 1354 ms to write vectors
   IW 0 [2023-06-28T14:23:19.290000600Z; main]: 6 ms to finish stored fields
   IW 0 [2023-06-28T14:23:19.290178853Z; main]: 0 ms to write postings and 
finish vectors
   IW 0 [2023-06-28T14:23:19.290669001Z; main]: 0 ms to write fieldInfos
   DWPT 0 [2023-06-28T14:23:19.291053701Z; main]: new segment has 0 deleted docs
   DWPT 0 [2023-06-28T14:23:19.291129515Z; main]: new segment has 0 
soft-deleted docs
   DWPT 0 [2023-06-28T14:23:19.292160606Z; main]: new segment has no vectors; 
no norms; no docValues; no prox; freqs
   DWPT 0 [2023-06-28T14:23:19.292249403Z; main]: 
flushedFiles=[_0_Lucene95HnswVectorsFormat_0.vem, _0.fdm, 
_0_Lucene95HnswVectorsFormat_0.vec, _0.fdx, _0_Lucene95HnswVectorsFormat_0.vex, 
_0.fdt, _0.fnm]
   DWPT 0 [2023-06-28T14:23:19.292320403Z; main]: flushed codec=Lucene95
   DWPT 0 [2023-06-28T14:23:19.295665508Z; main]: flushed: segment=_0 
ramUsed=1,945.012 MB newFlushedSize=1,863.46 MB docs/MB=168.985
   DWPT 0 [2023-06-28T14:23:19.296825017Z; main]: flush time 1370.228388 ms
   IW 0 [2023-06-28T14:23:19.297541689Z; main]: publishFlushedSegment 
seg-private updates=null
   IW 0 [2023-06-28T14:23:19.298158353Z; main]: publishFlushedSegment 
_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295, 
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux, 
os.arch=amd64, os.version=6.2.0-23-generic, 
lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
 :id=9b08nbm1nw553b43pa9kzvach
   BD 0 [2023-06-28T14:23:19.299549573Z; main]: finished packet delGen=1 now 
completedDelGen=1
   IW 0 [2023-06-28T14:23:19.301085879Z; main]: publish sets newSegment 
delGen=1 seg=_0(9.7.0):C314897:[diagnostics={source=flush, 
timestamp=1687962199295, java.runtime.version=20.0.1+9-29, java.vendor=Oracle 
Corporation, os=Linux, os.arch=amd64, os.version=6.2.0-23-generic, 
lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
 :id=9b08nbm1nw553b43pa9kzvach
   IFD 0 [2023-06-28T14:23:19.301281180Z; main]: now checkpoint 
"_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295, 
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux, 
os.arch=amd64, os.version=6.2.0-23-generic, 
lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
 :id=9b08nbm1nw553b43pa9kzvaci" [1 segments ; isCommit = false]
   IFD 0 [2023-06-28T14:23:19.301666023Z; main]: now delete 0 files: []
   IFD 0 [2023-06-28T14:23:19.301718781Z; main]: 0 ms to checkpoint
   MP 0 [2023-06-28T14:23:19.303689024Z; main]:   
seg=_0(9.7.0):C314897:[diagnostics={source=flush, timestamp=1687962199295, 
java.runtime.version=20.0.1+9-29, java.vendor=Oracle Corporation, os=Linux, 
os.arch=amd64, os.version=6.2.0-23-generic, 
lucene.version=9.7.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]
 :id=9b08nbm1nw553b43pa9kzvaci size=1863.460 MB
   MP 0 [2023-06-28T14:23:19.303936133Z; main]: findMerges: 1 segments
   MP ....
   Indexed 2680961 documents in 2633s
   ```
   </details>
   
   Full output from the test runs can be see here 
https://gist.github.com/ChrisHegarty/ef008da196624c1a3fe46578ee3a0a6c.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to