mayya-sharipova commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1629661053

   @rmuir  
   
   > Can we run this test with lucene's defaults (e.g. not a 2GB rambuffer)?
   
   
   I've done the test and surprising indexing time decreased substantially. It 
is almost 2 times faster to index with Lucene's defaults than with 2Gb  
RamBuffer at the expense that we end up with a bigger number of segments.
   
   - Lucene 9.7 branch with FloatVectorValues.MAX_DIMENSIONS set to 2048
   - preferredBitSize=128
   - Panama Vector API enabled
   - vector dims: 1536
   - num of docs: 2.68M
   
   | RamBuffer Size   |      Indexing time      |  Num of segments |
   |----------: |-------------:|------:|
   | 16 Mb |   1877 s | 19|
   | 1994 Mb |    3141s   |   9 |
   
   
   <details>
    <summary>Details</summary>
   
   ```
   WARNING: Using incubator modules: jdk.incubator.vector
   Jul 10, 2023 3:35:25 P.M. 
org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
   INFO: Using MemorySegmentIndexInput with Java 20; to disable start with 
-Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
   Jul 10, 2023 3:35:26 P.M. org.apache.lucene.util.VectorUtilPanamaProvider 
<init>
   INFO: Java vector incubator API enabled; uses preferredBitSize=128
   
   _fc.fdt                             _v6.fnm                             
_vj.si                              _vr_Lucene95HnswVectorsFormat_0.vec
   _fc.fdx                             _v6.si                              
_vj_Lucene95HnswVectorsFormat_0.vec _vr_Lucene95HnswVectorsFormat_0.vem
   _fc.fnm                             _v6_Lucene95HnswVectorsFormat_0.vec 
_vj_Lucene95HnswVectorsFormat_0.vem _vr_Lucene95HnswVectorsFormat_0.vex
   _fc.si                              _v6_Lucene95HnswVectorsFormat_0.vem 
_vj_Lucene95HnswVectorsFormat_0.vex _vs.fdm
   _fc_Lucene95HnswVectorsFormat_0.vec _v6_Lucene95HnswVectorsFormat_0.vex 
_vl.fdm                             _vs.fdt
   creating index in vectors.bin-16-100.index
   MS 0 [2023-07-10T14:47:25.668178Z; main]: initDynamicDefaults 
maxThreadCount=4 maxMergeCount=9
   IFD 0 [2023-07-10T14:47:25.725823Z; main]: init: current segments file is 
"segments"; 
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@64f6106c
   IFD 0 [2023-07-10T14:47:25.735809Z; main]: now delete 0 files: []
   IFD 0 [2023-07-10T14:47:25.738456Z; main]: now checkpoint "" [0 segments ; 
isCommit = false]
   IFD 0 [2023-07-10T14:47:25.738587Z; main]: now delete 0 files: []
   IFD 0 [2023-07-10T14:47:25.743719Z; main]: 2 ms to checkpoint
   IW 0 [2023-07-10T14:47:25.744195Z; main]: init: create=true reader=null
   IW 0 [2023-07-10T14:47:25.779752Z; main]:
   
dir=MMapDirectory@/Users/mayya/Elastic/knn/open_ai_vectors/vectors.bin-16-100.index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@319b92f3
   index=
   version=9.7.0
   analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
   ramBufferSizeMB=16.0
   maxBufferedDocs=-1
   mergedSegmentWarmer=null
   delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
   commit=null
   openMode=CREATE
   similarity=org.apache.lucene.search.similarities.BM25Similarity
   mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=4, maxMergeCount=9, 
ioThrottle=true
   codec=Lucene95
   infoStream=org.apache.lucene.util.PrintStreamInfoStream
   mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10, 
maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, 
forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, 
maxCFSSegmentSizeMB=8.796093022208E12, noCFSRatio=0.1, deletesPctAllowed=20.0
   readerPooling=true
   perThreadHardLimitMB=1945
   useCompoundFile=false
   commitOnClose=true
   indexSort=null
   checkPendingFlushOnUpdate=true
   softDeletesField=null
   maxFullFlushMergeWaitMillis=500
   leafSorter=null
   eventListener=org.apache.lucene.index.IndexWriterEventListener$1@10a035a0
   writer=org.apache.lucene.index.IndexWriter@67b467e9
   
   IW 0 [2023-07-10T14:47:25.780320Z; main]: MMapDirectory.UNMAP_SUPPORTED=true
   FP 0 [2023-07-10T14:47:27.042597Z; main]: trigger flush: 
activeBytes=16779458 deleteBytes=0 vs ramBufferMB=16.0
   FP 0 [2023-07-10T14:47:27.045564Z; main]: thread state has 16779458 bytes; 
docInRAM=2589
   FP 0 [2023-07-10T14:47:27.049109Z; main]: 1 in-use non-flushing threads 
states
   DWPT 0 [2023-07-10T14:47:27.050859Z; main]: flush postings as segment _0 
numDocs=2589
   ....
   Indexed 2680961 documents in 1877s
   ```
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to