mikemccand commented on issue #15079:
URL: https://github.com/apache/lucene/issues/15079#issuecomment-3196351421

   The chart is indeed depressing on first glance!...
   
   9/25/2021 was due to increasing default HNSW `beamWidth` from 16 to 100, 
7/17/2023 was from increase vectors from 100 dims to 768 dims (mpnet), then 
8/25/2024 was from switching from mpnet to Cohere wikipedia vectors (still 768 
dims).
   
   But 4/30/2024 remains unexplained.  It might be [this luceneutil 
fix](https://github.com/mikemccand/luceneutil/commit/615ff6c75157f395dfbe2b133c67415c81fe5399),
 which may've been causing messed up Cohere 768 vectors we were testing on up 
until that point (casting each double into two adjacent messed-up `float32`).  
I'll add annot.
   
   I found something really strange about the first drop (8/6/2025) in this 
issue.  When looking at the indexing output (every 10 K docs) there is a 
massive discontinuity in the output while building the vectors index:
   
   ```
   ...
   Indexer: 23900000 docs (2870.7 sec); 8325.4 docs/sec
   Indexer: 23910000 docs (2872.6 sec); 8323.5 docs/sec
   Indexer: 23920000 docs (2874.4 sec); 8321.6 docs/sec
   Indexer: 23930000 docs (2876.3 sec); 8319.7 docs/sec
   Indexer: 23940000 docs (2877.9 sec); 8318.5 docs/sec
   Indexer: 23950000 docs (8665.2 sec); 2763.9 docs/sec
   Indexer: 23960000 docs (8666.0 sec); 2764.8 docs/sec
   Indexer: 23970000 docs (8666.6 sec); 2765.8 docs/sec
   Indexer: 23980000 docs (8667.3 sec); 2766.7 docs/sec
   Indexer: 23990000 docs (8668.0 sec); 2767.7 docs/sec
   Indexer: 24000000 docs (8668.7 sec); 2768.6 docs/sec
   ...
   ```
   
   ~5787 seconds hang!  This might be CMS back-pressure.  I peeked in kernel 
logs and don't see any smoking gun, and, the next couple day's runs don't show 
this odd hang.  I'll turn on InfoStream for this indexing going forwards...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to