mikemccand commented on issue #15079: URL: https://github.com/apache/lucene/issues/15079#issuecomment-3196351421
The chart is indeed depressing on first glance!... 9/25/2021 was due to increasing default HNSW `beamWidth` from 16 to 100, 7/17/2023 was from increase vectors from 100 dims to 768 dims (mpnet), then 8/25/2024 was from switching from mpnet to Cohere wikipedia vectors (still 768 dims). But 4/30/2024 remains unexplained. It might be [this luceneutil fix](https://github.com/mikemccand/luceneutil/commit/615ff6c75157f395dfbe2b133c67415c81fe5399), which may've been causing messed up Cohere 768 vectors we were testing on up until that point (casting each double into two adjacent messed-up `float32`). I'll add annot. I found something really strange about the first drop (8/6/2025) in this issue. When looking at the indexing output (every 10 K docs) there is a massive discontinuity in the output while building the vectors index: ``` ... Indexer: 23900000 docs (2870.7 sec); 8325.4 docs/sec Indexer: 23910000 docs (2872.6 sec); 8323.5 docs/sec Indexer: 23920000 docs (2874.4 sec); 8321.6 docs/sec Indexer: 23930000 docs (2876.3 sec); 8319.7 docs/sec Indexer: 23940000 docs (2877.9 sec); 8318.5 docs/sec Indexer: 23950000 docs (8665.2 sec); 2763.9 docs/sec Indexer: 23960000 docs (8666.0 sec); 2764.8 docs/sec Indexer: 23970000 docs (8666.6 sec); 2765.8 docs/sec Indexer: 23980000 docs (8667.3 sec); 2766.7 docs/sec Indexer: 23990000 docs (8668.0 sec); 2767.7 docs/sec Indexer: 24000000 docs (8668.7 sec); 2768.6 docs/sec ... ``` ~5787 seconds hang! This might be CMS back-pressure. I peeked in kernel logs and don't see any smoking gun, and, the next couple day's runs don't show this odd hang. I'll turn on InfoStream for this indexing going forwards... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org