Re: [I] [Discuss] Reducing allocations in HnswUtil::markRooted [lucene]
msokolov commented on issue #14002: URL: https://github.com/apache/lucene/issues/14002#issuecomment-2509991790 > Should we be concerned about the chance of a humongous allocation (if the stack is really deep) with primitives? If not, I can give it a shot with running the same benchmarks with primitives and starting at sqrt of the graph size at that level. I don't think switching to primitives would be worse than what we do today (which would have the same humongous allocation, only bigger). So, +1 to try it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Grammar and typo fixes [lucene]
msokolov commented on code in PR #14019: URL: https://github.com/apache/lucene/pull/14019#discussion_r1864957737 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -184,7 +184,7 @@ List filterAndLock(Predicate /** * Removes the given DWPT from the pool unless it's already been removed before. * - * @return true iff the given DWPT has been removed. Otherwise false + * @return true, iff the given DWPT has been removed. Otherwise false Review Comment: Since this is all about syntax, grammar, etc, we might as well italicize _iff_ since that is how it is generally done, as in the example you cited. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use Arrays.mismatch in FSTCompiler#add. [lucene]
msokolov commented on PR #13924: URL: https://github.com/apache/lucene/pull/13924#issuecomment-2510044787 Thanks, LGTM, merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use Arrays.mismatch in FSTCompiler#add. [lucene]
msokolov merged PR #13924: URL: https://github.com/apache/lucene/pull/13924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Grammar and typo fixes [lucene]
msokolov commented on code in PR #14019: URL: https://github.com/apache/lucene/pull/14019#discussion_r1864959587 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -101,8 +101,8 @@ * whether a new index is created, or whether an existing index is opened. Note that you can open an * index with {@link OpenMode#CREATE} even while readers are using the index. The old readers will * continue to search the "point in time" snapshot they had opened, and won't see the newly created - * index until they re-open. If {@link OpenMode#CREATE_OR_APPEND} is used IndexWriter will create a - * new index if there is not already an index at the provided path and otherwise open the existing + * index until they re-open. If {@link OpenMode#CREATE_OR_APPEND} is used, IndexWriter will create a Review Comment: Again, not a great sentence, but fixing it requires a bigger rewrite; I don't think that this change is better (the "will" has no subject). ## lucene/core/src/java/org/apache/lucene/index/IndexOptions.java: ## @@ -33,7 +33,7 @@ public enum IndexOptions { DOCS, /** * Only documents and term frequencies are indexed: positions are omitted. This enables normal - * scoring, except Phrase and other positional queries will throw an exception. + * scoring, except phrase and other positional queries will throw an exception. Review Comment: I think I would prefer to avoid this ambiiguity by using `PhraseQuery` explicitly ## lucene/core/src/java/org/apache/lucene/index/IndexCommit.java: ## @@ -25,9 +25,9 @@ * Expert: represents a single commit into an index as seen by the {@link IndexDeletionPolicy} or * {@link IndexReader}. * - * Changes to the content of an index are made visible only after the writer who made that change - * commits by writing a new segments file (segments_N). This point in time, when the - * action of writing of a new segments file to the directory is completed, is an index commit. + * Changes to the content of an index are made visible only after the writer which made that Review Comment: sorry this is wrong. We could use `that`, but `who` is also correct - no need to "fix" it, ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -123,13 +123,13 @@ * IndexWriterConfig#setRAMBufferSizeMB}) or the number of added documents (see {@link * IndexWriterConfig#setMaxBufferedDocs(int)}). The default is to flush when RAM usage hits {@link * IndexWriterConfig#DEFAULT_RAM_BUFFER_SIZE_MB} MB. For best indexing speed you should flush by RAM - * usage with a large RAM buffer. In contrast to the other flush options {@link - * IndexWriterConfig#setRAMBufferSizeMB} and {@link IndexWriterConfig#setMaxBufferedDocs(int)}, + * usage with a large RAM buffer. In contrast to the other flush options, in case of {@link Review Comment: let's leave this as it was. It was OK. ## lucene/core/src/java/org/apache/lucene/index/IndexCommit.java: ## @@ -25,9 +25,9 @@ * Expert: represents a single commit into an index as seen by the {@link IndexDeletionPolicy} or * {@link IndexReader}. * - * Changes to the content of an index are made visible only after the writer who made that change - * commits by writing a new segments file (segments_N). This point in time, when the - * action of writing of a new segments file to the directory is completed, is an index commit. + * Changes to the content of an index are made visible only after the writer which made that + * change commits by writing a new segments file (segments_N). This point in time, when Review Comment: OK this sentence is awkward, but what you've done here doesn't really address the core problem. I think it really would be most correct as: "_The_ point in time (no comma) when ... is completed (no comma) is an index commit". ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -184,7 +184,7 @@ List filterAndLock(Predicate /** * Removes the given DWPT from the pool unless it's already been removed before. * - * @return true iff the given DWPT has been removed. Otherwise false + * @return true, iff the given DWPT has been removed. Otherwise false Review Comment: also, the comma you inserted is not correct (try replacing iff with "if and only if" - we wouldn't use a comma before "if"). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
msokolov closed pull request #13872: Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces URL: https://github.com/apache/lucene/pull/13872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors [lucene]
msokolov commented on PR #13985: URL: https://github.com/apache/lucene/pull/13985#issuecomment-2509938326 The impact on search times is surprising, as you said. Can you clarify one thing a about the benchmark setup: does it perform searches concurrently with indexing (and merging) on the same box, or are they completely separate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix changelog for GITHUB#14011 [lucene]
msokolov commented on PR #14018: URL: https://github.com/apache/lucene/pull/14018#issuecomment-2509947234 Thanks for that, I will move to ``Optimizations` section and merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Better encapsulate locking logic in HnswGraphBuilder [lucene]
msokolov commented on PR #14016: URL: https://github.com/apache/lucene/pull/14016#issuecomment-2509959742 I'm not sure we need this because we guarantee that an index cannot be written while it is being read under normal circumstances, and I think it's actually good to expose less functionality if there is no purpose for it, to avoid traps. Did you have a use case in mind? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix changelog for GITHUB#14011 [lucene]
msokolov commented on PR #14018: URL: https://github.com/apache/lucene/pull/14018#issuecomment-2509978645 @viliam-durina would you also like to attempt the backport of this and the related commit to 10.1? Generally this is a matter of cherry-picking the change on to the branch, in this case it would be `branch_10_1` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix changelog for GITHUB#14011 [lucene]
msokolov merged PR #14018: URL: https://github.com/apache/lucene/pull/14018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org