Re: [I] [Discuss] Reducing allocations in HnswUtil::markRooted [lucene]

2024-12-01 Thread via GitHub


msokolov commented on issue #14002:
URL: https://github.com/apache/lucene/issues/14002#issuecomment-2509991790

   > Should we be concerned about the chance of a humongous allocation (if the 
stack is really deep) with primitives? If not, I can give it a shot with 
running the same benchmarks with primitives and starting at sqrt of the graph 
size at that level.
   
   I don't think switching to primitives would be worse than what we do today 
(which would have the same humongous allocation, only bigger). So, +1 to try it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Grammar and typo fixes [lucene]

2024-12-01 Thread via GitHub


msokolov commented on code in PR #14019:
URL: https://github.com/apache/lucene/pull/14019#discussion_r1864957737


##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -184,7 +184,7 @@ List 
filterAndLock(Predicate
   /**
* Removes the given DWPT from the pool unless it's already been removed 
before.
*
-   * @return true iff the given DWPT has been removed. Otherwise 
false
+   * @return true, iff the given DWPT has been removed. Otherwise 
false

Review Comment:
   Since this is all about syntax, grammar, etc, we might as well italicize 
_iff_ since that is how it is generally done, as in the example you cited. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use Arrays.mismatch in FSTCompiler#add. [lucene]

2024-12-01 Thread via GitHub


msokolov commented on PR #13924:
URL: https://github.com/apache/lucene/pull/13924#issuecomment-2510044787

   Thanks, LGTM, merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use Arrays.mismatch in FSTCompiler#add. [lucene]

2024-12-01 Thread via GitHub


msokolov merged PR #13924:
URL: https://github.com/apache/lucene/pull/13924


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Grammar and typo fixes [lucene]

2024-12-01 Thread via GitHub


msokolov commented on code in PR #14019:
URL: https://github.com/apache/lucene/pull/14019#discussion_r1864959587


##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -101,8 +101,8 @@
  * whether a new index is created, or whether an existing index is opened. 
Note that you can open an
  * index with {@link OpenMode#CREATE} even while readers are using the index. 
The old readers will
  * continue to search the "point in time" snapshot they had opened, and won't 
see the newly created
- * index until they re-open. If {@link OpenMode#CREATE_OR_APPEND} is used 
IndexWriter will create a
- * new index if there is not already an index at the provided path and 
otherwise open the existing
+ * index until they re-open. If {@link OpenMode#CREATE_OR_APPEND} is used, 
IndexWriter will create a

Review Comment:
   Again, not a great sentence, but fixing it requires a bigger rewrite; I 
don't think that this change is better (the "will" has no subject).



##
lucene/core/src/java/org/apache/lucene/index/IndexOptions.java:
##
@@ -33,7 +33,7 @@ public enum IndexOptions {
   DOCS,
   /**
* Only documents and term frequencies are indexed: positions are omitted. 
This enables normal
-   * scoring, except Phrase and other positional queries will throw an 
exception.
+   * scoring, except phrase and other positional queries will throw an 
exception.

Review Comment:
   I think I would prefer to avoid this ambiiguity by using  `PhraseQuery` 
explicitly



##
lucene/core/src/java/org/apache/lucene/index/IndexCommit.java:
##
@@ -25,9 +25,9 @@
  * Expert: represents a single commit into an index as seen by the {@link 
IndexDeletionPolicy} or
  * {@link IndexReader}.
  *
- * Changes to the content of an index are made visible only after the 
writer who made that change
- * commits by writing a new segments file (segments_N). This 
point in time, when the
- * action of writing of a new segments file to the directory is completed, is 
an index commit.
+ * Changes to the content of an index are made visible only after the 
writer which made that

Review Comment:
   sorry this is wrong. We could use `that`, but `who` is also correct - no 
need to "fix" it,



##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -123,13 +123,13 @@
  * IndexWriterConfig#setRAMBufferSizeMB}) or the number of added documents 
(see {@link
  * IndexWriterConfig#setMaxBufferedDocs(int)}). The default is to flush when 
RAM usage hits {@link
  * IndexWriterConfig#DEFAULT_RAM_BUFFER_SIZE_MB} MB. For best indexing speed 
you should flush by RAM
- * usage with a large RAM buffer. In contrast to the other flush options {@link
- * IndexWriterConfig#setRAMBufferSizeMB} and {@link 
IndexWriterConfig#setMaxBufferedDocs(int)},
+ * usage with a large RAM buffer. In contrast to the other flush options, in 
case of {@link

Review Comment:
   let's leave this as it was. It was OK.



##
lucene/core/src/java/org/apache/lucene/index/IndexCommit.java:
##
@@ -25,9 +25,9 @@
  * Expert: represents a single commit into an index as seen by the {@link 
IndexDeletionPolicy} or
  * {@link IndexReader}.
  *
- * Changes to the content of an index are made visible only after the 
writer who made that change
- * commits by writing a new segments file (segments_N). This 
point in time, when the
- * action of writing of a new segments file to the directory is completed, is 
an index commit.
+ * Changes to the content of an index are made visible only after the 
writer which made that
+ * change commits by writing a new segments file (segments_N). 
This point in time, when

Review Comment:
   OK this sentence is awkward, but what you've done here doesn't really 
address the core problem. I think it really would be most correct as:  "_The_ 
point in time (no comma) when ... is completed (no comma) is an index commit". 



##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java:
##
@@ -184,7 +184,7 @@ List 
filterAndLock(Predicate
   /**
* Removes the given DWPT from the pool unless it's already been removed 
before.
*
-   * @return true iff the given DWPT has been removed. Otherwise 
false
+   * @return true, iff the given DWPT has been removed. Otherwise 
false

Review Comment:
   also, the comma you inserted is not correct (try replacing iff with "if and 
only if" - we wouldn't use a comma before "if").



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-12-01 Thread via GitHub


msokolov closed pull request #13872: Remove vector values copy() methods, 
moving IndexInput.clone() and temp storage into lower-level interfaces
URL: https://github.com/apache/lucene/pull/13872


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors [lucene]

2024-12-01 Thread via GitHub


msokolov commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2509938326

   The impact on search times is surprising, as you said. Can you clarify one 
thing a about the benchmark setup: does it perform searches concurrently with 
indexing (and merging) on the same box, or are they completely separate?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix changelog for GITHUB#14011 [lucene]

2024-12-01 Thread via GitHub


msokolov commented on PR #14018:
URL: https://github.com/apache/lucene/pull/14018#issuecomment-2509947234

   Thanks for that, I will move to ``Optimizations` section and merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Better encapsulate locking logic in HnswGraphBuilder [lucene]

2024-12-01 Thread via GitHub


msokolov commented on PR #14016:
URL: https://github.com/apache/lucene/pull/14016#issuecomment-2509959742

   I'm not sure we need this because we guarantee that an index cannot be 
written while it is being read under normal circumstances, and I think it's 
actually good to expose less functionality if there is no purpose for it, to 
avoid traps.  Did you have a use case in mind?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix changelog for GITHUB#14011 [lucene]

2024-12-01 Thread via GitHub


msokolov commented on PR #14018:
URL: https://github.com/apache/lucene/pull/14018#issuecomment-2509978645

   @viliam-durina would you also like to attempt the backport of this and the 
related commit to 10.1? Generally this is a matter of cherry-picking the change 
on to the branch, in this case it would be `branch_10_1`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix changelog for GITHUB#14011 [lucene]

2024-12-01 Thread via GitHub


msokolov merged PR #14018:
URL: https://github.com/apache/lucene/pull/14018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org