[GitHub] [lucene] benwtrent commented on pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub
benwtrent commented on PR #12529: URL: https://github.com/apache/lucene/pull/12529#issuecomment-1711551199 @msokolov what say you? It seems like encapsulating random vector seeking & scoring into one thing makes the code simpler. -- This is an automated message from the Apache Git Service

[GitHub] [lucene] jpountz commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub
jpountz commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1319777844 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java: ## @@ -423,8 +422,12 @@ public RandomAccessVectorValues c

[GitHub] [lucene] mikemccand commented on issue #12542: Lucene's FST Builder should have a simpler "knob" to trade off memory/CPU required against minimality

2023-09-08 Thread via GitHub
mikemccand commented on issue #12542: URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711608900 Digging into this a bit, I think I found some silly performance bugs in our current FST impl: * We seem to create a `PagedGrowableWriter` with [page size 128 MB here](https:

[GitHub] [lucene] javanna commented on pull request #12544: Close index readers in tests

2023-09-08 Thread via GitHub
javanna commented on PR #12544: URL: https://github.com/apache/lucene/pull/12544#issuecomment-1711628290 thanks @jpountz ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [lucene] javanna merged pull request #12544: Close index readers in tests

2023-09-08 Thread via GitHub
javanna merged PR #12544: URL: https://github.com/apache/lucene/pull/12544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] dweiss commented on issue #12542: Lucene's FST Builder should have a simpler "knob" to trade off memory/CPU required against minimality

2023-09-08 Thread via GitHub
dweiss commented on issue #12542: URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711706053 With regard to automata/ FSTs - they're nearly the same thing, conceptually. Automata are logically transducers producing a constant epsilon value (no value). This knowledge can be u

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1319990727 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java: ## @@ -423,8 +422,12 @@ public RandomAccessVectorValues co

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r132915 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004511 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/OffHeapByteVectorValues.java: ## @@ -60,13 +61,17 @@ public int size() { @Override public byte[] vecto

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004724 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004018 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -42,9 +41,7 @@ import org.apache.lucene.util.Bits;

[GitHub] [lucene] mikemccand commented on pull request #12489: Add support for recursive graph bisection.

2023-09-08 Thread via GitHub
mikemccand commented on PR #12489: URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712097368 @jpountz did you measure any change to index size with the reordered docids? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] jpountz commented on pull request #12489: Add support for recursive graph bisection.

2023-09-08 Thread via GitHub
jpountz commented on PR #12489: URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712166542 I did. My wikimedium file is sorted by title, which already gives some compression compared to random ordering. Disappointedly, recursive graph bisection only improved compression of pos

[GitHub] [lucene] javanna opened a new pull request, #12544: Close index readers in tests

2023-09-08 Thread via GitHub
javanna opened a new pull request, #12544: URL: https://github.com/apache/lucene/pull/12544 There are a few places where tests don't close index readers. This has not caused problems so far, but it becomes an issue when the reader gets an executor, because its shutdown happens as a closing

[GitHub] [lucene] jpountz commented on a diff in pull request #12544: Close index readers in tests

2023-09-08 Thread via GitHub
jpountz commented on code in PR #12544: URL: https://github.com/apache/lucene/pull/12544#discussion_r1319678371 ## lucene/core/src/test/org/apache/lucene/search/TestSort.java: ## @@ -849,11 +849,10 @@ public void testMultiSort() throws IOException { } public void testRew

[GitHub] [lucene] javanna commented on a diff in pull request #12544: Close index readers in tests

2023-09-08 Thread via GitHub
javanna commented on code in PR #12544: URL: https://github.com/apache/lucene/pull/12544#discussion_r1319723656 ## lucene/core/src/test/org/apache/lucene/search/TestSort.java: ## @@ -849,11 +849,10 @@ public void testMultiSort() throws IOException { } public void testRew