[GitHub] [lucene] jmazanec15 closed pull request #12002: Set algorithm params during force merge in KnnGraphTester

2023-09-19 Thread via GitHub
jmazanec15 closed pull request #12002: Set algorithm params during force merge in KnnGraphTester URL: https://github.com/apache/lucene/pull/12002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] jmazanec15 commented on pull request #12002: Set algorithm params during force merge in KnnGraphTester

2023-09-19 Thread via GitHub
jmazanec15 commented on PR #12002: URL: https://github.com/apache/lucene/pull/12002#issuecomment-1726189550 Closing as this was moved to lucenutil. Will raise a PR over there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [lucene] eraneverlaw commented on issue #12561: UAX29URLEmailTokenizerImpl.jflex matches emails with commas and invalid periods in the local part

2023-09-19 Thread via GitHub
eraneverlaw commented on issue #12561: URL: https://github.com/apache/lucene/issues/12561#issuecomment-1726278027 > awesome find, wow, embedded sneaky range in the grammar :) > > Locally I modified the grammar per your suggestion and ran `gradle regenerate`, tests seemed happy. i want

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330549252 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330561055 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,29 @@ import java.util.Collection; import java.util.List; import java.util

[GitHub] [lucene] msokolov commented on pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-19 Thread via GitHub
msokolov commented on PR #12552: URL: https://github.com/apache/lucene/pull/12552#issuecomment-1726315384 Thanks for the CHANGES entry - I'll push shortly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [lucene] benwtrent merged pull request #12571: Fix HNSW graph reading with excessive connections

2023-09-19 Thread via GitHub
benwtrent merged PR #12571: URL: https://github.com/apache/lucene/pull/12571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[GitHub] [lucene] msokolov merged pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-19 Thread via GitHub
msokolov merged PR #12552: URL: https://github.com/apache/lucene/pull/12552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] jmazanec15 commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-19 Thread via GitHub
jmazanec15 commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1726544154 For dynamically changing M, I would be a little hesitant IMO. It could lead to inconsistent behavior, where some segments are searched very fast and others slow. If we were to su

[GitHub] [lucene] Tony-X commented on pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-19 Thread via GitHub
Tony-X commented on PR #12552: URL: https://github.com/apache/lucene/pull/12552#issuecomment-1726671056 Thanks @msokolov ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [lucene] gf2121 commented on a diff in pull request #12574: Make TaskExecutor public

2023-09-20 Thread via GitHub
gf2121 commented on code in PR #12574: URL: https://github.com/apache/lucene/pull/12574#discussion_r1331468750 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -85,11 +85,24 @@ final List invokeAll(Collection> tasks) throws IOException { return re

[GitHub] [lucene] gf2121 opened a new pull request, #12573: Speed up sort on deleted terms

2023-09-20 Thread via GitHub
gf2121 opened a new pull request, #12573: URL: https://github.com/apache/lucene/pull/12573 ### Description Recently, we captured a flame graph in a scene with frequent updates, which showed that sorting deleted terms occupied a high CPU ratio. Currently, we use JDK sort to sort

[GitHub] [lucene] shubhamvishu commented on pull request #12183: Make TermStates#build concurrent

2023-09-20 Thread via GitHub
shubhamvishu commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1727638896 > Hey @shubhamvishu heads up: I merged #12659 to address the deadlock issue and opened #12574 to adjust TaskExecutor visibility outside of this PR. Hopefully you are next going to b

[GitHub] [lucene] javanna commented on pull request #12183: Make TermStates#build concurrent

2023-09-20 Thread via GitHub
javanna commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1727419929 Hey @shubhamvishu heads up: I merged #12659 to address the deadlock issue and opened #12574 to adjust TaskExecutor visibility outside of this PR. Hopefully you are next going to be able

[GitHub] [lucene] benwtrent closed issue #12570: Reading after Segment Merge fails for HNSW

2023-09-20 Thread via GitHub
benwtrent closed issue #12570: Reading after Segment Merge fails for HNSW URL: https://github.com/apache/lucene/issues/12570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [lucene] javanna merged pull request #12574: Make TaskExecutor public

2023-09-20 Thread via GitHub
javanna merged PR #12574: URL: https://github.com/apache/lucene/pull/12574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz opened a new issue, #12572: Make IndexWriter#flushNextBuffer flush deletes too?

2023-09-20 Thread via GitHub
jpountz opened a new issue, #12572: URL: https://github.com/apache/lucene/issues/12572 ### Description `IndexWriter#flushNextBuffer()` is a convenient way to control indexing buffer sizes across multiple index writers. Unfortunately, it seems that it only ever flushes DWPTs, and neve

[GitHub] [lucene] javanna opened a new pull request, #12574: Make TaskExecutor public

2023-09-20 Thread via GitHub
javanna opened a new pull request, #12574: URL: https://github.com/apache/lucene/pull/12574 TaskExecutor is currently package private. We have scenarios where we want to parallelize the execution and reuse it outside of its package, hence this commit makes it public. Note that its co

[GitHub] [lucene] zhaih closed issue #9660: ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614]

2023-09-20 Thread via GitHub
zhaih closed issue #9660: ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614] URL: https://github.com/apache/lucene/issues/9660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] stefanvodita commented on issue #9660: ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614]

2023-09-20 Thread via GitHub
stefanvodita commented on issue #9660: URL: https://github.com/apache/lucene/issues/9660#issuecomment-1727196480 Yes, it's resolved. Thanks, Patrick! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] javanna commented on pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-20 Thread via GitHub
javanna commented on PR #12569: URL: https://github.com/apache/lucene/pull/12569#issuecomment-1727388144 > It might be worth using CallerRunsPolicy with a small queue in tests sometimes, as this is an interesting case that will make tasks run in the current thread. Given that TaskEx

[GitHub] [lucene] kaivalnp opened a new issue, #12575: Allow implementers of AbstractKnnVectorQuery to access final topK results?

2023-09-20 Thread via GitHub
kaivalnp opened a new issue, #12575: URL: https://github.com/apache/lucene/issues/12575 ### Context Vector search is performed in [`AbstractKnnVectorQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java), where

[GitHub] [lucene] javanna commented on a diff in pull request #12574: Make TaskExecutor public

2023-09-20 Thread via GitHub
javanna commented on code in PR #12574: URL: https://github.com/apache/lucene/pull/12574#discussion_r1331497358 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -85,11 +85,24 @@ final List invokeAll(Collection> tasks) throws IOException { return r

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-20 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1331259326 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,29 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-20 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1331266192 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] zhaih commented on issue #9660: ArrayIndexOutOfBoundsException in ByteBlockPool [LUCENE-8614]

2023-09-20 Thread via GitHub
zhaih commented on issue #9660: URL: https://github.com/apache/lucene/issues/9660#issuecomment-1727052393 @stefanvodita Seems the issue is resolved? I closed the issue, feel free to reopen it -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [lucene] javanna merged pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-20 Thread via GitHub
javanna merged PR #12569: URL: https://github.com/apache/lucene/pull/12569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] vsop-479 commented on pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-20 Thread via GitHub
vsop-479 commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1727215669 @jpountz Please take a look when you get a chance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] shubhamvishu commented on pull request #12183: Make TermStates#build concurrent

2023-09-20 Thread via GitHub
shubhamvishu commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1727782959 I have rebased the PR based on the changes in #12574. Could some please take a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [lucene] javanna commented on pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-20 Thread via GitHub
javanna commented on PR #12569: URL: https://github.com/apache/lucene/pull/12569#issuecomment-172724 I pushed new commits to address the latest review comments, thanks for all the input. This should be ready now. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [lucene] tylerbertrand opened a new pull request, #12577: Resolve CompileJava task cache miss

2023-09-20 Thread via GitHub
tylerbertrand opened a new pull request, #12577: URL: https://github.com/apache/lucene/pull/12577 ### Description Resolves `CompileJava` cache miss caused by `options.compilerArgs` input difference. Moved the `apijar` input file to a `CommandLineArgumentProvider` to apply rela

[GitHub] [lucene] javanna opened a new pull request, #12578: Deprecate IndexSearcher#getExecutor

2023-09-20 Thread via GitHub
javanna opened a new pull request, #12578: URL: https://github.com/apache/lucene/pull/12578 We have recently introduced a TaskExecutor abstraction, which is meant to be used to execute concurrent tasks using the executor provided to the IndexSearcher constructor. All concurrenct tasks shoul

[GitHub] [lucene] uschindler commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-20 Thread via GitHub
uschindler commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1331117098 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,29 @@ import java.util.Collection; import java.util.List; import java.util.O

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-20 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1331266192 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] kaivalnp opened a new issue, #12579: [DISCUSS] Should there be a threshold-based vector search API?

2023-09-20 Thread via GitHub
kaivalnp opened a new issue, #12579: URL: https://github.com/apache/lucene/issues/12579 ### Context Almost all [vector search algorithms](https://ann-benchmarks.com/index.html#algorithms) focus on getting the `topK` results for a given query vector. This however, may not be the best

[GitHub] [lucene] dweiss commented on a diff in pull request #12577: Resolve CompileJava task cache miss

2023-09-20 Thread via GitHub
dweiss commented on code in PR #12577: URL: https://github.com/apache/lucene/pull/12577#discussion_r1331954883 ## gradle/java/core-mrjar.gradle: ## @@ -29,20 +29,19 @@ configure(project(":lucene:core")) { dependencies.add("main${jdkVersion}Implementation", sourceSets.mai

[GitHub] [lucene] javanna commented on pull request #12578: Deprecate IndexSearcher#getExecutor

2023-09-20 Thread via GitHub
javanna commented on PR #12578: URL: https://github.com/apache/lucene/pull/12578#issuecomment-1728197364 > I assume you will remove the deprecated method in main branch and add an entry to the MIGRATE.txt there? Yes. but I'll open a PR against main to have people double check the chan

[GitHub] [lucene] tylerbertrand commented on a diff in pull request #12577: Resolve CompileJava task cache miss

2023-09-20 Thread via GitHub
tylerbertrand commented on code in PR #12577: URL: https://github.com/apache/lucene/pull/12577#discussion_r1332210752 ## gradle/java/core-mrjar.gradle: ## @@ -29,20 +29,19 @@ configure(project(":lucene:core")) { dependencies.add("main${jdkVersion}Implementation", sourceS

[GitHub] [lucene] tylerbertrand commented on a diff in pull request #12577: Resolve CompileJava task cache miss

2023-09-20 Thread via GitHub
tylerbertrand commented on code in PR #12577: URL: https://github.com/apache/lucene/pull/12577#discussion_r1332210752 ## gradle/java/core-mrjar.gradle: ## @@ -29,20 +29,19 @@ configure(project(":lucene:core")) { dependencies.add("main${jdkVersion}Implementation", sourceS

[GitHub] [lucene] javanna opened a new pull request, #12580: Remove deprecated IndexSearcher#getExecutor method

2023-09-21 Thread via GitHub
javanna opened a new pull request, #12580: URL: https://github.com/apache/lucene/pull/12580 Use getTaskExecutor instead. This is important to enforce tracking of tasks that run in each thread. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [lucene] dweiss commented on a diff in pull request #12577: Resolve CompileJava task cache miss

2023-09-21 Thread via GitHub
dweiss commented on code in PR #12577: URL: https://github.com/apache/lucene/pull/12577#discussion_r1332621322 ## gradle/java/core-mrjar.gradle: ## @@ -29,20 +29,19 @@ configure(project(":lucene:core")) { dependencies.add("main${jdkVersion}Implementation", sourceSets.mai

[GitHub] [lucene] javanna merged pull request #12578: Deprecate IndexSearcher#getExecutor

2023-09-21 Thread via GitHub
javanna merged PR #12578: URL: https://github.com/apache/lucene/pull/12578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] dweiss merged pull request #12577: Resolve CompileJava task cache miss

2023-09-21 Thread via GitHub
dweiss merged PR #12577: URL: https://github.com/apache/lucene/pull/12577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] iverase opened a new pull request, #12581: Allow reading / writing binary stored values as DataInput

2023-09-21 Thread via GitHub
iverase opened a new pull request, #12581: URL: https://github.com/apache/lucene/pull/12581 Currently, the only way to handle binary data on stored fields is via byte arrays (wrapped as BytesRef). THis means we are allocating a new byte array everytime we read the value which is wasteful an

[GitHub] [lucene] javanna merged pull request #12580: Remove deprecated IndexSearcher#getExecutor method

2023-09-21 Thread via GitHub
javanna merged PR #12580: URL: https://github.com/apache/lucene/pull/12580 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
jpountz commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1332900264 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -86,19 +90,48 @@ public TermStates( * @param needsStats if {@code true} then all leaf contex

[GitHub] [lucene] jpountz commented on a diff in pull request #12573: Use radix sort to speed up the sorting of deleted terms

2023-09-21 Thread via GitHub
jpountz commented on code in PR #12573: URL: https://github.com/apache/lucene/pull/12573#discussion_r1332932759 ## lucene/core/src/java/org/apache/lucene/index/BufferedUpdates.java: ## @@ -197,6 +183,160 @@ boolean any() { @Override public long ramBytesUsed() { -retu

[GitHub] [lucene] uschindler commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
uschindler commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1332936743 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -211,4 +244,40 @@ public String toString() { return sb.toString(); } + + /** Wrapp

[GitHub] [lucene] jpountz commented on pull request #12564: Window-at-a-time scoring for conjunctions.

2023-09-21 Thread via GitHub
jpountz commented on PR #12564: URL: https://github.com/apache/lucene/pull/12564#issuecomment-1729584972 I'll look into merging #12382 first, which does something in-between what we have on `main` and this PR, this will help better understand the performance impact of each bit of the change

[GitHub] [lucene] jpountz commented on pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-21 Thread via GitHub
jpountz commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1729618165 Reopening as I'm now seeing speedups. It's possible it's related to other changes that happened since last time I looked, or to the specific tasks that get picked by luceneutil. Here's t

[GitHub] [lucene] gf2121 commented on a diff in pull request #12573: Use radix sort to speed up the sorting of deleted terms

2023-09-21 Thread via GitHub
gf2121 commented on code in PR #12573: URL: https://github.com/apache/lucene/pull/12573#discussion_r1333106578 ## lucene/core/src/java/org/apache/lucene/index/BufferedUpdates.java: ## @@ -197,6 +183,160 @@ boolean any() { @Override public long ramBytesUsed() { -retur

[GitHub] [lucene] gf2121 commented on a diff in pull request #12573: Use radix sort to speed up the sorting of deleted terms

2023-09-21 Thread via GitHub
gf2121 commented on code in PR #12573: URL: https://github.com/apache/lucene/pull/12573#discussion_r1333107412 ## lucene/core/src/java/org/apache/lucene/index/BufferedUpdates.java: ## @@ -139,15 +131,11 @@ public void addTerm(Term term, int docIDUpto) { return; } -

[GitHub] [lucene] benwtrent commented on a diff in pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-21 Thread via GitHub
benwtrent commented on code in PR #12382: URL: https://github.com/apache/lucene/pull/12382#discussion_r1333097677 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [lucene] jpountz commented on a diff in pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-21 Thread via GitHub
jpountz commented on code in PR #12382: URL: https://github.com/apache/lucene/pull/12382#discussion_r1333138769 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [lucene] jpountz commented on a diff in pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-21 Thread via GitHub
jpountz commented on code in PR #12382: URL: https://github.com/apache/lucene/pull/12382#discussion_r1333140455 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1333143247 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -86,19 +90,48 @@ public TermStates( * @param needsStats if {@code true} then all leaf c

[GitHub] [lucene] jpountz commented on a diff in pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-21 Thread via GitHub
jpountz commented on code in PR #12382: URL: https://github.com/apache/lucene/pull/12382#discussion_r1333154945 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -250,31 +254,108 @@ BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOExcep

[GitHub] [lucene] jpountz commented on pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-21 Thread via GitHub
jpountz commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1729707958 > The numbers are impressive I agree... it makes me suspicous. I'll verify my `localrun.py` and run with higher `taskCountPerCat` and `taskRepeatCount` to see if I'm still getting

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1333163431 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -211,4 +244,40 @@ public String toString() { return sb.toString(); } + + /** Wra

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1333163431 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -211,4 +244,40 @@ public String toString() { return sb.toString(); } + + /** Wra

[GitHub] [lucene] benwtrent commented on pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-21 Thread via GitHub
benwtrent commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1729731542 > I agree... it makes me suspicous. I'll verify my localrun.py and run with higher taskCountPerCat and taskRepeatCount to see if I'm still getting such good results. I doubt thi

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1333163431 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -211,4 +244,40 @@ public String toString() { return sb.toString(); } + + /** Wra

[GitHub] [lucene] uschindler commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
uschindler commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1333200020 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -211,4 +244,40 @@ public String toString() { return sb.toString(); } + + /** Wrapp

[GitHub] [lucene] romseygeek commented on a diff in pull request #12581: Allow reading / writing binary stored fields as DataInput

2023-09-21 Thread via GitHub
romseygeek commented on code in PR #12581: URL: https://github.com/apache/lucene/pull/12581#discussion_r1333208421 ## lucene/core/src/java/org/apache/lucene/codecs/StoredFieldsWriter.java: ## @@ -182,6 +190,11 @@ public MergeVisitor(MergeState mergeState, int readerIndex) {

[GitHub] [lucene] jpountz commented on pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-21 Thread via GitHub
jpountz commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1729775625 Running with taskCountPerCat=5 and taskRepeatCount=50 gave very similar results: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

[GitHub] [lucene] iverase commented on a diff in pull request #12581: Allow reading / writing binary stored fields as DataInput

2023-09-21 Thread via GitHub
iverase commented on code in PR #12581: URL: https://github.com/apache/lucene/pull/12581#discussion_r1333235616 ## lucene/core/src/java/org/apache/lucene/codecs/StoredFieldsWriter.java: ## @@ -182,6 +190,11 @@ public MergeVisitor(MergeState mergeState, int readerIndex) {

[GitHub] [lucene] jpountz merged pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
jpountz merged PR #12183: URL: https://github.com/apache/lucene/pull/12183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-21 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1333269792 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -211,4 +244,40 @@ public String toString() { return sb.toString(); } + + /** Wra

[GitHub] [lucene] s1monw commented on issue #12572: Make IndexWriter#flushNextBuffer flush deletes too?

2023-09-21 Thread via GitHub
s1monw commented on issue #12572: URL: https://github.com/apache/lucene/issues/12572#issuecomment-1729849466 I think this makes sense to me to also flush deletes if necessary. I think this is as simple as calling `applyAllDeletes()` in `flushOneDWPT()` but we need to process events in IW a

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-21 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r118922 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java: ## @@ -1305,6 +1305,7 @@ private void writeLeafBlockPackedValues( } if (lowCardinality

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-21 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r123249 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -317,6 +329,18 @@ default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOE

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-21 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r123249 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -317,6 +329,18 @@ default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOE

[GitHub] [lucene] iverase commented on a diff in pull request #12581: Allow reading / writing binary stored fields as DataInput

2023-09-21 Thread via GitHub
iverase commented on code in PR #12581: URL: https://github.com/apache/lucene/pull/12581#discussion_r1333415368 ## lucene/core/src/java/org/apache/lucene/codecs/StoredFieldsWriter.java: ## @@ -182,6 +190,11 @@ public MergeVisitor(MergeState mergeState, int readerIndex) {

[GitHub] [lucene] benwtrent opened a new pull request, #12582: Add new int8 scalar quantization to HNSW codec

2023-09-21 Thread via GitHub
benwtrent opened a new pull request, #12582: URL: https://github.com/apache/lucene/pull/12582 As with most codec changes, this is an eye popping number of LoC and the design isn't finished yet. I am opening this as draft to be open about the work and to discuss further direction.

[GitHub] [lucene] vsop-479 commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-21 Thread via GitHub
vsop-479 commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1333808257 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java: ## @@ -1305,6 +1305,7 @@ private void writeLeafBlockPackedValues( } if (lowCardinalit

[GitHub] [lucene] rmuir opened a new pull request, #12583: Fix hidden range embedded in UAX29URLEmail grammar

2023-09-21 Thread via GitHub
rmuir opened a new pull request, #12583: URL: https://github.com/apache/lucene/pull/12583 TLDR: The dash needed escaping! See #12561 for an explanation of the issue. Actually there were TODOs already in existing tests. Closes #12561 -- This is an automated message from

[GitHub] [lucene] rmuir commented on issue #12561: UAX29URLEmailTokenizerImpl.jflex matches emails with commas and invalid periods in the local part

2023-09-21 Thread via GitHub
rmuir commented on issue #12561: URL: https://github.com/apache/lucene/issues/12561#issuecomment-1730755572 @eraneverlaw sorry for the delay again, see PR. I added a simple test but there was already funkiness shown in existing tests and a TODO to figure out the comma. There are more TODOs

[GitHub] [lucene] gf2121 commented on pull request #12573: Use radix sort to speed up the sorting of deleted terms

2023-09-21 Thread via GitHub
gf2121 commented on PR #12573: URL: https://github.com/apache/lucene/pull/12573#issuecomment-1730825155 Thanks @jpountz ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [lucene] gf2121 merged pull request #12573: Use radix sort to speed up the sorting of deleted terms

2023-09-21 Thread via GitHub
gf2121 merged PR #12573: URL: https://github.com/apache/lucene/pull/12573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] gf2121 opened a new pull request, #12584: Use radix sort to speed up the sorting of deleted terms (Backport 9x)

2023-09-21 Thread via GitHub
gf2121 opened a new pull request, #12584: URL: https://github.com/apache/lucene/pull/12584 Backport of https://github.com/apache/lucene/pull/12573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] gf2121 merged pull request #12584: Use radix sort to speed up the sorting of deleted terms (Backport 9x)

2023-09-21 Thread via GitHub
gf2121 merged PR #12584: URL: https://github.com/apache/lucene/pull/12584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] eraneverlaw commented on a diff in pull request #12583: Fix hidden range embedded in UAX29URLEmail grammar

2023-09-21 Thread via GitHub
eraneverlaw commented on code in PR #12583: URL: https://github.com/apache/lucene/pull/12583#discussion_r1333932344 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/email/TestUAX29URLEmailAnalyzer.java: ## @@ -433,9 +433,9 @@ public void testMailtoSchemeEmails() thr

[GitHub] [lucene] javanna commented on pull request #12183: Make TermStates#build concurrent

2023-09-22 Thread via GitHub
javanna commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1730930146 Great to see this merged, thanks @shubhamvishu for all the work as well as patience as we were figuring out a way forward! -- This is an automated message from the Apache Git Service.

[GitHub] [lucene] vsop-479 commented on pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
vsop-479 commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1730932421 @iverase I replaced int values with static variables. Please take a look. Actually, i used enum to define the match states in pre version. but it downgraded the performance a little.

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1334028608 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -228,6 +228,22 @@ public enum Relation { CELL_CROSSES_QUERY }; + /** Math states for

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1334030455 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -281,6 +297,12 @@ public interface PointTree extends Cloneable { * @lucene.experimental

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1334077045 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -317,6 +329,18 @@ default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOE

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1334077045 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -317,6 +329,18 @@ default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOE

[GitHub] [lucene] jpountz commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-09-22 Thread via GitHub
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1731162359 > Maybe we should add OrHighVeryLow to nightly benchy too? @mikemccand I started looking into this, but my enwiki (`enwiki-20120502-lines-with-random-label.txt`) seems to have sli

[GitHub] [lucene] rmuir commented on a diff in pull request #12583: Fix hidden range embedded in UAX29URLEmail grammar

2023-09-22 Thread via GitHub
rmuir commented on code in PR #12583: URL: https://github.com/apache/lucene/pull/12583#discussion_r1334328967 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/email/TestUAX29URLEmailAnalyzer.java: ## @@ -433,9 +433,9 @@ public void testMailtoSchemeEmails() throws Ex

[GitHub] [lucene] jpountz commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
jpountz commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334309792 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/QuantizedVectorsWriter.java: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [lucene] benwtrent commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334388920 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,851 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [lucene] benwtrent commented on pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
benwtrent commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1731463225 > Do we know why search is faster? Is it mostly because working on the quantized vectors requires a lower memory bandwi[d]th? Search is faster in two regards: - PanamaVec

[GitHub] [lucene] uschindler commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
uschindler commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334448931 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] uschindler commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
uschindler commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334450128 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] tveasey commented on pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
tveasey commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1731530040 > @tveasey helped me do some empirical analysis here and can provide some numbers. So the rationale is quite simple as Ben said. If you change the upper and lower quantiles very l

[GitHub] [lucene] rmuir commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
rmuir commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334477512 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [lucene] benwtrent commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334508429 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] easyice commented on pull request #12557: Improve refresh speed with softdelete enable

2023-09-22 Thread via GitHub
easyice commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1731767546 Update: when we call `softUpdateDocument` for a segment that already has some deleted doc, it will iterate all the deleted doc use `ReadersAndUpdates#MergedDocValues#onDiskDocValu

[GitHub] [lucene] gsmiller commented on pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-22 Thread via GitHub
gsmiller commented on PR #12560: URL: https://github.com/apache/lucene/pull/12560#issuecomment-1731814078 Circling back on this: For Amazon's Product Search engine, we make fairly heavy use of these expression implementations. I pulled this change into our Lucene fork early (currently on 9.

<    1   2   3   4   5   6   7   8   9   10   >