[GitHub] [lucene] jpountz merged pull request #12568: Fix issues with BP tests and the security manager.

2023-09-18 Thread via GitHub
jpountz merged PR #12568: URL: https://github.com/apache/lucene/pull/12568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on pull request #12568: Fix issues with BP tests and the security manager.

2023-09-18 Thread via GitHub
jpountz commented on PR #12568: URL: https://github.com/apache/lucene/pull/12568#issuecomment-1724930942 > Did you test with -Dtests.directory=MMapDirectory? I did. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [lucene] epotyom closed pull request #12445: Expression: add a set of duplicate variables

2023-09-18 Thread via GitHub
epotyom closed pull request #12445: Expression: add a set of duplicate variables URL: https://github.com/apache/lucene/pull/12445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] epotyom commented on pull request #12445: Expression: add a set of duplicate variables

2023-09-18 Thread via GitHub
epotyom commented on PR #12445: URL: https://github.com/apache/lucene/pull/12445#issuecomment-1724861051 @gsmiller , very nice improvement! Yes, this change is obsolete now as we only need to worry about variables reused across expressions, but not within one expression. I'll cancel this PR

[GitHub] [lucene] benwtrent commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724675517 @jmazanec15 I just noticed something in my testing as a tripled checked everything. `'maxConn': (96,),` was left over from a previous test. So, a max of 192 is perfectly fi

[GitHub] [lucene] benwtrent commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724671063 > What did you mean by merging and then force merging? Do you have the luceneutil params you ran with for replication? I indexed 50 cohere 768 vectors with a buffer of 1

[GitHub] [lucene] jmazanec15 commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
jmazanec15 commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724649688 Ill take a look too @benwtrent. 192 seems to be a lot. From a high level scan of the HNSWGraphBuilder, I do not see anything that obviously causes neighbors to go above M/2M. The

[GitHub] [lucene] Tony-X commented on issue #12513: Try out a tantivy's term dictionary format

2023-09-18 Thread via GitHub
Tony-X commented on issue #12513: URL: https://github.com/apache/lucene/issues/12513#issuecomment-1724547262 I've been designing how to possibly account for the optional states that each term may end up with. Namely how to deal with the following: * if a term has singleton docid * if

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329314180 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## Review Comment: Does it make sense to move `TaskExecutor` under `org.apache.lucene.uti

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329311171 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1329306960 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,4 +68,46 @@ final List invokeAll(Collection> tasks) throws IOExcept } re

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1329299924 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -26,17 +26,21 @@ import java.util.concurrent.Executor; import java.util.concurrent.Fut

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329299492 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -48,10 +58,17 @@ class TaskExecutor { * @return a list containing the results from t

[GitHub] [lucene] benwtrent commented on pull request #12571: Fix HNSW graph reading with excessive connections

2023-09-18 Thread via GitHub
benwtrent commented on PR #12571: URL: https://github.com/apache/lucene/pull/12571#issuecomment-1724430058 I think this is a two part fix. One, to make sure if users hit this that their graph is readable (this PR) and two, get the graph builder to output reasonable connection numbers again.

[GitHub] [lucene] benwtrent opened a new pull request, #12571: Fix HNSW graph reading with excessive connections

2023-09-18 Thread via GitHub
benwtrent opened a new pull request, #12571: URL: https://github.com/apache/lucene/pull/12571 When re-using the HNSW graph during segment merges, it is possible that more than the configured `M*2` connections could be made per vector. In those instances, we should allow the graph to s

[GitHub] [lucene] benwtrent commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724413959 OK, debugging some more, looks like merging multiple segments and then force merging with Lucene Util 100% replicates this in 9.6. The connection count on force merge increa

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329276879 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util

[GitHub] [lucene] benwtrent commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724341985 OK, this bug has been around since 9.6. I tried to replicate in 9.5 and it didn't. So, I am guessing something around using an existing graph to merge ends up creating mor

[GitHub] [lucene] benwtrent opened a new issue, #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent opened a new issue, #12570: URL: https://github.com/apache/lucene/issues/12570 ### Description While testing with Lucene Util, I ran 50 vectors through current main. To test merging, I decreased the rambuffersize to 128MB, indexed, and then force-merged. When mer

[GitHub] [lucene] javanna commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-18 Thread via GitHub
javanna commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1328936157 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,4 +68,46 @@ final List invokeAll(Collection> tasks) throws IOExcept } return

[GitHub] [lucene] javanna commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-18 Thread via GitHub
javanna commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1328932010 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -86,19 +93,40 @@ public TermStates( * @param needsStats if {@code true} then all leaf contex

[GitHub] [lucene] javanna opened a new pull request, #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
javanna opened a new pull request, #12569: URL: https://github.com/apache/lucene/pull/12569 Concurrent search is currently applied once per search call, either when search is called, or when concurrent query rewrite happens. They generally don't happen within one another. There are situatio

[GitHub] [lucene] gsmiller commented on pull request #12445: Expression: add a set of duplicate variables

2023-09-18 Thread via GitHub
gsmiller commented on PR #12445: URL: https://github.com/apache/lucene/pull/12445#issuecomment-1724108149 Thanks for the idea @epotyom! I wonder if this is relevant after #12560 was merged? With that change, we now only ever evaluate any given expression argument once (within an expression)

[GitHub] [lucene] jpountz opened a new pull request, #12568: Fix issues with BP tests and the security manager.

2023-09-18 Thread via GitHub
jpountz opened a new pull request, #12568: URL: https://github.com/apache/lucene/pull/12568 The default ForkJoinPool implementation uses a thread factory that removes all permissions on threads, so we need to create our own to avoid tests failing with FS-based directories. -- This is

[GitHub] [lucene] uschindler commented on a diff in pull request #12489: Add support for recursive graph bisection.

2023-09-18 Thread via GitHub
uschindler commented on code in PR #12489: URL: https://github.com/apache/lucene/pull/12489#discussion_r1328974164 ## lucene/misc/src/test/org/apache/lucene/misc/index/TestBPIndexReorderer.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] dweiss closed issue #12565: Omit -Ptests.haltonfailure=false in failed test repro line

2023-09-18 Thread via GitHub
dweiss closed issue #12565: Omit -Ptests.haltonfailure=false in failed test repro line URL: https://github.com/apache/lucene/issues/12565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] dweiss merged pull request #12566: Omit -Ptests.haltonfailure=false in failed test repro line #12565

2023-09-18 Thread via GitHub
dweiss merged PR #12566: URL: https://github.com/apache/lucene/pull/12566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dweiss closed issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-18 Thread via GitHub
dweiss closed issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error URL: https://github.com/apache/lucene/issues/12562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [lucene] dweiss merged pull request #12567: TestPassageSelector.randomizedSanityCheck can fail when the random input clashes with an assertion

2023-09-18 Thread via GitHub
dweiss merged PR #12567: URL: https://github.com/apache/lucene/pull/12567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dweiss commented on issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-18 Thread via GitHub
dweiss commented on issue #12562: URL: https://github.com/apache/lucene/issues/12562#issuecomment-1723003992 The issue is caused by random input clashing with test assertion, not a problem. I've filed PRs for both problems. -- This is an automated message from the Apache Git Service. To r

[GitHub] [lucene] dweiss opened a new pull request, #12567: TestPassageSelector.randomizedSanityCheck can fail when the random input clashes with an assertion

2023-09-18 Thread via GitHub
dweiss opened a new pull request, #12567: URL: https://github.com/apache/lucene/pull/12567 This is a test-assumption error only. As per issue: https://github.com/apache/lucene/issues/12562 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [lucene] dweiss opened a new pull request, #12566: Omit -Ptests.haltonfailure=false in failed test repro line #12565

2023-09-18 Thread via GitHub
dweiss opened a new pull request, #12566: URL: https://github.com/apache/lucene/pull/12566 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

[GitHub] [lucene] dweiss opened a new issue, #12565: Omit -Ptests.haltonfailure=false in failed test repro line

2023-09-18 Thread via GitHub
dweiss opened a new issue, #12565: URL: https://github.com/apache/lucene/issues/12565 ### Description When a test fails in jenkins, it reports the gradle's build step as "build success" in the log file, even though some tests have failed. It's done so that all of the build runs until

[GitHub] [lucene] dweiss commented on issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-18 Thread via GitHub
dweiss commented on issue #12562: URL: https://github.com/apache/lucene/issues/12562#issuecomment-1722957920 The "build success" here is caused by an explicit change made by Uwe, here: https://github.com/apache/lucene/issues/10513#issuecomment-1224072458 In his own words: The t

[GitHub] [lucene] jpountz commented on pull request #12564: Window-at-a-time scoring for conjunctions.

2023-09-18 Thread via GitHub
jpountz commented on PR #12564: URL: https://github.com/apache/lucene/pull/12564#issuecomment-1722938506 This gives a major speedup in wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

[GitHub] [lucene] jpountz opened a new pull request, #12564: Window-at-a-time scoring for conjunctions.

2023-09-18 Thread via GitHub
jpountz opened a new pull request, #12564: URL: https://github.com/apache/lucene/pull/12564 This adds a bulk scorer for conjunctions that follows a similar idea as BS1: it is probably possible to make evaluation faster by evaluating windows of documents at a time instead of a single documen