Re: [PR] Remove halt() call in TestSimpleServer (part of TestStressNRTReplication [lucene]

2024-03-13 Thread via GitHub
dweiss merged PR #13177: URL: https://github.com/apache/lucene/pull/13177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Support disabling IndexSearcher.maxClauseCount with a value of -1 [lucene]

2024-03-13 Thread via GitHub
jpountz commented on PR #13178: URL: https://github.com/apache/lucene/pull/13178#issuecomment-1993797886 I'm curious what problem you are trying to address. It looks like you're trying to avoid the overhead of checking the number of clauses, but intuitively this wouldn't help much as we hav

Re: [PR] Make BP work on indexes that have blocks. [lucene]

2024-03-13 Thread via GitHub
jpountz merged PR #13125: URL: https://github.com/apache/lucene/pull/13125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[I] Improve Lucene's I/O concurrency [lucene]

2024-03-13 Thread via GitHub
jpountz opened a new issue, #13179: URL: https://github.com/apache/lucene/issues/13179 ### Description Currently, Lucene's I/O concurrency is bound by the search concurrency. If `IndexSearcher` runs on N threads, then Lucene will never perform more than N I/Os concurrently. Unless yo

Re: [PR] gh-13147: use dense bit-encoding for frequent terms [lucene]

2024-03-13 Thread via GitHub
jpountz commented on PR #13153: URL: https://github.com/apache/lucene/pull/13153#issuecomment-1994160216 Have you seen interesting performance numbers with this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-13 Thread via GitHub
benwtrent commented on code in PR #13124: URL: https://github.com/apache/lucene/pull/13124#discussion_r1523142360 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -152,7 +154,25 @@ public Lucene99HnswVectorsFormat() { * @param b

Re: [PR] Introduce IORunnable to fix failure in TestIndexWriterOnDiskFull.testAddIndexOnDiskFull [lucene]

2024-03-13 Thread via GitHub
easyice commented on PR #13172: URL: https://github.com/apache/lucene/pull/13172#issuecomment-1994345997 Thanks @dweiss , I found a seed `837FF885325AC743` that can reproduce this failure on the main branch, but it's not stable, it may fail once in about 10 runs. with this patch, I ran it 1

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-13 Thread via GitHub
benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1994372130 @zhaih @jpountz I am going to create a separate issue around making HNSW worker slicing automatic. It will require a bunch of its own benchmarking and work and honestly seems orthogona

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-13 Thread via GitHub
jpountz commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1994373820 Thanks for sharing performance numbers @benwtrent, very interesting. Also double checking if you saw my above comment: https://github.com/apache/lucene/pull/13124#pullrequestreview-19304

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-13 Thread via GitHub
benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1994399018 @jpountz would you prefer something like the original patch from @dweiss ? I can submit the merging actions independently to the intra-merge executor. Anything more (like figurin

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-13 Thread via GitHub
jpountz commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1994458100 Yes exactly, something very simple, mostly to exercise intra-merge concurrency with more than just vectors. -- This is an automated message from the Apache Git Service. To respond to t

[PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-03-13 Thread via GitHub
benwtrent opened a new pull request, #13181: URL: https://github.com/apache/lucene/pull/13181 With quantized vectors, and with current vectors, we separate out the "scoring" vs. "iteration", requiring the user to always iterate the raw vectors and provide their own similarity function.

Re: [I] Regarding the frequency used for scoring sloppy phrase queries. [lucene]

2024-03-13 Thread via GitHub
jpountz commented on issue #13152: URL: https://github.com/apache/lucene/issues/13152#issuecomment-1994488165 I agree that the penalty feels too high. We have challenges with queries like this one because there is no good theoretical basis for the right way to score such queries (at least t

Re: [PR] Use group-varint encode the positions [lucene]

2024-03-13 Thread via GitHub
jpountz commented on PR #12842: URL: https://github.com/apache/lucene/pull/12842#issuecomment-1994545075 It looks like `writeGroupVInt` has room for improvement. Can we improve it by making it look a bit more like the read logic? -- This is an automated message from the Apache Git Service

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-13 Thread via GitHub
jpountz commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-1994666062 Would we be subject to the same issue if/when 3+ different implementations of `DocIdSetIterator` get used in `IntersectVisitor#visit`? -- This is an automated message from the Apache G

[I] Making vector comparisons pluggable [lucene]

2024-03-13 Thread via GitHub
benwtrent opened a new issue, #13182: URL: https://github.com/apache/lucene/issues/13182 ### Description Opening an issue to continue discussion originating here: https://github.com/apache/lucene/pull/13076#issuecomment-1930363479 Making vector similarities pluggable via SP

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-13 Thread via GitHub
benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1994921965 > Yes exactly, something very simple, mostly to exercise intra-merge concurrency with more than just vectors. Latest commit adds `TaskExecutor` actions to merge to allow differen

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-13 Thread via GitHub
zhaih commented on code in PR #13124: URL: https://github.com/apache/lucene/pull/13124#discussion_r1523620861 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -152,7 +154,25 @@ public Lucene99HnswVectorsFormat() { * @param beamW

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-13 Thread via GitHub
antonha commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-1995011184 > Would we be subject to the same issue if/when 3+ different implementations of `DocIdSetIterator` get used in `IntersectVisitor#visit`? Yes. Your question makes me think th

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-13 Thread via GitHub
jpountz commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-1995156682 Relatedly indeed, I was wondering if the API should expose an IntsRef or something like that, so that there is a single virtual call per block of doc IDs anyway (IntsRef cannot be extend

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-13 Thread via GitHub
benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1995183835 @jpountz ok, that naive attempt failed as norms & terms apparently need to be merged in order (one or the other would fail due to missing files...). I am not sure if this is tru

Re: [PR] Change BP reordering logic to help support document blocks later on. [lucene]

2024-03-13 Thread via GitHub
rishabhmaurya commented on code in PR #13123: URL: https://github.com/apache/lucene/pull/13123#discussion_r1523704731 ## lucene/misc/src/java/org/apache/lucene/misc/index/BPIndexReorderer.java: ## @@ -341,116 +344,94 @@ protected void compute() { */ private boolean sh

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-13 Thread via GitHub
antonha commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-1996041622 @jpountz I had a quick look at the code, and it seems to me like there are, with this PR, only two implementations used for the DISI used for the `IntersectVisitor#visit` method - which

Re: [PR] Remove halt() call in TestSimpleServer (part of TestStressNRTReplication [lucene]

2024-03-13 Thread via GitHub
rmuir commented on PR #13177: URL: https://github.com/apache/lucene/pull/13177#issuecomment-1996150774 Thanks for doing this. I was disappointed that java doesn't allow this: ``` jshell> ProcessHandle.current().destroyForcibly() | Exception java.lang.IllegalStateException: destroy o

Re: [PR] Replace Collections.synchronizedSet() with ConcurrentHashMap.newKeySet() [lucene]

2024-03-13 Thread via GitHub
github-actions[bot] commented on PR #13142: URL: https://github.com/apache/lucene/pull/13142#issuecomment-1996171442 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

2024-03-13 Thread via GitHub
github-actions[bot] commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1996171619 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi