Vikasht34 commented on issue #14208:
URL: https://github.com/apache/lucene/issues/14208#issuecomment-2642077599
Hierarchical Merge Execution (Layer-by-Layer Merging): Instead of merging
all HNSW layers at once, which leads to high peak memory usage, merges can be
executed incrementally, lay
dsmiley commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2641899421
Addressing this need would be amazing! Many search architectures (including
where I work) always filter to a specific field (say a doc type or tenant/user;
it depends). That 50-60
rmuir commented on code in PR #14207:
URL: https://github.com/apache/lucene/pull/14207#discussion_r1945858708
##
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java:
##
@@ -1052,6 +1052,77 @@ public static Automaton removeDeadStates(Automaton a) {
return r
gsmiller commented on code in PR #14204:
URL: https://github.com/apache/lucene/pull/14204#discussion_r1945846299
##
lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java:
##
@@ -0,0 +1,252 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
rmuir commented on code in PR #14192:
URL: https://github.com/apache/lucene/pull/14192#discussion_r1945855034
##
lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java:
##
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
msfroh commented on code in PR #14198:
URL: https://github.com/apache/lucene/pull/14198#discussion_r1945713513
##
lucene/analysis/opennlp/build.gradle:
##
@@ -26,3 +26,33 @@ dependencies {
moduleTestImplementation project(':lucene:test-framework')
}
+
+ext {
+ testModelDa
msfroh commented on code in PR #14198:
URL: https://github.com/apache/lucene/pull/14198#discussion_r1945585486
##
lucene/analysis/opennlp/build.gradle:
##
@@ -26,3 +26,33 @@ dependencies {
moduleTestImplementation project(':lucene:test-framework')
}
+
+ext {
+ testModelDa
john-wagster commented on PR #14192:
URL: https://github.com/apache/lucene/pull/14192#issuecomment-2641159156
Iterated here a bit after the changes in
https://github.com/apache/lucene/pull/14193 went in and also pivoted to using
https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt. I
Tim-Brooks opened a new pull request, #14213:
URL: https://github.com/apache/lucene/pull/14213
Allows a StoredField to be created from a DataInput.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
Tim-Brooks commented on PR #14213:
URL: https://github.com/apache/lucene/pull/14213#issuecomment-2640892230
I am opening this proposed change to support writing a stored field from a
byte source which does not require a contiguous array allocation. The reason I
am proposing this is because
mkhludnev commented on PR #13974:
URL: https://github.com/apache/lucene/pull/13974#issuecomment-2640847271
Thanks. I'm happy to hear. Here's what I have to work on:
- @gsmiller what's your feeling about the [proposed
API](https://github.com/apache/lucene/blob/c56caeb26a5af4b0afc5f2cb04a4f
rmuir commented on issue #14211:
URL: https://github.com/apache/lucene/issues/14211#issuecomment-2640743462
PR: https://github.com/apache/lucene/pull/14212
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
chatman commented on code in PR #14131:
URL: https://github.com/apache/lucene/pull/14131#discussion_r1945235820
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/vectorsearch/CuVSKnnFloatVectorQuery.java:
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation
iverase commented on PR #14204:
URL: https://github.com/apache/lucene/pull/14204#issuecomment-2640605039
>So I guess that the only option is to fail at runtime. I can do that. What
looks like a reasonable cap on the number of returned intervals? 1024?
1024 sounds a good default and ma
rmuir commented on issue #14211:
URL: https://github.com/apache/lucene/issues/14211#issuecomment-2640601140
I understand it now: the problem is the Set of new initialStates being
populated as a side-effect, which is what brzozowski is using. some of those
are dead states, we remove them, bu
[
https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated LUCENE-10471:
Labels: pull-request-available (was: )
> Increase the number of dims for KNN vectors to
jzwolak commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-2640596317
@asfimport
I suggest making it easier to change the limit. I appreciate the need to
have a limit for optimization and performance. The models that have more than
1024 dimensi
rmuir commented on issue #14211:
URL: https://github.com/apache/lucene/issues/14211#issuecomment-2640589820
We have one use of reverse() outside of tests: but it is an important one
(getCommonSuffix) and it already cleans up after reverse by calling
removeDeadStates(). So to me, the obvious
benwtrent commented on PR #14160:
URL: https://github.com/apache/lucene/pull/14160#issuecomment-2640589065
OK, the current implementation is about as good as I can figure it.
- We explore greater than neighbor-neighbors if we gathered < maxConn/4
vectors to score
- We will explor
rmuir commented on PR #14209:
URL: https://github.com/apache/lucene/pull/14209#issuecomment-2640553292
To me the deprecation is easy enough, developer is usually responsive to
such things.
It comes across different than just a hard break in a few ways, usually you
question why the p
jpountz commented on PR #14204:
URL: https://github.com/apache/lucene/pull/14204#issuecomment-2640553823
I agree that providing a small interval is a bad usage pattern. I don't know
how to validate this though, since we can't know the range of values of the
docs that match the query up-fron
rmuir opened a new issue, #14211:
URL: https://github.com/apache/lucene/issues/14211
### Description
Operations.reverse() doesn't just add dead-states, it adds dead-states with
nondeterminism, such that returned automaton `isDetermistic()` becomes `false`.
That's really a bit more th
rmuir commented on issue #14210:
URL: https://github.com/apache/lucene/issues/14210#issuecomment-2640522300
I pushed fix to correct the build. but I will add more tracing to this seed,
to see who is adding the "nondetermistic dead states".
if it is an automaton method, we need to fix
asfgit closed issue #14210: flaky test: concatenate turns NFA into a DFA and it
causes test fail
URL: https://github.com/apache/lucene/issues/14210
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
rmuir commented on issue #14210:
URL: https://github.com/apache/lucene/issues/14210#issuecomment-2640481736
If i comment out `removeDeadStates()` then the test passes.
If I add the `removeDeadStates()` to the test before doing any concatenate,
the test passes.
The problem happens b
rmuir commented on issue #14210:
URL: https://github.com/apache/lucene/issues/14210#issuecomment-2640461459
And of course the automaton is massive, nightmare:

--
This is an automated message f
rmuir commented on PR #14209:
URL: https://github.com/apache/lucene/pull/14209#issuecomment-2640417498
https://github.com/apache/lucene/issues/14210
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
rmuir commented on PR #14209:
URL: https://github.com/apache/lucene/pull/14209#issuecomment-2640379859
The problem is not related to this PR from what I can tell, the issue is
that `concatenate()` turns an NFA into a DFA and this fails the test.
I can reproduce it in main: I will deal
rmuir commented on PR #14209:
URL: https://github.com/apache/lucene/pull/14209#issuecomment-2640351185
I will look into the test fail, I did run them many times, but I was also
pretty aggressive about trying to cleanup tests, so that we can just remove the
deprecations in a followup commit,
dweiss commented on PR #14193:
URL: https://github.com/apache/lucene/pull/14193#issuecomment-2640330745
> Finally! The concatenate() issue was an easy fix, it neglected to clean up
its dead states. All of its partners in crime do this, but the fact we neglect
it for concatenate messes up to
benwtrent commented on issue #13966:
URL: https://github.com/apache/lucene/issues/13966#issuecomment-2640281978
Fixed by: https://github.com/apache/lucene/pull/14181
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
ChrisHegarty commented on PR #14181:
URL: https://github.com/apache/lucene/pull/14181#issuecomment-2640293165
Thank you @benwtrent - let's wait for the next lucene nightly. If no perf
improvement, that's ok. There should be a lot less garbage created, and CPU
devoted to cleaning young heap
benwtrent closed issue #13966: Evaluate adding a double addressing vector scorer
URL: https://github.com/apache/lucene/issues/13966
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
benwtrent merged PR #14181:
URL: https://github.com/apache/lucene/pull/14181
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
rmuir opened a new pull request, #14209:
URL: https://github.com/apache/lucene/pull/14209
These algorithms run in linear time: it is trappy to offer two-arg options:
they encourage users to use them in a loop and create quadratic time.
Send a strong signal to the user's editor/IDE to
stefanvodita commented on code in PR #13914:
URL: https://github.com/apache/lucene/pull/13914#discussion_r1944983208
##
lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java:
##
@@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) {
* is used to
benwtrent opened a new issue, #14208:
URL: https://github.com/apache/lucene/issues/14208
### Description
I am not sure of other structures, but HNSW merges can allocate a pretty
large chunk of memory on heap.
For example:
Let's have the max_conn set to 16. Thus connectio
jpountz commented on code in PR #14193:
URL: https://github.com/apache/lucene/pull/14193#discussion_r1944819744
##
lucene/core/src/test/org/apache/lucene/util/automaton/TestAutomaton.java:
##
@@ -667,11 +667,14 @@ public void testConcatenatePreservesDet() throws
Exception {
jpountz commented on PR #14205:
URL: https://github.com/apache/lucene/pull/14205#issuecomment-2639972179
Superseded by #14193
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
jpountz closed pull request #14205: Automatically remove dead states from
concatenated automata.
URL: https://github.com/apache/lucene/pull/14205
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
rmuir merged PR #14193:
URL: https://github.com/apache/lucene/pull/14193
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
iverase commented on code in PR #14204:
URL: https://github.com/apache/lucene/pull/14204#discussion_r1944680152
##
lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java:
##
@@ -0,0 +1,252 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
benwtrent commented on PR #14181:
URL: https://github.com/apache/lucene/pull/14181#issuecomment-2639727635
OK, benchmarks show pretty much no difference when I was testing on my
machine. But, recall numbers all checkout and the API is nicer and makes more
sense for vector merging.
So
rmuir commented on code in PR #14193:
URL: https://github.com/apache/lucene/pull/14193#discussion_r1944626865
##
lucene/core/src/test/org/apache/lucene/util/automaton/TestAutomaton.java:
##
@@ -667,11 +667,14 @@ public void testConcatenatePreservesDet() throws
Exception {
}
rmuir commented on PR #14205:
URL: https://github.com/apache/lucene/pull/14205#issuecomment-2639660109
It's in my PR over there too. I think we should avoid addEpsilon in the
test. We don't even need any transitions.
--
This is an automated message from the Apache Git Service.
To respond
jpountz opened a new pull request, #14207:
URL: https://github.com/apache/lucene/pull/14207
This helps generate simpler automata, especially when these automata are
later combined through other operations such as `Operations#concat`.
--
This is an automated message from the Apache Git Ser
iverase opened a new issue, #14206:
URL: https://github.com/apache/lucene/issues/14206
The following seed reproduces the issue:
```
./gradlew :lucene:core:test --tests
"org.apache.lucene.index.TestLogMergePolicy.testNoPathologicalMerges"
-Ptests.seed=5C1CAC337454D389
> Ta
jpountz opened a new pull request, #14205:
URL: https://github.com/apache/lucene/pull/14205
Concatenating automata frequently creates dead states. This PR suggests that
`Operations#concatenate` automatically removes these dead states. This is not
unseen: `Operations#repeat`, `Operations#uni
dweiss commented on code in PR #14198:
URL: https://github.com/apache/lucene/pull/14198#discussion_r1944528365
##
lucene/analysis/opennlp/build.gradle:
##
@@ -26,3 +26,33 @@ dependencies {
moduleTestImplementation project(':lucene:test-framework')
}
+
+ext {
+ testModelDa
gf2121 commented on PR #14176:
URL: https://github.com/apache/lucene/pull/14176#issuecomment-2639462765
Thanks @jpountz !
Updated.
Could you please also help review
https://github.com/mikemccand/luceneutil/pull/335 ? I'd like to merge it first
so that nightly benchmark can ca
jpountz opened a new pull request, #14204:
URL: https://github.com/apache/lucene/pull/14204
This is inspired from a paper by Tencent where the authors describe how they
speed up so-called "histogram queries" by sorting the index by timestamp and
translating ranges of values corresponding to
gf2121 commented on PR #14176:
URL: https://github.com/apache/lucene/pull/14176#issuecomment-2639090793
Thanks @iverase !
For the vectorized decodeing, I benchmarked the decoding method with jmh,
the result on my M2 mac:
```
Benchmark Mode Cnt
52 matches
Mail list logo