matthias-mueller commented on code in PR #12299:
URL: https://github.com/apache/lucene/pull/12299#discussion_r1196179445
##
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##
@@ -53,7 +53,10 @@ public static CharArraySet getWordSet(Reader reader,
CharArrayS
rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551026488
> Of course we can also include the implementation classes into the main JAR
file.
To be clear this is what I propose, not making things pluggable. That's not
related and not w
uschindler commented on code in PR #12299:
URL: https://github.com/apache/lucene/pull/12299#discussion_r1196362845
##
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##
@@ -53,7 +53,10 @@ public static CharArraySet getWordSet(Reader reader,
CharArraySet res
matthias-mueller commented on code in PR #12299:
URL: https://github.com/apache/lucene/pull/12299#discussion_r1196380870
##
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##
@@ -53,7 +53,10 @@ public static CharArraySet getWordSet(Reader reader,
CharArrayS
msokolov commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551254446
+1 to this - I would be grateful to you all if you are able to get this
working. Don't really want to have to die with a sword in my hand to get to the
promised land of vector API.
benwtrent commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1551287769
I don't see how it is that commit as the change was only detected after this
PR was merged and has persisted since then. Though I may be reading the graph
incorrectly.
We should
msokolov commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1551293959
Here is the timeline graph with one sample per day, roughly
https://home.apache.org/~mikemccand/lucenebench/VectorSearch.html
--
This is an automated message from the Apache Git Servi
uschindler commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1551296276
Javadocs look fine:
https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/javadoc/core/org/apache/lucene/index/IndexReader.html
 {
+ "' does not exist."
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551468645
I agree with the high-level idea - use the Incubating Vector API internally
in Lucene, in a way that is opt-in and also constrained by module/SPI/loading
techniques (so as to r
ChrisHegarty commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551473467
Correction.
The incubating Vector API must be loaded as part of the boot layer - it must
be added by the command line --add-modules flag **OR _required_ by the
implemen
benwtrent commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1551495918
> Also, is it actually this commit, or is it the HashMap commit
https://github.com/apache/lucene/commit/3c163745bb07aed51b52878de0666f1405696147
?
@jbellis that change seems to
jbellis opened a new pull request, #12303:
URL: https://github.com/apache/lucene/pull/12303
Referencing the regression discussed in #12255
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
jbellis commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1551523734

^ This is almost the entire diff, there is also the same change for byte[]
and then `p
nknize commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551636108
> 2\. Java is dying and becoming the next COBOL
😆 This is timely and
> let's carve out a path where we use the vector api IFF the user opts in
via the command-line.
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551695383
> Correction.
>
> The incubating Vector API must be loaded as part of the boot layer - it
must be added by the command line --add-modules flag **OR explicitly _required_
b
uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1551698369
> > 2. Java is dying and becoming the next COBOL
>
> 😆
>
> > let's carve out a path where we use the vector api IFF the user opts in
via the command-line.
>
>
JarvisCraft commented on PR #12295:
URL: https://github.com/apache/lucene/pull/12295#issuecomment-1551711212
> So I tend to change the equals methods with instanceof to follow the more
modern approach consistently.
In this case, I may roll back changes to `equals` methods and create a
markrmiller commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1551779674
While this is not a critique on Lucene's attempt to utilize SIMD via
OpenJDK, or any proposed ideas here, it's challenging to envision Lucene
emerging as the leading solution fo
mikemccand commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1551807578
Hmm, curiously, clicking through the datapoint with the ~20% QPS drop,
[these were the changes to
Lucene](https://github.com/apache/lucene/compare/963ed7ce888724c2dd55fff8c13a08b81b20
benwtrent commented on issue #12219:
URL: https://github.com/apache/lucene/issues/12219#issuecomment-1551846428
@rmuir I was not suggesting it as a way to only get performance for "some
big company". I just thought using an incubating API was out of the question in
Lucene (you have implied
reta opened a new issue, #12304:
URL: https://github.com/apache/lucene/issues/12304
### Description
In OpenSearch, we've updated to the recent Apache Lucene 9.7 snapshots and
got an number of tests failing with `java.security.AccessControlException`. The
culprit is `org.apache.lucene
benwtrent commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1551891133
So, using lucene util, I have been comparing this PR's commit vs. the one
previous. I am consistently getting lower QPS with this PR's change but not
near the 20% slow down (only aroun
benwtrent commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1551959695
Well, with just a single thread, Luca's commit changed nothing to my local
tests.
I do seem to have a single threaded slow down (though minor) due to this
change.
@mikem
uschindler commented on issue #12304:
URL: https://github.com/apache/lucene/issues/12304#issuecomment-1551968070
Hi, VirtualMethod is stone-aged class which was recently introduced again to
handle
Yes it may miss doPrivileged(). In fact the missing permission is granted by
Opensearc
reta commented on issue #12304:
URL: https://github.com/apache/lucene/issues/12304#issuecomment-1551969539
>This is new in 9.7, the recently released 9.6 does not use this. If I
provide a PR for 9.x can you test my "better fix"? I am not 100% sure it can be
fixed without permissions, but gi
jbellis commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1551970248
Can you check the performance with the changes at
https://github.com/apache/lucene/pull/12303 ?
--
This is an automated message from the Apache Git Service.
To respond to the message,
uschindler commented on issue #12304:
URL: https://github.com/apache/lucene/issues/12304#issuecomment-1552013584
Thanks for finding the issue so early (before a release). Backport #12197
uses VirtualMethod class for the first time since Very long time. It was
heavily used at Lucene 3 an
uschindler commented on PR #12299:
URL: https://github.com/apache/lucene/pull/12299#issuecomment-1552064934
Looks fine. I will merge this, but please add a CHANGES.txt entry in the 9.7
section.
Thanks for taking care of the issue! 👍
--
This is an automated message from the Apache G
msokolov commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1552065169
I see that `constants.SEARCH_NUM_THREADS=2` and this is what is passed to
Competitor() by default as `numThreads`, and is then passed to
`perf.SearchPerfTest` as `-searchThreadCount`,
msokolov commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1552073475
One thing that';s confusing here is that luceneutil reports "commits since
last successful run" relative to a 5/10 run, but there is no graph point
showing that run. The last successful
gsmiller opened a new pull request, #12305:
URL: https://github.com/apache/lucene/pull/12305
### Description
This proposes some minor cleanup and improvements to
`DaciukMihovAutomatonBuilder` I came across while starting to explore #12176
(exploring the idea of allowing this algorith
dsmiley opened a new pull request, #12306:
URL: https://github.com/apache/lucene/pull/12306
The principle objective of this PR is to make it easier for users to use
Lucene with higher dimensions. While it's [technically possible to circumvent
the existing
limit](https://lists.apache.org/t
msokolov commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1552164269
I tried running luceneutil before/after this change using this command:
```
comp = competition.Competition()
index = comp.newIndex('baseline', sourceData,
msokolov commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1552173316
Re-running with
comp = competition.Competition(verifyScores=False)
lets the benchmark complete without errors and I get this result:
```
dsmiley commented on PR #12306:
URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552175880
I get your point on supportability, and I appreciate that you and others
take that so seriously! Then let's have documentation around this setting to
make it explicitly clear that it is
rmuir commented on PR #12306:
URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552179361
Sorry, I really don't believe we should do this in the default codec. Again
you can just make an alternative non default codec in lucene/codecs with a
different limit than the default code
dsmiley commented on PR #12306:
URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552179995
I could also imagine some users hosting search for others might want to
configure the limit lower!
--
This is an automated message from the Apache Git Service.
To respond to the messag
rmuir commented on PR #12306:
URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552183640
The default codec with the index back compact guarantees that we have is
just no place for leniency or "you are on your own". You should trust the index
is supported.
The lucene/co
dsmiley commented on PR #12306:
URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552202819
BTW thanks for your prompt and friendly responses.
I have two lines of inquiry here:
(A) I don't accept the rule/belief that you have in which our default codecs
shall not h
rmuir commented on PR #12306:
URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552206013
I gave you the technical reasons I am against the system property. I don't
want to argue on and on because you don't agree with it, we don't have to agree.
--
This is an automated messag
contrebande-labs commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1552208982
> [...] it's challenging to envision Lucene emerging as the leading solution
for large-scale vector similarity search [...]
For anyone out there who knows IR's SOTA
contrebande-labs opened a new issue, #12307:
URL: https://github.com/apache/lucene/issues/12307
### Description
We use IntelliJ to build a fat jar to run on ARM64 cloud VMs with Java 20 on
Oracle Linux 8.5 to build Lucene BM25+HNSW indices.
Our first run gave us this error mess
gsmiller commented on issue #4350:
URL: https://github.com/apache/lucene/issues/4350#issuecomment-1552229102
I came across a reference to this issue while working on #12305. It looks
like the `CharsRef` family is pretty consistent with `BytesRef` now (including
the builders). I don't see an
contrebande-labs commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1552245148
Is there room for a helping hand here? Tests, benchmarks, continuous
integration, documentation, anything ?
--
This is an automated message from the Apache Git Service.
T
dsmiley commented on PR #12306:
URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552322741
The only argument you've given against a system property specifically is:
> system property is not used appropriately (e.g. proper security checks).
If there is a security manager, and
JerryChin commented on PR #12299:
URL: https://github.com/apache/lucene/pull/12299#issuecomment-1552327703
Hi @uschindler, which category should I put it under? how about
`Improvements`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
michaelwechner commented on PR #12306:
URL: https://github.com/apache/lucene/pull/12306#issuecomment-1552471997
Just to make sure I understand the -1 correctly: The concern is that if a
user can set the dimension by himself, because the Lucene version works well
with this dimension at that
50 matches
Mail list logo