jpountz commented on PR #11929:
URL: https://github.com/apache/lucene/pull/11929#issuecomment-1315035527
Yes it would be enough. :+1:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
akhgeek30 commented on issue #11864:
URL: https://github.com/apache/lucene/issues/11864#issuecomment-1315045550
@mmatela Initial WDGF > SGF > FGF was present. I thought it would be
feasible to flatten the result from WDGF and then push it to SGF. But the
result was same.
--
This is an a
jpountz merged PR #11929:
URL: https://github.com/apache/lucene/pull/11929
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
LuXugang merged PR #11895:
URL: https://github.com/apache/lucene/pull/11895
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.ap
LuXugang commented on PR #11895:
URL: https://github.com/apache/lucene/pull/11895#issuecomment-1315074041
> This makes me think that we could also enhance this logic to count queries
that have a mix of SHOULD and MUST_NOT clauses, in case this is something you
are interested in looking into
uschindler commented on PR #11930:
URL: https://github.com/apache/lucene/pull/11930#issuecomment-1315094972
As we started over here: My suggestion would be to let this go in without
changing the default in MMapDirectory.
If we want to change the default we can make a followup PR on MM
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315146472
In case it helps this discussion, I ran the following code to get a sense of
the savings we could get assuming `2^24` vectors that are not clustered and 32
neighbors per vector.
`
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1315181358
@dweiss a difference may be the hyperthreading. on my tiny 2-core, using the
hyperthreads results in a real speed benefit over using the default `ncpu/2`
```
org.gradle.workers.max=4
rmuir closed issue #11910: improve error-prone configuration for int-overflow
bugs
URL: https://github.com/apache/lucene/issues/11910
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific com
rmuir merged PR #11923:
URL: https://github.com/apache/lucene/pull/11923
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
rmuir commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315198898
The last time i indexed vectors the resulting .vex file was 98% waste. I
checked it with gzip.
I'm not gonna say "I don't trust you guys numbers" but... i dont trust your
numbers :)
rmuir opened a new issue, #11932:
URL: https://github.com/apache/lucene/issues/11932
### Description
A great overflow issue:
https://errorprone.info/bugpattern/NarrowingCompoundAssignment
javac is getting its own checker for this in version 20 so eventually we can
switch to th
rmuir commented on issue #11932:
URL: https://github.com/apache/lucene/issues/11932#issuecomment-1315236305
There's a fair amount of noise to the check (a lot of code uses these
operators to do crazy bit twiddling stuff etc), but it finds some good stuff
that we should obviously fix. includ
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315253148
I don't know much about HNSW but I would guess that this could be due to the
fact that nodes that map to vectors that are similar to one another will have a
similar set of neighbors, and
rmuir commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315256986
i guess i feel, the naive thing to do, is to start by treating them like
postings.
not to multiply a bunch of numbers and treat this as a grid, yeah i'm aware
of the hype name "dens
dweiss commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1315260335
Yup, fine with me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
rmuir commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315260266
and yes, deduping neighbors is the same challenge as deduping terms
postings, agree it isn't worth it.
but we need to stop reinventing wheels and treating this shit like its
really
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315266717
So vint-delta like short postings?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
rmuir commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315267393
sure? this sounds like a great initial approach? that's what lucene did for
a very long time.
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315270376
No objections, I just wanted to make sure we agreed on what doing the same
thing as postings meant.
--
This is an automated message from the Apache Git Service.
To respond to the messa
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1315274974
@dweiss i'll run simple bench on some mac laptops i have before pushing. i
know they are very commonly used.
--
This is an automated message from the Apache Git Service.
To respond to th
jpountz commented on PR #11930:
URL: https://github.com/apache/lucene/pull/11930#issuecomment-1315275629
Agreed Uwe.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsu
jpountz merged PR #11930:
URL: https://github.com/apache/lucene/pull/11930
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
rmuir commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315280529
I think i mean at a higher level, think about what we (and IR community) has
learned over the years with postings, and try to apply it to these vectors.
Means not just integer compre
rmuir commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315282799
and the best is i keep hearing the excuse that we can't use randomized
testing for this stuff. absolutely nonsense!
--
This is an automated message from the Apache Git Service.
To re
jpountz opened a new issue, #11933:
URL: https://github.com/apache/lucene/issues/11933
### Description
`ChecksumIndexInput` only allows reading files sequentially, so the only
`IOContext` that makes sense is `IOContext.READONCE`?
### Version and environment details
_No r
mikemccand commented on issue #10878:
URL: https://github.com/apache/lucene/issues/10878#issuecomment-1315303628
This random failure is still happening in recent CI builds (e.g.
[here](https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.4/721/)), but
never seems to repro for me.
mikemccand commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1315303887
> this test failed for me yesterday in the same way. Seems to be a bug.
This test has been failing for some time -- we have #10878 open for it.
I'll try to make some time to fi
jpountz opened a new pull request, #11934:
URL: https://github.com/apache/lucene/pull/11934
These calls should use `IOContext.READONCE` rather than forward the default
`IOContext`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
uschindler commented on PR #11930:
URL: https://github.com/apache/lucene/pull/11930#issuecomment-1315314387
I think a CHANGES.txt entry would be good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
rmuir commented on issue #10878:
URL: https://github.com/apache/lucene/issues/10878#issuecomment-1315334043
I'm wondering if, in CI with multiplier, this test hits something like file
handle limit (too many open files) or similar, and masks the exc somehow?
--
This is an automated message
benwtrent commented on code in PR #11923:
URL: https://github.com/apache/lucene/pull/11923#discussion_r1022817060
##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java:
##
@@ -85,7 +85,7 @@ final class DocumentsWriterFlushControl implements
Accountabl
rmuir commented on code in PR #11923:
URL: https://github.com/apache/lucene/pull/11923#discussion_r1022818839
##
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java:
##
@@ -85,7 +85,7 @@ final class DocumentsWriterFlushControl implements
Accountable, C
jpountz commented on PR #11928:
URL: https://github.com/apache/lucene/pull/11928#issuecomment-1315362000
Have you checked if this actually made things faster? We're saving calls to
`advance` in some cases but also adding conditions to some very tight loops.
E.g. we recently got speedups by
jpountz commented on code in PR #11928:
URL: https://github.com/apache/lucene/pull/11928#discussion_r1022874688
##
lucene/MIGRATE.md:
##
@@ -102,6 +102,12 @@ Lucene 9.2 or stay with 9.0.
See LUCENE-10558 for more details and workarounds.
+### DisjunctionDISIApproximation be
jpountz closed issue #10243: Make DocValuesIterator public [LUCENE-9203]
URL: https://github.com/apache/lucene/issues/10243
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To u
jpountz commented on issue #10243:
URL: https://github.com/apache/lucene/issues/10243#issuecomment-1315407533
Closing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To un
jpountz commented on issue #10581:
URL: https://github.com/apache/lucene/issues/10581#issuecomment-1315407884
This has been merged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
jpountz closed issue #10581: Ensure sub-iterators of ConjunctionDISI are on the
same document [LUCENE-9541]
URL: https://github.com/apache/lucene/issues/10581
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
jpountz commented on PR #256:
URL: https://github.com/apache/lucene/pull/256#issuecomment-1315411675
Even better, all our base merge policies have a default impl now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
jpountz closed pull request #256: LUCENE-10064: Implement
TieredMergePolicy#findFullFlushMerges.
URL: https://github.com/apache/lucene/pull/256
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
jpountz closed issue #11102: Give findFullFlushMerges a default implementation
[LUCENE-10064]
URL: https://github.com/apache/lucene/issues/11102
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
mikemccand opened a new pull request, #11935:
URL: https://github.com/apache/lucene/pull/11935
### Description
This change just records to an in-heap log (`PrintStream`) what the test
actually did, and then prints that full log on test failure (well, when the
test knowingly throws a
mikemccand commented on PR #11935:
URL: https://github.com/apache/lucene/pull/11935#issuecomment-1315429989
A little more context: this test seems not to reproduce on failure, so
hopefully with this verbosity, when it does fail, we can reconstruct what
happened. But, it is also possible th
mikemccand commented on PR #11935:
URL: https://github.com/apache/lucene/pull/11935#issuecomment-1315439558
Also, I think this test might be a bit borked: most of the time, after
randomly throwing exceptions in a terrifying place (inside
`IndexFileDeleter.decRef`), `IndexWriter` closes itse
rmuir commented on PR #11935:
URL: https://github.com/apache/lucene/pull/11935#issuecomment-1315459218
There was a somewhat similar-behaving hard-to-reproduce fail here: #11755
Not all places in IndexWriter throw `AlreadyClosedException` on tragedy.
sometimes other exceptions get thro
rmuir commented on PR #11935:
URL: https://github.com/apache/lucene/pull/11935#issuecomment-1315461200
and... just as a warning, for that issue, if you turned on verbose, it would
no longer fail anymore. So it had the heisenbug quality to it as well.
--
This is an automated message from t
rmuir commented on issue #11932:
URL: https://github.com/apache/lucene/issues/11932#issuecomment-1315559541
Here's a log of the output from the new check:
[overflow.log](https://github.com/apache/lucene/files/10014404/overflow.log)
I think it is too noisy to worry about fixing all the
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1315632772
I tested on M1 mac that has 10 cores (8 performance + 2 efficiency). With
these higher concurrency levels, the benchmark is a little flawed since there
are a couple of single-task bottlene
jpountz commented on issue #11932:
URL: https://github.com/apache/lucene/issues/11932#issuecomment-1315648711
I'll look into this log.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
dweiss commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1315760973
My dev machine is a rather beefy 32-core threadripper and your observations
align with mine: the execution times are pretty much the same for me, but
overall resource utilization seems mu
mikemccand merged PR #11935:
URL: https://github.com/apache/lucene/pull/11935
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
gsmiller commented on PR #11928:
URL: https://github.com/apache/lucene/pull/11928#issuecomment-1315785521
> Have you checked if this actually made things faster? We're saving calls
to advance in some cases but also adding conditions to some very tight loops.
I'd run some of our intern
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1315785690
will merge this one and open a followup issue for the 3GB heap. another idea
i had (still to be investigated), is trying to run `ecj` without forking.
there's no toolchain issues as it beh
rmuir closed issue #11924: gradle build uses excessive resources on multi-core
machine
URL: https://github.com/apache/lucene/issues/11924
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
rmuir closed issue #11925: error-prone's JVM arguments need help
URL: https://github.com/apache/lucene/issues/11925
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscri
rmuir merged PR #11927:
URL: https://github.com/apache/lucene/pull/11927
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
rmuir opened a new pull request, #11936:
URL: https://github.com/apache/lucene/pull/11936
I've been overriding this with 1GB for a long time personally: I think the
3GB is a relic from a previous time.
--
This is an automated message from the Apache Git Service.
To respond to the mess
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1316004033
I had to revert this because it made jenkins angry on alternate toolchains.
I think i understand the issue, I just need to test it out locally.
--
This is an automated message from the A
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1316010019
The problem is this piece:
```
tasks.withType(JavaCompile) { JavaCompile task ->
task.options.forkOptions.jvmArgs += vmOpts
}
```
There are 3 cases:
* Ja
rmuir opened a new pull request, #11937:
URL: https://github.com/apache/lucene/pull/11937
This is fixing 11927 for alternate toolchains. Git really showing its true
colors today.
The problem with alternate toolchains is they fork differently: invoke
`javac` executable rather than err
rmuir commented on PR #11937:
URL: https://github.com/apache/lucene/pull/11937#issuecomment-1316066302
you can't really reopen a PR and github revoked access to dweiss's branch as
soon as i pressed merge, so i couldn't even update it.
at least here all the changes are in one place and
rmuir merged PR #11937:
URL: https://github.com/apache/lucene/pull/11937
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
rmuir opened a new pull request, #11938:
URL: https://github.com/apache/lucene/pull/11938
Fixes found from #11932 review of narrowing-compound-assignments
The checker is noisy, so it isn't being enabled here. Just fixing the rare
gems that look reasonable.
Feel free to push com
dweiss commented on PR #11937:
URL: https://github.com/apache/lucene/pull/11937#issuecomment-1316436339
> you can't really reopen a PR and github revoked access to dweiss's branch
as soon as i pressed merge
well, it wasn't me. :) No idea what happened there...
--
This is an automat
dweiss commented on code in PR #11937:
URL: https://github.com/apache/lucene/pull/11937#discussion_r1023550549
##
gradle/hacks/turbocharge-jvm-opts.gradle:
##
@@ -38,4 +39,20 @@ allprojects {
jvmArgs += vmOpts
}
-}
\ No newline at end of file
+
+// Tweak java
maosuhan opened a new pull request, #11939:
URL: https://github.com/apache/lucene/pull/11939
### Description
When we execute TermRangeQuery or TermInSet query, lucene use
DocIdSetBuilder to store doc id list. When the doc id list becomes large, it
will convert from array to bitset in
67 matches
Mail list logo