jpountz opened a new issue, #11915:
URL: https://github.com/apache/lucene/issues/11915
### Description
Lucene's abstractions are good at dealing with long runs of documents that
do not match a query, but much less at dealing with long runs of documents that
match a query. In such cas
rendel commented on issue #11702:
URL: https://github.com/apache/lucene/issues/11702#issuecomment-1309968917
> I don't think ESQL is going to be different from existing faceting
support: it will still want to use ordinals when it makes sense such as
grouping by term.
@jpountz This ma
jpountz commented on PR #11888:
URL: https://github.com/apache/lucene/pull/11888#issuecomment-1310043887
Thanks @vsop-479. Do you know if the test you added to terms can be improved
in such a way that it would have caught this bug?
--
This is an automated message from the Apache Git Servi
jfboeuf commented on code in PR #11900:
URL: https://github.com/apache/lucene/pull/11900#discussion_r1019029965
##
lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java:
##
@@ -46,7 +46,9 @@ public class FuzzySet implements Accountable {
public static final in
jfboeuf commented on code in PR #11900:
URL: https://github.com/apache/lucene/pull/11900#discussion_r1019030241
##
lucene/codecs/src/java/org/apache/lucene/codecs/bloom/MurmurHash64.java:
##
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
rmuir commented on issue #11911:
URL: https://github.com/apache/lucene/issues/11911#issuecomment-1310217848
"read every byte of the index" is the promise that checkindex makes. So this
bug is really important.
--
This is an automated message from the Apache Git Service.
To respond to the
rmuir closed pull request #11906: Add monster test for many knn docs
URL: https://github.com/apache/lucene/pull/11906
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubsc
rmuir commented on PR #11906:
URL: https://github.com/apache/lucene/pull/11906#issuecomment-1310221605
this test is folded into #11905
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
rmuir merged PR #11905:
URL: https://github.com/apache/lucene/pull/11905
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
jpountz commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1310330397
Apologies Luca, but after looking more at your changes, I'm getting worried
that this change is harder than I had anticipated. I was optimistically hoping
that never returning null PointValu
jpountz commented on issue #11393:
URL: https://github.com/apache/lucene/issues/11393#issuecomment-1310333691
I had hoped that getting rid of ghost fields would automatically help avoid
some bugs but after looking into it for both postings and points (thanks
@javanna and @shahrs87 !) it loo
jpountz closed issue #11393: Ghost fields and postings/points [LUCENE-10357]
URL: https://github.com/apache/lucene/issues/11393
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
jpountz commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1310334991
Apologies @javanna, but after looking more at your changes, I'm getting
worried that this change is harder than I had anticipated. I was optimistically
hoping that never returning null P
javanna closed pull request #11793: Prevent PointValues from returning null for
ghost fields
URL: https://github.com/apache/lucene/pull/11793
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
javanna commented on PR #11793:
URL: https://github.com/apache/lucene/pull/11793#issuecomment-1310370411
Agreed @jpountz I think it was a good experiment to spend some time on, and
I have also been thinking along the same lines, that the changes I ended up
making were not solving the proble
benwtrent opened a new pull request, #11916:
URL: https://github.com/apache/lucene/pull/11916
Checkindex with vectors should exercise the graph and seek operations. These
are exposed via the search interface.
There is the option to search EVERY stored vector value as we iterate it,
b
benwtrent commented on PR #11916:
URL: https://github.com/apache/lucene/pull/11916#issuecomment-1310547475
@rmuir I took a stab at it. I am unfamiliar with checkindex, but this will
search 64 vectors, seeking the graph to catch if there is something obscene
broken.
A more complicated
rmuir commented on PR #11916:
URL: https://github.com/apache/lucene/pull/11916#issuecomment-1310577261
Thank you, yeah this is fine as a start! I think, it would be an improvement
in the future to not just search the first 64 vectors but maybe every n'th
(just a different form of sampling).
jpountz opened a new pull request, #11917:
URL: https://github.com/apache/lucene/pull/11917
The default codec has a number of small and hot files, that actually used to
be fully loaded in memory before we moved them off-heap. In the general case,
these files are expected to fully fit into t
uschindler opened a new pull request, #11918:
URL: https://github.com/apache/lucene/pull/11918
This also adds incorrect (e.g., negative) positions to exception message.
This also fixes some wrong exception messages (seek vs. read) in
ByteBufferIndexInput. Sometimes it said "seek" alth
uschindler commented on PR #11918:
URL: https://github.com/apache/lucene/pull/11918#issuecomment-1310726064
The new test is a bit bad, but unfortunately, MMapDirectory's multi-input
only has an assert in seek(). If that hits, test also passses. In reality
negative offsets on slices should a
jtibshirani commented on issue #11863:
URL: https://github.com/apache/lucene/issues/11863#issuecomment-1310797975
In https://github.com/apache/lucene/pull/11905 we added a test for a large
number of documents (with a tiny dimension).
It'd also be good to clean up and merge something l
rmuir commented on PR #11916:
URL: https://github.com/apache/lucene/pull/11916#issuecomment-1310890561
thanks! I like it. Feel free to add a CHANGES entry if you want, it is a
good one for that, because checkindex is user-visible and important. I would
suggest in the 9.4.2 section as that's
benwtrent commented on PR #11916:
URL: https://github.com/apache/lucene/pull/11916#issuecomment-1310907074
pushed CHANGES under 9.4.2 as an `Improvement` @rmuir
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
rmuir merged PR #11916:
URL: https://github.com/apache/lucene/pull/11916
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
rmuir closed issue #11911: improve checkindex to be more thorough for vectors
(e.g. test seeking)
URL: https://github.com/apache/lucene/issues/11911
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
uschindler commented on PR #11918:
URL: https://github.com/apache/lucene/pull/11918#issuecomment-1310952543
Should I backport this also to 9.4.2 when it gets released next week. I am
afraid of more horrible bugs in vectors and I'd like to give people a chance to
report it.
Problem of
rmuir commented on PR #11916:
URL: https://github.com/apache/lucene/pull/11916#issuecomment-1310961095
@benwtrent I hit issue upon backporting to branch_9x: it may be nothing
specific to 9.x but just a random seed that hasn't been encountered yet on
master?
The checkindex error messa
rmuir commented on PR #11916:
URL: https://github.com/apache/lucene/pull/11916#issuecomment-1310962304
Another idea, perhaps even simpler, is not to filter deleteddocs at all here
in this logic. Because checkindex doesnt normally exclude deleted docs and just
checks everything.
--
This i
benwtrent commented on PR #11916:
URL: https://github.com/apache/lucene/pull/11916#issuecomment-1310966373
@rmuir 100%. I reproduced it with that seed, then removed the deleted docs
check and it cleared up. I bet its because ALL the docs were deleted or
something.
--
This is an automated
benwtrent opened a new pull request, #11919:
URL: https://github.com/apache/lucene/pull/11919
There is a chance that all the docs are deleted. This is ok in a checkindex
scenario and other checks don't bother with verifying deleted docs like this.
Removing the check.
This repro
rmuir commented on PR #11919:
URL: https://github.com/apache/lucene/pull/11919#issuecomment-1310970223
looks good, thank you for making the PR so fast. The test failure reproduces
and with this change it passes again.
--
This is an automated message from the Apache Git Service.
To respond
benwtrent commented on PR #11919:
URL: https://github.com/apache/lucene/pull/11919#issuecomment-1310971424
Ran
```
./gradlew test --tests TestLucene94HnswVectorsFormat -Dtests.iters=1000
```
Just to be sure we are good. All green locally.
@rmuir
--
This is an automated
benwtrent commented on PR #11919:
URL: https://github.com/apache/lucene/pull/11919#issuecomment-1310971694
Apologies for the noise! Still learning all of Lucene's edges :D
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1310974545
I think preload is different from mlock, mlock needs way more discussion and
personally I'm against it. mlock would be an operational hassle because of
default resource limits on linux as
uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1019662247
##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -235,7 +235,7 @@ public IndexInput openInput(String name, IOContext context)
throws IOExc
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1310976902
> I think preload is different from mlock, mlock needs way more discussion
and personally I'm against it. mlock would be an operational hassle because of
default resource limits on li
uschindler commented on PR #11918:
URL: https://github.com/apache/lucene/pull/11918#issuecomment-1310991790
Ah you added milestone 9.4.2 already to issue. Will do same here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1311033936
After Mike switched to preview mode the results look good. The speed with
MemorySegmentIndexInput ist similar to old ByteBuffer code.
https://home.apache.org/~mikemccand/lucenebench
rmuir merged PR #11919:
URL: https://github.com/apache/lucene/pull/11919
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
rmuir commented on PR #11918:
URL: https://github.com/apache/lucene/pull/11918#issuecomment-1311107401
yes, +1 to backport. This way if there is another problem, it might be
easier to debug.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
41 matches
Mail list logo