vsop-479 commented on code in PR #11888:
URL: https://github.com/apache/lucene/pull/11888#discussion_r1542387588
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##
@@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef targ
vsop-479 commented on code in PR #11888:
URL: https://github.com/apache/lucene/pull/11888#discussion_r1542363416
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##
@@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef targ
vsop-479 commented on code in PR #11888:
URL: https://github.com/apache/lucene/pull/11888#discussion_r1542357624
##
lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestLucene99PostingsFormat.java:
##
@@ -143,4 +141,13 @@ private void doTestImpactSerialization(List
impact
vsop-479 commented on code in PR #11888:
URL: https://github.com/apache/lucene/pull/11888#discussion_r1542233210
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##
@@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef targ
vsop-479 commented on code in PR #11888:
URL: https://github.com/apache/lucene/pull/11888#discussion_r1542231368
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##
@@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef targ
rmuir commented on PR #13231:
URL: https://github.com/apache/lucene/pull/13231#issuecomment-2024165548
Give me some time for a couple more PRs to get this shell script doing less,
and I think we'll be able to totally nuke it.
I kept the `iconv` check here for UTF-8 correctness becaus
rmuir opened a new pull request, #13231:
URL: https://github.com/apache/lucene/pull/13231
Remove this hack, to reduce more logic in this script.
It is no longer needed as of
https://github.com/snowballstem/snowball-website/commit/b934d6b565e268b3db080140cc145f532cd6e648
--
This is
uschindler commented on code in PR #13229:
URL: https://github.com/apache/lucene/pull/13229#discussion_r1542132603
##
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##
@@ -88,4 +84,18 @@ public IOContext(MergeInfo mergeInfo) {
// Merges read input segments seq
uschindler commented on code in PR #13229:
URL: https://github.com/apache/lucene/pull/13229#discussion_r1542131095
##
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##
@@ -88,4 +84,18 @@ public IOContext(MergeInfo mergeInfo) {
// Merges read input segments seq
uschindler commented on PR #13206:
URL: https://github.com/apache/lucene/pull/13206#issuecomment-2024093146
Could you add a changes.txt entry in the 9.11 bugfix section? Will merge
this PR tomorrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, pl
rmuir merged PR #13227:
URL: https://github.com/apache/lucene/pull/13227
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
mikemccand commented on code in PR #11888:
URL: https://github.com/apache/lucene/pull/11888#discussion_r1541971598
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##
@@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef ta
benwtrent commented on PR #13200:
URL: https://github.com/apache/lucene/pull/13200#issuecomment-2023902003
Tests still all fail, but now I think it compiles. Many deprecation warnings
to go through and clean up still.
One concern I had was on `FieldInfo`. Do we want to ask for a fully
mikemccand commented on code in PR #11888:
URL: https://github.com/apache/lucene/pull/11888#discussion_r1541892891
##
lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestLucene99PostingsFormat.java:
##
@@ -143,4 +141,13 @@ private void doTestImpactSerialization(List
impa
vigyasharma commented on PR #13220:
URL: https://github.com/apache/lucene/pull/13220#issuecomment-2023475693
@jpountz I was checking for consensus. I'm aligned with deprecating.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
jpountz merged PR #13216:
URL: https://github.com/apache/lucene/pull/13216
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz commented on PR #13216:
URL: https://github.com/apache/lucene/pull/13216#issuecomment-2023348250
I'm merging only to `main` for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
tteofili commented on code in PR #13225:
URL: https://github.com/apache/lucene/pull/13225#discussion_r1541406253
##
lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java:
##
@@ -949,4 +951,58 @@ public int hashCode() {
return 31 * classHash() + doc
rmuir commented on PR #13227:
URL: https://github.com/apache/lucene/pull/13227#issuecomment-2023119889
I know, that seems crazy, but I think it would be the ultimate goal. Will
have to find an "easy" / "dead-simple" way to achieve publishing snowball
properly that avoids craziness of maven
jpountz commented on PR #13222:
URL: https://github.com/apache/lucene/pull/13222#issuecomment-2023049088
I opened #13229, is it what you had in mind?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
tteofili commented on code in PR #13225:
URL: https://github.com/apache/lucene/pull/13225#discussion_r1541314263
##
lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java:
##
@@ -949,4 +951,58 @@ public int hashCode() {
return 31 * classHash() + doc
mayya-sharipova commented on code in PR #13225:
URL: https://github.com/apache/lucene/pull/13225#discussion_r1541305072
##
lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java:
##
@@ -949,4 +951,58 @@ public int hashCode() {
return 31 * classHash(
rmuir commented on PR #13227:
URL: https://github.com/apache/lucene/pull/13227#issuecomment-2022952918
> The only downside is that we can't detect anymore if snowball code calls
unsafe shit like forgetting locales or charsets.
And we also won't if it starts being published as jar and
uschindler commented on PR #13227:
URL: https://github.com/apache/lucene/pull/13227#issuecomment-2022951336
> thanks, i will do this. It isn't an option to call lucene ArrayUtil
methods etc from snowball code.
The only downside is that we can't detect anymore if snowball code calls
u
rmuir commented on PR #13227:
URL: https://github.com/apache/lucene/pull/13227#issuecomment-2022934383
thanks, i will do this. It isn't an option to call lucene ArrayUtil methods
etc from snowball code.
--
This is an automated message from the Apache Git Service.
To respond to the message
benwtrent commented on PR #13224:
URL: https://github.com/apache/lucene/pull/13224#issuecomment-2022932116
I agree with @jpountz 's concern here. But I think the `zero` checks can be
done as things are read in and we can return the static object.
--
This is an automated message from the A
uschindler commented on PR #13227:
URL: https://github.com/apache/lucene/pull/13227#issuecomment-2022932644
> @uschindler How can i exclude the org.tartarus code from this check (or
really all checks) without touching it?
>
> ```
> Forbidden method invocation: java.util.Arrays#copy
gf2121 commented on PR #13221:
URL: https://github.com/apache/lucene/pull/13221#issuecomment-2022926905
I run 10 rounds on wikimediumall index (normal index, no index sorting /
force merge). Result looks positive in general, but we do meet slight
regression for high-cardinality field in sev
expani opened a new issue, #13228:
URL: https://github.com/apache/lucene/issues/13228
### Description
One of the optimisations introduced by
[LUCENE-10233](https://issues.apache.org/jira/browse/LUCENE-10233) was to
compress continuous doc Ids (strictly sorted) by only storing the sta
rmuir commented on PR #13227:
URL: https://github.com/apache/lucene/pull/13227#issuecomment-2022906420
@uschindler How can i exclude the org.tartarus code from this check (or
really all checks) without touching it?
```
Forbidden method invocation: java.util.Arrays#copyOf(**) [Prefe
jpountz commented on PR #13222:
URL: https://github.com/apache/lucene/pull/13222#issuecomment-2022902471
> What do you think?
I had similar thoughts is mind, so that sounds good to me.
I'm still curious about how to fix the bigger issue wrt reader pooling.
Should `getMergeInsta
jpountz commented on PR #13220:
URL: https://github.com/apache/lucene/pull/13220#issuecomment-2022882774
@vigyasharma I'd like to double check with you if you're good with
deprecating before merging, I'm not sure if your previous comment was candidly
checking for consensus, or if you were i
rmuir merged PR #13209:
URL: https://github.com/apache/lucene/pull/13209
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
jpountz commented on PR #13224:
URL: https://github.com/apache/lucene/pull/13224#issuecomment-2022824497
The idea makes sense to me, but the fact that we're checking for a specific
`blockShift` of `16` looks fragile to me. If codecs change the value of
`blockShift` tomorrow, this will break
khushbr opened a new issue, #13226:
URL: https://github.com/apache/lucene/issues/13226
### Description
### Description
We have a cluster, running on Lucene v8.7.0 and configured with
`TieredMergePolicy`. We are seeing a peculiar behavior where segments with
heavy deletes are not
tteofili opened a new pull request, #13225:
URL: https://github.com/apache/lucene/pull/13225
### Description
This introduces just a test to check that a `KnnVectorQuery` runs the same
when a same field is indexed with different `KnnVectorFormats` (e.g.
`Lucene99HnswVectorsFormat` and
original-brownbear commented on code in PR #13224:
URL: https://github.com/apache/lucene/pull/13224#discussion_r1540997664
##
lucene/core/src/java/org/apache/lucene/util/packed/DirectMonotonicReader.java:
##
@@ -39,6 +39,9 @@ public final class DirectMonotonicReader extends Long
benwtrent commented on code in PR #13197:
URL: https://github.com/apache/lucene/pull/13197#discussion_r1540992353
##
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java:
##
@@ -36,10 +36,12 @@ public class VectorUtilBenchmark {
private byt
benwtrent commented on code in PR #13224:
URL: https://github.com/apache/lucene/pull/13224#discussion_r1540991260
##
lucene/core/src/java/org/apache/lucene/util/packed/DirectMonotonicReader.java:
##
@@ -39,6 +39,9 @@ public final class DirectMonotonicReader extends LongValues
i
benwtrent commented on PR #13202:
URL: https://github.com/apache/lucene/pull/13202#issuecomment-2022583444
Looking at the benchmarking, we are adding a 5% overhead to all vector
operations when using float32. As vector operations get faster (consider
hamming distance with exploring more vec
original-brownbear opened a new pull request, #13224:
URL: https://github.com/apache/lucene/pull/13224
Having a single block of all zeros is a fairly common case that is using a
lot of heap for duplicate instances in some use-cases in ES. => read a
singleton for it to save the duplication
tteofili commented on code in PR #13197:
URL: https://github.com/apache/lucene/pull/13197#discussion_r1540836373
##
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java:
##
@@ -36,10 +36,12 @@ public class VectorUtilBenchmark {
private byte
tteofili commented on PR #13197:
URL: https://github.com/apache/lucene/pull/13197#issuecomment-2022409522
I tend to agree on being opinionated on a set of allowed configurations for
what concerns the number of bits (4 and 7).
Given the speed-space trade-off for packing, I think it's usefu
jpountz opened a new pull request, #13223:
URL: https://github.com/apache/lucene/pull/13223
This is a follow-up of a discussion on #13219. `mmap` has a higher readahead
than regular `read()` operations by default, e.g. 128kB instead of 16kB on my
Linux box. On indexes that exceed the size o
uschindler commented on PR #13222:
URL: https://github.com/apache/lucene/pull/13222#issuecomment-2022362746
About the announced comment: When we merge we want to use sequential, as the
kernel may earlier free the pages. But actually I am not sure, if we really
need this: After merging the f
uschindler commented on PR #13222:
URL: https://github.com/apache/lucene/pull/13222#issuecomment-2022357798
In addition, we should still look into getting the IOContexts correct when
we merge. The current solution is not ideal, but somehow not really changeable.
When you clone an indexinput
rquesada-tibco commented on code in PR #13201:
URL: https://github.com/apache/lucene/pull/13201#discussion_r1540777517
##
lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java:
##
@@ -292,7 +292,21 @@ public long cost() {
};
}
gf2121 commented on PR #13221:
URL: https://github.com/apache/lucene/pull/13221#issuecomment-2022295704
I separately build two `wikimedium10m` indices that force merged and reverse
sorted by `dayOfYear`/`lastMod` and here is the result:
**wikimedium10m.lucene_baseline.Lucene99.dvfield
jpountz commented on PR #13222:
URL: https://github.com/apache/lucene/pull/13222#issuecomment-2022301350
No worries, Uwe. Looking forward to your suggestions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
uschindler commented on PR #13222:
URL: https://github.com/apache/lucene/pull/13222#issuecomment-2022255027
Hi, I have some problem regarding merging with it - and a suggestion. Please
hold with merging.
--
This is an automated message from the Apache Git Service.
To respond to the messag
vsop-479 commented on PR #11888:
URL: https://github.com/apache/lucene/pull/11888#issuecomment-2022219035
@mikemccand Thanks for your review.
I measured performance on `wikimediumall`:
# iter1
TaskQPS baseline StdDevQPS my_modified_version StdDev
gf2121 commented on PR #13221:
URL: https://github.com/apache/lucene/pull/13221#issuecomment-2022184017
> A downside is that this approach may be less memory-efficient, since we
store competitive docs as integers, never as a bit set like today. But we may
be able to work around it by just s
jpountz closed issue #13211: Replace boolean flags on IOContext with an enum
URL: https://github.com/apache/lucene/issues/13211
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
jpountz merged PR #13219:
URL: https://github.com/apache/lucene/pull/13219
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
gf2121 commented on PR #13199:
URL: https://github.com/apache/lucene/pull/13199#issuecomment-2022144257
Nightly benchmark:
https://home.apache.org/~mikemccand/lucenebench/TermDTSort.html
https://home.apache.org/~mikemccand/lucenebench/TermDayOfYearSort.html
https://home.apache.org/~m
55 matches
Mail list logo