easyice commented on PR #13069:
URL: https://github.com/apache/lucene/pull/13069#issuecomment-1931077803
This will also fix test failure for
TestReqOptSumScorer.testFilterRandomFrequentOpt
```
./gradlew test --tests TestReqOptSumScorer.testFilterRandomFrequentOpt
-Dtests.seed=70A6
dweiss closed issue #13083: Modify getEnWikiRandomLines to fetch and decompress
the zstd resource
URL: https://github.com/apache/lucene/issues/13083
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
dweiss commented on issue #13083:
URL: https://github.com/apache/lucene/issues/13083#issuecomment-1930762464
I used zstd-jni for decompression within the buildscript as command-line
zstd may not be installed locally. zstd-jni is still way, way faster than
decompressing bz2...
--
This is
uschindler commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1480537766
##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -214,4 +214,18 @@ public static float[] checkFinite(float[] v) {
}
return v;
}
+
dweiss commented on issue #13065:
URL: https://github.com/apache/lucene/issues/13065#issuecomment-1930667697
It will be a major headache to maintain native bindings for all major
platforms. I think such an analyzer should be a downstream project (then you
can restrict the platforms on which
jpountz opened a new issue, #13084:
URL: https://github.com/apache/lucene/issues/13084
### Description
@uschindler asked this question in
https://lists.apache.org/thread/6o3hn3x8syfm8lj93kk5rrxb0kx701gp.
In this discussion, we were looking for introducing the ability to iterate
dweiss opened a new issue, #13083:
URL: https://github.com/apache/lucene/issues/13083
### Description
The decompression speed should be significant.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
dweiss closed issue #13074: Expose the linedocsfile (enwiki) as a zstd
compressed archive
URL: https://github.com/apache/lucene/issues/13074
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
dweiss commented on issue #13074:
URL: https://github.com/apache/lucene/issues/13074#issuecomment-1930660076
Thank you, Mike! I'll create a follow-up issue to change the gradle task to
download and unpack the zstd-compressed file.
--
This is an automated message from the Apache Git Servic
rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1930648432
which of the current functions really need to be in core? I guess the
problem I see is that there are 6 functions today, 3 float, 3 byte.
The byte functions don't perform well and n
mikemccand commented on issue #13074:
URL: https://github.com/apache/lucene/issues/13074#issuecomment-1930545328
OK, done!
https://home.apache.org/~mikemccand/enwiki.random.lines.txt.zst
I downloaded and confirmed the `wc -c` gives the same count as above.
Thanks @dweiss
mikemccand commented on issue #13074:
URL: https://github.com/apache/lucene/issues/13074#issuecomment-1930535125
Wow, that is an amazingly fast decompression! And also an awesome
improvement in compression ratio. Yup, I'll do this shortly.
--
This is an automated message from the Apache
stefanvodita commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1930494299
Thank you for reviving the PR, Mike; it had been sitting around for a good
while. I’ll leave it up for a few more days to see if there are other comments
and merge if there aren’t.
benwtrent merged PR #13058:
URL: https://github.com/apache/lucene/pull/13058
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
benwtrent commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1930440518
> IMHO, the VectorSimilarity class should NOT be an ENUM and instead be an
SPI with a symbolic name (using NamedSPILoader for the lookup) and the name
should be stored in FieldInfo.
ChrisHegarty commented on PR #12706:
URL: https://github.com/apache/lucene/pull/12706#issuecomment-1930403065
Thanks @uschindler.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comm
rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1930397239
Thanks uwe, thats exactly what is needed. The problem i see is a very
immature field (vector search) that has no way to add new features (distance
functions) without permanently impacting
uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1930363479
In general, I'd like to rethink the plugabble VectorSimilarities (per
field). IMHO, the VectorSimilarity class should NOT be an ENUM and instead be
an SPI with a symbolic name (using
benwtrent commented on code in PR #13058:
URL: https://github.com/apache/lucene/pull/13058#discussion_r1480229819
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -135,7 +135,7 @@ private TopDocs getLeafResults(
}
// Perform the app
benwtrent commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1930225256
My question about supporting Lucene 9 indices is out of legit ignorance. I
think we would still need to support reading and searching segments stored with
Cosine in Lucene 10. But we c
benwtrent commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1930202577
> Can we do that for Lucene 10.0 ?
Deprecate it and warning of its imminent demise or remove it?
Either should be possible. For users, they would have to add code to
norma
uschindler commented on PR #12706:
URL: https://github.com/apache/lucene/pull/12706#issuecomment-1930132446
I targeted it to milestone 9.10.0. I will add the CHANGES.txt entry shortly
before merging.
--
This is an automated message from the Apache Git Service.
To respond to the message, p
uschindler commented on PR #12706:
URL: https://github.com/apache/lucene/pull/12706#issuecomment-1930120998
> > Do you mean another 9.9.3 with bugfix, or do you mean next minor version
9.10?
>
> Apologies, I mean the next minor - 9.10 (not 9.9.3). Sorry for the
confusion.
Yes.
ChrisHegarty commented on PR #12706:
URL: https://github.com/apache/lucene/pull/12706#issuecomment-1930112461
> Do you mean another 9.9.3 with bugfix, or do you mean next minor version
9.10?
Apologies, I mean the next minor - 9.10 (not 9.9.3). Sorry for the confusion.
--
This is an
benwtrent commented on PR #13058:
URL: https://github.com/apache/lucene/pull/13058#issuecomment-1930102716
@jpountz there you go :). Only for `approximateSearch`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
uschindler commented on PR #12706:
URL: https://github.com/apache/lucene/pull/12706#issuecomment-1930098353
Yes. To be sure I wanted to wait till Friday this week. But yes in general I
am happy to have this in.
Do you mean another 9.9.3 with bugfix, or do you mean next minor version
benwtrent opened a new pull request, #13082:
URL: https://github.com/apache/lucene/pull/13082
This particular test relies on doc-ids for potential tie breaks. For
consistency, removing the random flushing by reverting change from commit:
f7cab164501
closes: https://github.com/apache/
ChrisHegarty commented on PR #12706:
URL: https://github.com/apache/lucene/pull/12706#issuecomment-1930033556
@uschindler As per our in-person conversation, are you ok to merge this PR
so that it can be incorporated into the next Lucene bugfix version.
--
This is an automated message from
lmessinger commented on issue #13065:
URL: https://github.com/apache/lucene/issues/13065#issuecomment-1929981311
hi,
in Hebrew and other Semitic languages, lemmas are context-dependent.
eg שמן could be interpreted as
fat, oil, their name, from
all dependent on the context
s
mark4z commented on issue #7886:
URL: https://github.com/apache/lucene/issues/7886#issuecomment-1929959691
Yeah, I think u are right.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
benwtrent commented on issue #13065:
URL: https://github.com/apache/lucene/issues/13065#issuecomment-1929933564
@lmessinger I don't see why text tokenization would need any native code.
Word piece is pretty simple and just a dictionary look up.
Do y'all not have a Java one?
O
benwtrent opened a new pull request, #13081:
URL: https://github.com/apache/lucene/pull/13081
The failure is due to the randomized flushing and the later assertion there
are two leaves only. When switching from `w.addDocuments` to `w.addDocument`
the test infra now has an opportunity to ran
benwtrent commented on issue #13080:
URL: https://github.com/apache/lucene/issues/13080#issuecomment-1929888740
Ah, I see the issue, will fix momentarily.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
benwtrent opened a new issue, #13080:
URL: https://github.com/apache/lucene/issues/13080
### Description
TestTopFieldCollector.testTotalHits fails on branch_9x, git-bisect indicates
https://github.com/apache/lucene/commits/0aa88910ca9a1032d288996d14203eac4953f2de
I tried reprod
pmpailis commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1479909430
##
lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java:
##
@@ -94,6 +95,29 @@ public float compare(float[] v1, float[] v2) {
public float
mayya-sharipova closed pull request #12794: Speedup concurrent multi-segment
HNWS graph search
URL: https://github.com/apache/lucene/pull/12794
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
mayya-sharipova commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1929779853
Closed in favour of https://github.com/apache/lucene/pull/12962
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub a
mayya-sharipova merged PR #12962:
URL: https://github.com/apache/lucene/pull/12962
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lu
risdenk closed issue #12145: port gradle improvements to Lucene
URL: https://github.com/apache/lucene/issues/12145
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscrib
risdenk commented on issue #12145:
URL: https://github.com/apache/lucene/issues/12145#issuecomment-1929767397
Handled by https://github.com/apache/lucene/pull/12150
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
risdenk commented on PR #12150:
URL: https://github.com/apache/lucene/pull/12150#issuecomment-1929766994
Closes https://github.com/apache/lucene/issues/12145
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
benwtrent commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1929761190
So, I did some of my own experiments. I tested Vamana (vectors in-graph) &
HNSW, both with `int8` quantization (here is my Lucene branch:
https://github.com/apache/lucene/compare/
pmpailis commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1479853108
##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -214,4 +214,19 @@ public static float[] checkFinite(float[] v) {
}
return v;
}
+
+
uschindler commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1479835140
##
lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java:
##
@@ -94,6 +95,29 @@ public float compare(float[] v1, float[] v2) {
public floa
uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929552474
> I do agree cosine should probably be removed (not because of hamming
distance), but because dot_product exists.
Can we do that for Lucene 10.0 ?
--
This is an automated mes
benwtrent commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929537735
> My question is why add this function when it's not that much faster than
integer dot product?
Because it provides different scores. Integer dot-product doesn't provide
the sam
uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929503472
> My question is why add this function when it's not that much faster than
integer dot product? I see less than 20 percent improvement, which won't even
translate to 20 percent indexi
rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929405495
A good way to get in a new function would be to actually improve our support
o&m by removing a horribly performing one such as cosine first. That way we are
actually improving rather than
rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929397814
My question is why add this function when it's not that much faster than
integer dot product? I see less than 20 percent improvement, which won't even
translate to 20 percent indexing/sear
pmpailis commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929201238
Thanks for the suggestion @uschindler - will add the suggested variant to
benchmarks! To be honest, the reason I re-run on x86 was mainly of the vector
performance differences (hence w
uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929197473
P.S. the long support for bit count was added recently on x86. We may also
compare with the integer one using the integer var handle (that's easy to
check). Maybe that performs better
uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929179243
About NEON: Robert checked yesterday. There is a lot going on in Hotspot and
optimizations are added all the time.
If neon is slower on your machine, it might be that there's st
uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929165660
Please also add a test like the panama vs scalar one where you compare the
results of the varhandle variant with the simple byte-by-byte one from the tail
loop. Make sure to use inter
pmpailis commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929150939
Thank you so much @rmuir & @uschindler for taking such a close look and also
running benchmarks. 🙇 The reason I went with the look up table was because
there seemed to be some improvem
pmpailis commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1479470613
##
lucene/core/src/test/org/apache/lucene/index/KnnGraphTestCase.java:
##
@@ -54,35 +55,65 @@
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.Byte
55 matches
Mail list logo