mikemccand opened a new issue, #12476:
URL: https://github.com/apache/lucene/issues/12476
### Description
@fulmicoton (Tantivy creator) reached out to me after our [fun discussion
about how to tap into branchless CPU instructions (CMOVcc on
x86-64)](https://markmail.org/message/rqktb
mikemccand commented on issue #12476:
URL: https://github.com/apache/lucene/issues/12476#issuecomment-1658179453
Thank you for the pointer @fulmicoton!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
mikemccand opened a new issue, #12477:
URL: https://github.com/apache/lucene/issues/12477
### Description
Lucene has an efficient (storage and CPU) compressor for monotonic long
values, that simply makes "best fit" (ish?) linear model to the N monotonic
values, and then encodes the p
mikemccand commented on issue #12477:
URL: https://github.com/apache/lucene/issues/12477#issuecomment-1658195108
Note that Tantivy uses binary search to locate the target docid in the block
of docs -- somehow Tantivy uses SIMD to decode (docid-delta encoded) postings
into absolute docids fi
tang-hi commented on issue #12477:
URL: https://github.com/apache/lucene/issues/12477#issuecomment-1658224212
I have attempted to encode/decode the post block using SIMD instructions.
However, I believe it may not be the opportune moment to vectorize it. This is
because we are currently una
easyice commented on PR #12435:
URL: https://github.com/apache/lucene/pull/12435#issuecomment-1658292502
I m sorry for the late reply, i agree with you, it has no impact on
performance
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
busykoala opened a new pull request, #12478:
URL: https://github.com/apache/lucene/pull/12478
### Description
This pull request adds a new feature to Lucene's DictionaryDecompounder.
Now, you can set the position increment of subtokens to one. This feature is
required when you're doi
benwtrent commented on PR #12434:
URL: https://github.com/apache/lucene/pull/12434#issuecomment-1658402079
> If it's a small number (say c children per parent), it may be better to
use KNN search with K' = c * K. It would be interesting to compare these two
approaches to see if we can prov
benwtrent commented on code in PR #12434:
URL: https://github.com/apache/lucene/pull/12434#discussion_r1279318027
##
lucene/core/src/test/org/apache/lucene/util/hnsw/TestNeighborQueue.java:
##
@@ -114,6 +114,38 @@ public void testUnboundedQueue() {
assertEquals(maxNode, nn.
benwtrent commented on code in PR #12434:
URL: https://github.com/apache/lucene/pull/12434#discussion_r1279321413
##
lucene/join/src/java/org/apache/lucene/search/join/ToParentJoinKnnCollector.java:
##
@@ -0,0 +1,294 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
benwtrent commented on code in PR #12434:
URL: https://github.com/apache/lucene/pull/12434#discussion_r1279430631
##
lucene/core/src/java/org/apache/lucene/search/KnnCollector.java:
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
benwtrent commented on code in PR #12434:
URL: https://github.com/apache/lucene/pull/12434#discussion_r1279441401
##
lucene/join/src/java/org/apache/lucene/search/join/ToParentJoinKnnCollector.java:
##
@@ -0,0 +1,294 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
msokolov commented on code in PR #12434:
URL: https://github.com/apache/lucene/pull/12434#discussion_r1279450672
##
lucene/join/src/java/org/apache/lucene/search/join/ToParentJoinKnnCollector.java:
##
@@ -0,0 +1,294 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
msokolov commented on code in PR #12434:
URL: https://github.com/apache/lucene/pull/12434#discussion_r1279451735
##
lucene/core/src/java/org/apache/lucene/search/KnnCollector.java:
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ *
nreimers commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1658640222
@msokolov In our BEIR paper we talked about this:
https://arxiv.org/abs/2104.08663
The issue with cosine similarity is that it just encodes the topic. For the
query `What
benwtrent opened a new pull request, #12479:
URL: https://github.com/apache/lucene/pull/12479
The current dot-product score scaling and similarity implementation assumes
normalized vectors. This disregards information that the model may store within
the magnitude.
See: https://githu
benwtrent commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1659123368
I found another dataset, Yandex Text-to-image:
https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search
I tested against the first 500_000 values in t
17 matches
Mail list logo