jpountz merged PR #12384:
URL: https://github.com/apache/lucene/pull/12384
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #12392:
URL: https://github.com/apache/lucene/pull/12392
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #12400:
URL: https://github.com/apache/lucene/pull/12400
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
LuXugang opened a new issue, #12401:
URL: https://github.com/apache/lucene/issues/12401
### Description
In `TermOrdValLeafComparator#CompetitiveIterator#advance(int target)`, when
posting could not be used to filter competitive documents, then switch to use
`SortedDocValues` to skip
mayya-sharipova commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-165777
@mikemccand Indeed, exactly as said, sorry for being unclear. We have not
checked search, will work on that.
@uschindler Thanks, indeed, we need tests on other machine
Perdjesk opened a new pull request, #12402:
URL: https://github.com/apache/lucene/pull/12402
### Description
Correct Javadocs still referring to removed API:
SimpleBindings#add(SortField).
https://github.com/apache/lucene/commit/5eb117f561ab691f34409943ae1f85781735f8e0
-
javanna commented on PR #12398:
URL: https://github.com/apache/lucene/pull/12398#issuecomment-1611263143
thanks @jpountz !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
javanna merged PR #12398:
URL: https://github.com/apache/lucene/pull/12398
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
msokolov commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1611270462
I see a lot of good work on the implementation in the attached PR, great!
What I'm lacking though is any understanding of what the use cases for this
might be. Do we have some? I t
msokolov commented on code in PR #12380:
URL: https://github.com/apache/lucene/pull/12380#discussion_r1245115736
##
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/TopSuggestDocsCollector.java:
##
@@ -100,12 +100,19 @@ public int getCountToCollect() {
@Overr
Perdjesk commented on code in PR #12402:
URL: https://github.com/apache/lucene/pull/12402#discussion_r1245128216
##
lucene/core/src/java/org/apache/lucene/search/package-info.java:
##
@@ -303,8 +303,8 @@
*
* // SimpleBindings just maps variables to SortField instances
Rev
Perdjesk commented on code in PR #12402:
URL: https://github.com/apache/lucene/pull/12402#discussion_r1245131114
##
lucene/core/src/java/org/apache/lucene/search/package-info.java:
##
@@ -303,8 +303,8 @@
*
* // SimpleBindings just maps variables to SortField instances
Rev
jpountz commented on code in PR #12380:
URL: https://github.com/apache/lucene/pull/12380#discussion_r1245196000
##
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/TopSuggestDocsCollector.java:
##
@@ -136,15 +143,7 @@ public TopSuggestDocs get() throws IOExcepti
jpountz commented on code in PR #12380:
URL: https://github.com/apache/lucene/pull/12380#discussion_r1245197338
##
lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingLeafCollector.java:
##
@@ -57,4 +58,11 @@ public void collect(int doc) throws IOException {
jpountz commented on code in PR #12380:
URL: https://github.com/apache/lucene/pull/12380#discussion_r1245210140
##
lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingCollector.java:
##
@@ -49,7 +50,9 @@ public LeafCollector getLeafCollector(LeafReaderContext
jpountz commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1611459853
Thanks for looking into this! For reference, I've been separately looking
into whether we could vectorize prefix sums, which is one bottleneck of
postings decoding today as we manag
HoustonPutman commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1611546574
@alessandrobenedetti's [Berlin Buzzwords
talk](https://www.youtube.com/watch?v=KhL0NrGj0uE) gave a pretty good example.
If you want to have individual vectors for each paragra
rmuir commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1611571271
crazy question: do we really need vectorized prefix sum for the postings
list? could we just decode the deltas, and lazily defer computation of
accumulated docid sum until its needed,
benwtrent commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1611575334
There are also late-interaction-models that do embeddings per token. While
the current HNSW codec wouldn't be best for that, it is another use case for
multiple embeddings per doc
jpountz commented on code in PR #12381:
URL: https://github.com/apache/lucene/pull/12381#discussion_r1245348154
##
lucene/core/src/java/org/apache/lucene/index/DocsWithFieldSet.java:
##
@@ -75,4 +75,9 @@ public DocIdSetIterator iterator() {
public int cardinality() {
ret
jpountz commented on code in PR #12374:
URL: https://github.com/apache/lucene/pull/12374#discussion_r1245354048
##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -1014,4 +1021,48 @@ private static SliceExecutor
getSliceExecutionControlPlane(Executor exe
ChrisHegarty commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1611648786
I ran @mayya-sharipova's exact same benchmark/test on my machine. Here are
the results.
### Test environment
- Dataset:
- [nq](https://huggingface.co/data
jpountz commented on PR #12194:
URL: https://github.com/apache/lucene/pull/12194#issuecomment-1611650783
> if we were to split the window based on certain size and only call
peexNextNonMatchingDocID when advancing to a new window, I felt it might not be
as effective, since for unsorted inde
uschindler commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1611685208
I have a customer using Solr to do kNN for trademark images. Each trademark
has several images and they want to find te trademark with closest imae match
(cosine distance). They
uschindler commented on issue #12399:
URL: https://github.com/apache/lucene/issues/12399#issuecomment-1611699406
Instead of Valhalla we could also create MemorySegments on heap and create
structs on them and then use Varhandles to access the components.
--
This is an automated message fro
tang-hi commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1611775687
I have successfully implemented all encode methods in forutil while keeping
the compression format unchanged. Here are the results.
| Benchmark | Mode | Cnt |
zhaih commented on issue #12358:
URL: https://github.com/apache/lucene/issues/12358#issuecomment-1611789072
Maybe we need a `BulkScorable` or something which holds multiple `Scorable`
(or just holds an array of scores) and set the contract that `collect(DocIdSet`
should use `BulkScorable` b
uschindler commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1611811043
Hi,
if you look at the first line of `ForUtil.java`, you will see the following
comment:
```java
// This file has been automatically generated, DO NOT EDIT
```
uschindler commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1611841983
You can take the current python script as "basis" and work from there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to G
tang-hi commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1611846513
> You can take the current python script as "basis" and work from there.
Great, I will give it a try! 😄
--
This is an automated message from the Apache Git Service.
uschindler commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1611856694
The above would be a separate PR to cleanup the adhoc internal
implementation of the Panama Integration a bit. The implementation devloped
here could then be added to the new Luc
rmuir commented on issue #11507:
URL: https://github.com/apache/lucene/issues/11507#issuecomment-1611884547
Can we run this test with lucene's defaults (e.g. not a 2GB rambuffer)?
We are still talking about an hour to index < 3M docs, so I think the
performance is not good.
As i've sa
sohami commented on code in PR #12374:
URL: https://github.com/apache/lucene/pull/12374#discussion_r1245689211
##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -1014,4 +1021,48 @@ private static SliceExecutor
getSliceExecutionControlPlane(Executor exec
benwtrent commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1612016406
So, I have been thinking of the current implementation and was wondering if
we could instead move towards using the `join` functionality?
Just to make sure I am not absolute
benwtrent opened a new issue, #12403:
URL: https://github.com/apache/lucene/issues/12403
### Description
One of the biggest pain points of HNSW is that the graph and vectors must be
in memory.
Since the vectors are stored off heap and read in via byte streams, it seems
like we
rmuir commented on issue #12403:
URL: https://github.com/apache/lucene/issues/12403#issuecomment-1612026748
afaik 16-bit fp support is in newer versions of java (21?) and being worked
on for vector api there too. not sure of its current state.
--
This is an automated message from the
rmuir commented on issue #12403:
URL: https://github.com/apache/lucene/issues/12403#issuecomment-1612065273
in java 20+ there are at least functions for simple scalar conversions:
https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/Float.html#float16ToFloat(short)
h
rmuir commented on issue #12403:
URL: https://github.com/apache/lucene/issues/12403#issuecomment-1612091190
looking at that branch too, the hardware support currently only exists for
x86:
add:
https://github.com/openjdk/panama-vector/blob/vectorIntrinsics%2Bfp16/src/hotspot/cpu/x86/x
sgup432 commented on code in PR #12383:
URL: https://github.com/apache/lucene/pull/12383#discussion_r1245969984
##
lucene/core/src/java/org/apache/lucene/search/TermQuery.java:
##
@@ -72,7 +72,16 @@ public TermWeight(
if (termStats == null) {
this.simScorer = nul
sgup432 commented on code in PR #12383:
URL: https://github.com/apache/lucene/pull/12383#discussion_r1245970089
##
lucene/queries/src/test/org/apache/lucene/queries/function/TestFunctionScoreQuery.java:
##
@@ -322,6 +329,19 @@ private void assertInnerScoreMode(
ScoreMode
sgup432 commented on PR #12383:
URL: https://github.com/apache/lucene/pull/12383#issuecomment-1612290029
@jpountz @msfroh I have addressed comments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
rmuir commented on issue #12399:
URL: https://github.com/apache/lucene/issues/12399#issuecomment-1612499383
> Yeah, some of our custom sorts are because we want to sort one array, but
use the sort key from another parallel array. Unfortunately I don't think (?)
the JDK has existing APIs for
uschindler commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1612506692
I would still prefer to have multiple values per document. From the point of
view of implementation this does not look crazy to me, but using blockjoins
adds too many limitations
uschindler commented on code in PR #12314:
URL: https://github.com/apache/lucene/pull/12314#discussion_r1246177620
##
lucene/core/src/java/org/apache/lucene/index/DocsWithVectorsSet.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
44 matches
Mail list logo