date:20210722

[GitHub] [lucene] jpountz opened a new pull request #223: LUCENE-10015: Make VectorSimilarityFunction a codec parameter.

2021-07-22 Thread GitBox



jpountz opened a new pull request #223:
URL: https://github.com/apache/lucene/pull/223


   VectorSimilarityFunction moves to the `org.apache.lucene.util.hnsw` package.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

2021-07-22 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385377#comment-17385377
 ] 

Adrien Grand commented on LUCENE-10015:
---

Thanks Tomoko. I opened a PR to see how it played out and it looks like making 
the similarity function a codec parameter works rather nicely. We're losing the 
ability to validate that all segments have the same similarity function, but 
it's probably ok as changing the similarity function is a super expert option?

> Remove VectorValues.SimilarityFunction, remove NONE
> ---
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec 
> parameter.
> The NONE option should be removed: it just makes the codec more complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

2021-07-22 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385408#comment-17385408
 ] 

Julie Tibshirani commented on LUCENE-10015:
---

I think it makes sense to keep the ability to configure the similarity function 
at the field level. I don't see it as a very 'expert' option -- based on what 
the vectors represent and how they've been processed, it's necessary to use the 
right similarity function to obtain good results. Also unlike 'maxConn' and 
'beamWidth' (which were specific to our HNSW implementation), it's a concept 
that makes sense across NN algorithms generally. To the best of my knowledge, 
many NN algorithms can handle the full set of common similarity functions 
(Euclidean, dot product, cosine).

In case it's helpful context: currently we only support Euclidean and cosine 
distance, which is technically redundant. For cosine similarity, users could 
normalize the vectors to unit length and use Euclidean. But I'm assuming we'll 
add support for inner product too, which seems very popular and cannot be 
expressed in terms of Euclidean distance. The FAISS library currently supports 
only Euclidean distance and inner product.

> Remove VectorValues.SimilarityFunction, remove NONE
> ---
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec 
> parameter.
> The NONE option should be removed: it just makes the codec more complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #222: LUCENE-10032: Remove leafDocMaps from MergeState

2021-07-22 Thread GitBox



jpountz commented on a change in pull request #222:
URL: https://github.com/apache/lucene/pull/222#discussion_r674713006



##
File path: lucene/core/src/java/org/apache/lucene/index/MergeState.java
##
@@ -227,19 +222,7 @@ public int get(int docID) {
   }
 
   private List maybeSortReaders(

Review comment:
   maybe should rename and make it void now that it only seems to perform 
validation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

2021-07-22 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385463#comment-17385463
 ] 

Adrien Grand commented on LUCENE-10015:
---

If there's agreement we should keep the similarity function, then let's resolve 
this issue as the other thing we discussed, the removal of NONE, has already 
been merged?

> Remove VectorValues.SimilarityFunction, remove NONE
> ---
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec 
> parameter.
> The NONE option should be removed: it just makes the codec more complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dnhatn commented on a change in pull request #222: LUCENE-10032: Remove leafDocMaps from MergeState

2021-07-22 Thread GitBox



dnhatn commented on a change in pull request #222:
URL: https://github.com/apache/lucene/pull/222#discussion_r674773365



##
File path: lucene/core/src/java/org/apache/lucene/index/MergeState.java
##
@@ -227,19 +222,7 @@ public int get(int docID) {
   }
 
   private List maybeSortReaders(

Review comment:
   ++, I pushed 
https://github.com/apache/lucene/pull/222/commits/707f4f982fc831f56d51db1288e1ff468025eee0




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

2021-07-22 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385488#comment-17385488
 ] 

Tomoko Uchida commented on LUCENE-10015:


I am fine with the current field API, sorry for the disturbance.
Besides whether the similarity function should be configurable by users, I 
agree with it is surely search-algorithm independent feature - the most ann/knn 
algorithms are distance-measure agnostic in my understanding (maybe Humming 
distance would be also worth supported).

> Remove VectorValues.SimilarityFunction, remove NONE
> ---
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec 
> parameter.
> The NONE option should be removed: it just makes the codec more complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

2021-07-22 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385488#comment-17385488
 ] 

Tomoko Uchida edited comment on LUCENE-10015 at 7/22/21, 1:15 PM:
--

I am fine with the current field API, sorry for the disturbance.
Besides whether the similarity function should be configurable by users, I 
agree with it is surely search-algorithm independent feature - most ann/knn 
algorithms are distance-measure agnostic in my understanding (maybe Humming 
distance would be also worth supported).


was (Author: tomoko uchida):
I am fine with the current field API, sorry for the disturbance.
Besides whether the similarity function should be configurable by users, I 
agree with it is surely search-algorithm independent feature - the most ann/knn 
algorithms are distance-measure agnostic in my understanding (maybe Humming 
distance would be also worth supported).

> Remove VectorValues.SimilarityFunction, remove NONE
> ---
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec 
> parameter.
> The NONE option should be removed: it just makes the codec more complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

2021-07-22 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385546#comment-17385546
 ] 

Michael Sokolov commented on LUCENE-10015:
--

Yes, let's keep the similarity function! It's an essential component of a 
metric space (or non-metric space), and is implicit in the vectors: you can't 
just use any arbitrary function, as Julie says.

 

It makes my life a little more complicated that we removed NONE since it 
simplifies schema management at the next level. You can define a schema with 
fields of type vector, and then separately say whether you want to support KNN 
search for such a field. We used to support such vectors with BinaryDocValues, 
and then switched to this impl. Now we will have to switch back. But I guess we 
are now saying these fields are *only* to be be used for NN search, so it makes 
sense, and I won't block it.

> Remove VectorValues.SimilarityFunction, remove NONE
> ---
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec 
> parameter.
> The NONE option should be removed: it just makes the codec more complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on a change in pull request #218: LUCENE-9855: Rename knn search vector format

2021-07-22 Thread GitBox



msokolov commented on a change in pull request #218:
URL: https://github.com/apache/lucene/pull/218#discussion_r674861123



##
File path: lucene/CHANGES.txt
##
@@ -7,7 +7,7 @@ http://s.apache.org/luceneversions
 
 New Features
 
-* LUCENE-9322: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie 
Tibshirani, Tomoko Uchida)
+* LUCENE-9322 LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, 
Julie Tibshirani, Tomoko Uchida)
 
 * LUCENE-9004: Approximate nearest vector search via NSW graphs

Review comment:
   I think you and I were mostly working on it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

2021-07-22 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10015.
---
Resolution: Fixed

> Remove VectorValues.SimilarityFunction, remove NONE
> ---
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec 
> parameter.
> The NONE option should be removed: it just makes the codec more complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10015) Remove VectorValues.SimilarityFunction.NONE

2021-07-22 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-10015:
--
Summary: Remove VectorValues.SimilarityFunction.NONE  (was: Remove 
VectorValues.SimilarityFunction, remove NONE)

> Remove VectorValues.SimilarityFunction.NONE
> ---
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec 
> parameter.
> The NONE option should be removed: it just makes the codec more complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz closed pull request #223: LUCENE-10015: Make VectorSimilarityFunction a codec parameter.

2021-07-22 Thread GitBox



jpountz closed pull request #223:
URL: https://github.com/apache/lucene/pull/223


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on a change in pull request #218: LUCENE-9855: Rename knn search vector format

2021-07-22 Thread GitBox



mocobeta commented on a change in pull request #218:
URL: https://github.com/apache/lucene/pull/218#discussion_r674908384



##
File path: lucene/CHANGES.txt
##
@@ -7,7 +7,7 @@ http://s.apache.org/luceneversions
 
 New Features
 
-* LUCENE-9322: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie 
Tibshirani, Tomoko Uchida)
+* LUCENE-9322 LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, 
Julie Tibshirani, Tomoko Uchida)
 
 * LUCENE-9004: Approximate nearest vector search via NSW graphs

Review comment:
   Thanks, I'll add the credit on this pr.
   Actually, I cannot identify who is supposed to be listed here since so many 
people were involved (though I know your name should be firstly credited)... 
please let me know if there are other outstanding contributors.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on a change in pull request #218: LUCENE-9855: Rename knn search vector format

2021-07-22 Thread GitBox



mocobeta commented on a change in pull request #218:
URL: https://github.com/apache/lucene/pull/218#discussion_r674908384



##
File path: lucene/CHANGES.txt
##
@@ -7,7 +7,7 @@ http://s.apache.org/luceneversions
 
 New Features
 
-* LUCENE-9322: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie 
Tibshirani, Tomoko Uchida)
+* LUCENE-9322 LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, 
Julie Tibshirani, Tomoko Uchida)
 
 * LUCENE-9004: Approximate nearest vector search via NSW graphs

Review comment:
   Thanks, I'll add the credit on this pr.
   Actually, I cannot identify who are supposed to be listed here since so many 
people were involved (though I know your name should be firstly credited)... 
please let me know if there are other outstanding contributors.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dnhatn commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

2021-07-22 Thread GitBox



dnhatn commented on a change in pull request #221:
URL: https://github.com/apache/lucene/pull/221#discussion_r674914345



##
File path: lucene/core/src/java/org/apache/lucene/index/DocIDMerger.java
##
@@ -144,59 +159,45 @@ protected boolean lessThan(Sub a, Sub b) {
 public void reset() throws IOException {
   // caller may not have fully consumed the queue:
   queue.clear();
+  current = null;
   boolean first = true;
   for (T sub : subs) {
 if (first) {
   // by setting mappedDocID = -1, this entry is guaranteed to be the 
top of the queue
   // so the first call to next() will advance it
   sub.mappedDocID = -1;
+  current = sub;
   first = false;
-} else {
-  int mappedDocID;
-  while (true) {
-int docID = sub.nextDoc();
-if (docID == NO_MORE_DOCS) {
-  mappedDocID = NO_MORE_DOCS;
-  break;
-}
-mappedDocID = sub.docMap.get(docID);
-if (mappedDocID != -1) {
-  break;
-}
-  }
-  if (mappedDocID == NO_MORE_DOCS) {
-// all docs in this sub were deleted; do not add it to the queue!
-continue;
-  }
-  sub.mappedDocID = mappedDocID;
-}
-queue.add(sub);
+} else if (sub.nextMappedDoc() != NO_MORE_DOCS) {
+  queue.add(sub);
+} // else all docs in this sub were deleted; do not add it to the 
queue!
   }
+  queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : 
queue.top().mappedDocID;
 }
 
 @Override
 public T next() throws IOException {
-  T top = queue.top();
-
-  while (true) {
-int docID = top.nextDoc();
-if (docID == NO_MORE_DOCS) {
-  queue.pop();
-  top = queue.top();
-  break;
-}
-int mappedDocID = top.docMap.get(docID);
-if (mappedDocID == -1) {
-  // doc was deleted
-  continue;
+  int nextDoc = current.nextMappedDoc();
+  // the below condition is unlikely when the index sort is on a 
low-cardinality field
+  if (nextDoc >= queueNextMappedDoc) {
+if (nextDoc == NO_MORE_DOCS) {
+  if (queue.size() == 0) {
+current = null;
+return null;
+  } else {
+current = queue.pop();
+queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : 
queue.top().mappedDocID;

Review comment:
   I think we can remove `queueNextMappedDoc` and replace it with 
`queue.top().mappedDocID`? Below is my version without `queueNextMappedDoc`. 
WDYT?
   
   ```java
   int nextDoc = current.nextMappedDoc();
   // the below condition is unlikely when the index sort is on a 
low-cardinality field
   if (nextDoc == NO_MORE_DOCS) {
 if (queue.size() == 0) {
   return null;
 }
 current = queue.pop();
   } else if (queue.size() > 0 && nextDoc > queue.top().mappedDocID) {
 T newCurrent = queue.top();
 queue.updateTop(current);
 current = newCurrent;
   }
   return current;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dnhatn commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

2021-07-22 Thread GitBox



dnhatn commented on a change in pull request #221:
URL: https://github.com/apache/lucene/pull/221#discussion_r674914345



##
File path: lucene/core/src/java/org/apache/lucene/index/DocIDMerger.java
##
@@ -144,59 +159,45 @@ protected boolean lessThan(Sub a, Sub b) {
 public void reset() throws IOException {
   // caller may not have fully consumed the queue:
   queue.clear();
+  current = null;
   boolean first = true;
   for (T sub : subs) {
 if (first) {
   // by setting mappedDocID = -1, this entry is guaranteed to be the 
top of the queue
   // so the first call to next() will advance it
   sub.mappedDocID = -1;
+  current = sub;
   first = false;
-} else {
-  int mappedDocID;
-  while (true) {
-int docID = sub.nextDoc();
-if (docID == NO_MORE_DOCS) {
-  mappedDocID = NO_MORE_DOCS;
-  break;
-}
-mappedDocID = sub.docMap.get(docID);
-if (mappedDocID != -1) {
-  break;
-}
-  }
-  if (mappedDocID == NO_MORE_DOCS) {
-// all docs in this sub were deleted; do not add it to the queue!
-continue;
-  }
-  sub.mappedDocID = mappedDocID;
-}
-queue.add(sub);
+} else if (sub.nextMappedDoc() != NO_MORE_DOCS) {
+  queue.add(sub);
+} // else all docs in this sub were deleted; do not add it to the 
queue!
   }
+  queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : 
queue.top().mappedDocID;
 }
 
 @Override
 public T next() throws IOException {
-  T top = queue.top();
-
-  while (true) {
-int docID = top.nextDoc();
-if (docID == NO_MORE_DOCS) {
-  queue.pop();
-  top = queue.top();
-  break;
-}
-int mappedDocID = top.docMap.get(docID);
-if (mappedDocID == -1) {
-  // doc was deleted
-  continue;
+  int nextDoc = current.nextMappedDoc();
+  // the below condition is unlikely when the index sort is on a 
low-cardinality field
+  if (nextDoc >= queueNextMappedDoc) {
+if (nextDoc == NO_MORE_DOCS) {
+  if (queue.size() == 0) {
+current = null;
+return null;
+  } else {
+current = queue.pop();
+queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : 
queue.top().mappedDocID;

Review comment:
   I think we can remove `queueNextMappedDoc` and replace it with 
`queue.top().mappedDocID`? Below is my version without `queueNextMappedDoc`. 
WDYT?
   
   ```java
   int nextDoc = current.nextMappedDoc();
   // the below condition is unlikely when the index sort is on a 
low-cardinality field
   if (nextDoc == NO_MORE_DOCS) {
 current = queue.pop();
   } else if (queue.size() > 0 && nextDoc > queue.top().mappedDocID) {
 T newCurrent = queue.top();
 queue.updateTop(current);
 current = newCurrent;
   }
   return current;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9450) Taxonomy index should use DocValues not StoredFields

2021-07-22 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385612#comment-17385612
 ] 

Michael McCandless commented on LUCENE-9450:


OK, +1 for Scenario 2 (use index created version)!

> Taxonomy index should use DocValues not StoredFields
> 
>
> Key: LUCENE-9450
> URL: https://issues.apache.org/jira/browse/LUCENE-9450
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.5.2
>Reporter: Gautam Worah
>Priority: Minor
>  Labels: performance
> Fix For: main (9.0)
>
> Attachments: LUCENE-9450-localrun.py-v1, wip_taxonomy_patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The taxonomy index that maps binning labels to ordinals was created before 
> Lucene added BinaryDocValues.
> I've attached a WIP patch (does not pass tests currently)
> Issue suggested by [~mikemccand]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke merged pull request #2514: Fix some psuedo->pseudo typos in Solr Ref Guide.

2021-07-22 Thread GitBox



cpoerschke merged pull request #2514:
URL: https://github.com/apache/lucene-solr/pull/2514


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

2021-07-22 Thread GitBox



jpountz commented on a change in pull request #221:
URL: https://github.com/apache/lucene/pull/221#discussion_r674980048



##
File path: lucene/core/src/java/org/apache/lucene/index/DocIDMerger.java
##
@@ -144,59 +159,45 @@ protected boolean lessThan(Sub a, Sub b) {
 public void reset() throws IOException {
   // caller may not have fully consumed the queue:
   queue.clear();
+  current = null;
   boolean first = true;
   for (T sub : subs) {
 if (first) {
   // by setting mappedDocID = -1, this entry is guaranteed to be the 
top of the queue
   // so the first call to next() will advance it
   sub.mappedDocID = -1;
+  current = sub;
   first = false;
-} else {
-  int mappedDocID;
-  while (true) {
-int docID = sub.nextDoc();
-if (docID == NO_MORE_DOCS) {
-  mappedDocID = NO_MORE_DOCS;
-  break;
-}
-mappedDocID = sub.docMap.get(docID);
-if (mappedDocID != -1) {
-  break;
-}
-  }
-  if (mappedDocID == NO_MORE_DOCS) {
-// all docs in this sub were deleted; do not add it to the queue!
-continue;
-  }
-  sub.mappedDocID = mappedDocID;
-}
-queue.add(sub);
+} else if (sub.nextMappedDoc() != NO_MORE_DOCS) {
+  queue.add(sub);
+} // else all docs in this sub were deleted; do not add it to the 
queue!
   }
+  queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : 
queue.top().mappedDocID;
 }
 
 @Override
 public T next() throws IOException {
-  T top = queue.top();
-
-  while (true) {
-int docID = top.nextDoc();
-if (docID == NO_MORE_DOCS) {
-  queue.pop();
-  top = queue.top();
-  break;
-}
-int mappedDocID = top.docMap.get(docID);
-if (mappedDocID == -1) {
-  // doc was deleted
-  continue;
+  int nextDoc = current.nextMappedDoc();
+  // the below condition is unlikely when the index sort is on a 
low-cardinality field
+  if (nextDoc >= queueNextMappedDoc) {
+if (nextDoc == NO_MORE_DOCS) {
+  if (queue.size() == 0) {
+current = null;
+return null;
+  } else {
+current = queue.pop();
+queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : 
queue.top().mappedDocID;

Review comment:
   Thanks. I had to add a check whether the queue is empty under `if 
(nextDoc == NO_MORE_DOCS) {` for it to work.
   
   I ran a quick benchmark with a merge that merges 10M docs that have 10 
single-valued SORTED_SET doc-value fields each. Without index sorting, the 
merge runs in ~740ms. With index sorting on a NumericDocValues field of 
cardinality 8, I get the following merging times:
- master: ~1300ms
- the first version of this PR: ~860ms,
- your suggestion: ~910ms
   
   I think the extra simplicity is worth the minor slowdown, it's still 
significantly faster than master.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dnhatn commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

2021-07-22 Thread GitBox



dnhatn commented on a change in pull request #221:
URL: https://github.com/apache/lucene/pull/221#discussion_r674984362



##
File path: lucene/core/src/java/org/apache/lucene/index/DocIDMerger.java
##
@@ -144,59 +159,45 @@ protected boolean lessThan(Sub a, Sub b) {
 public void reset() throws IOException {
   // caller may not have fully consumed the queue:
   queue.clear();
+  current = null;
   boolean first = true;
   for (T sub : subs) {
 if (first) {
   // by setting mappedDocID = -1, this entry is guaranteed to be the 
top of the queue
   // so the first call to next() will advance it
   sub.mappedDocID = -1;
+  current = sub;
   first = false;
-} else {
-  int mappedDocID;
-  while (true) {
-int docID = sub.nextDoc();
-if (docID == NO_MORE_DOCS) {
-  mappedDocID = NO_MORE_DOCS;
-  break;
-}
-mappedDocID = sub.docMap.get(docID);
-if (mappedDocID != -1) {
-  break;
-}
-  }
-  if (mappedDocID == NO_MORE_DOCS) {
-// all docs in this sub were deleted; do not add it to the queue!
-continue;
-  }
-  sub.mappedDocID = mappedDocID;
-}
-queue.add(sub);
+} else if (sub.nextMappedDoc() != NO_MORE_DOCS) {
+  queue.add(sub);
+} // else all docs in this sub were deleted; do not add it to the 
queue!
   }
+  queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : 
queue.top().mappedDocID;
 }
 
 @Override
 public T next() throws IOException {
-  T top = queue.top();
-
-  while (true) {
-int docID = top.nextDoc();
-if (docID == NO_MORE_DOCS) {
-  queue.pop();
-  top = queue.top();
-  break;
-}
-int mappedDocID = top.docMap.get(docID);
-if (mappedDocID == -1) {
-  // doc was deleted
-  continue;
+  int nextDoc = current.nextMappedDoc();
+  // the below condition is unlikely when the index sort is on a 
low-cardinality field
+  if (nextDoc >= queueNextMappedDoc) {
+if (nextDoc == NO_MORE_DOCS) {
+  if (queue.size() == 0) {
+current = null;
+return null;
+  } else {
+current = queue.pop();
+queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : 
queue.top().mappedDocID;

Review comment:
   @jpountz Thanks for benchmarking this. I am fine with either versions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10033) Encode doc values in smaller blocks of values, like postings

2021-07-22 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-10033:
-

 Summary: Encode doc values in smaller blocks of values, like 
postings
 Key: LUCENE-10033
 URL: https://issues.apache.org/jira/browse/LUCENE-10033
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


This is a follow-up to the discussion on this thread: 
https://lists.apache.org/thread.html/r7b757074d5f02874ce3a295b0007dff486bc10d08fb0b5e5a4ba72c5%40%3Cdev.lucene.apache.org%3E.

Our current approach for doc values uses large blocks of 16k values where 
values can be decompressed independently, using DirectWriter/DirectReader. This 
is a bit inefficient in some cases, e.g. a single outlier can grow the number 
of bits per value for the entire block, we can't easily use run-length 
compression, etc. Plus, it encourages using a different sub-class for every 
compression technique, which puts pressure on the JVM.

We'd like to move to an approach that would be more similar to postings with 
smaller blocks (e.g. 128 values) whose values get all decompressed at once 
(using SIMD instructions), with skip data within blocks in order to efficiently 
skip to arbitrary doc IDs (or maybe still use jump tables as today's doc 
values, and as discussed here for postings: 
https://lists.apache.org/thread.html/r7c3cb7ab143fd4ecbc05c04064d10ef9fb50c5b4d6479b0f35732677%40%3Cdev.lucene.apache.org%3E).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

2021-07-22 Thread GitBox



gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r675059243



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, 
LeafCollector collector, DocsA
 
   collectDocID = docID;

Review comment:
   I was thinking about this a little more and wanted to see what you 
thought about a slightly different approach. Full disclosure, I’ve been away 
from my computer for a day and am on my phone right now, so I haven’t looked 
into this as deeply as I’d like. 
   
   It looks to me, when doing “query first scoring”, that the ScoreAndDoc 
wrapper is essentially useless. The baseScorer will be positioned on the 
“collectDoc” whenever it collects a full hit or near miss. Can’t we just pass 
baseScorer into the collectors (via setScorer) when doing queryFirstScoring and 
avoid using a ScoreAndDoc instance completely? That would also allow the score 
to be lazily evaluated since the collectors would just be calling into 
baseScorer directly if they need the score. If I remember correctly, there’s a 
caching scorer wrapper we could use here as well to ensure the score is only 
computed once in the case that multiple drill sideways collectors call back for 
the score. 
   
   Like I said, I’m on my phone and haven’t been able to try this out at all, 
so maybe it’s completely flawed for some reason. But maybe you could have a 
look and see if something along these lines would work?
   
   Thanks again for taking this on!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on a change in pull request #218: LUCENE-9855: Rename knn search vector format

2021-07-22 Thread GitBox



msokolov commented on a change in pull request #218:
URL: https://github.com/apache/lucene/pull/218#discussion_r675162465



##
File path: lucene/CHANGES.txt
##
@@ -7,7 +7,7 @@ http://s.apache.org/luceneversions
 
 New Features
 
-* LUCENE-9322: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie 
Tibshirani, Tomoko Uchida)
+* LUCENE-9322 LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, 
Julie Tibshirani, Tomoko Uchida)
 
 * LUCENE-9004: Approximate nearest vector search via NSW graphs

Review comment:
   Yeah I looked at the discussion on the issue, and there were lots of 
folks weighing in there with helpful contributions, but given there were so 
many, and as far as I can remember only the two of us actually contributed any 
code, I'd just list us.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #218: LUCENE-9855: Rename knn search vector format

2021-07-22 Thread GitBox



mocobeta commented on pull request #218:
URL: https://github.com/apache/lucene/pull/218#issuecomment-885343046


   Thanks, @msokolov for your comment.
   I'll wait for feedback from others (if there are further suggestions) for a 
while before merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller merged pull request #196: LUCENE-10000: Make MultiCollectorManager consistent with MultiCollector

2021-07-22 Thread GitBox



gsmiller merged pull request #196:
URL: https://github.com/apache/lucene/pull/196


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10000) MultiCollectorManager should have parity with MultiCollector behavior

2021-07-22 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385845#comment-17385845
 ] 

ASF subversion and git services commented on LUCENE-1:
--

Commit ad7746d6e31b698dd0f7d124288de576b4f95686 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ad7746d ]

LUCENE-1: Make MultiCollectorManager consistent with MultiCollector (#196)

MultiCollectorManager is now consistent with MultiCollector with respect to
early termination, min score setting and score caching.

> MultiCollectorManager should have parity with MultiCollector behavior
> -
>
> Key: LUCENE-1
> URL: https://issues.apache.org/jira/browse/LUCENE-1
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The behavior of {{MultiCollectorManager}} and {{MultiCollector}} differ with 
> respect to how they handle {{CollectionTerminationException}} and 
> minCompetitiveScore calls.
> In {{MultiCollector}}, remaining collectors will continue collecting when one 
> or more have thrown a {{CollectionTerminationException}}, but 
> {{MultiCollectorManager}} will allow the exception to propagate and stop 
> collecting against all collectors as soon as one throws (see: LUCENE-6772).
> Also, {{MultiCollector}} properly handles setting of min competitive scores 
> by ensuring the lowest set across all wrapped collectors is used, or by 
> making it a no-op in cases where at least one collector is using 
> {{ScoreMode.COMPLETE}} (see: LUCENE-9402). {{MultiCollectorManager}} will 
> share the same {{Scorable}} across all collectors, which seems like the wrong 
> thing to do.
> Finally, {{MultiCollector}} will cache scores when a {{Scorable}} is shared 
> by more than one collector (see: LUCENE-6263).
> We should update {{MultiCollectorManager}} to use common behavior with 
> {{MultiCollector}} in all these areas. It should be easy to do by delegating 
> to {{MultiCollector}} within {{MultiCollectorManager}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #196: LUCENE-10000: Make MultiCollectorManager consistent with MultiCollector

2021-07-22 Thread GitBox



gsmiller commented on pull request #196:
URL: https://github.com/apache/lucene/pull/196#issuecomment-885353122


   > Thanks for fixing this!
   
   Sure. Thanks for the feedback!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10000) MultiCollectorManager should have parity with MultiCollector behavior

2021-07-22 Thread Greg Miller (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller updated LUCENE-1:
-
Fix Version/s: main (9.0)

> MultiCollectorManager should have parity with MultiCollector behavior
> -
>
> Key: LUCENE-1
> URL: https://issues.apache.org/jira/browse/LUCENE-1
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The behavior of {{MultiCollectorManager}} and {{MultiCollector}} differ with 
> respect to how they handle {{CollectionTerminationException}} and 
> minCompetitiveScore calls.
> In {{MultiCollector}}, remaining collectors will continue collecting when one 
> or more have thrown a {{CollectionTerminationException}}, but 
> {{MultiCollectorManager}} will allow the exception to propagate and stop 
> collecting against all collectors as soon as one throws (see: LUCENE-6772).
> Also, {{MultiCollector}} properly handles setting of min competitive scores 
> by ensuring the lowest set across all wrapped collectors is used, or by 
> making it a no-op in cases where at least one collector is using 
> {{ScoreMode.COMPLETE}} (see: LUCENE-9402). {{MultiCollectorManager}} will 
> share the same {{Scorable}} across all collectors, which seems like the wrong 
> thing to do.
> Finally, {{MultiCollector}} will cache scores when a {{Scorable}} is shared 
> by more than one collector (see: LUCENE-6263).
> We should update {{MultiCollectorManager}} to use common behavior with 
> {{MultiCollector}} in all these areas. It should be easy to do by delegating 
> to {{MultiCollector}} within {{MultiCollectorManager}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10000) MultiCollectorManager should have parity with MultiCollector behavior

2021-07-22 Thread Greg Miller (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller resolved LUCENE-1.
--
Resolution: Fixed

> MultiCollectorManager should have parity with MultiCollector behavior
> -
>
> Key: LUCENE-1
> URL: https://issues.apache.org/jira/browse/LUCENE-1
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The behavior of {{MultiCollectorManager}} and {{MultiCollector}} differ with 
> respect to how they handle {{CollectionTerminationException}} and 
> minCompetitiveScore calls.
> In {{MultiCollector}}, remaining collectors will continue collecting when one 
> or more have thrown a {{CollectionTerminationException}}, but 
> {{MultiCollectorManager}} will allow the exception to propagate and stop 
> collecting against all collectors as soon as one throws (see: LUCENE-6772).
> Also, {{MultiCollector}} properly handles setting of min competitive scores 
> by ensuring the lowest set across all wrapped collectors is used, or by 
> making it a no-op in cases where at least one collector is using 
> {{ScoreMode.COMPLETE}} (see: LUCENE-9402). {{MultiCollectorManager}} will 
> share the same {{Scorable}} across all collectors, which seems like the wrong 
> thing to do.
> Finally, {{MultiCollector}} will cache scores when a {{Scorable}} is shared 
> by more than one collector (see: LUCENE-6263).
> We should update {{MultiCollectorManager}} to use common behavior with 
> {{MultiCollector}} in all these areas. It should be easy to do by delegating 
> to {{MultiCollector}} within {{MultiCollectorManager}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request #223: LUCENE-10015: Make VectorSimilarityFunction a codec parameter.

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

[GitHub] [lucene] jpountz commented on a change in pull request #222: LUCENE-10032: Remove leafDocMaps from MergeState

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

[GitHub] [lucene] dnhatn commented on a change in pull request #222: LUCENE-10032: Remove leafDocMaps from MergeState

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

[jira] [Comment Edited] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

[jira] [Commented] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

[GitHub] [lucene] msokolov commented on a change in pull request #218: LUCENE-9855: Rename knn search vector format

[jira] [Resolved] (LUCENE-10015) Remove VectorValues.SimilarityFunction, remove NONE

[jira] [Updated] (LUCENE-10015) Remove VectorValues.SimilarityFunction.NONE

[GitHub] [lucene] jpountz closed pull request #223: LUCENE-10015: Make VectorSimilarityFunction a codec parameter.

[GitHub] [lucene] mocobeta commented on a change in pull request #218: LUCENE-9855: Rename knn search vector format

[GitHub] [lucene] mocobeta commented on a change in pull request #218: LUCENE-9855: Rename knn search vector format

[GitHub] [lucene] dnhatn commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

[GitHub] [lucene] dnhatn commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

[jira] [Commented] (LUCENE-9450) Taxonomy index should use DocValues not StoredFields

[GitHub] [lucene-solr] cpoerschke merged pull request #2514: Fix some psuedo->pseudo typos in Solr Ref Guide.

[GitHub] [lucene] jpountz commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

[GitHub] [lucene] dnhatn commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

[jira] [Created] (LUCENE-10033) Encode doc values in smaller blocks of values, like postings

[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

[GitHub] [lucene] msokolov commented on a change in pull request #218: LUCENE-9855: Rename knn search vector format

[GitHub] [lucene] mocobeta commented on pull request #218: LUCENE-9855: Rename knn search vector format

[GitHub] [lucene] gsmiller merged pull request #196: LUCENE-10000: Make MultiCollectorManager consistent with MultiCollector

[jira] [Commented] (LUCENE-10000) MultiCollectorManager should have parity with MultiCollector behavior

[GitHub] [lucene] gsmiller commented on pull request #196: LUCENE-10000: Make MultiCollectorManager consistent with MultiCollector

[jira] [Updated] (LUCENE-10000) MultiCollectorManager should have parity with MultiCollector behavior

[jira] [Resolved] (LUCENE-10000) MultiCollectorManager should have parity with MultiCollector behavior

30 matches

Site Navigation

Mail list logo

Footer information