[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

2023-06-14 Thread via GitHub


jpountz commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590631585

   We just upgraded Elasticsearch to a Lucene snapshot that has this change, 
and this triggered major speedups on some queries. In my opinion, the PR title 
and description don't do justice to this change since it does not only help 
when `after` is out of range, also when `after` is within the range but 
filtering only based on the `after` value significantly reduces the number of 
hits to evaluate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gashutos commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

2023-06-14 Thread via GitHub


gashutos commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590638213

   > We just upgraded Elasticsearch to a Lucene snapshot that has this change, 
and this triggered major speedups on some queries. In my opinion, the PR title 
and description don't do justice to this change since it does not only help 
when after is out of range, also when after is within the range but filtering 
only based on the after value significantly reduces the number of hits to 
evaluate.
   
   @jpountz Agreed !
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

2023-06-14 Thread via GitHub


jpountz commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590663201

   @gashutos I think we should make users aware of this optimization, would you 
be up for opening another PR that adds a CHANGES entry?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

2023-06-14 Thread via GitHub


jpountz commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590666069

   Let's also update the title/description of this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gashutos opened a new pull request, #12367: Add CHANGES.txt for #12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector

2023-06-14 Thread via GitHub


gashutos opened a new pull request, #12367:
URL: https://github.com/apache/lucene/pull/12367

   ### Description
   
   Adding CHANGES.txt in improvements sections.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gashutos closed pull request #12367: Add CHANGES.txt for #12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector

2023-06-14 Thread via GitHub


gashutos closed pull request #12367: Add CHANGES.txt for #12334 Honor after 
value for skipping documents even if queue is not full for PagingFieldCollector
URL: https://github.com/apache/lucene/pull/12367


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gashutos opened a new pull request, #12368: Add CHANGES.txt for #12334 Honor after value for skipping documents e…

2023-06-14 Thread via GitHub


gashutos opened a new pull request, #12368:
URL: https://github.com/apache/lucene/pull/12368

   
   Adding CHANGES.txt for #12334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gashutos commented on pull request #12334: Honor after value for skipping documents even if queue is not full for PagingFieldCollector

2023-06-14 Thread via GitHub


gashutos commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590701060

   Sure, changes title/description, LMK if looks good.
   CHANGES.txt PR https://github.com/apache/lucene/pull/12368


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub


javanna commented on issue #12347:
URL: https://github.com/apache/lucene/issues/12347#issuecomment-1590702834

   heya @sohami thanks a lot for sharing more context. 
   
   > With custom slice computation to control the max slices per request/index 
the limiting factor in SliceExecutor will not be needed.
   
   Good point, agreed. Also, QueueSizeBasedExecutor is quite opinionated and 
non configurable, and it gets applied based on an instanceof check on the 
provided executor which is not fantastic. 
   
   Another thought on my end: executing sometimes on the caller thread, and 
sometimes on the executor makes things hard to reason about: how do you size 
the two thread pools if you can't easily tell what load they are subjected to? 
   
   Instead of making the slice executor configurable then, I would considering 
removing it entirely, and forcing the collection to always to happen on the 
separate thread pool. I think we'll need to figure out how to handle rejections 
from the executor thread pool, as today the collection happens on the caller 
thread whenever there's a rejection which I don't think is a behaviour we want 
to keep. We could also leave this to the executor implementation that is 
provided.
   
   I believe that the QueueSizeBasedExecutor was contributed by OpenSearch: 
would the approach suggested above be feasible for you folks? I am thinking it 
would simplify things and provide a better user experience for Lucene users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12334: Honor after value for skipping documents even if queue is not full for PagingFieldCollector

2023-06-14 Thread via GitHub


jpountz commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590703139

   Looks great, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #12368: Add CHANGES.txt for #12334 Honor after value for skipping documents e…

2023-06-14 Thread via GitHub


jpountz merged PR #12368:
URL: https://github.com/apache/lucene/pull/12368


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #12349: CompetitiveIterator should be null if sort field does not exist in TermOrdValComparator

2023-06-14 Thread via GitHub


LuXugang commented on PR #12349:
URL: https://github.com/apache/lucene/pull/12349#issuecomment-1590716969

   ```java
 public void test111() throws IOException{
   
   Directory dir = newDirectory();
   IndexWriterConfig iwc = new IndexWriterConfig(new 
MockAnalyzer(random()));
   RandomIndexWriter indexWriter = new RandomIndexWriter(random(), dir, 
iwc);
   Document doc;
   Random random = new Random();
   int count = 0;
   while (count++ < 10){
 doc = new Document();
 doc.add(new SortedSetDocValuesField("sortedSet", new BytesRef("a")));
 doc.add(new StringField("name", 
String.valueOf(random.nextInt(100)), StringField.Store.YES));
 indexWriter.addDocument(doc);
   }
   indexWriter.commit();
   
   IndexReader reader = indexWriter.getReader();
   IndexSearcher searcher = newSearcher(reader);
   
   assert reader.maxDoc() == 10;
   Query query = new MatchAllDocsQuery();
   Sort sort = new Sort(new SortedSetSortField("field no exist ", false));
   
   TopDocs noSearchField = searcher.search(query, 2000);
   assert noSearchField.totalHits.value == 2001;
   
   TopDocs hasSearchField = searcher.search(query, 2000, sort);
   // if the search sort field is not exist,  should early terminate after 
Top 2000 collected?
   assert hasSearchField.totalHits.value == 10;
   
   indexWriter.close();
   reader.close();
   dir.close();
   
 }
   ```
   
   If search sort field does not exist, should we early terminate collection 
after TopN collected? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub


alessandrobenedetti commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1229310619


##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ConstKnnFloatValueSource.java:
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.queries.function.valuesource;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.util.VectorUtil;
+
+/** Function that returns a constant float vector value for every document. */
+public class ConstKnnFloatValueSource extends ValueSource {
+  private final float[] vector;
+
+  public ConstKnnFloatValueSource(float[] constVector) {
+this.vector = VectorUtil.checkFinite(Objects.requireNonNull(constVector, 
"constVector"));

Review Comment:
   "constVector" -> maybe a better message?
   "the input constant vector is null" for example?
   I struggled to read this code, thinking it was some reference to some 
variable/constant but it was just a message (this applies to the other non null 
check)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub


uschindler commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1229323827


##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ConstKnnFloatValueSource.java:
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.queries.function.valuesource;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.util.VectorUtil;
+
+/** Function that returns a constant float vector value for every document. */
+public class ConstKnnFloatValueSource extends ValueSource {
+  private final float[] vector;
+
+  public ConstKnnFloatValueSource(float[] constVector) {
+this.vector = VectorUtil.checkFinite(Objects.requireNonNull(constVector, 
"constVector"));

Review Comment:
   actually theres are inconsistences how it is used. I tend to just use for 
parameter checks just the variable name of the parameter.
   I have no strong preference. It is also not harmoized in Lucene. The JDK 
uses the "variable name" approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub


uschindler commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1229323827


##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ConstKnnFloatValueSource.java:
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.queries.function.valuesource;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.util.VectorUtil;
+
+/** Function that returns a constant float vector value for every document. */
+public class ConstKnnFloatValueSource extends ValueSource {
+  private final float[] vector;
+
+  public ConstKnnFloatValueSource(float[] constVector) {
+this.vector = VectorUtil.checkFinite(Objects.requireNonNull(constVector, 
"constVector"));

Review Comment:
   actually theres are inconsistences how it is used. I tend to just use for 
parameter checks just the variable name of the parameter.
   I have no strong preference. It is also not harmoized in Lucene. The JDK 
uses also both approaches...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub


uschindler commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1229329251


##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ConstKnnFloatValueSource.java:
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.queries.function.valuesource;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.util.VectorUtil;
+
+/** Function that returns a constant float vector value for every document. */
+public class ConstKnnFloatValueSource extends ValueSource {
+  private final float[] vector;
+
+  public ConstKnnFloatValueSource(float[] constVector) {
+this.vector = VectorUtil.checkFinite(Objects.requireNonNull(constVector, 
"constVector"));

Review Comment:
   - 
https://github.com/openjdk/jdk/blob/bd79db3930f192f6742e29a63a6d1c3bc3dd3385/src/java.base/share/classes/java/nio/channels/Channels.java#L87
   - 
https://github.com/openjdk/jdk/blob/bd79db3930f192f6742e29a63a6d1c3bc3dd3385/src/java.base/share/classes/java/util/StringJoiner.java#L126



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12349: CompetitiveIterator should be null if sort field does not exist in TermOrdValComparator

2023-06-14 Thread via GitHub


jpountz commented on PR #12349:
URL: https://github.com/apache/lucene/pull/12349#issuecomment-1590865303

   I agree that we should fix this comparator so that the last call to 
`IndexSearcher.search` in your test only collects 2000 hits. This doesn't seem 
to be what your PR does though?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #12366: Move TermAndBoost back to its original location.

2023-06-14 Thread via GitHub


jpountz merged PR #12366:
URL: https://github.com/apache/lucene/pull/12366


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub


javanna opened a new pull request, #12369:
URL: https://github.com/apache/lucene/pull/12369

   We have recently increased the likelihood of leveraging inter-segment search 
concurrency in tests when newSearcher is used to create the index searcher (see 
#959). When parallel execution is enabled though, an executor is only set 50% 
of the times, and parallel execution is dependent on the number of documents 
and segments indexed. That means that out of 1000 test runs that uses 
RandomIndexWriter to index a random number of docs up to 1000, we will 
effectively parallelize only a couple of times.
   
   This commit increases the likelihood of running concurrent searches by 
lowering further the slice thresholds and setting the executor frequently 
instead of 50% of the times.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on a diff in pull request #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub


javanna commented on code in PR #12369:
URL: https://github.com/apache/lucene/pull/12369#discussion_r1229356704


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##
@@ -1965,9 +1966,9 @@ public static IndexSearcher newSearcher(
 .addClosedListener(cacheKey -> 
TestUtil.shutdownExecutorService(ex));
   }
   IndexSearcher ret;
+  int maxDocPerSlice = random.nextBoolean() ? 1 : 1 + random.nextInt(1000);
+  int maxSegmentsPerSlice = random.nextBoolean() ? 1 : 1 + 
random.nextInt(10);

Review Comment:
   This may be too aggressive, as we may end up with way too many slices 
depending on how many docs and segments tests have. An alternative would be to 
have a different value distribution that is closer to the lower bound of the 
range. Another option could be to make this configurable so that tests that 
want a behaviour that is closed to production can override it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on a diff in pull request #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub


javanna commented on code in PR #12369:
URL: https://github.com/apache/lucene/pull/12369#discussion_r1229359014


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##
@@ -1965,9 +1966,9 @@ public static IndexSearcher newSearcher(
 .addClosedListener(cacheKey -> 
TestUtil.shutdownExecutorService(ex));
   }
   IndexSearcher ret;
+  int maxDocPerSlice = random.nextBoolean() ? 1 : 1 + random.nextInt(1000);
+  int maxSegmentsPerSlice = random.nextBoolean() ? 1 : 1 + 
random.nextInt(10);

Review Comment:
   I do think that when `useThreads` is true, we should do our best to leverage 
concurrency at least half of the runs, rather than 0.2% of the runs. Being this 
dependent on the number of docs and segments makes it particularly challenging 
to come up with a good default value. Possibly the proposed behaviour is good 
for tests that index a low amount of docs, which is the majority of the lucene 
tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub


jpountz commented on code in PR #12369:
URL: https://github.com/apache/lucene/pull/12369#discussion_r1229481989


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##
@@ -1941,7 +1940,7 @@ public static IndexSearcher newSearcher(
 } else {
   int threads = 0;
   final ThreadPoolExecutor ex;
-  if (r.getReaderCacheHelper() == null || random.nextBoolean()) {
+  if (r.getReaderCacheHelper() == null || rarely()) {

Review Comment:
   I'd prefer to keep this one a `random.nextBoolean()` as the semantics of 
`useThreads` to me are about whether the test _may_ use threads. The point is 
to allow some tests to disable threading by passing `useThreads = false`.



##
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##
@@ -1965,9 +1966,9 @@ public static IndexSearcher newSearcher(
 .addClosedListener(cacheKey -> 
TestUtil.shutdownExecutorService(ex));
   }
   IndexSearcher ret;
+  int maxDocPerSlice = random.nextBoolean() ? 1 : 1 + random.nextInt(1000);
+  int maxSegmentsPerSlice = random.nextBoolean() ? 1 : 1 + 
random.nextInt(10);

Review Comment:
   It looks ok to me, worst-case scenario it will create one slice per segment, 
which shouldn't be an adversarial case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on a diff in pull request #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub


javanna commented on code in PR #12369:
URL: https://github.com/apache/lucene/pull/12369#discussion_r1229508836


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##
@@ -1941,7 +1940,7 @@ public static IndexSearcher newSearcher(
 } else {
   int threads = 0;
   final ThreadPoolExecutor ex;
-  if (r.getReaderCacheHelper() == null || random.nextBoolean()) {
+  if (r.getReaderCacheHelper() == null || rarely()) {

Review Comment:
   I see, maybe it's too many changes at the same time. Not setting the 
executor half of the times though lowers the likelihood quite a bit, which is 
lowered further by the slices thresholds. I agree that we should not guarantee 
that we always parallelize when we may use threads, yet I am trying to have 
that happen at least 50% of the times, instead of a couple of times every 1000 
runs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #12349: CompetitiveIterator should be null if sort field does not exist in TermOrdValComparator

2023-06-14 Thread via GitHub


LuXugang commented on PR #12349:
URL: https://github.com/apache/lucene/pull/12349#issuecomment-1591110183

   > This doesn't seem to be what your PR does though?
   
   It indeed has no relation to this PR》
   
   > If search sort field does not exist, should we early terminate collection 
after TopN collected?
   
   I would like to open an issue for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12349: CompetitiveIterator should be null if sort field does not exist in TermOrdValComparator

2023-06-14 Thread via GitHub


jpountz commented on PR #12349:
URL: https://github.com/apache/lucene/pull/12349#issuecomment-159992

   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors

2023-06-14 Thread via GitHub


uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1591112679

   I did not see any slowdowns in last night @mikemccand benchmark caused by 
the check during indexing and on building the query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #12358: Optimize `count()` for BooleanQuery disjunction

2023-06-14 Thread via GitHub


uschindler commented on issue #12358:
URL: https://github.com/apache/lucene/issues/12358#issuecomment-1591117808

   Hi, thanks for crosschecking. 1 hour warmup is therefor not changing 
anything.
   
   Anyways, I'd use a newer JDK like 20.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] nreimers commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub


nreimers commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591318871

   @msokolov The index / vector DB should return the dot product score as is. 
No scaling, no truncation.
   
   Using dot product is tremendously useful for embedding models, they perform 
in asymmetric settings where you want to map a short search query to a longer 
relevant document (which is the most common case in search) much better than 
cosine similarity or euclidean distance.
   
   But here the index should return the values as is and it should then be up 
to the user to truncate negative scores or to normalize these scores to 
pre-defined ranges.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub


uschindler commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591331493

   > @msokolov The index / vector DB should return the dot product score as is. 
No scaling, no truncation.
   > 
   > Using dot product is tremendously useful for embedding models, they 
perform in asymmetric settings where you want to map a short search query to a 
longer relevant document (which is the most common case in search) much better 
than cosine similarity or euclidean distance.
   > 
   > But here the index should return the values as is and it should then be up 
to the user to truncate negative scores or to normalize these scores to 
pre-defined ranges.
   
   The problem is that this is not compatible with Lucene.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub


benwtrent commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591347442

   I would think as long as more negative values are scored lower, we will 
retrieve documents in a sane manner. 
   
   Scaling negatives to restrict them and then not scaling positive values at 
all could work. The `_score` wouldn't always be the dot-product exactly, but it 
allows KNN search to find the most relevant information, even if all of the 
dot-products are negative when comparing with the query vector.
   
   This brings us back to @jmazanec15 suggestion on scaling scores.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub


msokolov commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591355715

   Yeah, after consideration, I think we could maybe argue for changing the 
scaling of negative values given that they were documented as unsupported, even 
though it would be breaking back-compat in the sense that scores would be 
changed. But I think we ought to preserve the scaling of non-negative values in 
case people have scaling factors they use for combining scores with other 
queries' scores. So we could go with @jmazanec15 suggestion except leaving in 
place the scale by 1/2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub


msokolov commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591379022

   Yeah. Another thing we could consider is doing this scaling in 
KnnVectorQuery and/or its Scorer. These have the ultimate responsibility of 
complying with the Scorer contract. If we did it there we wouldn't have to 
change the output of VectorSimilarity. However it's messy to do it there since 
this is specific to a particular similarity implementation, so on balance doing 
it in the similarity makes more sense to me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti merged pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub


alessandrobenedetti merged PR #12253:
URL: https://github.com/apache/lucene/pull/12253


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti closed issue #12252: Add function queries for computing vector similarity between knn vectors

2023-06-14 Thread via GitHub


alessandrobenedetti closed issue #12252: Add function queries for computing 
vector similarity between knn vectors
URL: https://github.com/apache/lucene/issues/12252


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sohami commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub


sohami commented on issue #12347:
URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591452446

   @javanna Thanks for your input.
   
   > Another thought on my end: executing sometimes on the caller thread, and 
sometimes on the executor makes things hard to reason about: how do you size 
the two thread pools if you can't easily tell what load they are subjected to?
   
   > Instead of making the slice executor configurable then, I would 
considering removing it entirely, and forcing the collection to always to 
happen on the separate thread pool. I think we'll need to figure out how to 
handle rejections from the executor thread pool, as today the collection 
happens on the caller thread whenever there's a rejection which I don't think 
is a behaviour we want to keep. We could also leave this to the executor 
implementation that is provided.
   
   As you mentioned earlier as well (and I agree) it is hard to understand the 
default which works best for all the usage. So providing a way to customize it 
will provide the flexibility to the users to adhere to their use cases. I think 
that way we can see what custom mechanism used across users works well and then 
change the default later as needed. I would also like to try to just remove the 
limiting factor but keep the mechanism to execute the last slice on the caller 
thread, so `SliceExecutor` type interface will still be useful.
   
   I think for now can we split the issue into 2. We can potentially make the 
change for 1st one now and follow up with 2nd one. Thoughts ?
   1. Take the `LeafSlice[]` in constructor to allow for custom slice 
computation.
   2. Discuss different options to customize `SliceExecutor` or we will want to 
replace it with some other interface
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Jackyrie2 opened a new pull request, #12371: [Draft] #12236 Lazily compute similarity score

2023-06-14 Thread via GitHub


Jackyrie2 opened a new pull request, #12371:
URL: https://github.com/apache/lucene/pull/12371

   ### Description
   Per @zhaih suggestion in #12236, this PR moves the computation of the 
similarity score from `initalizedFromGraph` to a later time, when the 
`NeighborArray` needs to be sorted and pop out the worst non-diverse node. A 
new abstract class `ScoringFunction` is created to hold the necessary context 
to compute the similarity score, and is passed into the `addOutofOrder` 
function. Let me know if this solution works, as it puts extra strain on memory 
usage. 
   
   I will work on writing unit tests, but the changes in this PR pass the 
current unit tests in hnsw test directory. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #12371: [Draft] #12236 Lazily compute similarity score

2023-06-14 Thread via GitHub


benwtrent commented on PR #12371:
URL: https://github.com/apache/lucene/pull/12371#issuecomment-1591770109

   Hey @Jackyrie2 this does add some extra memory overhead, 4 new object 
references. It would be good if it was justified with a benchmark. 
   
   Could you share some benchmarking on indexing throughput and segment 
merging? I expect those two places to be where we see improvement if any.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub


javanna commented on issue #12347:
URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591822866

   > Take the LeafSlice[] in constructor to allow for custom slice computation.
   
   Sounds good, I'll happily review that change.
   
   > Discuss different options to customize SliceExecutor or we will want to 
replace it with some other interface
   
   Ok to discussing, I do think that making things pluggable is a change that's 
difficult to revert in terms of backwards compatibility, and I think we should 
put some effort into changing the current behaviour before we add new public 
abstractions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] atris commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub


atris commented on issue #12347:
URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591837122

   > > Take the LeafSlice[] in constructor to allow for custom slice 
computation.
   > 
   > Sounds good, I'll happily review that change.
   > 
   > > Discuss different options to customize SliceExecutor or we will want to 
replace it with some other interface
   > 
   > Ok to discussing, I do think that making things pluggable is a change 
that's difficult to revert in terms of backwards compatibility, and I think we 
should put some effort into changing the current behaviour before we add new 
public abstractions.
   
   Strong -1 t replacing the interface. I think it has worked well for many 
users for a while and it would be breaking back compatibility to serve a 
specific use case.
   
   I am just catching up on this thread -- why does the current SliceExecutor 
not work for extension in this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis opened a new pull request, #12372: Reuse neighborqueue during hnsw index build (attempt 2)

2023-06-14 Thread via GitHub


jbellis opened a new pull request, #12372:
URL: https://github.com/apache/lucene/pull/12372

   This changes HnswGraphBuilder to re-use the same candidates queues for 
adding nodes by allocating them in the Builder instance.
   
   This saves about 2.5% of build time and takes memory allocations of NQ 
long[] from 25% of total to 0%.  JFR runs are attached.
   
   The difference from the first attempt (which actually made things slower) is 
that it preserves the original code's behavior of using a 1-sized queue for the 
search in the levels above where the node actually gets added.
   
   [main.jfr.gz](https://github.com/apache/lucene/files/11749837/main.jfr.gz)
   [nq2.jfr.gz](https://github.com/apache/lucene/files/11749838/nq2.jfr.gz)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12372: Reuse neighborqueue during hnsw index build (attempt 2)

2023-06-14 Thread via GitHub


jbellis commented on PR #12372:
URL: https://github.com/apache/lucene/pull/12372#issuecomment-1591859337

   Additionally, the original change only re-used the candidates queues within 
a single addNode call, so this is improved in that respect as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sohami commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub


sohami commented on issue #12347:
URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591876384

   @atris To summarize, there are 2 separate functionality I am looking to add:
   
   1) Custom slice computation which the extension can provide. For this we can 
provide a constructor in `IndexSearcher` which takes in `LeafSlice` array from 
extension. I think probably there is no concern with this.
   
   2) Mechanism to provide custom `SliceExecutor` implementation or deprecate 
this with some other mechanism. I would ideally like to provide a mechanism for 
extensions to be able to give a custom implementation of it. The default 
implementations takes into consideration certain limiting factor to apply back 
pressure which will not be needed in all the cases (as shared 
[above](https://github.com/apache/lucene/issues/12347#issuecomment-1589876811)) 
and will also simplify the reasoning behind which `slices` got executed on 
which thread-pool. So keeping the existing default and giving the flexibility 
to customize it is what I guess will be helpful here. This is still being 
discussed and would be great to hear your feedback as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis opened a new pull request, #12373: require that float vector components are smaller than 1E17 to prevent overflowing to Infinity

2023-06-14 Thread via GitHub


jbellis opened a new pull request, #12373:
URL: https://github.com/apache/lucene/pull/12373

   Following up to PR #12281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12373: require that float vector components are smaller than 1E17 to prevent overflowing to Infinity

2023-06-14 Thread via GitHub


jbellis commented on PR #12373:
URL: https://github.com/apache/lucene/pull/12373#issuecomment-1591954514

   cc @uschindler 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sohami opened a new pull request, #12374: Provide constructor to accept the LeafSlice computed by extensions

2023-06-14 Thread via GitHub


sohami opened a new pull request, #12374:
URL: https://github.com/apache/lucene/pull/12374

   ### Description
   Add a constructor which takes in the computed slices from extensions and 
uses that for running the search concurrently on provided executor. This is 
based on the discussion on the issue 
https://github.com/apache/lucene/issues/12347
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sohami commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub


sohami commented on issue #12347:
URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591995488

   @javanna @atris I have create a PR (#12374) for item 1 above for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org