[GitHub] [lucene] original-brownbear commented on a diff in pull request #12453: Faster bulk numeric reads from BufferedIndexInput

2023-07-24 Thread via GitHub
original-brownbear commented on code in PR #12453: URL: https://github.com/apache/lucene/pull/12453#discussion_r1271914546 ## lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java: ## @@ -159,6 +159,63 @@ public final long readLong() throws IOException { }

[GitHub] [lucene] jpountz opened a new issue, #12456: Investigate slow fuzzy queries

2023-07-24 Thread via GitHub
jpountz opened a new issue, #12456: URL: https://github.com/apache/lucene/issues/12456 While disjunctive queries got a performance boost with https://github.com/apache/lucene/pull/12444 ([OrHighHigh](http://people.apache.org/~mikemccand/lucenebench/OrHighHigh.html), [OrHighMed](http://peop

[GitHub] [lucene] jpountz commented on a diff in pull request #12446: Enable rank-unsafe optimizations for MAXSCORE/WAND.

2023-07-24 Thread via GitHub
jpountz commented on code in PR #12446: URL: https://github.com/apache/lucene/pull/12446#discussion_r1271961619 ## lucene/core/src/java/org/apache/lucene/search/MaxScoreBulkScorer.java: ## @@ -168,7 +171,17 @@ private boolean partitionScorers() { if (maxScoreSumFloat >= m

[GitHub] [lucene] jpountz commented on a diff in pull request #12446: Enable rank-unsafe optimizations for MAXSCORE/WAND.

2023-07-24 Thread via GitHub
jpountz commented on code in PR #12446: URL: https://github.com/apache/lucene/pull/12446#discussion_r1271963742 ## lucene/core/src/java/org/apache/lucene/search/BulkScorer.java: ## @@ -90,4 +90,13 @@ public abstract int score(LeafCollector collector, Bits acceptDocs, int min, i

[GitHub] [lucene] jpountz commented on a diff in pull request #12453: Faster bulk numeric reads from BufferedIndexInput

2023-07-24 Thread via GitHub
jpountz commented on code in PR #12453: URL: https://github.com/apache/lucene/pull/12453#discussion_r1271956372 ## lucene/core/src/test/org/apache/lucene/store/TestBufferedIndexInput.java: ## @@ -209,6 +209,103 @@ public void testBackwardsLongReads() throws IOException {

[GitHub] [lucene] original-brownbear commented on a diff in pull request #12453: Faster bulk numeric reads from BufferedIndexInput

2023-07-24 Thread via GitHub
original-brownbear commented on code in PR #12453: URL: https://github.com/apache/lucene/pull/12453#discussion_r1272006226 ## lucene/core/src/test/org/apache/lucene/store/TestBufferedIndexInput.java: ## @@ -209,6 +209,103 @@ public void testBackwardsLongReads() throws IOExceptio

[GitHub] [lucene] jpountz opened a new pull request, #12457: Improve MaxScoreBulkScorer partitioning logic.

2023-07-24 Thread via GitHub
jpountz opened a new pull request, #12457: URL: https://github.com/apache/lucene/pull/12457 Partitioning scorers is an optimization problem: the optimal set of non-essential scorers is the subset of scorers whose sum of max window scores is less than the minimum competitive score that maxim

[GitHub] [lucene] jpountz commented on pull request #12457: Improve MaxScoreBulkScorer partitioning logic.

2023-07-24 Thread via GitHub
jpountz commented on PR #12457: URL: https://github.com/apache/lucene/pull/12457#issuecomment-1647837090 luceneutil doesn't show an improvement on wikimedium because all fuzzy queries only have low-frequency terms, only nightlies have the `titel~2` query in their tasks file. ```

[GitHub] [lucene] jpountz commented on a diff in pull request #12453: Faster bulk numeric reads from BufferedIndexInput

2023-07-24 Thread via GitHub
jpountz commented on code in PR #12453: URL: https://github.com/apache/lucene/pull/12453#discussion_r1272210731 ## lucene/CHANGES.txt: ## @@ -84,6 +84,8 @@ Optimizations * GITHUB#12372: Reduce allocation during HNSW construction (Jonathan Ellis) +* GITHUB#12453: Faster bulk

[GitHub] [lucene] jpountz commented on a diff in pull request #12453: Faster bulk numeric reads from BufferedIndexInput

2023-07-24 Thread via GitHub
jpountz commented on code in PR #12453: URL: https://github.com/apache/lucene/pull/12453#discussion_r1272352389 ## lucene/CHANGES.txt: ## @@ -84,6 +84,8 @@ Optimizations * GITHUB#12372: Reduce allocation during HNSW construction (Jonathan Ellis) +* GITHUB#12453: Faster bulk

[GitHub] [lucene] jpountz merged pull request #12442: Assert IdxOrDvQuery subqueries and document useful fields

2023-07-24 Thread via GitHub
jpountz merged PR #12442: URL: https://github.com/apache/lucene/pull/12442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] HoustonPutman commented on pull request #12430: Enable search for site javadocs

2023-07-24 Thread via GitHub
HoustonPutman commented on PR #12430: URL: https://github.com/apache/lucene/pull/12430#issuecomment-1648042240 This same fix worked for Solr: https://solr.apache.org/docs/9_3_0/core/index.html Will go ahead and merge! -- This is an automated message from the Apache Git Service. To

[GitHub] [lucene] HoustonPutman merged pull request #12430: Enable search for site javadocs

2023-07-24 Thread via GitHub
HoustonPutman merged PR #12430: URL: https://github.com/apache/lucene/pull/12430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@luce

[GitHub] [lucene] original-brownbear commented on a diff in pull request #12453: Faster bulk numeric reads from BufferedIndexInput

2023-07-24 Thread via GitHub
original-brownbear commented on code in PR #12453: URL: https://github.com/apache/lucene/pull/12453#discussion_r1272393881 ## lucene/CHANGES.txt: ## @@ -84,6 +84,8 @@ Optimizations * GITHUB#12372: Reduce allocation during HNSW construction (Jonathan Ellis) +* GITHUB#12453:

[GitHub] [lucene] jpountz merged pull request #12453: Faster bulk numeric reads from BufferedIndexInput

2023-07-24 Thread via GitHub
jpountz merged PR #12453: URL: https://github.com/apache/lucene/pull/12453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] gsmiller commented on issue #12451: Interesting TestStringsToAutomaton failure

2023-07-24 Thread via GitHub
gsmiller commented on issue #12451: URL: https://github.com/apache/lucene/issues/12451#issuecomment-1648287605 Also, here's the compiled automaton resulting from the code-point automaton above: ![out](https://github.com/apache/lucene/assets/16479560/24b31791-22df-4e78--f0be01cce99c)

[GitHub] [lucene] jbellis commented on a diff in pull request #12421: Concurrent hnsw graph and builder, take two

2023-07-24 Thread via GitHub
jbellis commented on code in PR #12421: URL: https://github.com/apache/lucene/pull/12421#discussion_r1272560387 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -311,7 +369,6 @@ void searchLevel( graphSeek(graph, level, topCandidateNode);

[GitHub] [lucene] jbellis commented on a diff in pull request #12421: Concurrent hnsw graph and builder, take two

2023-07-24 Thread via GitHub
jbellis commented on code in PR #12421: URL: https://github.com/apache/lucene/pull/12421#discussion_r1272561099 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswGraphBuilder.java: ## @@ -0,0 +1,465 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [lucene] benwtrent commented on a diff in pull request #12421: Concurrent hnsw graph and builder, take two

2023-07-24 Thread via GitHub
benwtrent commented on code in PR #12421: URL: https://github.com/apache/lucene/pull/12421#discussion_r1272563613 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -311,7 +369,6 @@ void searchLevel( graphSeek(graph, level, topCandidateNode)

[GitHub] [lucene] gsmiller opened a new issue, #12458: UTF32toUTF8 can produce automata that produce invalid unicode

2023-07-24 Thread via GitHub
gsmiller opened a new issue, #12458: URL: https://github.com/apache/lucene/issues/12458 ### Description When converting a unicode (UTF32) automaton down to a UTF8 representation, UTF32toUTF8 can create an automaton that produces/accepts invalid UTF8. This happens when a transition in

[GitHub] [lucene] gsmiller commented on issue #12451: Interesting TestStringsToAutomaton failure

2023-07-24 Thread via GitHub
gsmiller commented on issue #12451: URL: https://github.com/apache/lucene/issues/12451#issuecomment-1648504863 This looks like a bug in the `UTF32toUTF8` conversion logic to me. It seems the resulting UTF8 automaton in this case is producing a large range of invalid UTF8. This appears to al

[GitHub] [lucene] benwtrent commented on a diff in pull request #12434: Add ParentJoin KNN support

2023-07-24 Thread via GitHub
benwtrent commented on code in PR #12434: URL: https://github.com/apache/lucene/pull/12434#discussion_r1272680226 ## lucene/core/src/java/org/apache/lucene/util/hnsw/KnnResults.java: ## @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] benwtrent commented on a diff in pull request #12434: Add ParentJoin KNN support

2023-07-24 Thread via GitHub
benwtrent commented on code in PR #12434: URL: https://github.com/apache/lucene/pull/12434#discussion_r1272681200 ## lucene/core/src/java/org/apache/lucene/util/hnsw/KnnResultsProvider.java: ## @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[GitHub] [lucene] benwtrent commented on a diff in pull request #12434: Add ParentJoin KNN support

2023-07-24 Thread via GitHub
benwtrent commented on code in PR #12434: URL: https://github.com/apache/lucene/pull/12434#discussion_r1272735873 ## lucene/core/src/java/org/apache/lucene/util/hnsw/KnnResultsProvider.java: ## @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[GitHub] [lucene] iverase opened a new issue, #12459: Allow reading binary doc values as a DataInput

2023-07-24 Thread via GitHub
iverase opened a new issue, #12459: URL: https://github.com/apache/lucene/issues/12459 ### Description Binary doc values allow to store a variable number of bytes on a doc value. In order to read those bytes, we currently get a BytesRef from the API which contains the bytes on heap.

[GitHub] [lucene] iverase opened a new pull request, #12460: Allow reading binary doc values as a DataInput

2023-07-24 Thread via GitHub
iverase opened a new pull request, #12460: URL: https://github.com/apache/lucene/pull/12460 see https://github.com/apache/lucene/issues/12459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [lucene] iverase commented on issue #12459: Allow reading binary doc values as a DataInput

2023-07-24 Thread via GitHub
iverase commented on issue #12459: URL: https://github.com/apache/lucene/issues/12459#issuecomment-1648979938 I wrote a prototype for this change here: https://github.com/apache/lucene/pull/12460 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [lucene] jmazanec15 commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-24 Thread via GitHub
jmazanec15 commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1649102549 > 🤦 yep! > Here is with the higher max conn. Sort of better. Right, I was thinking this might explain the recall descrepency for the dotproduct score change (0.989 vs 0

[GitHub] [lucene] searchivarius commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-24 Thread via GitHub
searchivarius commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1649112930 Hi @jmazanec15 and @benwtrent : thanks a lot for testing. For higher recalls (somewhat higher or lower than 0.8) transformation seem to lead to substantial increase in latency