Re: [I] Move group-varint encoding/decoding logic to DataOutput/DataInput? [lucene]

2023-11-23 Thread via GitHub
easyice commented on issue #12826: URL: https://github.com/apache/lucene/issues/12826#issuecomment-1823961753 The wikimediumall benchmark reslut looks fine on java 21: java 21 ``` TaskQPS baseline StdDevQPS my_modified_version St

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-23 Thread via GitHub
jpountz commented on code in PR #12810: URL: https://github.com/apache/lucene/pull/12810#discussion_r1403031948 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListReader.java: ## @@ -63,7 +63,7 @@ public abstract class MultiLevelSkipListReader implements Closeab

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-23 Thread via GitHub
jpountz commented on code in PR #12810: URL: https://github.com/apache/lucene/pull/12810#discussion_r1403032682 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListReader.java: ## @@ -63,7 +63,7 @@ public abstract class MultiLevelSkipListReader implements Closeab

Re: [I] Move group-varint encoding/decoding logic to DataOutput/DataInput? [lucene]

2023-11-23 Thread via GitHub
jpountz commented on issue #12826: URL: https://github.com/apache/lucene/issues/12826#issuecomment-1823968702 Thanks for testing, I was wondering about the virtual call too. In theory, group-varint works well because you can start decoding the next group before being fully done with the cur

Re: [PR] Improve DirectReader java doc [lucene]

2023-11-23 Thread via GitHub
gf2121 merged PR #12835: URL: https://github.com/apache/lucene/pull/12835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] Move group-varint encoding/decoding logic to DataOutput/DataInput? [lucene]

2023-11-23 Thread via GitHub
easyice commented on issue #12826: URL: https://github.com/apache/lucene/issues/12826#issuecomment-1824101886 Thanks for explaining! you are right, if we call `DataInput.readVIntGroup`, the function `DataInput.readVIntGroupLong`(the same as `GroupVIntReader#readLong`) is not inlined, But

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-23 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1403174609 ## lucene/core/src/test/org/apache/lucene/index/TestIndexSorting.java: ## @@ -3173,4 +3173,184 @@ public void testSortDocsAndFreqsAndPositionsAndOffsets() throws IOExc

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-23 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1403178782 ## lucene/core/src/test/org/apache/lucene/index/TestIndexSorting.java: ## @@ -3173,4 +3173,184 @@ public void testSortDocsAndFreqsAndPositionsAndOffsets() throws IOExc

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-11-23 Thread via GitHub
s1monw commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1824135498 > I'm a little worried about giving up functionality, but I think if we had a list of parent-fields rather than a single parent-field that would cover what we can do today? Maybe one is e

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-11-23 Thread via GitHub
jpountz merged PR #12622: URL: https://github.com/apache/lucene/pull/12622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-23 Thread via GitHub
jpountz commented on PR #12810: URL: https://github.com/apache/lucene/pull/12810#issuecomment-1824353933 I'll merge to make sure it gets some time in CI before we cut a release. Feel free to raise concerns after merging if you have any, I'll happily address them and revert if necessary. -

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-23 Thread via GitHub
jpountz merged PR #12810: URL: https://github.com/apache/lucene/pull/12810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-23 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402058230 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-23 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402058230 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [I] Virtual threads and Lucene (support async tasks) [lucene]

2023-11-23 Thread via GitHub
Jeevananthan-23 commented on issue #12531: URL: https://github.com/apache/lucene/issues/12531#issuecomment-1824474577 @uschindler, I understand the complexity of completely rewriting all Lucene internals. However, IMO it is necessary to do so in parallel. Relying entirely on MMAP is a bad i

Re: [I] Explore a single scoring implementation in DrillSidewaysScorer [LUCENE-10037] [lucene]

2023-11-23 Thread via GitHub
slow-J commented on issue #11076: URL: https://github.com/apache/lucene/issues/11076#issuecomment-1824516300 Hi @gsmiller, I found a drill sideways tasks file in luceneutil ([ds.tasks](https://github.com/mikemccand/luceneutil/blob/5fae60800edd84c70492ac8765824ca2e4a6a991/tasks/ds.tasks)) I

[PR] Move MergeState.DocMap to a FunctionalInterface [lucene]

2023-11-23 Thread via GitHub
s1monw opened a new pull request, #12836: URL: https://github.com/apache/lucene/pull/12836 This change converts MergeState to an interface to make use of lambda expressions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Move MergeState.DocMap to a FunctionalInterface [lucene]

2023-11-23 Thread via GitHub
jpountz commented on code in PR #12836: URL: https://github.com/apache/lucene/pull/12836#discussion_r1403533598 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -5237,13 +5234,10 @@ public int get(int docID) { for (int i = 0; i < docMaps.length; +

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1403551939 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -26,26 +26,71 @@ * @lucene.experimental */ public final class TopKnnCollector

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1403560016 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -79,24 +81,30 @@ public Query rewrite(IndexSearcher indexSearcher) throws

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1403563168 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -26,26 +26,71 @@ * @lucene.experimental */ public final class TopKnnCollector

Re: [PR] Move MergeState.DocMap to a FunctionalInterface [lucene]

2023-11-23 Thread via GitHub
s1monw commented on code in PR #12836: URL: https://github.com/apache/lucene/pull/12836#discussion_r1403570681 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -5237,13 +5234,10 @@ public int get(int docID) { for (int i = 0; i < docMaps.length; ++

[PR] Removing TermInSetQuery array ctor [lucene]

2023-11-23 Thread via GitHub
slow-J opened a new pull request, #12837: URL: https://github.com/apache/lucene/pull/12837 Creating a PR for the items discussed in https://github.com/apache/lucene/issues/12243. Currently, calling the [`KeywordField#newSetQuery`](https://github.com/apache/lucene/blob/main/lu

Re: [I] Add a new static method for KeywordField#newSetQuery to support collections parameter [lucene]

2023-11-23 Thread via GitHub
slow-J commented on issue #12243: URL: https://github.com/apache/lucene/issues/12243#issuecomment-1824681333 > > I don't think the fact that TermInSetQuery has a Collection ctor should impact this at all. Maybe that should be removed? > > +1 > > It's weird to take both array an

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1403584908 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -26,26 +26,71 @@ * @lucene.experimental */ public final class TopKnnCollector

Re: [I] Virtual threads and Lucene (support async tasks) [lucene]

2023-11-23 Thread via GitHub
uschindler commented on issue #12531: URL: https://github.com/apache/lucene/issues/12531#issuecomment-1824732748 > @uschindler, I understand the complexity of completely rewriting all Lucene internals. However, IMO it is necessary to do so in parallel. Relying entirely on MMAP is a bad idea

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1824736252 @vigyasharma Answering other questions: > We seem to consistently see an improvement in recall between single segment, and multi-segment runs (both seq and conc.) on baseli

[PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-23 Thread via GitHub
jpountz opened a new pull request, #12838: URL: https://github.com/apache/lucene/pull/12838 This is a new iteration on #12810, which I had to revert because of test failures when docFreq is a multiple of 128. We currently have a hack when the doc freq is not a multiple of 128 in order

Re: [PR] Move MergeState.DocMap to a FunctionalInterface [lucene]

2023-11-23 Thread via GitHub
s1monw merged PR #12836: URL: https://github.com/apache/lucene/pull/12836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[I] Grow arrays up to a given limit to avoid overallocation where possible [lucene]

2023-11-23 Thread via GitHub
stefanvodita opened a new issue, #12839: URL: https://github.com/apache/lucene/issues/12839 ### Description `ArrayUtils` provides methods to grow arrays, overallocating exponentially, with the possibility of requesting a minimum size. Sometimes we have an upper limit to the number

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-23 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402058230 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [PR] Random access term dictionary [lucene]

2023-11-23 Thread via GitHub
Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1825228925 After some tweaking and tinkering I was finally able to get the bench running the way I wanted in luceneutil! @mikemccand Unfortunately luceneutil out of the box doesn't work for m

Re: [PR] hunspell: allow in-memory entry sorting for faster dictionary loading [lucene]

2023-11-23 Thread via GitHub
donnerpeter merged PR #12834: URL: https://github.com/apache/lucene/pull/12834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

Re: [PR] Hide the internal data structure of HeapPointWriter [lucene]

2023-11-23 Thread via GitHub
iverase commented on PR #12762: URL: https://github.com/apache/lucene/pull/12762#issuecomment-1825276830 I didn't notice any performance change. Added javadocs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL