Re: [I] Reproducible failure in TestIndexWriter.testHasUncommittedChanges [lucene]

2023-11-13 Thread via GitHub
jpountz closed issue #12763: Reproducible failure in TestIndexWriter.testHasUncommittedChanges URL: https://github.com/apache/lucene/issues/12763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-13 Thread via GitHub
jpountz commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1807797987 Thanks both, I pushed an annotation, it should show up tomorrow. I hah high expectations based on preliminary results from https://github.com/apache/lucene/issues/12696#issue-1950239343

Re: [PR] Refactor the use of runFinalization in tests and benchmarks [lucene]

2023-11-13 Thread via GitHub
ChrisHegarty merged PR #12768: URL: https://github.com/apache/lucene/pull/12768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-13 Thread via GitHub
slow-J commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1807962044 I ran a new luceneutil benchmark on Saturday with my commit https://github.com/apache/lucene/commit/8ae598bae593e1faa4ff82a87f4cd45f120f1059 (using Lucene99PostingsFormat) as candidate an

Re: [PR] Fix CheckIndex to detect major corruption with old (not the latest) commit point [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1390999456 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -610,6 +610,39 @@ public Status checkIndex(List onlySegments, ExecutorService executorServ

Re: [PR] Fix CheckIndex to detect major corruption with old (not the latest) commit point [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12530: URL: https://github.com/apache/lucene/pull/12530#issuecomment-1808000711 I think this PR is ready. Note that its scope is "only" to catch cases where the last commit point succeeds, but older commit points have problems. This case was previously passing

[PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-13 Thread via GitHub
gf2121 opened a new pull request, #12800: URL: https://github.com/apache/lucene/pull/12800 **Description** In https://github.com/apache/lucene/pull/12114, we had great numbers for LSB radix sorter when sorting random docs in `SortingDocsEnum` . But we can not take advantage of the LS

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1808013188 @shubhamvishu can we close this one? Any other things to try? For some reason, FST building of enwiki terms seems not to like this magical hashing... -- This is an automated messa

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12699: URL: https://github.com/apache/lucene/pull/12699#discussion_r1391017311 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -484,15 +487,15 @@ public boolean seekExact(BytesRef target) throws

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1391027915 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1808041346 > Thanks for the tips! Yes, almost there. I'm working on the real compact bitpacker and unpacker. I still need to implement the PostingFormat afterwards. Do you think I need to implem

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-13 Thread via GitHub
jpountz commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1808043672 Could you check in your benchmark under `lucene/benchmark-jmh` so that we could play with it? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1391032959 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene99/randomaccess/TermStateCodec.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-13 Thread via GitHub
gf2121 commented on code in PR #12699: URL: https://github.com/apache/lucene/pull/12699#discussion_r1391055842 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java: ## @@ -118,13 +118,11 @@ long readVLongOutput(DataInput in) throws IOException {

Re: [I] Can FST read bytes forward? [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on issue #12355: URL: https://github.com/apache/lucene/issues/12355#issuecomment-1808084756 +1 to find a way to reverse the bytes at compilation time. The reversal of bytes during FST compilation is so hard to think about! It happens because the FST is logically

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-13 Thread via GitHub
gf2121 commented on code in PR #12699: URL: https://github.com/apache/lucene/pull/12699#discussion_r1391060906 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java: ## @@ -142,12 +138,20 @@ public void setState(int state) { } v

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-13 Thread via GitHub
gf2121 commented on code in PR #12699: URL: https://github.com/apache/lucene/pull/12699#discussion_r1391079102 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -1190,4 +1176,65 @@ public void seekExact(long ord) { public long or

Re: [I] surpriseMePolygon and createRegularPolygon in test util class returns invalid polygon [lucene]

2023-11-13 Thread via GitHub
stefanvodita commented on issue #12596: URL: https://github.com/apache/lucene/issues/12596#issuecomment-1808119452 Another edge case that can cause problems is that of multiple points on the same line (e.g. 3 points in a row with the same x coordinate). This happens for regular and for "sur

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391064549 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -359,7 +383,9 @@ public void truncate(long newLen) { assert newLen == getPosition();

Re: [I] Can FST read bytes forward? [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on issue #12355: URL: https://github.com/apache/lucene/issues/12355#issuecomment-1808180025 Looking at https://github.com/BurntSushi/fst/blob/master/src/raw/node.rs, it seems Tantivy also read bytes in backward. However this Node class only works with byte array. I think

Re: [PR] Fix CheckIndex to detect major corruption with old (not the latest) commit point [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12530: URL: https://github.com/apache/lucene/pull/12530#issuecomment-1808188829 Hmm the Gradle Precommit Checks failed with: ``` * What went wrong: Execution failed for task ':lucene:documentation:createDocumentationIndex'. > Out of space in CodeCac

Re: [PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12800: URL: https://github.com/apache/lucene/pull/12800#discussion_r1391137486 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocSorterBenchmark.java: ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12800: URL: https://github.com/apache/lucene/pull/12800#discussion_r1391138011 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocSorterBenchmark.java: ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12800: URL: https://github.com/apache/lucene/pull/12800#issuecomment-1808206830 I wonder whether `Arrays.sort` might be a good choice instead of making our own powerful sorting classes? [OpenJDK is (gradually?) taking advantage of fast SIMD sorting](https://gith

Re: [I] Would SIMD powered sort (on top of Panama) be worth it? [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on issue #12399: URL: https://github.com/apache/lucene/issues/12399#issuecomment-1808206348 At least OpenJDK 22 on modern-ish Intel x86-64 CPUs will [sometimes using SIMD for fast `Arrays.sort`](https://bugs.openjdk.org/browse/JDK-8309130)! -- This is an automated mes

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1808208970 > I ran a new luceneutil benchmark on Saturday with my commit [8ae598b](https://github.com/apache/lucene/commit/8ae598bae593e1faa4ff82a87f4cd45f120f1059) (using Lucene99PostingsFormat

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391146752 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-13 Thread via GitHub
gf2121 commented on code in PR #12800: URL: https://github.com/apache/lucene/pull/12800#discussion_r1391150384 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocSorterBenchmark.java: ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391146752 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391146752 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [I] [DISCUSS] Should we change TieredMergePolicy's segment deletion accounting to use numDocs in the denominator rather than MaxDoc? [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on issue #12792: URL: https://github.com/apache/lucene/issues/12792#issuecomment-1808225864 +1, I like that interpretation @jpountz. @yugushihuang maybe we could improve the javadocs to express the formula and @jpountz sentiment about it? -- This is an automated mess

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12715: URL: https://github.com/apache/lucene/pull/12715#discussion_r1391162919 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -125,8 +125,11 @@ public class FSTCompiler { /** * Instantiates an FST/FSA builder

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391165022 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391172089 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391172089 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [I] Port PR management bot from Apache Beam [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on issue #12796: URL: https://github.com/apache/lucene/issues/12796#issuecomment-1808250719 Maybe we could start super simple here, e.g. adding labels (hmm, how?), and commenting on PRs that are becoming stale? I don't really like the auto-closing of very stale PRs ...

Re: [I] Port PR management bot from Apache Beam [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on issue #12796: URL: https://github.com/apache/lucene/issues/12796#issuecomment-1808252953 Where does this Apache Beam bot actually live? Is it a GitHub action? Do you have a link to its sources @stefanvodita? -- This is an automated message from the Apache Git Ser

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-13 Thread via GitHub
slow-J commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1808258638 > > I ran a new luceneutil benchmark on Saturday with my commit [8ae598b](https://github.com/apache/lucene/commit/8ae598bae593e1faa4ff82a87f4cd45f120f1059) (using Lucene99PostingsFormat)

Re: [I] Port PR management bot from Apache Beam [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on issue #12796: URL: https://github.com/apache/lucene/issues/12796#issuecomment-1808259142 Oooh I see some interesting GH action sources in Beam e.g. https://github.com/apache/beam/blob/master/.github/workflows/pr-bot-new-prs.yml -- This is an automated message from

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391183424 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-13 Thread via GitHub
benwtrent commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1808282034 @mayya-sharipova two important measurements we need to check here: - When comparing baseline & candidate, can the `candidate` get to higher recall than baseline with lower laten

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808286805 > I've got my framework set up for testing larger than memory indexes and have some somewhat interesting first results. Thank you for setting this up @kevindrosendahl -- th

Re: [PR] [Minor] Improvements to slice pools [lucene]

2023-11-13 Thread via GitHub
mikemccand merged PR #12795: URL: https://github.com/apache/lucene/pull/12795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-13 Thread via GitHub
jpountz commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1808314262 At least in theory, group varint could be made faster than vints even with single-byte integers, because a single check on `flag == 0` would tell us that all 4 integers have a single byt

Re: [PR] Fix NFAQuery in TestRegexpRandom2 [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12793: URL: https://github.com/apache/lucene/pull/12793#issuecomment-1808325688 > I didn't realize our random searcher will use threadpool randomly, fixed it to use a rewrite method that will not do concurrent rewrite Ahh, sneaky. Does this mean users must

Re: [I] [DISCUSS] Should we change TieredMergePolicy's segment deletion accounting to use numDocs in the denominator rather than MaxDoc? [lucene]

2023-11-13 Thread via GitHub
vigyasharma commented on issue #12792: URL: https://github.com/apache/lucene/issues/12792#issuecomment-1808328118 > There will be scenario that developers expect a segment deletion pct to be `delCount / (maxDoc-delCount)` and this accounting seems more realistic than current accounting.

Re: [PR] Minor change to IndexOrDocValuesQuery#toString [lucene]

2023-11-13 Thread via GitHub
mikemccand merged PR #12791: URL: https://github.com/apache/lucene/pull/12791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-13 Thread via GitHub
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808340050 Thank you @kevindrosendahl this does seem to confirm my suspicion that the improvement isn't necessarily due to the data structure, but due to quantization. But, this does confuse

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391247632 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Cache buckets to speed up BytesRefHash#sort [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12784: URL: https://github.com/apache/lucene/pull/12784#issuecomment-1808361557 Did we see any bump in nightly benchmarks? This should make initial segment flush when there are many terms in an inverted field faster? -- This is an automated message from the Ap

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391172089 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #240: URL: https://github.com/apache/lucene/pull/240#discussion_r1391263968 ## lucene/core/src/java/org/apache/lucene/search/TopFieldCollectorManager.java: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #240: URL: https://github.com/apache/lucene/pull/240#discussion_r1391264578 ## lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java: ## @@ -45,20 +43,6 @@ public boolean withCollector() { return t

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391272833 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -359,7 +383,9 @@ public void truncate(long newLen) { assert newLen == getPosition();

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391247632 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391285510 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12786: URL: https://github.com/apache/lucene/pull/12786#discussion_r1391299752 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -289,21 +273,38 @@ public long getNodeAddress(long hashSlot) { } /** - * Set

Re: [PR] Cache buckets to speed up BytesRefHash#sort [lucene]

2023-11-13 Thread via GitHub
gf2121 commented on PR #12784: URL: https://github.com/apache/lucene/pull/12784#issuecomment-1808443275 Thanks for tracking in ! @mikemccand > Did we see any bump in nightly benchmarks? I would expect this change more likely bring some improvements for flushing high cardinalit

Re: [PR] javadocs cleanup in Lucene99PostingsFormat [lucene]

2023-11-13 Thread via GitHub
mikemccand merged PR #12776: URL: https://github.com/apache/lucene/pull/12776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12295: URL: https://github.com/apache/lucene/pull/12295#discussion_r1391341171 ## lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/morfologik/MorphosyntacticTagsAttributeImpl.java: ## @@ -52,8 +52,8 @@ public void clear() {

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-13 Thread via GitHub
jbellis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808489834 > recall actually improves when introducing pq, and only starts to decrease at a factor of 16 I would guess that either there is a bug or you happen to be testing with a real

Re: [PR] Fix NFAQuery in TestRegexpRandom2 [lucene]

2023-11-13 Thread via GitHub
zhaih commented on PR #12793: URL: https://github.com/apache/lucene/pull/12793#issuecomment-1808581207 @mikemccand It's just concurrent rewrite, I already have some javadoc warn about this: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/RegexpQuery.

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12786: URL: https://github.com/apache/lucene/pull/12786#issuecomment-1808651797 `Test2BFST` is happy: ``` The slowest tests (exceeding 500 ms) during this run:

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-13 Thread via GitHub
mikemccand merged PR #12786: URL: https://github.com/apache/lucene/pull/12786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[PR] Fix large 99 percentile match latency in Monitor during concurrent commits and purges [lucene]

2023-11-13 Thread via GitHub
daviscook477 opened a new pull request, #12801: URL: https://github.com/apache/lucene/pull/12801 ### Description Within Lucene Monitor, there is a thread contention issue that manifests as multi-second latencies in the `Monitor.match` function when it is called concurrently with `Monitor

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-13 Thread via GitHub
kevindrosendahl commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808717985 @benwtrent > Thank you @kevindrosendahl this does seem to confirm my suspicion that the improvement isn't necessarily due to the data structure, but due to quantization.

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-13 Thread via GitHub
slow-J commented on code in PR #12797: URL: https://github.com/apache/lucene/pull/12797#discussion_r1391501828 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3974,7 +3974,7 @@ public static class Options { boolean doExorcise = false; boolean do

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1391564228 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1391566961 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene99/randomaccess/TermStateCodec.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1391575474 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene99/randomaccess/TermStateCodecImpl.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-13 Thread via GitHub
uschindler commented on code in PR #12295: URL: https://github.com/apache/lucene/pull/12295#discussion_r1391562831 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PointsWriter.java: ## @@ -182,7 +181,7 @@ public void merge(MergeState mergeState) throws IOExcept

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-13 Thread via GitHub
uschindler commented on PR #12295: URL: https://github.com/apache/lucene/pull/12295#issuecomment-1808887454 P.S.: When pushing more changes, please **do not squash and force-push**! This makes reviewing not working easy. We squash later when merging, it is not needed to do this while develo

Re: [PR] Fix large 99 percentile match latency in Monitor during concurrent commits and purges [lucene]

2023-11-13 Thread via GitHub
daviscook477 commented on PR #12801: URL: https://github.com/apache/lucene/pull/12801#issuecomment-1809121843 Thank you for taking a look! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Always collect sparsely in TaxonomyFacets & switch to dense if there are enough unique labels [lucene]

2023-11-13 Thread via GitHub
gautamworah96 commented on issue #12576: URL: https://github.com/apache/lucene/issues/12576#issuecomment-1809159446 I am not actively working on this problem as of now. I am still in the process of figuring out what would be the correct thing to test/do here as a first step. Jotting down

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1384518777 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTWriter.java: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391272833 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -359,7 +383,9 @@ public void truncate(long newLen) { assert newLen == getPosition();

[PR] Remove FST constructors with DataInput for metadata [lucene]

2023-11-13 Thread via GitHub
dungba88 opened a new pull request, #12803: URL: https://github.com/apache/lucene/pull/12803 ### Description Remove FST constructors with DataInput for metadata, in favor of the new constructor with FSTMetadata -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391183424 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] Ensure DrillSidewaysScorer calls LeafCollector#finish on all sideways-dim FacetsCollectors [lucene]

2023-11-13 Thread via GitHub
gsmiller merged PR #12640: URL: https://github.com/apache/lucene/pull/12640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391146752 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391973476 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1373993909 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -287,9 +315,9 @@ public long getMappedStateCount() { return dedupHash == null ? 0 : no

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391973476 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-13 Thread via GitHub
zacharymorn commented on code in PR #240: URL: https://github.com/apache/lucene/pull/240#discussion_r1392083799 ## lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java: ## @@ -45,20 +43,6 @@ public boolean withCollector() { return

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-13 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1809654589 > Looks great, thanks @zacharymorn! I just think we should revert the `lucene/benchmark` change that breaks testing of a custom collector for this first step ... Thanks @mikemccan