Re: [PR] [Minor] Improvements to slice pools [lucene]

2023-11-13 Thread via GitHub
mikemccand merged PR #12795: URL: https://github.com/apache/lucene/pull/12795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-13 Thread via GitHub
jpountz commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1808314262 At least in theory, group varint could be made faster than vints even with single-byte integers, because a single check on `flag == 0` would tell us that all 4 integers have a single byt

Re: [PR] Fix NFAQuery in TestRegexpRandom2 [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12793: URL: https://github.com/apache/lucene/pull/12793#issuecomment-1808325688 > I didn't realize our random searcher will use threadpool randomly, fixed it to use a rewrite method that will not do concurrent rewrite Ahh, sneaky. Does this mean users must

Re: [I] [DISCUSS] Should we change TieredMergePolicy's segment deletion accounting to use numDocs in the denominator rather than MaxDoc? [lucene]

2023-11-13 Thread via GitHub
vigyasharma commented on issue #12792: URL: https://github.com/apache/lucene/issues/12792#issuecomment-1808328118 > There will be scenario that developers expect a segment deletion pct to be `delCount / (maxDoc-delCount)` and this accounting seems more realistic than current accounting.

Re: [PR] Minor change to IndexOrDocValuesQuery#toString [lucene]

2023-11-13 Thread via GitHub
mikemccand merged PR #12791: URL: https://github.com/apache/lucene/pull/12791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-13 Thread via GitHub
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808340050 Thank you @kevindrosendahl this does seem to confirm my suspicion that the improvement isn't necessarily due to the data structure, but due to quantization. But, this does confuse

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391247632 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Cache buckets to speed up BytesRefHash#sort [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12784: URL: https://github.com/apache/lucene/pull/12784#issuecomment-1808361557 Did we see any bump in nightly benchmarks? This should make initial segment flush when there are many terms in an inverted field faster? -- This is an automated message from the Ap

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391172089 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #240: URL: https://github.com/apache/lucene/pull/240#discussion_r1391263968 ## lucene/core/src/java/org/apache/lucene/search/TopFieldCollectorManager.java: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #240: URL: https://github.com/apache/lucene/pull/240#discussion_r1391264578 ## lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java: ## @@ -45,20 +43,6 @@ public boolean withCollector() { return t

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391272833 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -359,7 +383,9 @@ public void truncate(long newLen) { assert newLen == getPosition();

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391247632 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391285510 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12786: URL: https://github.com/apache/lucene/pull/12786#discussion_r1391299752 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -289,21 +273,38 @@ public long getNodeAddress(long hashSlot) { } /** - * Set

Re: [PR] Cache buckets to speed up BytesRefHash#sort [lucene]

2023-11-13 Thread via GitHub
gf2121 commented on PR #12784: URL: https://github.com/apache/lucene/pull/12784#issuecomment-1808443275 Thanks for tracking in ! @mikemccand > Did we see any bump in nightly benchmarks? I would expect this change more likely bring some improvements for flushing high cardinalit

Re: [PR] javadocs cleanup in Lucene99PostingsFormat [lucene]

2023-11-13 Thread via GitHub
mikemccand merged PR #12776: URL: https://github.com/apache/lucene/pull/12776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on code in PR #12295: URL: https://github.com/apache/lucene/pull/12295#discussion_r1391341171 ## lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/morfologik/MorphosyntacticTagsAttributeImpl.java: ## @@ -52,8 +52,8 @@ public void clear() {

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-13 Thread via GitHub
jbellis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808489834 > recall actually improves when introducing pq, and only starts to decrease at a factor of 16 I would guess that either there is a bug or you happen to be testing with a real

Re: [PR] Fix NFAQuery in TestRegexpRandom2 [lucene]

2023-11-13 Thread via GitHub
zhaih commented on PR #12793: URL: https://github.com/apache/lucene/pull/12793#issuecomment-1808581207 @mikemccand It's just concurrent rewrite, I already have some javadoc warn about this: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/RegexpQuery.

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-13 Thread via GitHub
mikemccand commented on PR #12786: URL: https://github.com/apache/lucene/pull/12786#issuecomment-1808651797 `Test2BFST` is happy: ``` The slowest tests (exceeding 500 ms) during this run:

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-13 Thread via GitHub
mikemccand merged PR #12786: URL: https://github.com/apache/lucene/pull/12786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[PR] Fix large 99 percentile match latency in Monitor during concurrent commits and purges [lucene]

2023-11-13 Thread via GitHub
daviscook477 opened a new pull request, #12801: URL: https://github.com/apache/lucene/pull/12801 ### Description Within Lucene Monitor, there is a thread contention issue that manifests as multi-second latencies in the `Monitor.match` function when it is called concurrently with `Monitor

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-13 Thread via GitHub
kevindrosendahl commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1808717985 @benwtrent > Thank you @kevindrosendahl this does seem to confirm my suspicion that the improvement isn't necessarily due to the data structure, but due to quantization.

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-13 Thread via GitHub
slow-J commented on code in PR #12797: URL: https://github.com/apache/lucene/pull/12797#discussion_r1391501828 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3974,7 +3974,7 @@ public static class Options { boolean doExorcise = false; boolean do

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1391564228 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1391566961 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene99/randomaccess/TermStateCodec.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Random access term dictionary [lucene]

2023-11-13 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1391575474 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene99/randomaccess/TermStateCodecImpl.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-13 Thread via GitHub
uschindler commented on code in PR #12295: URL: https://github.com/apache/lucene/pull/12295#discussion_r1391562831 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PointsWriter.java: ## @@ -182,7 +181,7 @@ public void merge(MergeState mergeState) throws IOExcept

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2023-11-13 Thread via GitHub
uschindler commented on PR #12295: URL: https://github.com/apache/lucene/pull/12295#issuecomment-1808887454 P.S.: When pushing more changes, please **do not squash and force-push**! This makes reviewing not working easy. We squash later when merging, it is not needed to do this while develo

Re: [PR] Fix large 99 percentile match latency in Monitor during concurrent commits and purges [lucene]

2023-11-13 Thread via GitHub
daviscook477 commented on PR #12801: URL: https://github.com/apache/lucene/pull/12801#issuecomment-1809121843 Thank you for taking a look! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Always collect sparsely in TaxonomyFacets & switch to dense if there are enough unique labels [lucene]

2023-11-13 Thread via GitHub
gautamworah96 commented on issue #12576: URL: https://github.com/apache/lucene/issues/12576#issuecomment-1809159446 I am not actively working on this problem as of now. I am still in the process of figuring out what would be the correct thing to test/do here as a first step. Jotting down

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1384518777 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTWriter.java: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391272833 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -359,7 +383,9 @@ public void truncate(long newLen) { assert newLen == getPosition();

[PR] Remove FST constructors with DataInput for metadata [lucene]

2023-11-13 Thread via GitHub
dungba88 opened a new pull request, #12803: URL: https://github.com/apache/lucene/pull/12803 ### Description Remove FST constructors with DataInput for metadata, in favor of the new constructor with FSTMetadata -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391183424 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] Ensure DrillSidewaysScorer calls LeafCollector#finish on all sideways-dim FacetsCollectors [lucene]

2023-11-13 Thread via GitHub
gsmiller merged PR #12640: URL: https://github.com/apache/lucene/pull/12640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391146752 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391973476 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1373993909 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -287,9 +315,9 @@ public long getMappedStateCount() { return dedupHash == null ? 0 : no

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-13 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391973476 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-13 Thread via GitHub
zacharymorn commented on code in PR #240: URL: https://github.com/apache/lucene/pull/240#discussion_r1392083799 ## lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java: ## @@ -45,20 +43,6 @@ public boolean withCollector() { return

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-13 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1809654589 > Looks great, thanks @zacharymorn! I just think we should revert the `lucene/benchmark` change that breaks testing of a custom collector for this first step ... Thanks @mikemccan

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1809743395 ### Benchmark Setup Sharing my benchmark setup for reproducibility in [this branch](https://github.com/kaivalnp/lucene/tree/similarity-benchmark) (see [this commit](https://gith

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392197630 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392201110 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392201110 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391973476 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [I] Unroll or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-14 Thread via GitHub
vsop-479 commented on issue #12788: URL: https://github.com/apache/lucene/issues/12788#issuecomment-1809865124 > To benchmark then use the benchmark-jmh Gradle module I measured max with scalar, unroll, vector implementation by benchmark-jmh: Benchmark

Re: [PR] Add downloading binutils instructions for the macos [lucene]

2023-11-14 Thread via GitHub
rmuir merged PR #12804: URL: https://github.com/apache/lucene/pull/12804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Add downloading binutils instructions for the macos [lucene]

2023-11-14 Thread via GitHub
rmuir commented on PR #12804: URL: https://github.com/apache/lucene/pull/12804#issuecomment-1809976433 thank you @vsop-479 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-14 Thread via GitHub
easyice commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1810036730 Thank you @jpountz , I pushed the benchmark code, and added a new comparison between `ByteArrayDataInput` vs `ByteBufferIndexInput` . For `readVInt`, the `ByteBufferIndexInput` is a bit

Re: [PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-14 Thread via GitHub
gf2121 commented on code in PR #12800: URL: https://github.com/apache/lucene/pull/12800#discussion_r1392571640 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocSorterBenchmark.java: ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-14 Thread via GitHub
mikemccand merged PR #12715: URL: https://github.com/apache/lucene/pull/12715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12715: URL: https://github.com/apache/lucene/pull/12715#discussion_r1392585037 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -125,8 +125,11 @@ public class FSTCompiler { /** * Instantiates an FST/FSA builder

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392585398 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizin

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on PR #12715: URL: https://github.com/apache/lucene/pull/12715#issuecomment-1810209410 Thank you @cavorite! Much cleaner to use a consistent API for building FSTs... -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392595332 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar t

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392598634 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392603399 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

Re: [I] Make FST BytesStore grow smoothly [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on issue #12619: URL: https://github.com/apache/lucene/issues/12619#issuecomment-1810237337 Note that `oal.store.ByteBuffersDataOutput` takes a different and neat approach to gracefully growing: it picks an initial block size, and appends new blocks as you write bytes,

Re: [PR] Fix large 99 percentile match latency in Monitor during concurrent commits and purges [lucene]

2023-11-14 Thread via GitHub
romseygeek merged PR #12801: URL: https://github.com/apache/lucene/pull/12801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
mikemccand opened a new pull request, #12805: URL: https://github.com/apache/lucene/pull/12805 I noticed yet another errant `;` and then grep'd and found tons of them and removed them. Note that it was a bit tricky because some lines that have only whitespace and a semicolon are actu

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392645763 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar t

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392647780 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

Re: [PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
uschindler commented on code in PR #12805: URL: https://github.com/apache/lucene/pull/12805#discussion_r1392688486 ## lucene/queryparser/src/java/org/apache/lucene/queryparser/surround/parser/QueryParser.java: ## @@ -174,7 +174,6 @@ protected SrndQuery getTruncQuery(String trunc

Re: [PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
uschindler commented on code in PR #12805: URL: https://github.com/apache/lucene/pull/12805#discussion_r1392697633 ## lucene/queryparser/src/java/org/apache/lucene/queryparser/surround/parser/QueryParser.java: ## @@ -174,7 +174,6 @@ protected SrndQuery getTruncQuery(String trunc

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392710843 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392710843 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12805: URL: https://github.com/apache/lucene/pull/12805#discussion_r1392935143 ## lucene/queryparser/src/java/org/apache/lucene/queryparser/surround/parser/QueryParser.java: ## @@ -174,7 +174,6 @@ protected SrndQuery getTruncQuery(String trunc

Re: [PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
mikemccand merged PR #12805: URL: https://github.com/apache/lucene/pull/12805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-14 Thread via GitHub
slow-J commented on PR #12797: URL: https://github.com/apache/lucene/pull/12797#issuecomment-1810872291 > ```java > msg(infoStream, "Skipping logical integrity checks: pass -ea -slow to check logical integrity") > ``` I tried and this is not possible to do, because while runni

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-14 Thread via GitHub
slow-J commented on code in PR #12797: URL: https://github.com/apache/lucene/pull/12797#discussion_r1393046003 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3974,7 +3974,7 @@ public static class Options { boolean doExorcise = false; boolean do

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-14 Thread via GitHub
slow-J commented on code in PR #12797: URL: https://github.com/apache/lucene/pull/12797#discussion_r1393046003 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3974,7 +3974,7 @@ public static class Options { boolean doExorcise = false; boolean do

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-14 Thread via GitHub
slow-J commented on code in PR #12797: URL: https://github.com/apache/lucene/pull/12797#discussion_r1393115663 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -4051,15 +4057,30 @@ public static Options parseOptions(String[] args) { int i = 0; whi

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1811057058 Keeping the `visitLimit` = 0 (immediately fallback to lazy iterator) we expect an exact search to be performed (and `recall` = 1) as soon as the first node is visited (`numVisited` = 1)

Re: [PR] Utilize exact kNN search when gathering k > numVectors in a segment [lucene]

2023-11-14 Thread via GitHub
benwtrent commented on code in PR #12806: URL: https://github.com/apache/lucene/pull/12806#discussion_r1393164542 ## lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java: ## @@ -779,6 +781,16 @@ Directory getIndexStore( doc.add(getKnnVectorField(f

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1393228359 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Close all files when hitting an I/O exception with vectors. [lucene]

2023-11-14 Thread via GitHub
benwtrent commented on code in PR #12807: URL: https://github.com/apache/lucene/pull/12807#discussion_r1393328067 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -92,18 +92,8 @@ public final class Lucene99HnswVectorsReader extends

Re: [PR] Close all files when hitting an I/O exception with vectors. [lucene]

2023-11-14 Thread via GitHub
jpountz commented on code in PR #12807: URL: https://github.com/apache/lucene/pull/12807#discussion_r1393332881 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -92,18 +92,8 @@ public final class Lucene99HnswVectorsReader extends K

Re: [PR] Fix errorprone with alternative runtime [lucene]

2023-11-14 Thread via GitHub
uschindler merged PR #12808: URL: https://github.com/apache/lucene/pull/12808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Fix errorprone with alternative runtime [lucene]

2023-11-14 Thread via GitHub
uschindler commented on PR #12808: URL: https://github.com/apache/lucene/pull/12808#issuecomment-1811439713 Merged to lucene/branch_9x, solr/main + solr/branch_9x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-14 Thread via GitHub
Shibi-bala commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1811494543 Kind of confused why this check is failing. This was never changed and I've tried merging. 1``` . ERROR in /home/runner/work/lucene/lucene/lucene/test-framework/src/java/org

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-14 Thread via GitHub
uschindler commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1811511529 Have you merged in the latest main branch, so this PR is uptodate? This could be an issue which already existed when the PR was created. -- This is an automated message from the Apa

Re: [PR] Fix errorprone with alternative runtime [lucene]

2023-11-14 Thread via GitHub
uschindler commented on PR #12808: URL: https://github.com/apache/lucene/pull/12808#issuecomment-1811585147 Hi @dweiss, The issue with errorprone is exactly the same like we have seen for turbocharger of Java options: https://github.com/gradle/gradle/issues/22746 The new version of

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393461923 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -247,16 +306,14 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Fix errorprone with alternative runtime [lucene]

2023-11-14 Thread via GitHub
uschindler commented on PR #12808: URL: https://github.com/apache/lucene/pull/12808#issuecomment-1811586309 See also https://github.com/apache/beam/pull/24930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393462969 ## lucene/core/src/java/org/apache/lucene/util/fst/OnHeapFSTStore.java: ## @@ -64,22 +66,13 @@ public FSTStore init(DataInput in, long numBytes) throws IOException {

Re: [I] Can FST read bytes forward? [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on issue #12355: URL: https://github.com/apache/lucene/issues/12355#issuecomment-1811598793 > reverse byte[] after writing them all Interestingly we are specifically reverse the byte[] after the write to make it backward. To make it forward we can simply *not* do th

Re: [I] Make FST BytesStore grow smoothly [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on issue #12619: URL: https://github.com/apache/lucene/issues/12619#issuecomment-1811613353 In https://github.com/apache/lucene/pull/12624, I moved the main FST body out of `BytesStore` into `ByteBuffersDataOutput`, and BytesStore becomes only a single `byte[]` for the cu

Re: [I] Unroll or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-14 Thread via GitHub
vsop-479 closed issue #12788: Unroll or vectorize Math.max in CompetitiveImpactAccumulator.addAll? URL: https://github.com/apache/lucene/issues/12788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393547261 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393461923 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -247,16 +306,14 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-14 Thread via GitHub
cavorite commented on code in PR #12715: URL: https://github.com/apache/lucene/pull/12715#discussion_r1393698430 ## lucene/CHANGES.txt: ## @@ -7,6 +7,8 @@ http://s.apache.org/luceneversions API Changes - +* GITHUB-12695: Deprecated public constructor of F

Re: [PR] Close all files when hitting an I/O exception with vectors. [lucene]

2023-11-14 Thread via GitHub
jpountz merged PR #12807: URL: https://github.com/apache/lucene/pull/12807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-15 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1811972923 > Hi @mikemccand @jpountz @javanna @gsmiller , I have updated this PR to pick up the latest from `main`, as well as revert some changes to save them for follow-up PRs that address other

[I] Simplifying TextAreaPrintStream in Luke [lucene]

2023-11-15 Thread via GitHub
picimako opened a new issue, #12809: URL: https://github.com/apache/lucene/issues/12809 ### Description Hi, I've been looking into how [`org.apache.lucene.luke.app.desktop.util.TextAreaPrintStream`](https://github.com/apache/lucene/blob/main/lucene/luke/src/java/org/apache/luce

[PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-15 Thread via GitHub
jpountz opened a new pull request, #12810: URL: https://github.com/apache/lucene/pull/12810 Currently `advance(int target)` needs to perform two checks: - is there a need to use skip lists? - is there a need for decoding a new block? Ideally we would track the last doc ID in a

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393547261 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-15 Thread via GitHub
jpountz commented on PR #12810: URL: https://github.com/apache/lucene/pull/12810#issuecomment-1812097551 This change seems to be neutral on wikibigall. No speedup, but not slowdown either. ``` TaskQPS baseline StdDevQPS my_modified_version Std

<    21   22   23   24   25   26   27   28   29   30   >