Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-07 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1384518777 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTWriter.java: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-11-07 Thread via GitHub
gf2121 commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1798007152 > do we have any numbers if it actually helps applying deletes? I made a naive benchmark comparing `seekCeil` and `seekExact` when deleting terms on a field with bloom filter,

Re: [PR] Early terminate visit BKD leaf when current value greater than upper point in sorted dim. [lucene]

2023-11-07 Thread via GitHub
vsop-479 commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1798153626 @iverase I implemented visitWithSortedDim in LatLonPointDistanceQuery. Please take a look when you get a chance! BTW, The contents of getIntersectVisitor and getInverseIntersectVisito

Re: [PR] Remove BytesReader#reversed() [lucene]

2023-11-07 Thread via GitHub
mikemccand merged PR #12777: URL: https://github.com/apache/lucene/pull/12777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Remove `FST.BytesReader#reversed` method? [lucene]

2023-11-07 Thread via GitHub
mikemccand closed issue #12759: Remove `FST.BytesReader#reversed` method? URL: https://github.com/apache/lucene/issues/12759 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Should we ban Random#nextInt(int, int)? [lucene]

2023-11-07 Thread via GitHub
mikemccand closed issue #12771: Should we ban Random#nextInt(int, int)? URL: https://github.com/apache/lucene/issues/12771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Should we ban Random#nextInt(int, int)? [lucene]

2023-11-07 Thread via GitHub
mikemccand commented on issue #12771: URL: https://github.com/apache/lucene/issues/12771#issuecomment-1798212632 Indeed, the javadocs have the same terrifying explanation -- thanks @dungba88!: `If bound is a power of two then limiting is a simple masking operation. Otherwise, the result is

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-07 Thread via GitHub
mikemccand commented on code in PR #12778: URL: https://github.com/apache/lucene/pull/12778#discussion_r1384692182 ## lucene/core/src/test/org/apache/lucene/util/TestByteBlockPool.java: ## @@ -25,7 +24,34 @@ public class TestByteBlockPool extends LuceneTestCase { - public

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-07 Thread via GitHub
mikemccand commented on code in PR #12778: URL: https://github.com/apache/lucene/pull/12778#discussion_r1384697358 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -234,6 +234,44 @@ public void append(final BytesRef bytes) { append(bytes.bytes, bytes

Re: [PR] Copy directly between 2 ByteBlockPool to avoid double-copy [lucene]

2023-11-07 Thread via GitHub
dungba88 commented on code in PR #12778: URL: https://github.com/apache/lucene/pull/12778#discussion_r1384749437 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -234,6 +234,44 @@ public void append(final BytesRef bytes) { append(bytes.bytes, bytes.o

Re: [PR] Enable executing using NFA in RegexpQuery [lucene]

2023-11-07 Thread via GitHub
rmuir commented on code in PR #12767: URL: https://github.com/apache/lucene/pull/12767#discussion_r1384795478 ## lucene/core/src/test/org/apache/lucene/search/TestRegexpQuery.java: ## @@ -80,7 +80,10 @@ private long caseInsensitiveRegexQueryNrHits(String regex) throws IOExcepti

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-07 Thread via GitHub
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1798357836 @kevindrosendahl this looks really interesting! Thank you for digging in and starting the experimentation! I haven't had a chance to read your branch yet, but hope to soon.

Re: [PR] Stop exploring HNSW graph if scores are not getting better. [lucene]

2023-11-07 Thread via GitHub
benwtrent merged PR #12770: URL: https://github.com/apache/lucene/pull/12770 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Speed up BytesRefHash#sort [lucene]

2023-11-07 Thread via GitHub
gf2121 commented on PR #12775: URL: https://github.com/apache/lucene/pull/12775#issuecomment-1798365624 For realistic data I tried to index `wikimedium10m` with ramBuffer = 1024m and accumulating the took of all invoking of `BytesRefHash#sort`. Result shows the took sum decreased from 27161

Re: [PR] Speed up BytesRefHash#sort [lucene]

2023-11-07 Thread via GitHub
dweiss commented on PR #12775: URL: https://github.com/apache/lucene/pull/12775#issuecomment-1798550955 Wow. Nice improvement! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Speed up BytesRefHash#sort [lucene]

2023-11-07 Thread via GitHub
mikemccand commented on PR #12775: URL: https://github.com/apache/lucene/pull/12775#issuecomment-1798603970 This is incredible speedup :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] Fix nested link warning from renderSiteJavadoc [lucene]

2023-11-07 Thread via GitHub
slow-J opened a new pull request, #12779: URL: https://github.com/apache/lucene/pull/12779 Very minor change. ### Description Without this change, we are getting this warning in `> Task :lucene:core:renderSiteJavadoc` ``` /local/home/jslowins/upstream_bench/lucene_bench_

Re: [PR] Fix nested link warning from renderSiteJavadoc [lucene]

2023-11-07 Thread via GitHub
mikemccand commented on code in PR #12779: URL: https://github.com/apache/lucene/pull/12779#discussion_r1385198200 ## lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java: ## @@ -29,7 +29,7 @@ /** * Loader for text files that represent a list of stopwords. *

Re: [PR] Fix nested link warning from renderSiteJavadoc [lucene]

2023-11-07 Thread via GitHub
slow-J commented on code in PR #12779: URL: https://github.com/apache/lucene/pull/12779#discussion_r1385202002 ## lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java: ## @@ -29,7 +29,7 @@ /** * Loader for text files that represent a list of stopwords. * - *

Re: [PR] Fix nested link warning from renderSiteJavadoc [lucene]

2023-11-07 Thread via GitHub
mikemccand merged PR #12779: URL: https://github.com/apache/lucene/pull/12779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Explore partially decoding blocks (within-block skipping) [lucene]

2023-11-07 Thread via GitHub
Tony-X commented on issue #12749: URL: https://github.com/apache/lucene/issues/12749#issuecomment-1799460960 > How would it work? Since blocks are delta-coded, you can't know the value at a given index without decoding all previous values and computing their sum? Or you need to store some c

[PR] Normalize written scalar quantized vectors when using cosine similarity [lucene]

2023-11-07 Thread via GitHub
kevindrosendahl opened a new pull request, #12780: URL: https://github.com/apache/lucene/pull/12780 ### Description When using cosine similarity, the `ScalarQuantizer` normalizes vectors when calculating quantiles and `ScalarQuantizedRandomVectorScorer` normalizes query vectors befor

Re: [PR] Normalize written scalar quantized vectors when using cosine similarity [lucene]

2023-11-07 Thread via GitHub
kevindrosendahl commented on code in PR #12780: URL: https://github.com/apache/lucene/pull/12780#discussion_r1385606896 ## lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestLucene99HnswQuantizedVectorsFormat.java: ## @@ -38,8 +38,10 @@ import org.apache.lucene.store.Di

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-07 Thread via GitHub
kevindrosendahl commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1800243525 > I haven't had a chance to read your branch yet, but hope to soon. Great, thanks! To save you a bit of time, the tl;dr of going from HNSW to vamana is that it's actua

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-07 Thread via GitHub
nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1800972146 Hi @msokolov ! Thanks for clarifying. But I think it can help to remove the 'less important' edge from both sides, since it frees up a degree of "other" node to accept a new

Re: [PR] Speed up BytesRefHash#sort [lucene]

2023-11-07 Thread via GitHub
gf2121 commented on PR #12775: URL: https://github.com/apache/lucene/pull/12775#issuecomment-1801128536 Something odd: I accidentally run the benchmark on another mac with `intel chip` and the result is disappointing (no improvements) ``` use stable sort: false, sort 5169965 terms, too

Re: [PR] Speed up BytesRefHash#sort [lucene]

2023-11-07 Thread via GitHub
gf2121 commented on PR #12775: URL: https://github.com/apache/lucene/pull/12775#issuecomment-1801136851 Something odd: I accidentally run the benchmark on another mac with `intel chip` and the result is disappointing (no obvious improvements) ``` use stable sort: false, sort 5169965 te

Re: [PR] Speed up BytesRefHash#sort [lucene]

2023-11-07 Thread via GitHub
gf2121 commented on PR #12775: URL: https://github.com/apache/lucene/pull/12775#issuecomment-1801199102 I checked linux x86, no obvious speedup too. Maybe the stable style somewhat helped the arm CPU cache ``` use stable sort: false, sort 5169965 terms, took: 4900ms use stable sort:

Re: [PR] Speed up BytesRefHash#sort [lucene]

2023-11-07 Thread via GitHub
dweiss commented on PR #12775: URL: https://github.com/apache/lucene/pull/12775#issuecomment-1801236300 Maybe Apple chips are tuned for sorting (they need to sort out what to do with all these income bills, after all)? :) And seriously - thank you for checking on different hardware. T

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-07 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1801269361 Hi @mikemccand @jpountz @javanna @gsmiller , I have updated this PR to pick up the latest from `main`, as well as revert some changes to save them for follow-up PRs that address other co