Re: [PR] Make DirectMonotonicReader.Meta more compact [lucene]

2024-10-23 Thread via GitHub
github-actions[bot] commented on PR #13864: URL: https://github.com/apache/lucene/pull/13864#issuecomment-2433864889 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

[PR] Multi-tenant index writer initial commit [lucene]

2024-10-23 Thread via GitHub
mdmarshmallow opened a new pull request, #13951: URL: https://github.com/apache/lucene/pull/13951 ### Description Draft PR to outline my initial approach. I introduced `IndexWriterRamManager` to control writer flushes. I also have a function `IndexWriterRamManager#chooseWrite

Re: [I] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-23 Thread via GitHub
mdmarshmallow commented on issue #13913: URL: https://github.com/apache/lucene/issues/13913#issuecomment-2433693397 Here is a draft PR if anyone is interested in sanity checking my approach: https://github.com/apache/lucene/pull/13951 -- This is an automated message from the Apache Git Se

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-23 Thread via GitHub
msokolov commented on code in PR #13950: URL: https://github.com/apache/lucene/pull/13950#discussion_r1813555056 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -87,6 +87,28 @@ public Builder add(BooleanClause clause) { return this; } +

Re: [PR] Include java21 source folders to gradle source sets [lucene]

2024-10-23 Thread via GitHub
javanna commented on PR #13926: URL: https://github.com/apache/lucene/pull/13926#issuecomment-2433270359 Thanks @dweiss ! Those manual steps worked for me as well. Good to know that there is at least a way to get the IDE to do something useful with those files, especially needing to make ch

Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-23 Thread via GitHub
ljak commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2433229507 I needed to sort the `Query`s in some ways, so I compare them according to their toString representation: `orderedQueries.sort(Comparator.comparing(Query::toString));` Not sure

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-10-23 Thread via GitHub
msokolov commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1813272520 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -84,6 +91,76 @@ public void init() { floatsA[i] = random.nextF

[PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-23 Thread via GitHub
shubhamvishu opened a new pull request, #13950: URL: https://github.com/apache/lucene/pull/13950 ### Description Changes in this PR : 1. Makes some `BooleanQuery` methods that seem useful to user public. Like the `getClauses(Occur occur` to get the collection of one type of cla

[PR] Add new Directory implementation for AWS S3 [lucene]

2024-10-23 Thread via GitHub
albogdano opened a new pull request, #13949: URL: https://github.com/apache/lucene/pull/13949 ### Description This PR adds a new module `s3directory` to Lucene, containing a new `Directory` implementation for AWS S3. The code was adapted from the [lucene-s3-directory](https://gith

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-23 Thread via GitHub
benwtrent commented on PR #13910: URL: https://github.com/apache/lucene/pull/13910#issuecomment-2432746347 @mikemccand I have a PR open for this bug fix for 9.12. Will merge soon. Could you add a CHANGES entry in 9.12 for your bug fix for 9.12.1? -- This is an automated message fro

Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-23 Thread via GitHub
jpountz commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2432515126 See `BooleanQuery#clauseSets`, which is used for equals()/hashcode() and `BooleanQuery#clauses`, which is used for toString(). -- This is an automated message from the Apache Git Servi

Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-23 Thread via GitHub
ljak commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2432492876 Thanks for the feedback. Looking at `BooleanQuery`, it "only" has one list `List clauses`. So, is the idea to have 2 structures for the `DisjunctionMaxQuery`, the unordered multiset of quer

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-10-23 Thread via GitHub
jpountz commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1812935475 ## lucene/core/src/java/org/apache/lucene/index/BinaryDocValues.java: ## @@ -33,4 +34,15 @@ protected BinaryDocValues() {} * @return binary value */ public

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-23 Thread via GitHub
mikemccand commented on PR #13910: URL: https://github.com/apache/lucene/pull/13910#issuecomment-2432474472 > This seems like something we maybe ought to make the user community aware of. +1 thanks @msokolov. > @msokolov could we do a simpler patch for 9.12.1? +1. 9.12.

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-23 Thread via GitHub
mikemccand commented on code in PR #13910: URL: https://github.com/apache/lucene/pull/13910#discussion_r1812937509 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java: ## @@ -260,7 +260,7 @@ public void search(String field

Re: [PR] Allow reading binary doc values as a DataInput [lucene]

2024-10-23 Thread via GitHub
iverase closed pull request #12460: Allow reading binary doc values as a DataInput URL: https://github.com/apache/lucene/pull/12460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Allow reading binary doc values as a DataInput [lucene]

2024-10-23 Thread via GitHub
iverase commented on PR #12460: URL: https://github.com/apache/lucene/pull/12460#issuecomment-2432383872 I open https://github.com/apache/lucene/pull/13948 which is clearly less invasive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-10-23 Thread via GitHub
iverase opened a new pull request, #13948: URL: https://github.com/apache/lucene/pull/13948 Following up this [suggestion](https://github.com/apache/lucene/pull/12460#issuecomment-1715126194) from @jpountz, here I propose to add a new method to the BinaryDocValues API that returns the cont

Re: [PR] Remove TopScoreDocCollector's dependency on HitsThresholdChecker. [lucene]

2024-10-23 Thread via GitHub
jpountz commented on PR #13943: URL: https://github.com/apache/lucene/pull/13943#issuecomment-2432246400 Actually, while I was at it, I also removed `TopFieldCollector`'s dependency on HitsThresholdChecker, and then removed `HitsThresholdChecker`. ``` Ta

Re: [PR] DocValuesSkipper implementation in IndexSortSorted [lucene]

2024-10-23 Thread via GitHub
gsmiller commented on code in PR #13886: URL: https://github.com/apache/lucene/pull/13886#discussion_r1812687585 ## lucene/core/src/java/org/apache/lucene/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -397,106 +413,80 @@ private boolean matchAll(PointValues points

[PR] Fix ord-to-doc mapping when searching Lucene 9.0.0 hnsw indices [lucene]

2024-10-23 Thread via GitHub
benwtrent opened a new pull request, #13947: URL: https://github.com/apache/lucene/pull/13947 Bug discovered in https://github.com/apache/lucene/pull/13910 This corrects the logic so that users in lucene 9.12.1 will be able to correctly read from Lucene 9.0.0 HNSW indices. Previously,

[I] TestCommonTermsQuery.testMinShouldMatch test failure [lucene]

2024-10-23 Thread via GitHub
benwtrent opened a new issue, #13946: URL: https://github.com/apache/lucene/issues/13946 ### Description git bisect indicated: b940511b07b ``` TestCommonTermsQuery > testMinShouldMatch FAILED --   | org.junit.ComparisonFailure: expected:<[2]> but was:<[3]>   | at __

Re: [PR] Introduce a heuristic to amortize the per-window overhead in MaxScoreBulkScorer. [lucene]

2024-10-23 Thread via GitHub
jpountz commented on PR #13941: URL: https://github.com/apache/lucene/pull/13941#issuecomment-2431805003 There is a good speedup on `OrMany` as expected: https://benchmarks.mikemccandless.com/OrMany.html. I'll push an annotation. -- This is an automated message from the Apache Git Service

Re: [PR] Allow reading binary doc values as a DataInput [lucene]

2024-10-23 Thread via GitHub
iverase commented on code in PR #12460: URL: https://github.com/apache/lucene/pull/12460#discussion_r1812371829 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -820,6 +822,201 @@ public BytesRef binaryValue() throws IOException {

Re: [PR] Allow reading binary doc values as a DataInput [lucene]

2024-10-23 Thread via GitHub
iverase commented on code in PR #12460: URL: https://github.com/apache/lucene/pull/12460#discussion_r1812371829 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -820,6 +822,201 @@ public BytesRef binaryValue() throws IOException {

Re: [PR] Allow reading binary doc values as a DataInput [lucene]

2024-10-23 Thread via GitHub
iverase commented on code in PR #12460: URL: https://github.com/apache/lucene/pull/12460#discussion_r1812371829 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -820,6 +822,201 @@ public BytesRef binaryValue() throws IOException {

Re: [I] Allow reading binary doc values as a DataInput [lucene]

2024-10-23 Thread via GitHub
iverase commented on issue #12459: URL: https://github.com/apache/lucene/issues/12459#issuecomment-2431552556 Now that lucene 10 has been released and our java minimum version is 21, the RandomAccessInput API got efficient methods to read byte[] from the method #readBytes, I think this API

[I] Take advantage of DocValuesSkipper for SortedNumericDocValuesRangeQuery's count [lucene]

2024-10-23 Thread via GitHub
LuXugang opened a new issue, #13945: URL: https://github.com/apache/lucene/issues/13945 ### Description It seems like we could use SortedNumericDocValuesRangeQuery#getDocIdSetIteratorOrNullForPrimarySort to implement Weight#count of SortedNumericDocValuesRangeQuery ? -- This is an