Re: [PR] Add new Directory implementation for AWS S3 [lucene]

2024-10-24 Thread via GitHub
jpountz commented on code in PR #13949: URL: https://github.com/apache/lucene/pull/13949#discussion_r1815193079 ## lucene/s3directory/build.gradle: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements.

[PR] Ensure doc order for TestCommonTermsQuery#testMinShouldMatch [lucene]

2024-10-24 Thread via GitHub
benwtrent opened a new pull request, #13953: URL: https://github.com/apache/lucene/pull/13953 Need to ensure doc order as some docs have scores that are exactly the same. closes: https://github.com/apache/lucene/issues/13946 -- This is an automated message from the Apache Git Servic

Re: [I] TestCommonTermsQuery.testMinShouldMatch test failure [lucene]

2024-10-24 Thread via GitHub
benwtrent commented on issue #13946: URL: https://github.com/apache/lucene/issues/13946#issuecomment-2435510156 ``` 1> 0 doc: 3 id: 0 score: 2.8378134 1> 1 doc: 0 id: 3 score: 0.16505925 1> 2 doc: 1 id: 2 score: 0.16505925 ``` I think gitbisect might be a red herring?

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-10-24 Thread via GitHub
mikemccand commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1814796588 ## lucene/native/src/c/dotProduct.c: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreeme

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-24 Thread via GitHub
jpountz commented on code in PR #13951: URL: https://github.com/apache/lucene/pull/13951#discussion_r1815149823 ## lucene/core/src/java/org/apache/lucene/index/IndexWriterRAMManager.java: ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-10-24 Thread via GitHub
github-actions[bot] commented on PR #13782: URL: https://github.com/apache/lucene/pull/13782#issuecomment-2436563274 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Check ahead if we can get the count [lucene]

2024-10-24 Thread via GitHub
LuXugang commented on code in PR #13899: URL: https://github.com/apache/lucene/pull/13899#discussion_r1816108095 ## lucene/core/src/java/org/apache/lucene/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -186,10 +186,44 @@ public boolean isCacheable(LeafReaderContext

[PR] Disable exchanging minimum scores across slices for exhaustive evaluation. [lucene]

2024-10-24 Thread via GitHub
jpountz opened a new pull request, #13954: URL: https://github.com/apache/lucene/pull/13954 When `totalHitsThreshold` is `Integer.MAX_VALUE`, dynamic pruning is never used and all hits get evaluated. Thus, the minimum competitive score always stays at zero, and there is nothing to exchange

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-10-24 Thread via GitHub
mikemccand commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1814797680 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentScalarQuantizedVectorScorer.java: ## @@ -0,0 +1,407 @@ +/* + * Licensed to t

Re: [PR] Removing the deprecated parameters, -fast, -slow, -crossCheckTermVectors from CheckIndex. [lucene]

2024-10-24 Thread via GitHub
stefanvodita merged PR #13942: URL: https://github.com/apache/lucene/pull/13942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

[PR] Remove some useless code in TopScoreDocCollector. [lucene]

2024-10-24 Thread via GitHub
jpountz opened a new pull request, #13955: URL: https://github.com/apache/lucene/pull/13955 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] Made the UnifiedHighlighter's hasUnrecognizedQuery function processes FunctionQuery the same way as MatchAllDocsQuery and MatchNoDocsQuery queries for performance reasons. [lucene]

2024-10-24 Thread via GitHub
ljak closed pull request #12938: Made the UnifiedHighlighter's hasUnrecognizedQuery function processes FunctionQuery the same way as MatchAllDocsQuery and MatchNoDocsQuery queries for performance reasons. URL: https://github.com/apache/lucene/pull/12938 -- This is an automated message from t

Re: [PR] Made the UnifiedHighlighter's hasUnrecognizedQuery function processes FunctionQuery the same way as MatchAllDocsQuery and MatchNoDocsQuery queries for performance reasons. [lucene]

2024-10-24 Thread via GitHub
ljak commented on PR #12938: URL: https://github.com/apache/lucene/pull/12938#issuecomment-2435843480 Closing as @vletard continued the conversation over https://github.com/apache/lucene/pull/13165 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-24 Thread via GitHub
jpountz commented on code in PR #13950: URL: https://github.com/apache/lucene/pull/13950#discussion_r1815175858 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -136,20 +158,20 @@ public List clauses() { } /** Return the collection of queries for

Re: [I] Could Lucene's default Directory (`FSDirectory.open`) somehow preload `.vec` files? [lucene]

2024-10-24 Thread via GitHub
gautamworah96 commented on issue #13551: URL: https://github.com/apache/lucene/issues/13551#issuecomment-2436108800 @uschindler At Amazon, we implemented the `mmapDir.setPreload((name,ctx) -> name.endsWith(".vec"));` style fix you had suggested but later realized that this would not work fo

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-24 Thread via GitHub
gsmiller commented on PR #13950: URL: https://github.com/apache/lucene/pull/13950#issuecomment-2435456274 > This LGTM but it has been this way for so long I wonder if someone with more historical knowledge will have a rationale for hiding these things? -- This is an automated message from

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-24 Thread via GitHub
gsmiller commented on PR #13950: URL: https://github.com/apache/lucene/pull/13950#issuecomment-2435485794 > This LGTM but it has been this way for so long I wonder if someone with more historical knowledge will have a rationale for hiding these things? I took a quick peek at the git h

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-24 Thread via GitHub
shubhamvishu commented on PR #13950: URL: https://github.com/apache/lucene/pull/13950#issuecomment-2436351224 Thanks everyone for taking a look! As per comments the `#isPureDisjunction` and `#isTwoClausePureDisjunctionWithTerms` seems to be the debatable ones : - About `#isTwoClausePu

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-10-24 Thread via GitHub
mikemccand commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2435099386 @goankur -- thank you for pulling out the actual native code into a new `native` Lucene module. I'm not sure we need a new module -- could we use `misc` or `sandbox` maybe? I

Re: [PR] Fix ord-to-doc mapping when searching Lucene 9.0.0 hnsw indices [lucene]

2024-10-24 Thread via GitHub
benwtrent merged PR #13947: URL: https://github.com/apache/lucene/pull/13947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-24 Thread via GitHub
jpountz commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2435542959 I wouldn't sort them, and just rely on the order that the caller supplied? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Add tooling back on 9.10.x branch to generate int7_hnsw.9.10.zip bwc index [lucene]

2024-10-24 Thread via GitHub
github-actions[bot] commented on PR #13879: URL: https://github.com/apache/lucene/pull/13879#issuecomment-2436563159 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-24 Thread via GitHub
mdmarshmallow commented on code in PR #13951: URL: https://github.com/apache/lucene/pull/13951#discussion_r1815876685 ## lucene/core/src/java/org/apache/lucene/index/IndexWriterRAMManager.java: ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-24 Thread via GitHub
mdmarshmallow commented on PR #13951: URL: https://github.com/apache/lucene/pull/13951#issuecomment-2436487725 > Thanks for looking into it. It feels like there should be closer integration between this and the existing `FlushPolicy`/`FlushByRamOrCountsPolicy`? So I actually had the

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-24 Thread via GitHub
shubhamvishu commented on code in PR #13950: URL: https://github.com/apache/lucene/pull/13950#discussion_r1815551922 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -87,6 +87,28 @@ public Builder add(BooleanClause clause) { return this; }

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-24 Thread via GitHub
shubhamvishu commented on code in PR #13950: URL: https://github.com/apache/lucene/pull/13950#discussion_r1815552155 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -87,6 +87,28 @@ public Builder add(BooleanClause clause) { return this; }

Re: [I] Range Query Type With Logically Connected Ranges [LUCENE-8769] [lucene]

2024-10-24 Thread via GitHub
mkhludnev commented on issue #9814: URL: https://github.com/apache/lucene/issues/9814#issuecomment-2435334638 Hi, Is there a similar query for docValues only fields? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] Should knn query rewrite parallelize on slices like ordinary search? [lucene]

2024-10-24 Thread via GitHub
javanna opened a new issue, #13952: URL: https://github.com/apache/lucene/issues/13952 Search concurrency creates one task per slice. A slice is a collection of one or more segment partitions (although segments are not split into partitions by default just yet). The slicing mechanism allows

Re: [I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]

2024-10-24 Thread via GitHub
navneet1v commented on issue #13920: URL: https://github.com/apache/lucene/issues/13920#issuecomment-2434404176 @uschindler and @jpountz thanks for your inputs and detailed explanation. @shatejas I think all the required details are present, so are you going to raise a PR for this? @

Re: [I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]

2024-10-24 Thread via GitHub
navneet1v commented on issue #13920: URL: https://github.com/apache/lucene/issues/13920#issuecomment-2434499294 > Opening new readers is too expensive and mostly not useful. @uschindler one question on this, the reason why you say opening new readers is expensive because readers mostl