[PR] Improve user-facing docs. [lucene]

2025-02-17 Thread via GitHub
jpountz opened a new pull request, #14251: URL: https://github.com/apache/lucene/pull/14251 This improves user-facing docs in Lucene's package javadocs: - Make some docs up-to-date, e.g. some of them were still referring to oal.index.Fields. - More emphasis of structured search and k

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-02-17 Thread via GitHub
gaoj0017 commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2664639299 As we have consistently emphasized in both public and private communications, we are concerned that the **OSQ method employs an idea highly similar to the one presented in our [extended

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
gsmiller merged PR #14238: URL: https://github.com/apache/lucene/pull/14238 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
gsmiller commented on PR #14238: URL: https://github.com/apache/lucene/pull/14238#issuecomment-2664293726 Just merged to main. Thanks again @houserjohn 🎉 (I marked this under the 10.2 milestone based on the CHANGES entry; I'll keep an eye out for a backport PR and/or can help with this as

Re: [PR] Add new Directory implementation for AWS S3 [lucene]

2025-02-17 Thread via GitHub
github-actions[bot] commented on PR #13949: URL: https://github.com/apache/lucene/pull/13949#issuecomment-2664272130 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Bump floor segment size to 16MB. [lucene]

2025-02-17 Thread via GitHub
github-actions[bot] commented on PR #14189: URL: https://github.com/apache/lucene/pull/14189#issuecomment-2664271800 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Disable the query cache by default. [lucene]

2025-02-17 Thread via GitHub
github-actions[bot] commented on PR #14187: URL: https://github.com/apache/lucene/pull/14187#issuecomment-2664271825 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
gsmiller commented on PR #14238: URL: https://github.com/apache/lucene/pull/14238#issuecomment-2664264734 Looks good. Thanks @houserjohn ! I see @stefanvodita also added his approval so I will go ahead and get this merged. Thanks again! -- This is an automated message from the Apache Git

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1958855333 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +95,241 @@ public void testComputeDynamicNumericRangesWithOneLargeWe

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
gsmiller commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1958849147 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +95,241 @@ public void testComputeDynamicNumericRangesWithOneLargeWeig

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1958838623 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +95,243 @@ public void testComputeDynamicNumericRangesWithOneLargeWe

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1958839222 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +95,243 @@ public void testComputeDynamicNumericRangesWithOneLargeWe

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-17 Thread via GitHub
benwtrent commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2664103686 @msokolov I can provide a quick patch tomorrow against main. As I said, it would work for stuff as it is now. -- This is an automated message from the Apache Git Service. To respond

Re: [I] [Feature] Add support for passing extra information with KNNVectorField [lucene]

2025-02-17 Thread via GitHub
navneet1v commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2664091298 > If the use case is multitenancy, it seems you would never want to search across tenants, so this would apply not only to KNN search but to all kinds of search? I agree the impac

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-17 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2664096507 @benwtrent I'm curious how you managed to re-use the scores. I poked around a little and I guess it requires some new plumbing / class casts since the API isn't really designed for it?

Re: [I] [Feature] Add support for passing extra information with KNNVectorField [lucene]

2025-02-17 Thread via GitHub
msokolov commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2664078595 If the use case is multitenancy, it seems you would never want to search across tenants, so this would apply not only to KNN search but to all kinds of search? I agree the impact o

Re: [I] [Feature] Add support for passing extra information with KNNVectorField [lucene]

2025-02-17 Thread via GitHub
navneet1v commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2663976487 > I wonder if there might be a better way to accomplish your actual goal. Adding "extra data" doesn't seem like a good idea to me since it inherently blurs the function of the dat

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
gsmiller commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1958713253 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +95,243 @@ public void testComputeDynamicNumericRangesWithOneLargeWeig

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-17 Thread via GitHub
epotyom commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1958677872 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-17 Thread via GitHub
epotyom commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1958677872 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-17 Thread via GitHub
epotyom commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1958677872 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-17 Thread via GitHub
HoustonPutman commented on PR #13914: URL: https://github.com/apache/lucene/pull/13914#issuecomment-2663827155 > I know you mentioned there is a change in behavior in the caveat, but I do believe that this example should probably return ranges with equal counts. This one was a `<=` th

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-17 Thread via GitHub
HoustonPutman commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1958610123 ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used t

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-17 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2663774846 Thanks for running benchmarks. I'm confused as to why running inner loops of size 512 would be to much better than inner loops of size 32. This doesn't feel right? Does luceneutil also r

Re: [PR] Introduce multiSelect for ScalarQuantizer [lucene]

2025-02-17 Thread via GitHub
HoustonPutman closed pull request #13919: Introduce multiSelect for ScalarQuantizer URL: https://github.com/apache/lucene/pull/13919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-17 Thread via GitHub
jpountz commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1958608604 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Relation check within 1D BKD Leaves [lucene]

2025-02-17 Thread via GitHub
jpountz commented on PR #14244: URL: https://github.com/apache/lucene/pull/14244#issuecomment-2663735068 I understand the idea, but this looks a bit like a hack to me. I'm also not too fond of optimizing point-in-set queries, as terms would be a better way to index the data than points if t

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1958559535 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -285,6 +285,23 @@ private static void assertDynamicNumericRangeValidPropert

Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

2025-02-17 Thread via GitHub
benwtrent commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2663547774 > But the only question is whether we could somehow make it available in on 10.x since it would be a behavior change? I'm not sure if that question is still relevant though. If the res

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-17 Thread via GitHub
epotyom commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1958063799 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/utils/BaseFacetBuilder.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

2025-02-17 Thread via GitHub
msokolov commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2663491200 > Do you think the behavior change/result change is worth waiting for a major? I do think folks should be able to use this now, but be able to opt out. -- This is an automated message

Re: [PR] Remove duplicates from the hnsw recall testing [lucene]

2025-02-17 Thread via GitHub
msokolov commented on PR #14234: URL: https://github.com/apache/lucene/pull/14234#issuecomment-2663482993 thanks, @benwtrent ... this test was a bit hacky. Makes sense to remove noisy data from a test like this -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] [Feature] Add support for passing extra information with KNNVectorField [lucene]

2025-02-17 Thread via GitHub
msokolov commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2663422716 I wonder if there might be a better way to accomplish your actual goal. Adding "extra data" doesn't seem like a good idea to me since it inherently blurs the function of the data

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-17 Thread via GitHub
epotyom commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1958141105 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/utils/CommonFacetBuilder.java: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-17 Thread via GitHub
stefanvodita commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1958258387 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/utils/BaseFacetBuilder.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (A

[PR] Accept nodes where score == Math.nextUp(results.minCompetitiveSimilarity()) after #14215 [lucene]

2025-02-17 Thread via GitHub
iverase opened a new pull request, #14249: URL: https://github.com/apache/lucene/pull/14249 In https://github.com/apache/lucene/pull/12770 we refine how we were searching the graph by only considering nodes that were making the values of the score better, basically the condition in line 290

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-17 Thread via GitHub
epotyom commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1958022460 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-17 Thread via GitHub
epotyom commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1958053677 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -130,6 +135,88 @@ void index() throws IOException { IOUtils.close(indexWrite

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-17 Thread via GitHub
epotyom commented on PR #14237: URL: https://github.com/apache/lucene/pull/14237#issuecomment-2662789904 Thanks for reviewing @stefanvodita ! > I have some non-blocking concerns that this is a lot of new code, with more abstractions, and it doesn't have a user waiting to use it. There

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-17 Thread via GitHub
epotyom commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1958022460 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
stefanvodita commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1957977134 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -285,6 +285,23 @@ private static void assertDynamicNumericRangeValidPrope

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-17 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1957934311 ## lucene/CHANGES.txt: ## @@ -20,6 +20,7 @@ New Features Improvements - +* GITHUB#14238: Improve test coverage of Dynamic Range Faceting. (J

Re: [I] Figure out why hunspell tests occasionally fail and make them more consistent [lucene]

2025-02-17 Thread via GitHub
dweiss commented on issue #14235: URL: https://github.com/apache/lucene/issues/14235#issuecomment-2662323672 Ok, the scheduled actions works as expected: https://github.com/apache/lucene/actions/runs/13362768562 -- This is an automated message from the Apache Git Service. To respond to