Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-18 Thread via GitHub
houserjohn commented on PR #13914: URL: https://github.com/apache/lucene/pull/13914#issuecomment-2667735504 Here are the promised modified randomized unit tests. These should work with your API change, but you might need to modify them to suit the caveat you mentioned. Of course, add the co

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-18 Thread via GitHub
iverase commented on PR #14213: URL: https://github.com/apache/lucene/pull/14213#issuecomment-2667648238 Thank you @Tim-Brooks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-18 Thread via GitHub
iverase merged PR #14213: URL: https://github.com/apache/lucene/pull/14213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-18 Thread via GitHub
dungba88 commented on code in PR #14226: URL: https://github.com/apache/lucene/pull/14226#discussion_r1960939250 ## lucene/core/src/java/org/apache/lucene/search/OptimisticKnnVectorQuery.java: ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Strengthen calls to isDeterministic() in TestRegExpParsing [lucene]

2025-02-18 Thread via GitHub
rmuir merged PR #14248: URL: https://github.com/apache/lucene/pull/14248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
rmuir commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2667264982 ```java var re = new RegExp("παραστάσεις", RegExp.NONE, RegExp.CASE_INSENSITIVE); System.out.println(re.toAutomaton().toDot()); ``` ![Screen_Shot_2025-02-18_at_20 03 30](https:/

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-18 Thread via GitHub
houserjohn commented on PR #14238: URL: https://github.com/apache/lucene/pull/14238#issuecomment-2667246438 I believe I have addressed the bug here in [GH#14258](https://github.com/apache/lucene/pull/14258). Moving all further updates to there. -- This is an automated message from the Ap

[PR] [Unit] Increase Dynamic Range Faceting coverage and address edge cases [lucene]

2025-02-18 Thread via GitHub
houserjohn opened a new pull request, #14258: URL: https://github.com/apache/lucene/pull/14258 ### Summary This is a continuation of [GH#14238](https://github.com/apache/lucene/pull/14238) which was reverted due to a bug discovered during a random test with seed 50D292E371306B1. The

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
john-wagster commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2667222184 awesome! thank you sir. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
rmuir merged PR #14192: URL: https://github.com/apache/lucene/pull/14192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-02-18 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2667192244 The markdown would have a huge advantage for the many analyzers with xml-style prettyprint docs today. I don't think `@snippet` works with languages other than java. For me, ed

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
rmuir commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2667183580 @john-wagster this looks great! Thank you for simplifying this down as first step. I will merge it in after CI checks. -- This is an automated message from the Apache Git Service. To res

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-02-18 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2667156816 Related: if we bump main to java 23+ (https://github.com/apache/lucene/issues/14229) then we could do snippets in javadoc with the typical markdown mechanisms. -- This

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-02-18 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2667110544 https://github.com/google/google-java-format/issues/789 https://github.com/google/google-java-format/issues/886 Unfortunately, I think these make this issue a no-go to unles

[I] Use @snippet javadoc tag for snippets [lucene]

2025-02-18 Thread via GitHub
jpountz opened a new issue, #14257: URL: https://github.com/apache/lucene/issues/14257 ### Description Now that main requires Java 21, we can start using the [`@snippet` javadoc tag](https://openjdk.org/jeps/413), which is quite more convenient to use than the `` HTML tags we are cur

Re: [I] TestTieredMergePolicy.testPartialMerge fails [lucene]

2025-02-18 Thread via GitHub
jpountz commented on issue #14255: URL: https://github.com/apache/lucene/issues/14255#issuecomment-2666999215 I looked into the failure: the test randomly configures `targetSearchConcurrency=27`. Since the index allows `ceil(numDocs/targetSearchConcurrency)` docs per segment at most, at lon

Re: [PR] Reuse entry point scores and provide mechanisms to provide scores for directly entry points [lucene]

2025-02-18 Thread via GitHub
benwtrent commented on PR #14256: URL: https://github.com/apache/lucene/pull/14256#issuecomment-2666982905 > I also tried something based on the simpler approach I mentioned and also saw very minor gains in the seeded search with reentry when reusing scores. Yeah, I don't expect this

Re: [I] [Feature] Add support for passing extra information with KNNVectorField [lucene]

2025-02-18 Thread via GitHub
navneet1v commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2666981898 > I remember (but I don't remember where) seeing someone doing multi-tenant vector search by using a flat vector index and enabling index sorting on the tenant ID. Then vector sea

Re: [PR] Reuse entry point scores and provide mechanisms to provide scores for directly entry points [lucene]

2025-02-18 Thread via GitHub
benwtrent commented on code in PR #14256: URL: https://github.com/apache/lucene/pull/14256#discussion_r1960626073 ## lucene/core/src/java/org/apache/lucene/search/knn/MappedDISI.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Reuse entry point scores and provide mechanisms to provide scores for directly entry points [lucene]

2025-02-18 Thread via GitHub
msokolov commented on PR #14256: URL: https://github.com/apache/lucene/pull/14256#issuecomment-2666962731 I also tried something based on the simpler approach I mentioned and also saw very minor gains in the seeded search with reentry when reusing scores. -- This is an automated message f

Re: [I] [Feature] Add support for passing extra information with KNNVectorField [lucene]

2025-02-18 Thread via GitHub
jpountz commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2666915751 I remember (but I don't remember where) seeing someone doing multi-tenant vector search by using a flat vector index and enabling index sorting on the tenant ID. Then vector search

Re: [PR] Reuse entry point scores and provide mechanisms to provide scores for directly entry points [lucene]

2025-02-18 Thread via GitHub
msokolov commented on code in PR #14256: URL: https://github.com/apache/lucene/pull/14256#discussion_r1960570796 ## lucene/core/src/java/org/apache/lucene/search/knn/MappedDISI.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Reuse entry point scores and provide mechanisms to provide scores for directly entry points [lucene]

2025-02-18 Thread via GitHub
msokolov commented on PR #14256: URL: https://github.com/apache/lucene/pull/14256#issuecomment-2666896574 Hi, @benwtrent , this bakes support for this supplying scores feature pretty deeply. I was thinking if we were to use this only for the SeededKnnVectorQuery, it might suffice to create

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-18 Thread via GitHub
houserjohn commented on PR #13914: URL: https://github.com/apache/lucene/pull/13914#issuecomment-2666762847 @HoustonPutman I can confirm that the latest commits fixed the exception in `testComputeDynamicNumericRangesWithLargeTopN` and the issue in `testComputeDynamicNumericRangesWithSameWei

[PR] Reuse entry point scores and provide mechanisms to provide scores for directly entry points [lucene]

2025-02-18 Thread via GitHub
benwtrent opened a new pull request, #14256: URL: https://github.com/apache/lucene/pull/14256 Spinning out of: https://github.com/apache/lucene/pull/14226 That particular evolution of kNN querying is attempting to re-entry individual segment graphs with new exit and search criteria. T

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-18 Thread via GitHub
epotyom commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1960102851 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/utils/LongRangeFacetBuilder.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-18 Thread via GitHub
epotyom commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1960081192 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/utils/CommonFacetBuilder.java: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-18 Thread via GitHub
stefanvodita commented on PR #14238: URL: https://github.com/apache/lucene/pull/14238#issuecomment-2666165615 Thanks @gsmiller! I've merged them just now. Haven't looked into the failure itself yet. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-18 Thread via GitHub
epotyom commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1960058010 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -130,6 +135,88 @@ void index() throws IOException { IOUtils.close(indexWrite

Re: [PR] Revert "Increase Dynamic Range Faceting test coverage (#14238)" [lucene]

2025-02-18 Thread via GitHub
stefanvodita merged PR #14252: URL: https://github.com/apache/lucene/pull/14252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Revert "[Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests" [lucene]

2025-02-18 Thread via GitHub
stefanvodita merged PR #14253: URL: https://github.com/apache/lucene/pull/14253 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [I] TestTieredMergePolicy.testPartialMerge fails [lucene]

2025-02-18 Thread via GitHub
benwtrent commented on issue #14255: URL: https://github.com/apache/lucene/issues/14255#issuecomment-2666153793 Indeed, it is also due to d2c69c1472c the removal of the tier max merge at once setting. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-18 Thread via GitHub
gsmiller commented on PR #14238: URL: https://github.com/apache/lucene/pull/14238#issuecomment-2666152482 @houserjohn it looks like the randomized testing failed in some recent runs (see: [here](https://github.com/apache/lucene/actions/runs/13388526608/job/37390674327)): ``` TestDynam

[I] TestTieredMergePolicy.testPartialMerge fails [lucene]

2025-02-18 Thread via GitHub
benwtrent opened a new issue, #14255: URL: https://github.com/apache/lucene/issues/14255 ### Description ``` TestTieredMergePolicy > testPartialMerge FAILED java.lang.AssertionError: count=33 maxCount=40 at __randomizedtesting.SeedInfo.seed([3E91E10831C2BDA4:158AE

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-18 Thread via GitHub
epotyom commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1960045582 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -130,6 +135,88 @@ void index() throws IOException { IOUtils.close(indexWrite

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-18 Thread via GitHub
Shradha26 commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1959695463 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -130,6 +135,88 @@ void index() throws IOException { IOUtils.close(indexWri

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
john-wagster commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2665979362 Apologies @rmuir I forgot to request review after my last set of changes; just did so now. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Why CachingMergeContext works correctly [lucene]

2025-02-18 Thread via GitHub
cgejian closed issue #14254: Why CachingMergeContext works correctly URL: https://github.com/apache/lucene/issues/14254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-18 Thread via GitHub
Shradha26 commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1959688900 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -130,6 +135,88 @@ void index() throws IOException { IOUtils.close(indexWri

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-18 Thread via GitHub
Shradha26 commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1959688900 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -130,6 +135,88 @@ void index() throws IOException { IOUtils.close(indexWri

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-18 Thread via GitHub
iverase commented on PR #14213: URL: https://github.com/apache/lucene/pull/14213#issuecomment-2665599329 Thanks @Tim-Brooks, tests look good. I will be merging this soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] [Feature] Add support for passing extra information with KNNVectorField [lucene]

2025-02-18 Thread via GitHub
msokolov commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2665564697 What if we created a vector distance function like `dot-product(v1, v2) * sgnum(d1, d2)` where `(v1, d1)` and `(v2, d2)` are the `(vector, cluster)` pairs indexed together in the

[I] Why CachingMergeContext works correctly [lucene]

2025-02-18 Thread via GitHub
cgejian opened a new issue, #14254: URL: https://github.com/apache/lucene/issues/14254 **Background**: In version 7.6.0 of ES, an external client is continuously executing update_by_query on an index. **Phenomenon**: At this time, I found through /_cat/segments that the docs.count an

[PR] Revert "[Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests" [lucene]

2025-02-18 Thread via GitHub
stefanvodita opened a new pull request, #14253: URL: https://github.com/apache/lucene/pull/14253 Reverts apache/lucene#14238 Seeing errors in the automated check ([link](https://github.com/apache/lucene/actions/runs/13388526608/job/37390674327)). ``` TestDynamicRangeUtil > t

[PR] Revert "Increase Dynamic Range Faceting test coverage (#14238)" [lucene]

2025-02-18 Thread via GitHub
stefanvodita opened a new pull request, #14252: URL: https://github.com/apache/lucene/pull/14252 Reverts apache/lucene#14250 Seeing errors in the automated check ([link](https://github.com/apache/lucene/actions/runs/13388526608/job/37390674327)). ``` TestDynamicRangeUtil > t

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-18 Thread via GitHub
epotyom commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1959487538 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java: ## @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Increase Dynamic Range Faceting test coverage (#14238) [lucene]

2025-02-18 Thread via GitHub
stefanvodita merged PR #14250: URL: https://github.com/apache/lucene/pull/14250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-18 Thread via GitHub
epotyom commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1959444929 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java: ## @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Relation check within 1D BKD Leaves [lucene]

2025-02-18 Thread via GitHub
gf2121 commented on PR #14244: URL: https://github.com/apache/lucene/pull/14244#issuecomment-2665008171 > this looks a bit like a hack Agreed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Relation check within 1D BKD Leaves [lucene]

2025-02-18 Thread via GitHub
gf2121 closed pull request #14244: Relation check within 1D BKD Leaves URL: https://github.com/apache/lucene/pull/14244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-18 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2664968205 Confused +1 ... but the comparison of step512(baseline) and step32(candidate): ``` TaskQPS baseline StdDevQPS my_modified_version StdDev