[PR] Determinize automata used by IntervalsSource.regexp [lucene]

2024-09-05 Thread via GitHub
ChrisHegarty opened a new pull request, #13718: URL: https://github.com/apache/lucene/pull/13718 This commit determinizes internal automata used in the construction of the IntervalsSource created by the `regexp` factory. relates #13715 -- This is an automated message from the Apach

Re: [PR] move Operations.sameLanguage/subsetOf to AutomatonTestUtil in test-framework [lucene]

2024-09-05 Thread via GitHub
rmuir merged PR #13708: URL: https://github.com/apache/lucene/pull/13708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]

2024-09-05 Thread via GitHub
rmuir commented on issue #13719: URL: https://github.com/apache/lucene/issues/13719#issuecomment-2331298246 I do this on build side, rather than locally. so you might want to tweak it if you want to ignore "explicitly git-added files" which I think is our use-case. `git status` has a lot of

Re: [PR] Relax Operations.isTotal() to work with a deterministic automaton [lucene]

2024-09-05 Thread via GitHub
rmuir merged PR #13707: URL: https://github.com/apache/lucene/pull/13707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Relax Operations.isTotal() to work with a deterministic automaton [lucene]

2024-09-05 Thread via GitHub
mikemccand commented on code in PR #13707: URL: https://github.com/apache/lucene/pull/13707#discussion_r1745429485 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -857,22 +857,38 @@ public static boolean isEmpty(Automaton a) { return true;

Re: [I] Reproducible test failure in TestTaxonomyFacetAssociations.testFloatSumAssociation -- ULP float issue? [lucene]

2024-09-05 Thread via GitHub
stefanvodita commented on issue #13720: URL: https://github.com/apache/lucene/issues/13720#issuecomment-2331754886 It's one of those float-summation-is-not-commutative errors. First ordering: ``` 1> 0.0 + 575310.1 = 575310.1 1> 575310.1 + 701147.2 = 1276457.2 1> 1276457.2

[PR] Follow-up to GH#13702 [lucene]

2024-09-05 Thread via GitHub
gsmiller opened a new pull request, #13722: URL: https://github.com/apache/lucene/pull/13722 Ensures we retain pre-existing (but strange) inconsistency in DrillSideways#search(DrillDownQuery, Collector). This is a deprecated method so I propose we retain this inconsistency since the method

Re: [PR] Add dynamic range facets [lucene]

2024-09-05 Thread via GitHub
mikemccand commented on code in PR #13689: URL: https://github.com/apache/lucene/pull/13689#discussion_r1745595485 ## lucene/demo/src/java/org/apache/lucene/demo/facet/package-info.java: ## @@ -385,6 +385,12 @@ * Sampling support is implemented in {@link * org.apache.lucene.

Re: [I] Reproducible test failure in TestTaxonomyFacetAssociations.testFloatSumAssociation -- ULP float issue? [lucene]

2024-09-05 Thread via GitHub
mikemccand commented on issue #13720: URL: https://github.com/apache/lucene/issues/13720#issuecomment-2331878119 I wish the test assert APIs allowed us to express the allowed epsilon in ULPs (1 or 2 or so), not a fixed float. The expected/allowed absolute error varies with how large t

Re: [PR] Early exit from Operations#removeDeadStates when an automaton doesn't have dead states. [lucene]

2024-09-05 Thread via GitHub
jpountz merged PR #13721: URL: https://github.com/apache/lucene/pull/13721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-05 Thread via GitHub
jpountz commented on code in PR #13697: URL: https://github.com/apache/lucene/pull/13697#discussion_r1745619666 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -440,6 +478,101 @@ private String formatScoreExplanation(int matches, int sta

Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-05 Thread via GitHub
jpountz commented on code in PR #13697: URL: https://github.com/apache/lucene/pull/13697#discussion_r1745752639 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -440,6 +478,114 @@ private String formatScoreExplanation(int matches, int sta

Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-05 Thread via GitHub
Mikep86 commented on code in PR #13697: URL: https://github.com/apache/lucene/pull/13697#discussion_r1745751201 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -440,6 +478,114 @@ private String formatScoreExplanation(int matches, int sta

Re: [I] Reproducible test failure in TestTaxonomyFacetAssociations.testFloatSumAssociation -- ULP float issue? [lucene]

2024-09-05 Thread via GitHub
stefanvodita commented on issue #13720: URL: https://github.com/apache/lucene/issues/13720#issuecomment-2332088174 I like the idea of comparing based on ULP! I'll poach some code for [float comparison](https://github.com/apache/commons-numbers/blob/master/commons-numbers-core/src/main/java/o

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-09-05 Thread via GitHub
mikemccand commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2332099352 I'm trying to understand the status of this PR... so far it's a standalone JMH benchy that shows that using [FFM](https://openjdk.org/jeps/454) to invoke our own native C implementati

Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-05 Thread via GitHub
javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2332114836 Hey all, I have done some benchmarking with two main goals: 1) ensure there are no regressions introduced by the proposed change 2) ensure there is some performance gain when i

[PR] Add unit-of-least-precision float comparison [lucene]

2024-09-05 Thread via GitHub
stefanvodita opened a new pull request, #13723: URL: https://github.com/apache/lucene/pull/13723 Comparing floats with a fixed epsilon doesn't really work. We add comparison based on unit-of-lest-precision (ULP) and use it to fix a failing test. Closes #13720 -- This is an automate

Re: [PR] Dry up TestScorerPerf [lucene]

2024-09-05 Thread via GitHub
javanna merged PR #13712: URL: https://github.com/apache/lucene/pull/13712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[I] Dimensionality reduction in Lucene [lucene]

2024-09-05 Thread via GitHub
tanyaroosta opened a new issue, #13727: URL: https://github.com/apache/lucene/issues/13727 ### Description Hi. I am opening a new issue to follow up on a [discussion in this issue](https://github.com/apache/lucene/issues/13403#issuecomment-2132043000) regarding using the segment size

Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-05 Thread via GitHub
msokolov commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2332479198 Thanks for the testing, @javanna! Indeed it is clear that this change does *something* and could be useful as-is for some query loads. I'm also encouraged by Adrien's comments. Although

Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]

2024-09-05 Thread via GitHub
rmuir commented on issue #13719: URL: https://github.com/apache/lucene/issues/13719#issuecomment-2332541693 @dweiss yes that's the case i hit, it is just the "switching branches from main" use-case and it seems to trip on things such as buildSrc and benchmark-jmh, i end out manually rm -rf'

Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]

2024-09-05 Thread via GitHub
dweiss commented on issue #13719: URL: https://github.com/apache/lucene/issues/13719#issuecomment-2332544468 I think this logic is flawed: ``` // git ignores any folders which are empty (this includes folders with recursively empty sub-folders). def untrackedNonEmpty

Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]

2024-09-05 Thread via GitHub
dweiss commented on issue #13719: URL: https://github.com/apache/lucene/issues/13719#issuecomment-2332550998 https://github.com/apache/lucene/pull/13728 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-05 Thread via GitHub
javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2332571178 > I think we could keep it simple and provide a separate collector manager perhaps that supports intra-segment concurrency for now Having thought a little more, I am not sure this

Re: [I] Should the static search methods in FacetsCollector take a FacetsCollector as last argument? [lucene]

2024-09-05 Thread via GitHub
gsmiller commented on issue #13725: URL: https://github.com/apache/lucene/issues/13725#issuecomment-2332818574 My vote would be to be more restrictive with these signatures and specify `FacetsCollectorManager` given that these are sugar methods meant to make it easier to do faceting while a

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-09-05 Thread via GitHub
github-actions[bot] commented on PR #13398: URL: https://github.com/apache/lucene/pull/13398#issuecomment-2332950383 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

[I] Gradle builds slow to start [lucene]

2024-09-05 Thread via GitHub
dweiss opened a new issue, #13730: URL: https://github.com/apache/lucene/issues/13730 ### Description This has been mentioned by Mike Sokolov, I think. Gradle builds have become slw to start as we upgraded from version to version. Interestingly, I've come across this hint: