Re: [PR] Remove CollectorOwner class (#13671) [lucene]
epotyom commented on code in PR #13702: URL: https://github.com/apache/lucene/pull/13702#discussion_r1743335144 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java: ## @@ -414,62 +399,59 @@ public ConcurrentDrillSidewaysResult search( /** * Search using DrillDownQuery with custom collectors. This method can be used with any {@link - * CollectorOwner}s. It doesn't return anything because it is expected that you read results from - * provided {@link CollectorOwner}s. - * - * To read the results, run {@link CollectorOwner#getResult()} for drill down and all drill - * sideways dimensions. - * - * Note: use {@link Collections#unmodifiableList(List)} to wrap {@code - * drillSidewaysCollectorOwners} to convince compiler that it is safe to use List here. - * - * Use {@link MultiCollectorManager} wrapped by {@link CollectorOwner} to collect both hits and - * facets for the entire query and/or for drill-sideways dimensions. + * CollectorManager}s. * - * TODO: Class CollectorOwner was created so that we can ignore CollectorManager type C, - * because we want each dimensions to be able to use their own types. Alternatively, we can use - * typesafe heterogeneous container and provide CollectorManager type for each dimension to this - * method? I do like CollectorOwner approach as it seems more intuitive? + * Note: Use {@link MultiCollectorManager} to collect both hits and facets for the entire query + * and/or for drill-sideways dimensions. You can also use it to wrap different types of {@link + * CollectorManager} for drill-sideways dimensions. */ - public void search( + public Result search( final DrillDownQuery query, - CollectorOwner drillDownCollectorOwner, - List> drillSidewaysCollectorOwners) + CollectorManager drillDownCollectorManager, + List> drillSidewaysCollectorManagers) throws IOException { -if (drillDownCollectorOwner == null) { +if (drillDownCollectorManager == null) { throw new IllegalArgumentException( "This search method requires client to provide drill down collector manager"); } -if (drillSidewaysCollectorOwners == null) { +if (drillSidewaysCollectorManagers == null) { Review Comment: I agree with all points - thanks for reverting the commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestBoolean2.testRandomQueries fails in CI due to eating up heap space [lucene]
javanna commented on issue #11754: URL: https://github.com/apache/lucene/issues/11754#issuecomment-2328396050 You are correct @rmuir ! In my case, `mulFactor` is `1024`. My branch makes it even more likely to go out of heap space, because it may create more slices to be searched concurrently, and each slice gets a separate collector with a separate queue of that size. We could disable query concurrency for this test, or work on limiting the number of slices. On the other hand, I don't understand the purpose of `1000 * mulFactor`. If I use `mulFactor` as the size of the queue, the test still succeeds and it no longer goes out of heap for my failing seed. Maybe that's a reasonable fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
javanna commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1743463027 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -890,11 +945,70 @@ public static class LeafSlice { * * @lucene.experimental */ -public final LeafReaderContext[] leaves; +public final LeafReaderContextPartition[] leaves; -public LeafSlice(List leavesList) { - Collections.sort(leavesList, Comparator.comparingInt(l -> l.docBase)); - this.leaves = leavesList.toArray(new LeafReaderContext[0]); +public LeafSlice(List leafReaderContextPartitions) { + leafReaderContextPartitions.sort(Comparator.comparingInt(l -> l.ctx.docBase)); + // TODO should we sort by minDocId too? + this.leaves = leafReaderContextPartitions.toArray(new LeafReaderContextPartition[0]); +} + +/** + * Returns the total number of docs that a slice targets, by summing the number of docs that + * each of its leaf context partitions targets. + */ +public int getNumDocs() { + return Arrays.stream(leaves) + .map(LeafReaderContextPartition::getNumDocs) + .reduce(Integer::sum) + .get(); +} + } + + /** + * Holds information about a specific leaf context and the corresponding range of doc ids to + * search within. + * + * @lucene.experimental + */ + public static final class LeafReaderContextPartition { +private final int minDocId; +private final int maxDocId; +private final int numDocs; +public final LeafReaderContext ctx; + +private LeafReaderContextPartition( +LeafReaderContext leafReaderContext, int minDocId, int maxDocId, int numDocs) { + this.ctx = leafReaderContext; + this.minDocId = minDocId; + this.maxDocId = maxDocId; + this.numDocs = numDocs; Review Comment: I spent quite a bit of time on this, and concluded that we could use `maxDoc` in all cases, also when the partition targets the entire segment, but it is not practical. There are places where we special case when min==0 and max==NO_MORE_DOCS , hence it is simpler to keep on using NO_MORE_DOCS as an upper bound. That does mean that for this case we need to take maxDoc as a separate argument because we cannot compute the number of docs for a partition via maxDocId - minDocId. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2328473802 This is ready for a final review. I have no further outstanding items to look into. I added some entries to the migrate file, changes entry and updated description above to be aligned with the proposed changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestBoolean2.testRandomQueries fails in CI due to eating up heap space [lucene]
rmuir commented on issue #11754: URL: https://github.com/apache/lucene/issues/11754#issuecomment-2328792680 @javanna I don't understand the test, but we need to tone down its craziness. Removing 1000 sounds great to me, if the single line change makes the test a thousand times less annoying then we succeeded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestBoolean2.testRandomQueries fails in CI due to eating up heap space [lucene]
javanna commented on issue #11754: URL: https://github.com/apache/lucene/issues/11754#issuecomment-2328812477 Agreed, I opened #13713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] RegExp::toAutomaton no longer minimizes [lucene]
mikemccand commented on issue #13706: URL: https://github.com/apache/lucene/issues/13706#issuecomment-2328841283 > I think the relaxation patch is fine as a short first step - it doesn't claim to be optimal (PnP, as Mike loves to say). +1, heh. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] RegExp::toAutomaton no longer minimizes [lucene]
rmuir commented on issue #13706: URL: https://github.com/apache/lucene/issues/13706#issuecomment-2329167160 > Is `subsetOf` really quadratic? I know its javadoc says so, but I'm not convinced. I think this is only one of the scarier parts about it. The other scary part is that it may throw `IllegalArgumentException`, or in some cases `AssertionError` (if asserts are enabled, if not... UB?). For these reasons too, I wanted to avoid its usage in something that gets called e.g. by CompiledAutomaton and proposed moving it to AutomatonTestUtil for test purposes only: https://github.com/apache/lucene/pull/13708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce size of queue in TestBoolean2#testRandomQueries [lucene]
javanna merged PR #13713: URL: https://github.com/apache/lucene/pull/13713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestBoolean2.testRandomQueries fails in CI due to eating up heap space [lucene]
javanna closed issue #11754: TestBoolean2.testRandomQueries fails in CI due to eating up heap space URL: https://github.com/apache/lucene/issues/11754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Relax Operations.isTotal() to work with a deterministic automaton [lucene]
mikemccand commented on code in PR #13707: URL: https://github.com/apache/lucene/pull/13707#discussion_r1743876694 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -857,22 +857,38 @@ public static boolean isEmpty(Automaton a) { return true; } - /** Returns true if the given automaton accepts all strings. The automaton must be minimized. */ + /** Returns true if the given automaton accepts all strings. */ Review Comment: OK I see from the title of the PR :) that indeed this algo expects/requires a determinized automaton. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Relax Operations.isTotal() to work with a deterministic automaton [lucene]
rmuir commented on code in PR #13707: URL: https://github.com/apache/lucene/pull/13707#discussion_r1743877792 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -857,22 +857,38 @@ public static boolean isEmpty(Automaton a) { return true; } - /** Returns true if the given automaton accepts all strings. The automaton must be minimized. */ + /** Returns true if the given automaton accepts all strings. */ Review Comment: for some NFA, it might return `false` when in fact the NFA is "total", I think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Relax Operations.isTotal() to work with a deterministic automaton [lucene]
mikemccand commented on code in PR #13707: URL: https://github.com/apache/lucene/pull/13707#discussion_r1743879304 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -857,22 +857,38 @@ public static boolean isEmpty(Automaton a) { return true; } - /** Returns true if the given automaton accepts all strings. The automaton must be minimized. */ + /** Returns true if the given automaton accepts all strings. */ public static boolean isTotal(Automaton a) { return isTotal(a, Character.MIN_CODE_POINT, Character.MAX_CODE_POINT); } /** * Returns true if the given automaton accepts all strings for the specified min/max range of the - * alphabet. The automaton must be minimized. + * alphabet. Review Comment: Add that the automaton must be determinized? And maybe explain the CPU cost is `O(T)` (`T` = total number of transitions) -- linear in total transition count? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
stefanvodita commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2329194920 Do we have any benchmark results? I'm curious to see what's gotten faster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2329216197 > Do we have any benchmark results? I'm curious to see what's gotten faster. No. I focused on functionality. I wonder if it makes sense to run benchmarks before we address the duplicated work across partitions of the same segment. It would give an initial idea of the gain, but I don't know how reliable those numbers would be. That's also why segments don't get partitioned by default. I am looking forward to working on the next step though :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Make Operations#repeat create simpler automata. [lucene]
jpountz opened a new pull request, #13714: URL: https://github.com/apache/lucene/pull/13714 `Operations#repeat` currently creates an automaton that has one more final state, and 2x more transitions for each of the final states. With this change, the returned automaton has a single final state and only 2x more transitions for state 0. Currently a draft as I'm still looking into whether this new logic is correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make Operations#repeat create simpler automata. [lucene]
jpountz commented on PR #13714: URL: https://github.com/apache/lucene/pull/13714#issuecomment-2329331978 I just pushed a commit that avoids creating dead states if the input automaton doesn't have dead states itself. So the automaton created by `.*` would now have a single state. For context, I started looking into optimizing the single-transition case only (hence the branch name) but as I started looking deeper into it, it looked like improving the general case wouldn't be much harder, as the algorithm boils down to renumbering final states to 0 and other states to `state + 1`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2329360505 > I think it's important at least to ensure that we "do no harm" to the existing use cases. Intra-segment is not enabled by default and requires additional work. The scope of this initial support is to set the foundations and make the necessary breaking changes (now listed in the migrate guide). Also the added abstractions are experimental. > do we want to have docid intervals that span segments? There was some discussion above about how the intervals are defined and it wasn't clear whether or not we may have painted ourselves into a corner. Do we have enough flexibility? too much? The technical approach has been the same since this PR was opened: use the existing `BulkScorer#score` method that takes min and max. That is simple and effective. The new experimental abstraction (`LeafReaderContextPartition`) allows to partition a segment. If we do so, each partition must get a different `LeafSlice` (existing concept). A new `slicesWithPartitions` static method is added that is used in tests, and can be entirely customized in how segments are partitioned. You can have multiple partitions in the same slice, but they ought to be from different segments. Does this clarify your concerns? I don't entirely follow how we may have painted ourselves in a corner and how running benchmarks relates to this. I will run benchmarks at some point, but is that necessary to verify that searching force-merged segments can use more of the available resources, same as we do when searching multiple segments concurrently? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make Operations#repeat create simpler automata. [lucene]
mikemccand commented on code in PR #13714: URL: https://github.com/apache/lucene/pull/13714#discussion_r1743958059 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -182,26 +182,44 @@ public static Automaton repeat(Automaton a) { // Repeating the empty automata will still only accept the empty automata. return a; } + +if (a.isAccept(0) && a.getAcceptStates().nextSetBit(1) == -1) { Review Comment: I guess it is allowed to pass `1` to `nextSetBit` even if the `BitSet` is length 1? Javadoc seems to say it only throws `IndexOutOfBoundsException` if you pass a negative index, and the example `for` loop would do what you are doing here on a length 1 `BitSet` so I think it's OK. ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -182,26 +182,44 @@ public static Automaton repeat(Automaton a) { // Repeating the empty automata will still only accept the empty automata. return a; } + +if (a.isAccept(0) && a.getAcceptStates().nextSetBit(1) == -1) { + // If the only accept state is 0, then this automaton already repeats itself. Automata + // returned by this function only accept state 0, so this makes this function idempotent. + return a; Review Comment: This is quite a cool observation (both sentences, separately)! Maybe swap the words `only accept` to make it more accurate (`only` is referring to `state 0`, not really `accept`). Proper `only` placement is tricky! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2329387884 Perhaps one thing to add is that the main focus/direction of this change is to set the foundations to be able to finally decouple the index geometry from how the search is run and can be parallelized. We have support for intra-query concurrency, but it can only be leveraged if you have multiple segments. We want to be able to search independently from the number of segments, and apply the already existing concurrency support to big/force-merged segments that can only be searched sequentially otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make Operations#repeat create simpler automata. [lucene]
jpountz commented on PR #13714: URL: https://github.com/apache/lucene/pull/13714#issuecomment-2329450444 Good ideas, I'll look into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Relax Operations.isTotal() to work with a deterministic automaton [lucene]
rmuir commented on PR #13707: URL: https://github.com/apache/lucene/pull/13707#issuecomment-2329472492 I updated the javadocs: "The automaton must be deterministic, or this method may return false." Previous implementation was: "The automaton must be minimal, or this method may return false." -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
msokolov commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2329642279 > I don't entirely follow how we may have painted ourselves in a corner and how running benchmarks relates to this. They're not related; perhaps I should have broken my comment into multiple posts. My main concern is that we should be able, on a single thread, to evaluate matches from portions of multiple segments, and it sounds as if this API will allow for that. Regarding the benchmarks, I understand these changes are not intended to have any impact on existing search code paths and need to be invoked in a special way. I still think testing should be done to verify that. And this relates to a different concern - that we are proposing to launch a large amount of dead code that will have no real purpose until further work is done. Again, we might learn something from implementing changes that actually *do* move benchmarks. I guess an underlying concern is that we are trying to get stuff in before an arbitrary (Lucene 10) release date, and assuming that no breaking changes will later be required. Isn't it conceivable that we will need to make a change to the Scorer API to support cloning? Maybe "painted into a corner" was too strong, but there could certainly be future changes we might need. To me it would make more sense to hold on off on merging (even if it means missing the impending Lucene 10 release) until we have some positive results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Backport of #13702 [lucene]
gsmiller opened a new pull request, #13716: URL: https://github.com/apache/lucene/pull/13716 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] `count()` in LRUQueryCache [lucene]
LuXugang opened a new issue, #13717: URL: https://github.com/apache/lucene/issues/13717 ### Description Currently, the implementation of `Weight.count()` in `LRUQueryCache` first considers whether the wrapped `Weight` can obtain the `count`, and then considers the cache. However, for certain queries, such as `IndexSortSortedNumericDocValuesRangeQuery` and `PointRangeQuery`, obtaining the count from `CacheAndCount` should be faster. Should we adjust the logical order? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove CollectorOwner class (#13671) [lucene]
gsmiller commented on PR #13702: URL: https://github.com/apache/lucene/pull/13702#issuecomment-2329679349 No problem @javanna ! @epotyom - I got this back-ported onto 9.x but would appreciate a retro-active review if you have time (#13716). As before, I had to do some work to `DrillSideways` since it still supports a search method that directly accepts a `Collector` (instead of `CollectorManager`) so I had to modify that directly (since there's nothing to cherry-pick from main since the method has been removed). I'm pretty sure I got it right (and all tests pass, etc.), but please let me know if you spot something and we can fix it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add dynamic range facets [lucene]
stefanvodita commented on PR #13689: URL: https://github.com/apache/lucene/pull/13689#issuecomment-2329719474 It would be great to merge this in time for 9.12. Since it's completely new code and marked experimental, I don't expect it to be controversial, but I'll wait a bit longer in case anyone was planning to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] CompiledAutomaton not equal when NFARunAutomaton is non-null [lucene]
ChrisHegarty commented on issue #13715: URL: https://github.com/apache/lucene/issues/13715#issuecomment-2329846860 Before considering possible fixes, do we agree that there is a problem worth fixing? I’m particular thinking of the equality of IntervalQuery. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] CompiledAutomaton not equal when NFARunAutomaton is non-null [lucene]
zhaih commented on issue #13715: URL: https://github.com/apache/lucene/issues/13715#issuecomment-2329914096 I think it's a nice to have one. Altho for the specific IntervalQuery I feel like maybe we can directly claim they're equal if the pattern is the same and not checking the automaton at all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] CompiledAutomaton not equal when NFARunAutomaton is non-null [lucene]
rmuir commented on issue #13715: URL: https://github.com/apache/lucene/issues/13715#issuecomment-2329919715 @ChrisHegarty I think it might be an oversight IntervalQuery is getting an NFA, was that really intended? The following other methods on intervals are powered by automatons and get a DFA: * Intervals.prefix() * Intervals.wildcard() * Intervals.range() * Intervals.fuzzyTerm() This happens because they reuse the parsing from the associated Query classes, but this Intervals.regexp() neglects to do that and just creates its own RegExp and converts it to an NFA. Given that the default of RegexpQuery is to determinize, I think we should determinize here to avoid surprises such as this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] CompiledAutomaton not equal when NFARunAutomaton is non-null [lucene]
rmuir commented on issue #13715: URL: https://github.com/apache/lucene/issues/13715#issuecomment-2329956146 There are other related problems to fix here separately, just start with CompiledAutomaton: * `hashCode()` is inconsistent with `equals()`: either they both consider `nfaRunAutomaton` or they do not * `ramBytesUsed` doesn't consider `nfaRunAutomaton` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] CompiledAutomaton not equal when NFARunAutomaton is non-null [lucene]
rmuir commented on issue #13715: URL: https://github.com/apache/lucene/issues/13715#issuecomment-2330179575 That's usually how it works, it is good stuff. To be practical, for now, I'd recommend just fixing Intervals.regex to `determinize()` so that it matches all other queries which are using DFAs. It should be a one-line change to fix your problem because then there won't be an NFA :) I think it is enough for lucene 10 that the queries no longer `minimize()` up-front, and I'm hoping we can explore taking that next step soon so that we no longer `determinize()` up-front either (just as needed via NFA query), but we shouldn't be doing that now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org