[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support
[ https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435811#comment-17435811 ] Adrien Grand commented on LUCENE-10061: --- For luceneutil integration, you will need to enhance [TaskParser|https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/TaskParser.java] to create CombinedFieldQuery instances. You can look at how it handles minimumShouldMatch for inspiration. > CombinedFieldsQuery needs dynamic pruning support > - > > Key: LUCENE-10061 > URL: https://issues.apache.org/jira/browse/LUCENE-10061 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, > forcing Lucene to collect all matches in order to figure the top-k hits. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435972#comment-17435972 ] Greg Miller commented on LUCENE-10207: -- Sure, I'll see if I can move this forward a little bit. Thanks! > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] bruno-roustant commented on pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.
bruno-roustant commented on pull request #404: URL: https://github.com/apache/lucene/pull/404#issuecomment-954746622 @jpountz maybe you could be interested in the review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.
jpountz commented on pull request #404: URL: https://github.com/apache/lucene/pull/404#issuecomment-954799005 Yes! Sorry I saw it and wanted to have a look and then got distracted. I haven't taken the time to take a look yet but I ran indexing with `IndexAndSearchOpenStreetMaps` and saw a speedup with this change on big merges, so I'm already excited about it. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] madrob merged pull request #417: Replace deprecated Gradle 7.2 properties
madrob merged pull request #417: URL: https://github.com/apache/lucene/pull/417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.
jpountz commented on a change in pull request #404: URL: https://github.com/apache/lucene/pull/404#discussion_r739310053 ## File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java ## @@ -20,66 +20,120 @@ * {@link Sorter} implementation based on a variant of the quicksort algorithm called http://en.wikipedia.org/wiki/Introsort";>introsort: when the recursion level exceeds the * log of the length of the array to sort, it falls back to heapsort. This prevents quicksort from - * running into its worst-case quadratic runtime. Small arrays are sorted with insertion sort. + * running into its worst-case quadratic runtime. Small ranges are sorted with insertion sort. * * @lucene.internal */ public abstract class IntroSorter extends Sorter { + /** Below this size threshold, the partition selection is simplified to a single median. */ + private static final int SINGLE_MEDIAN_THRESHOLD = 40; + /** Create a new {@link IntroSorter}. */ public IntroSorter() {} @Override public final void sort(int from, int to) { checkRange(from, to); -quicksort(from, to, 2 * MathUtil.log(to - from, 2)); +sort(from, to, 2 * MathUtil.log(to - from, 2)); } - void quicksort(int from, int to, int maxDepth) { -if (to - from < BINARY_SORT_THRESHOLD) { - binarySort(from, to); - return; -} else if (--maxDepth < 0) { - heapSort(from, to); - return; -} - -final int mid = (from + to) >>> 1; - -if (compare(from, mid) > 0) { - swap(from, mid); -} - -if (compare(mid, to - 1) > 0) { - swap(mid, to - 1); - if (compare(from, mid) > 0) { -swap(from, mid); + /** + * Sorts between from (inclusive) and to (exclusive) with intro sort. + * + * Sorts small ranges with insertion sort. Fallbacks to heap sort to avoid quadratic worst + * case. Selects the pivot with medians and partitions with the Bentley-McIlroy fast 3-ways + * algorithm (Engineering a Sort Function, Bentley-McIlroy). + */ + void sort(int from, int to, int maxDepth) { +int size; + +// Sort small ranges with insertion sort. +while ((size = to - from) > INSERTION_SORT_THRESHOLD) { + + if (--maxDepth < 0) { +// Max recursion depth reached: fallback to heap sort. +heapSort(from, to); +return; } -} - -int left = from + 1; -int right = to - 2; -setPivot(mid); -for (; ; ) { - while (comparePivot(right) < 0) { ---right; + // Pivot selection based on medians. + int last = to - 1; + int mid = (from + last) >>> 1; + int range = size >> 3; + int pivot; + if (size <= SINGLE_MEDIAN_THRESHOLD) { +// Select the pivot with a single median around the middle element. +// Do not take the median between [from, mid, last] because it hurts performance +// if the order is descending. +pivot = median(mid - range, mid, mid + range); Review comment: In a 32 elements array, we'd be looking at indices 12, 16 and 20, so all rather close to the mid element. Should `range` be `size/4` instead of `size/8` in that case to be less subject to special distributions (ie. we'd be looking at indices 8, 16 and 24 in a 32-elements array)? ## File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java ## @@ -20,66 +20,120 @@ * {@link Sorter} implementation based on a variant of the quicksort algorithm called http://en.wikipedia.org/wiki/Introsort";>introsort: when the recursion level exceeds the * log of the length of the array to sort, it falls back to heapsort. This prevents quicksort from - * running into its worst-case quadratic runtime. Small arrays are sorted with insertion sort. + * running into its worst-case quadratic runtime. Small ranges are sorted with insertion sort. * * @lucene.internal */ public abstract class IntroSorter extends Sorter { + /** Below this size threshold, the partition selection is simplified to a single median. */ + private static final int SINGLE_MEDIAN_THRESHOLD = 40; + /** Create a new {@link IntroSorter}. */ public IntroSorter() {} @Override public final void sort(int from, int to) { checkRange(from, to); -quicksort(from, to, 2 * MathUtil.log(to - from, 2)); +sort(from, to, 2 * MathUtil.log(to - from, 2)); } - void quicksort(int from, int to, int maxDepth) { -if (to - from < BINARY_SORT_THRESHOLD) { - binarySort(from, to); - return; -} else if (--maxDepth < 0) { - heapSort(from, to); - return; -} - -final int mid = (from + to) >>> 1; - -if (compare(from, mid) > 0) { - swap(from, mid); -} - -if (compare(mid, to - 1) > 0) { - swap(mid, to - 1); - if (compare(from, mid) > 0) { -swap(from, mid); + /** + * Sorts between from (inclusive) and to (exclusive) with intro sort. + * + * Sorts small ranges with insertion sort. Fa
[jira] [Updated] (LUCENE-9280) Add ability to skip non-competitive documents on field sort
[ https://issues.apache.org/jira/browse/LUCENE-9280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova updated LUCENE-9280: Fix Version/s: 8.6 > Add ability to skip non-competitive documents on field sort > > > Key: LUCENE-9280 > URL: https://issues.apache.org/jira/browse/LUCENE-9280 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Priority: Minor > Fix For: main (9.0), 8.6 > > Time Spent: 18h 20m > Remaining Estimate: 0h > > Today collectors, once they collect enough docs, can instruct scorers to > update their iterators to skip non-competitive documents. This is applicable > only for a case when we need top docs by _score. > It would be nice to also have an ability to skip non-competitive docs when we > need top docs sorted by other fields different from _score. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436183#comment-17436183 ] Greg Miller commented on LUCENE-10207: -- I spent a little more time on this today and would be interested in any feedback on getting the "term count" for these MultiTermQueries for the purpose of estimating cost. For TermInSetQuery, this is known up-front since the user provides the set of terms. But MultiTermQueries in general often don't "know" their term count up-front (until producing their TermsEnum, which itself seems costly). I'm considering adding a new method to MultiTermQueries that allows implementations to provide their term count if known, or -1 if not known. Then estimating cost like this: {code:java} Terms indexTerms = context.reader().terms(query.getField()); int queryTermsCount = query.getTermsCount(); if (indexTerms == null) { cost = 0; // field doesn't exist } else if (queryTermsCount == -1) { cost = indexTerms.getDocCount(); } else { cost = Math.min(indexTerms.getDocCount(), queryTermsCount + (indexTerms.getSumDocFreq() - indexTerms.size())); } {code} Does this seem like a reasonable approach? Any other ideas? The other issue here is that we don't actually know at this point how many of the query terms are actually in the index. So this could potentially over-estimate cost if there a huge set of terms that aren't in the index. But solving for that requires intersecting the indexed terms with the query terms, which adds up-front cost. > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436183#comment-17436183 ] Greg Miller edited comment on LUCENE-10207 at 10/29/21, 10:35 PM: -- I spent a little more time on this today and would be interested in any feedback on getting the "term count" for these MultiTermQueries for the purpose of estimating cost. For TermInSetQuery, this is known up-front since the user provides the set of terms. But MultiTermQueries in general often don't "know" their term count up-front (until producing their TermsEnum, which itself seems costly). I'm considering adding a new method to MultiTermQueries that allows implementations to provide their term count if known, or -1 if not known. Then estimating cost like this: {code:java} Terms indexTerms = context.reader().terms(query.getField()); int queryTermsCount = query.getTermsCount(); if (indexTerms == null) { cost = 0; // field doesn't exist } else if (queryTermsCount == -1) { cost = indexTerms.getDocCount(); } else { cost = Math.min(indexTerms.getDocCount(), queryTermsCount + (indexTerms.getSumDocFreq() - indexTerms.size())); } {code} Does this seem like a reasonable approach? Any other ideas? (EDIT: Oh, I should note that this code would run when constructing a {{ScorerSupplier}}, eagerly evaluating the cost so the supplier can return it as requested before creating the {{Scorer}}) The other issue here is that we don't actually know at this point how many of the query terms are actually in the index. So this could potentially over-estimate cost if there a huge set of terms that aren't in the index. But solving for that requires intersecting the indexed terms with the query terms, which adds up-front cost. was (Author: gsmiller): I spent a little more time on this today and would be interested in any feedback on getting the "term count" for these MultiTermQueries for the purpose of estimating cost. For TermInSetQuery, this is known up-front since the user provides the set of terms. But MultiTermQueries in general often don't "know" their term count up-front (until producing their TermsEnum, which itself seems costly). I'm considering adding a new method to MultiTermQueries that allows implementations to provide their term count if known, or -1 if not known. Then estimating cost like this: {code:java} Terms indexTerms = context.reader().terms(query.getField()); int queryTermsCount = query.getTermsCount(); if (indexTerms == null) { cost = 0; // field doesn't exist } else if (queryTermsCount == -1) { cost = indexTerms.getDocCount(); } else { cost = Math.min(indexTerms.getDocCount(), queryTermsCount + (indexTerms.getSumDocFreq() - indexTerms.size())); } {code} Does this seem like a reasonable approach? Any other ideas? The other issue here is that we don't actually know at this point how many of the query terms are actually in the index. So this could potentially over-estimate cost if there a huge set of terms that aren't in the index. But solving for that requires intersecting the indexed terms with the query terms, which adds up-front cost. > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-10207_multitermquery.patch > > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LUCENE-10201) Upgrade Spatial4j to 0.8
[ https://issues.apache.org/jira/browse/LUCENE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436224#comment-17436224 ] ASF subversion and git services commented on LUCENE-10201: -- Commit c2c215d3a83bda97f298f1d14c0bfd523122ca98 in lucene's branch refs/heads/main from David Smiley [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c2c215d ] LUCENE-10201: Upgrade Spatial4j to 0.8 (#409) Upgrading Spatial4j to 0.8 improving a varitety of minor things. See release notes: https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8 Test-only dependency on JTS is upgraded to 1.17 as well > Upgrade Spatial4j to 0.8 > > > Key: LUCENE-10201 > URL: https://issues.apache.org/jira/browse/LUCENE-10201 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spatial-extras >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Spatial4j has been at 0.8 for some time. We should upgrade. > [https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley merged pull request #409: LUCENE-10201: Upgrade Spatial4j to 0.8
dsmiley merged pull request #409: URL: https://github.com/apache/lucene/pull/409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10201) Upgrade Spatial4j to 0.8
[ https://issues.apache.org/jira/browse/LUCENE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved LUCENE-10201. --- Fix Version/s: main (9.0) Resolution: Fixed > Upgrade Spatial4j to 0.8 > > > Key: LUCENE-10201 > URL: https://issues.apache.org/jira/browse/LUCENE-10201 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spatial-extras >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: main (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > Spatial4j has been at 0.8 for some time. We should upgrade. > [https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10212) Add luceneutil benchmark task for CombinedFieldsQuery
Zach Chen created LUCENE-10212: -- Summary: Add luceneutil benchmark task for CombinedFieldsQuery Key: LUCENE-10212 URL: https://issues.apache.org/jira/browse/LUCENE-10212 Project: Lucene - Core Issue Type: Task Reporter: Zach Chen Assignee: Zach Chen This is a spin-off task from https://issues.apache.org/jira/browse/LUCENE-10061 . In order to objectively evaluate performance changes for CombinedFieldsQuery, we would like to add benchmark task and parsing for CombinedFieldsQuery. One proposal to the query syntax to enable CombinedFieldsQuery benchmarking would be the following: {code:java} taskName: term1 term2 term3 term4 +combinedFields=field1^boost1,field2^boost2,field3^boost3 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10212) Add luceneutil benchmark task for CombinedFieldsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach Chen updated LUCENE-10212: --- Description: This is a spin-off task from https://issues.apache.org/jira/browse/LUCENE-10061 . In order to objectively evaluate performance changes for CombinedFieldsQuery, we would like to add benchmark task and parsing for CombinedFieldsQuery. One proposal to the query syntax to enable CombinedFieldsQuery benchmarking would be the following: {code:java} taskName: term1 term2 term3 term4 +combinedFields=field1^boost1,field2^boost2,field3^boost3 {code} was: This is a spin-off task from https://issues.apache.org/jira/browse/LUCENE-10061 . In order to objectively evaluate performance changes for CombinedFieldsQuery, we would like to add benchmark task and parsing for CombinedFieldsQuery. One proposal to the query syntax to enable CombinedFieldsQuery benchmarking would be the following: {code:java} taskName: term1 term2 term3 term4 +combinedFields=field1^boost1,field2^boost2,field3^boost3 {code} > Add luceneutil benchmark task for CombinedFieldsQuery > - > > Key: LUCENE-10212 > URL: https://issues.apache.org/jira/browse/LUCENE-10212 > Project: Lucene - Core > Issue Type: Task >Reporter: Zach Chen >Assignee: Zach Chen >Priority: Minor > > This is a spin-off task from > https://issues.apache.org/jira/browse/LUCENE-10061 . In order to objectively > evaluate performance changes for CombinedFieldsQuery, we would like to add > benchmark task and parsing for CombinedFieldsQuery. > One proposal to the query syntax to enable CombinedFieldsQuery benchmarking > would be the following: > {code:java} > taskName: term1 term2 term3 term4 > +combinedFields=field1^boost1,field2^boost2,field3^boost3 > {code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support
[ https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436233#comment-17436233 ] Zach Chen commented on LUCENE-10061: Thanks [~jpountz] for the pointer! I have created a spin-off task for luceneutil integration https://issues.apache.org/jira/browse/LUCENE-10212, and will actually work on it first and circle back to this task afterward. > CombinedFieldsQuery needs dynamic pruning support > - > > Key: LUCENE-10061 > URL: https://issues.apache.org/jira/browse/LUCENE-10061 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, > forcing Lucene to collect all matches in order to figure the top-k hits. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
apanimesh061 commented on pull request #412: URL: https://github.com/apache/lucene/pull/412#issuecomment-955147804 The new changes include the following: 1. if `flags` is initialized then boolean flags are ignored 2. Unit test for new `getFlags()` implementation using builders 3. Setter for `flags` 4. Builder can used to set booleans as well as `flags` Have not implemented the randomized testing as suggested. Still researching into how to implement that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org