date:20211029

[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support

2021-10-29 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435811#comment-17435811
 ] 

Adrien Grand commented on LUCENE-10061:
---

For luceneutil integration, you will need to enhance 
[TaskParser|https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/TaskParser.java]
 to create CombinedFieldQuery instances. You can look at how it handles 
minimumShouldMatch for inspiration.

> CombinedFieldsQuery needs dynamic pruning support
> -
>
> Key: LUCENE-10061
> URL: https://issues.apache.org/jira/browse/LUCENE-10061
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, 
> forcing Lucene to collect all matches in order to figure the top-k hits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

2021-10-29 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435972#comment-17435972
 ] 

Greg Miller commented on LUCENE-10207:
--

Sure, I'll see if I can move this forward a little bit. Thanks!

> Make TermInSetQuery usable with IndexOrDocValuesQuery
> -
>
> Key: LUCENE-10207
> URL: https://issues.apache.org/jira/browse/LUCENE-10207
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-10207_multitermquery.patch
>
>
> IndexOrDocValuesQuery is very useful to pick the right execution mode for a 
> query depending on other bits of the query tree.
> We would like to be able to use it to optimize execution of TermInSetQuery. 
> However IndexOrDocValuesQuery only works well if the "index" query can give 
> an estimation of the cost of the query without doing anything expensive (like 
> looking up all terms of the TermInSetQuery in the terms dict). Maybe we could 
> implement it for primary keys (terms.size() == sumDocFreq) by returning the 
> number of terms of the query? Another idea is to multiply the number of terms 
> by the average postings length, though this could be dangerous if the field 
> has a zipfian distribution and some terms have a much higher doc frequency 
> than the average.
> [~romseygeek] and I were discussing this a few weeks ago, and more recently 
> [~mikemccand] and [~gsmiller] again independently. So it looks like there is 
> interest in this. Here is an email thread where this was recently discussed: 
> https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] bruno-roustant commented on pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

2021-10-29 Thread GitBox



bruno-roustant commented on pull request #404:
URL: https://github.com/apache/lucene/pull/404#issuecomment-954746622


   @jpountz maybe you could be interested in the review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

2021-10-29 Thread GitBox



jpountz commented on pull request #404:
URL: https://github.com/apache/lucene/pull/404#issuecomment-954799005


   Yes! Sorry I saw it and wanted to have a look and then got distracted. I 
haven't taken the time to take a look yet but I ran indexing with 
`IndexAndSearchOpenStreetMaps` and saw a speedup with this change on big 
merges, so I'm already excited about it. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] madrob merged pull request #417: Replace deprecated Gradle 7.2 properties

2021-10-29 Thread GitBox



madrob merged pull request #417:
URL: https://github.com/apache/lucene/pull/417


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

2021-10-29 Thread GitBox



jpountz commented on a change in pull request #404:
URL: https://github.com/apache/lucene/pull/404#discussion_r739310053



##
File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java
##
@@ -20,66 +20,120 @@
  * {@link Sorter} implementation based on a variant of the quicksort algorithm 
called http://en.wikipedia.org/wiki/Introsort";>introsort: when the 
recursion level exceeds the
  * log of the length of the array to sort, it falls back to heapsort. This 
prevents quicksort from
- * running into its worst-case quadratic runtime. Small arrays are sorted with 
insertion sort.
+ * running into its worst-case quadratic runtime. Small ranges are sorted with 
insertion sort.
  *
  * @lucene.internal
  */
 public abstract class IntroSorter extends Sorter {
 
+  /** Below this size threshold, the partition selection is simplified to a 
single median. */
+  private static final int SINGLE_MEDIAN_THRESHOLD = 40;
+
   /** Create a new {@link IntroSorter}. */
   public IntroSorter() {}
 
   @Override
   public final void sort(int from, int to) {
 checkRange(from, to);
-quicksort(from, to, 2 * MathUtil.log(to - from, 2));
+sort(from, to, 2 * MathUtil.log(to - from, 2));
   }
 
-  void quicksort(int from, int to, int maxDepth) {
-if (to - from < BINARY_SORT_THRESHOLD) {
-  binarySort(from, to);
-  return;
-} else if (--maxDepth < 0) {
-  heapSort(from, to);
-  return;
-}
-
-final int mid = (from + to) >>> 1;
-
-if (compare(from, mid) > 0) {
-  swap(from, mid);
-}
-
-if (compare(mid, to - 1) > 0) {
-  swap(mid, to - 1);
-  if (compare(from, mid) > 0) {
-swap(from, mid);
+  /**
+   * Sorts between from (inclusive) and to (exclusive) with intro sort.
+   *
+   * Sorts small ranges with insertion sort. Fallbacks to heap sort to 
avoid quadratic worst
+   * case. Selects the pivot with medians and partitions with the 
Bentley-McIlroy fast 3-ways
+   * algorithm (Engineering a Sort Function, Bentley-McIlroy).
+   */
+  void sort(int from, int to, int maxDepth) {
+int size;
+
+// Sort small ranges with insertion sort.
+while ((size = to - from) > INSERTION_SORT_THRESHOLD) {
+
+  if (--maxDepth < 0) {
+// Max recursion depth reached: fallback to heap sort.
+heapSort(from, to);
+return;
   }
-}
-
-int left = from + 1;
-int right = to - 2;
 
-setPivot(mid);
-for (; ; ) {
-  while (comparePivot(right) < 0) {
---right;
+  // Pivot selection based on medians.
+  int last = to - 1;
+  int mid = (from + last) >>> 1;
+  int range = size >> 3;
+  int pivot;
+  if (size <= SINGLE_MEDIAN_THRESHOLD) {
+// Select the pivot with a single median around the middle element.
+// Do not take the median between [from, mid, last] because it hurts 
performance
+// if the order is descending.
+pivot = median(mid - range, mid, mid + range);

Review comment:
   In a 32 elements array, we'd be looking at indices 12, 16 and 20, so all 
rather close to the mid element. Should `range` be `size/4` instead of `size/8` 
in that case to be less subject to special distributions (ie. we'd be looking 
at indices 8, 16 and 24 in a 32-elements array)?

##
File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java
##
@@ -20,66 +20,120 @@
  * {@link Sorter} implementation based on a variant of the quicksort algorithm 
called http://en.wikipedia.org/wiki/Introsort";>introsort: when the 
recursion level exceeds the
  * log of the length of the array to sort, it falls back to heapsort. This 
prevents quicksort from
- * running into its worst-case quadratic runtime. Small arrays are sorted with 
insertion sort.
+ * running into its worst-case quadratic runtime. Small ranges are sorted with 
insertion sort.
  *
  * @lucene.internal
  */
 public abstract class IntroSorter extends Sorter {
 
+  /** Below this size threshold, the partition selection is simplified to a 
single median. */
+  private static final int SINGLE_MEDIAN_THRESHOLD = 40;
+
   /** Create a new {@link IntroSorter}. */
   public IntroSorter() {}
 
   @Override
   public final void sort(int from, int to) {
 checkRange(from, to);
-quicksort(from, to, 2 * MathUtil.log(to - from, 2));
+sort(from, to, 2 * MathUtil.log(to - from, 2));
   }
 
-  void quicksort(int from, int to, int maxDepth) {
-if (to - from < BINARY_SORT_THRESHOLD) {
-  binarySort(from, to);
-  return;
-} else if (--maxDepth < 0) {
-  heapSort(from, to);
-  return;
-}
-
-final int mid = (from + to) >>> 1;
-
-if (compare(from, mid) > 0) {
-  swap(from, mid);
-}
-
-if (compare(mid, to - 1) > 0) {
-  swap(mid, to - 1);
-  if (compare(from, mid) > 0) {
-swap(from, mid);
+  /**
+   * Sorts between from (inclusive) and to (exclusive) with intro sort.
+   *
+   * Sorts small ranges with insertion sort. Fa

[jira] [Updated] (LUCENE-9280) Add ability to skip non-competitive documents on field sort

2021-10-29 Thread Mayya Sharipova (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova updated LUCENE-9280:

Fix Version/s: 8.6

> Add ability to skip non-competitive documents on field sort 
> 
>
> Key: LUCENE-9280
> URL: https://issues.apache.org/jira/browse/LUCENE-9280
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: main (9.0), 8.6
>
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> Today collectors, once they collect enough docs, can instruct scorers to 
> update their iterators to skip non-competitive documents. This is applicable 
> only for a case when we need top docs by _score.
> It would be nice to also have an ability to skip non-competitive docs when we 
> need top docs sorted by other fields different from _score. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

2021-10-29 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436183#comment-17436183
 ] 

Greg Miller commented on LUCENE-10207:
--

I spent a little more time on this today and would be interested in any 
feedback on getting the "term count" for these MultiTermQueries for the purpose 
of estimating cost. For TermInSetQuery, this is known up-front since the user 
provides the set of terms. But MultiTermQueries in general often don't "know" 
their term count up-front (until producing their TermsEnum, which itself seems 
costly).

I'm considering adding a new method to MultiTermQueries that allows 
implementations to provide their term count if known, or -1 if not known. Then 
estimating cost like this:

{code:java}
  Terms indexTerms = context.reader().terms(query.getField());

  int queryTermsCount = query.getTermsCount();
  if (indexTerms == null) {
cost = 0;  // field doesn't exist
  } else if (queryTermsCount == -1) {
cost = indexTerms.getDocCount();
  } else {
cost = Math.min(indexTerms.getDocCount(), queryTermsCount + 
(indexTerms.getSumDocFreq() - indexTerms.size()));
  }
{code}

Does this seem like a reasonable approach? Any other ideas?

The other issue here is that we don't actually know at this point how many of 
the query terms are actually in the index. So this could potentially 
over-estimate cost if there a huge set of terms that aren't in the index. But 
solving for that requires intersecting the indexed terms with the query terms, 
which adds up-front cost.

> Make TermInSetQuery usable with IndexOrDocValuesQuery
> -
>
> Key: LUCENE-10207
> URL: https://issues.apache.org/jira/browse/LUCENE-10207
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-10207_multitermquery.patch
>
>
> IndexOrDocValuesQuery is very useful to pick the right execution mode for a 
> query depending on other bits of the query tree.
> We would like to be able to use it to optimize execution of TermInSetQuery. 
> However IndexOrDocValuesQuery only works well if the "index" query can give 
> an estimation of the cost of the query without doing anything expensive (like 
> looking up all terms of the TermInSetQuery in the terms dict). Maybe we could 
> implement it for primary keys (terms.size() == sumDocFreq) by returning the 
> number of terms of the query? Another idea is to multiply the number of terms 
> by the average postings length, though this could be dangerous if the field 
> has a zipfian distribution and some terms have a much higher doc frequency 
> than the average.
> [~romseygeek] and I were discussing this a few weeks ago, and more recently 
> [~mikemccand] and [~gsmiller] again independently. So it looks like there is 
> interest in this. Here is an email thread where this was recently discussed: 
> https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

2021-10-29 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436183#comment-17436183
 ] 

Greg Miller edited comment on LUCENE-10207 at 10/29/21, 10:35 PM:
--

I spent a little more time on this today and would be interested in any 
feedback on getting the "term count" for these MultiTermQueries for the purpose 
of estimating cost. For TermInSetQuery, this is known up-front since the user 
provides the set of terms. But MultiTermQueries in general often don't "know" 
their term count up-front (until producing their TermsEnum, which itself seems 
costly).

I'm considering adding a new method to MultiTermQueries that allows 
implementations to provide their term count if known, or -1 if not known. Then 
estimating cost like this:

{code:java}
  Terms indexTerms = context.reader().terms(query.getField());

  int queryTermsCount = query.getTermsCount();
  if (indexTerms == null) {
cost = 0;  // field doesn't exist
  } else if (queryTermsCount == -1) {
cost = indexTerms.getDocCount();
  } else {
cost = Math.min(indexTerms.getDocCount(), queryTermsCount + 
(indexTerms.getSumDocFreq() - indexTerms.size()));
  }
{code}

Does this seem like a reasonable approach? Any other ideas? (EDIT: Oh, I should 
note that this code would run when constructing a {{ScorerSupplier}}, eagerly 
evaluating the cost so the supplier can return it as requested before creating 
the {{Scorer}})

The other issue here is that we don't actually know at this point how many of 
the query terms are actually in the index. So this could potentially 
over-estimate cost if there a huge set of terms that aren't in the index. But 
solving for that requires intersecting the indexed terms with the query terms, 
which adds up-front cost.


was (Author: gsmiller):
I spent a little more time on this today and would be interested in any 
feedback on getting the "term count" for these MultiTermQueries for the purpose 
of estimating cost. For TermInSetQuery, this is known up-front since the user 
provides the set of terms. But MultiTermQueries in general often don't "know" 
their term count up-front (until producing their TermsEnum, which itself seems 
costly).

I'm considering adding a new method to MultiTermQueries that allows 
implementations to provide their term count if known, or -1 if not known. Then 
estimating cost like this:

{code:java}
  Terms indexTerms = context.reader().terms(query.getField());

  int queryTermsCount = query.getTermsCount();
  if (indexTerms == null) {
cost = 0;  // field doesn't exist
  } else if (queryTermsCount == -1) {
cost = indexTerms.getDocCount();
  } else {
cost = Math.min(indexTerms.getDocCount(), queryTermsCount + 
(indexTerms.getSumDocFreq() - indexTerms.size()));
  }
{code}

Does this seem like a reasonable approach? Any other ideas?

The other issue here is that we don't actually know at this point how many of 
the query terms are actually in the index. So this could potentially 
over-estimate cost if there a huge set of terms that aren't in the index. But 
solving for that requires intersecting the indexed terms with the query terms, 
which adds up-front cost.

> Make TermInSetQuery usable with IndexOrDocValuesQuery
> -
>
> Key: LUCENE-10207
> URL: https://issues.apache.org/jira/browse/LUCENE-10207
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-10207_multitermquery.patch
>
>
> IndexOrDocValuesQuery is very useful to pick the right execution mode for a 
> query depending on other bits of the query tree.
> We would like to be able to use it to optimize execution of TermInSetQuery. 
> However IndexOrDocValuesQuery only works well if the "index" query can give 
> an estimation of the cost of the query without doing anything expensive (like 
> looking up all terms of the TermInSetQuery in the terms dict). Maybe we could 
> implement it for primary keys (terms.size() == sumDocFreq) by returning the 
> number of terms of the query? Another idea is to multiply the number of terms 
> by the average postings length, though this could be dangerous if the field 
> has a zipfian distribution and some terms have a much higher doc frequency 
> than the average.
> [~romseygeek] and I were discussing this a few weeks ago, and more recently 
> [~mikemccand] and [~gsmiller] again independently. So it looks like there is 
> interest in this. Here is an email thread where this was recently discussed: 
> https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (LUCENE-10201) Upgrade Spatial4j to 0.8

2021-10-29 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436224#comment-17436224
 ] 

ASF subversion and git services commented on LUCENE-10201:
--

Commit c2c215d3a83bda97f298f1d14c0bfd523122ca98 in lucene's branch 
refs/heads/main from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c2c215d ]

LUCENE-10201: Upgrade Spatial4j to 0.8 (#409)

Upgrading Spatial4j to 0.8 improving a varitety of minor things.
See release notes:
https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8

Test-only dependency on JTS is upgraded to 1.17 as well

> Upgrade Spatial4j to 0.8
> 
>
> Key: LUCENE-10201
> URL: https://issues.apache.org/jira/browse/LUCENE-10201
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Spatial4j has been at 0.8 for some time.  We should upgrade.
> [https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dsmiley merged pull request #409: LUCENE-10201: Upgrade Spatial4j to 0.8

2021-10-29 Thread GitBox



dsmiley merged pull request #409:
URL: https://github.com/apache/lucene/pull/409


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10201) Upgrade Spatial4j to 0.8

2021-10-29 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-10201.
---
Fix Version/s: main (9.0)
   Resolution: Fixed

> Upgrade Spatial4j to 0.8
> 
>
> Key: LUCENE-10201
> URL: https://issues.apache.org/jira/browse/LUCENE-10201
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Spatial4j has been at 0.8 for some time.  We should upgrade.
> [https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10212) Add luceneutil benchmark task for CombinedFieldsQuery

2021-10-29 Thread Zach Chen (Jira)

Zach Chen created LUCENE-10212:
--

 Summary: Add luceneutil benchmark task for CombinedFieldsQuery
 Key: LUCENE-10212
 URL: https://issues.apache.org/jira/browse/LUCENE-10212
 Project: Lucene - Core
  Issue Type: Task
Reporter: Zach Chen
Assignee: Zach Chen


This is a spin-off task from https://issues.apache.org/jira/browse/LUCENE-10061 
. In order to objectively evaluate performance changes for CombinedFieldsQuery, 
we would like to  add benchmark task and parsing for CombinedFieldsQuery.

One proposal to the query syntax to enable CombinedFieldsQuery benchmarking 
would be the following:

 
{code:java}
taskName: term1 term2 term3 term4 
+combinedFields=field1^boost1,field2^boost2,field3^boost3
{code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10212) Add luceneutil benchmark task for CombinedFieldsQuery

2021-10-29 Thread Zach Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach Chen updated LUCENE-10212:
---
Description: 
This is a spin-off task from https://issues.apache.org/jira/browse/LUCENE-10061 
. In order to objectively evaluate performance changes for CombinedFieldsQuery, 
we would like to  add benchmark task and parsing for CombinedFieldsQuery.

One proposal to the query syntax to enable CombinedFieldsQuery benchmarking 
would be the following:
{code:java}
taskName: term1 term2 term3 term4 
+combinedFields=field1^boost1,field2^boost2,field3^boost3
{code}
 

 

 

  was:
This is a spin-off task from https://issues.apache.org/jira/browse/LUCENE-10061 
. In order to objectively evaluate performance changes for CombinedFieldsQuery, 
we would like to  add benchmark task and parsing for CombinedFieldsQuery.

One proposal to the query syntax to enable CombinedFieldsQuery benchmarking 
would be the following:

 
{code:java}
taskName: term1 term2 term3 term4 
+combinedFields=field1^boost1,field2^boost2,field3^boost3
{code}
 

 

 


> Add luceneutil benchmark task for CombinedFieldsQuery
> -
>
> Key: LUCENE-10212
> URL: https://issues.apache.org/jira/browse/LUCENE-10212
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Zach Chen
>Assignee: Zach Chen
>Priority: Minor
>
> This is a spin-off task from 
> https://issues.apache.org/jira/browse/LUCENE-10061 . In order to objectively 
> evaluate performance changes for CombinedFieldsQuery, we would like to  add 
> benchmark task and parsing for CombinedFieldsQuery.
> One proposal to the query syntax to enable CombinedFieldsQuery benchmarking 
> would be the following:
> {code:java}
> taskName: term1 term2 term3 term4 
> +combinedFields=field1^boost1,field2^boost2,field3^boost3
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support

2021-10-29 Thread Zach Chen (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436233#comment-17436233
 ] 

Zach Chen commented on LUCENE-10061:


Thanks [~jpountz] for the pointer! I have created a spin-off task for 
luceneutil integration https://issues.apache.org/jira/browse/LUCENE-10212, and 
will actually work on it first and circle back to this task afterward. 

> CombinedFieldsQuery needs dynamic pruning support
> -
>
> Key: LUCENE-10061
> URL: https://issues.apache.org/jira/browse/LUCENE-10061
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, 
> forcing Lucene to collect all matches in order to figure the top-k hits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] apanimesh061 commented on pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety

2021-10-29 Thread GitBox



apanimesh061 commented on pull request #412:
URL: https://github.com/apache/lucene/pull/412#issuecomment-955147804


   The new changes include the following:
   
   1. if `flags` is initialized then boolean flags are ignored
   2. Unit test for new `getFlags()` implementation using builders
   3. Setter for `flags`
   4. Builder can used to set booleans as well as `flags`
   
   Have not implemented the randomized testing as suggested. Still researching 
into how to implement that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support

[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

[GitHub] [lucene] bruno-roustant commented on pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

[GitHub] [lucene] jpountz commented on pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

[GitHub] [lucene] madrob merged pull request #417: Replace deprecated Gradle 7.2 properties

[GitHub] [lucene] jpountz commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

[jira] [Updated] (LUCENE-9280) Add ability to skip non-competitive documents on field sort

[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

[jira] [Comment Edited] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

[jira] [Commented] (LUCENE-10201) Upgrade Spatial4j to 0.8

[GitHub] [lucene] dsmiley merged pull request #409: LUCENE-10201: Upgrade Spatial4j to 0.8

[jira] [Resolved] (LUCENE-10201) Upgrade Spatial4j to 0.8

[jira] [Created] (LUCENE-10212) Add luceneutil benchmark task for CombinedFieldsQuery

[jira] [Updated] (LUCENE-10212) Add luceneutil benchmark task for CombinedFieldsQuery

[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support

[GitHub] [lucene] apanimesh061 commented on pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety

16 matches

Site Navigation

Mail list logo

Footer information