[jira] [Commented] (LUCENE-10120) Lazy initialize FixedBitSet in LRUQueryCache

2021-11-03 Thread Lu Xugang (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437808#comment-17437808
 ] 

Lu Xugang commented on LUCENE-10120:


Hi, [~gsmiller]
{quote}I'm not sure we necessarily need to handle the complexity of docs being 
added out-of-order though{quote}
The purpose of considering whether the doc id is unordered(even some docId is 
duplicated) is that I hope *DocIdSetProducer*  is not just only used in 
LRUQueryCache. eg. User can use *DocIdSetProducer* collect docIds in their 
custom *Collector* or when *FixBitSet* was used to collect unordered even 
duplicated docIds then instead of it with *DocIdSetProducer*

{quote}Feel free to incorporate the ideas from my patch file into your PR if 
you think they make sense{quote}
According to the patch you posted, I rewrote *DocIdSetProducer*, so that it 
only handle ordered docs and post a new PR.

I still keep *RangeDocIdSet* so that it can provide random access for 
conjunction like *FixBitSet* in *BitDocIdSet*  see *RangeDocIdSet#bits()*.



> Lazy initialize FixedBitSet in LRUQueryCache
> 
>
> Key: LUCENE-10120
> URL: https://issues.apache.org/jira/browse/LUCENE-10120
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (10.0)
>Reporter: Lu Xugang
>Priority: Major
> Attachments: 1.png, LUCENE-10120.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Basing on the implement of collecting docIds in DocsWithFieldSet, may be we 
> could do similar way to cache docIdSet in 
> *LRUQueryCache#cacheIntoBitSet(BulkScorer scorer, int maxDoc)* when docIdSet 
> is density.
> In this way , we do not always init a huge FixedBitSet which sometime is not 
> necessary when maxDoc is large
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang opened a new pull request #423: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache

2021-11-03 Thread GitBox


LuXugang opened a new pull request #423:
URL: https://github.com/apache/lucene/pull/423


   Detailed discussion see: https://issues.apache.org/jira/browse/LUCENE-10120 
and https://github.com/apache/lucene/pull/422


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache

2021-11-03 Thread GitBox


rmuir commented on pull request #422:
URL: https://github.com/apache/lucene/pull/422#issuecomment-958756345


   Why do we need a range docidset? RoaringDocIdSet will compress dense 
situations too. It should be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10120) Lazy initialize FixedBitSet in LRUQueryCache

2021-11-03 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437841#comment-17437841
 ] 

Robert Muir commented on LUCENE-10120:
--

I'm concerned about having 3 implementations of bitsets here (roaring, range, 
fixed). Can we please keep it to 2?

> Lazy initialize FixedBitSet in LRUQueryCache
> 
>
> Key: LUCENE-10120
> URL: https://issues.apache.org/jira/browse/LUCENE-10120
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (10.0)
>Reporter: Lu Xugang
>Priority: Major
> Attachments: 1.png, LUCENE-10120.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Basing on the implement of collecting docIds in DocsWithFieldSet, may be we 
> could do similar way to cache docIdSet in 
> *LRUQueryCache#cacheIntoBitSet(BulkScorer scorer, int maxDoc)* when docIdSet 
> is density.
> In this way , we do not always init a huge FixedBitSet which sometime is not 
> necessary when maxDoc is large
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-11-03 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437850#comment-17437850
 ] 

Adrien Grand commented on LUCENE-10196:
---

There's a noticeable 3-4% indexing speedup on 
[http://people.apache.org/~mikemccand/geobench.html#index-times] that is vely 
likely due to this change.

> Improve IntroSorter with 3-ways partitioning
> 
>
> Key: LUCENE-10196
> URL: https://issues.apache.org/jira/browse/LUCENE-10196
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: 8.11
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and with additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
>  - Sort small ranges with insertion sort (instead of binary sort).
>  - Select the quick sort pivot with medians.
>  - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
>  - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-11-03 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437856#comment-17437856
 ] 

Dawid Weiss commented on LUCENE-10196:
--

Go Bruno!

> Improve IntroSorter with 3-ways partitioning
> 
>
> Key: LUCENE-10196
> URL: https://issues.apache.org/jira/browse/LUCENE-10196
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: 8.11
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and with additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
>  - Sort small ranges with insertion sort (instead of binary sort).
>  - Select the quick sort pivot with medians.
>  - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
>  - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-11-03 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437861#comment-17437861
 ] 

Adrien Grand commented on LUCENE-10196:
---

[~broustant] Do you know if i similar improvement can be made to IntroSelector? 
IntroSelector is one of the bottlenecks of this benchmark: 
[http://people.apache.org/~mikemccand/geobench.html#search-polyRussia|http://people.apache.org/~mikemccand/geobench.html#search-polyRussia.],
 which spends significant time converting the Russia polygon into a 
ComponentTree (see ComponentTree#createTree).

> Improve IntroSorter with 3-ways partitioning
> 
>
> Key: LUCENE-10196
> URL: https://issues.apache.org/jira/browse/LUCENE-10196
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: 8.11
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and with additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
>  - Sort small ranges with insertion sort (instead of binary sort).
>  - Select the quick sort pivot with medians.
>  - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
>  - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10088) Too many open files in TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader

2021-11-03 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437862#comment-17437862
 ] 

Dawid Weiss commented on LUCENE-10088:
--

I bumped the limit on just this test class to be twice the default. I know it 
masks the underlying issue but we know what it is, can verify any potential fix 
by removing the annotation, and the repeated failures in jenkins don't bring in 
anything new.

 

This test passes for me with an increased handle count, although it takes a 
lng time for this seed:
{code:java}
-ea -Dtests.seed=21D53F262220F3E9 -Dtests.multiplier=2 -Dtests.nightly=true  
{code}
 

> Too many open files in 
> TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader
> --
>
> Key: LUCENE-10088
> URL: https://issues.apache.org/jira/browse/LUCENE-10088
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>
> [This build 
> failure|https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/386/]
>  reproduces for me.  I'll try to dig.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10217) BufferedUpdates is memory inefficient

2021-11-03 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-10217:
-

 Summary: BufferedUpdates is memory inefficient
 Key: LUCENE-10217
 URL: https://issues.apache.org/jira/browse/LUCENE-10217
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


I recently got a question from [~David Turner] about why {{IndexWriter}} was 
flushing data so frequently despite very small documents. After investigating, 
we noticed that most of the RAM buffer was actually spent on BufferedUpdates 
since his test was using {{IndexWriter#updateDocument}}. This is not surprising 
given that BufferedUpdates accounts BYTES_PER_DEL_TERM=160 bytes per update, 
plus the length of the field and the length of the term, so often around 200 
bytes only to record the updated term.

As a comparison, Lucene's nightly NYC taxis benchmark only needs 286 bytes per 
document in the RAM buffer for about 20 fields, 
(http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#index_docs_per_mb_ram),
 or ~15 bytes per field.

Updates are expected to be slower than appending given that they need to look 
up terms in the dictionary, but I suspect that this memory inefficiency is 
making updates even slower by forcing Lucene to flush its RAM buffer much more 
frequently than it has to when purely appending documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-11-03 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437876#comment-17437876
 ] 

Bruno Roustant commented on LUCENE-10196:
-

Thanks for sharing the benchmark Adrien.

I'm not sure about IntroSelector, but I suppose yes. This is an exciting 
challenge :). I'll find some time to investigate.

> Improve IntroSorter with 3-ways partitioning
> 
>
> Key: LUCENE-10196
> URL: https://issues.apache.org/jira/browse/LUCENE-10196
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: 8.11
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and with additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
>  - Sort small ranges with insertion sort (instead of binary sort).
>  - Select the quick sort pivot with medians.
>  - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
>  - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler opened a new pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …

2021-11-03 Thread GitBox


uschindler opened a new pull request #425:
URL: https://github.com/apache/lucene/pull/425


   See https://issues.apache.org/jira/browse/LUCENE-10218


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #420: [DRAFT] LUCENE-10122 Explore using NumericDocValue to store taxonomy parent array

2021-11-03 Thread GitBox


jpountz commented on a change in pull request #420:
URL: https://github.com/apache/lucene/pull/420#discussion_r741939064



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##
@@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
 if (reader.maxDoc() == first) {
   return;
 }
-
-// it's ok to use MultiTerms because we only iterate on one posting list.
-// breaking it to loop over the leaves() only complicates code for no
-// apparent gain.
-PostingsEnum positions =
-MultiTerms.getTermPostingsEnum(
-reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, 
PostingsEnum.PAYLOADS);
-
-// shouldn't really happen, if it does, something's wrong
-if (positions == null || positions.advance(first) == 
DocIdSetIterator.NO_MORE_DOCS) {
-  throw new CorruptIndexException(
-  "Missing parent data for category " + first, reader.toString());
-}
-
-int num = reader.maxDoc();
-for (int i = first; i < num; i++) {
-  if (positions.docID() == i) {
-if (positions.freq() == 0) { // shouldn't happen
-  throw new CorruptIndexException(
-  "Missing parent data for category " + i, reader.toString());
-}
-
-parents[i] = positions.nextPosition();
-
-if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) {
-  if (i + 1 < num) {
-throw new CorruptIndexException(
-"Missing parent data for category " + (i + 1), 
reader.toString());
-  }
-  break;
+for (LeafReaderContext leafContext: reader.leaves()) {

Review comment:
   Maybe there are benefits of MultiDocValues I'm missing for this specific 
use-case, but in general we prefer consuming data-structures segment-by-segment 
whenever possible and only rely on the MultiXXX classes for merging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …

2021-11-03 Thread GitBox


dweiss commented on pull request #425:
URL: https://github.com/apache/lucene/pull/425#issuecomment-959477250


   I added task graph ordering that enforces the validation runs prior to 
compilation (if they're both scheduled to run). This is safer than just 
scheduling module's tests after validation because you make sure classes from 
other modules have been validated too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …

2021-11-03 Thread GitBox


uschindler merged pull request #425:
URL: https://github.com/apache/lucene/pull/425


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude opened a new pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude opened a new pull request #2599:
URL: https://github.com/apache/lucene-solr/pull/2599


   backport of https://github.com/apache/solr/pull/393


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude merged pull request #2599:
URL: https://github.com/apache/lucene-solr/pull/2599


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude opened a new pull request #2600: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude opened a new pull request #2600:
URL: https://github.com/apache/lucene-solr/pull/2600


   backport of #2599 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2600: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude merged pull request #2600:
URL: https://github.com/apache/lucene-solr/pull/2600


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber

2021-11-03 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438215#comment-17438215
 ] 

Adrien Grand commented on LUCENE-10190:
---

For the record, we just hit this failure again on the 8.11 branch:
{noformat}
[junit4] Suite: org.apache.lucene.index.TestIndexWriter
   [junit4]   2> nov 03, 2021 5:23:40 AM 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
   [junit4]   2> AVVERTENZA: Uncaught exception in thread: 
Thread[Thread-1406,5,TGRP-TestIndexWriter]
   [junit4]   2> java.lang.AssertionError: expected:<1> but was:<0>
   [junit4]   2>at 
__randomizedtesting.SeedInfo.seed([AE7BCAA635E46482]:0)
   [junit4]   2>at org.junit.Assert.fail(Assert.java:89)
   [junit4]   2>at org.junit.Assert.failNotEquals(Assert.java:835)
   [junit4]   2>at org.junit.Assert.assertEquals(Assert.java:647)
   [junit4]   2>at org.junit.Assert.assertEquals(Assert.java:633)
   [junit4]   2>at 
org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4050)
   [junit4]   2>at java.lang.Thread.run(Thread.java:748)
   [junit4]   2> 
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexWriter 
-Dtests.method=testMaxCompletedSequenceNumber -Dtests.seed=AE7BCAA635E46482 
-Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=it-CH 
-Dtests.timezone=America/North_Dakota/Beulah -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   3.86s J3 | TestIndexWriter.testMaxCompletedSequenceNumber 
<<<
   [junit4]> Throwable #1: 
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=1580, name=Thread-1406, state=RUNNABLE, 
group=TGRP-TestIndexWriter]
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([AE7BCAA635E46482:92EF3564AF25E240]:0)
   [junit4]> Caused by: java.lang.AssertionError: expected:<1> but was:<0>
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([AE7BCAA635E46482]:0)
   [junit4]>at 
org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4050)
   [junit4]>at java.lang.Thread.run(Thread.java:748){noformat}

> Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
> -
>
> Key: LUCENE-10190
> URL: https://issues.apache.org/jira/browse/LUCENE-10190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Nhat Nguyen
>Priority: Minor
>
> CI failure in PR at:
> https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246
> Does not reproduce. Stack below.
> {code}
> org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber 
> FAILED
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=1840, name=Thread-1481, 
> state=RUNNABLE, group=TGRP-TestIndexWriter]
> at 
> __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0)
> Caused by:
> java.lang.AssertionError: expected:<1> but was:<0>
> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0)
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at 
> org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305)
> at java.base/java.lang.Thread.run(Thread.java:829)
> org.apache.lucene.index.TestIndexWriter > test suite's output saved to 
> /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt,
>  copied below:
>   2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-1481,5,TGRP-TestIndexWriter]
>   2> java.lang.AssertionError: expected:<1> but was:<0>
>   2>  at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0)
>   2>  at org.junit.Assert.fail(Assert.java:89)
>   2>  at org.junit.Assert.failNotEquals(Assert.java:835)
>   2>  at org.junit.Assert.assertEquals(Assert.java:647)
>   2>  at org.junit.Assert.assertEquals(Assert.java:633)
>   2>  at 
> org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305)
>   2>  at java.base/java.lang.Thread.run(Thread.java:829)
>   2> 
>> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured 
> an uncaught exception i

[GitHub] [lucene-solr] thelabdude opened a new pull request #2601: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests

2021-11-03 Thread GitBox


thelabdude opened a new pull request #2601:
URL: https://github.com/apache/lucene-solr/pull/2601


   backport of https://github.com/apache/solr/pull/394


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2601: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests

2021-11-03 Thread GitBox


thelabdude merged pull request #2601:
URL: https://github.com/apache/lucene-solr/pull/2601


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude opened a new pull request #2602: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests

2021-11-03 Thread GitBox


thelabdude opened a new pull request #2602:
URL: https://github.com/apache/lucene-solr/pull/2602


   backport of #2601 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2602: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests

2021-11-03 Thread GitBox


thelabdude merged pull request #2602:
URL: https://github.com/apache/lucene-solr/pull/2602


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley opened a new pull request #426: Javadocs, Sorter impls:

2021-11-03 Thread GitBox


dsmiley opened a new pull request #426:
URL: https://github.com/apache/lucene/pull/426


   * clarify which sorts are stable/not
   * link from utility methods to the primary Sorter implementations for 
further information
   * describe when InPlaceMergeSorter is useful.  Fix incorrect statement that 
is uses insertion sort.
   
   As an aside, I'm dubious on the value of InPlaceMergeSorter.  If my 
statement in the docs I added is correct, that it's for small arrays to avoid 
allocating memory, then such use-cases could call TimSorter and we could 
enhance TimSorter to up-front recognize it's a "small" array and go directly 
into binarySort without allocating anything.  WDYT?  We could just do that any 
way.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9229) Lucene web site broken links

2021-11-03 Thread Kristin Heibel (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438295#comment-17438295
 ] 

Kristin Heibel commented on LUCENE-9229:


Hello,

I am a newdev on this project, and wish to contribute to this ticket.  I have a 
few questions:

1) Where do I find the sources (for lucene, solr seems better)?

2) Where are the images?  Some of the issues are images, and I'm having issues 
finding the images that I am looking for.

3) Where are the docs compiled on my local machine?  I didn't find a task for 
that in build.gradle.  Is there a separate process?

> Lucene web site broken links
> 
>
> Key: LUCENE-9229
> URL: https://issues.apache.org/jira/browse/LUCENE-9229
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/javadocs, general/website
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: newdev
>
> The new website is live, so I ran a dead-link checker on it to see if 
> anything is broken. Here is the list of dead links. A bunch of them are from 
> JavaDoc or from RefGuide, but some are also missing graphics on the site 
> itself.
> Feel free to grab broken links and commit fixes either to lucene-site repo or 
> lucene-solr repo as you see fit. When a link is fixed, please change (x) into 
> (/) here so we see progress. Some of the broken links are in legacy changes 
> html, guess we don't need to fix those.
> h2. Website
> h3. [http://lucene.apache.org/pylucene/jcc/install.html]
> (/) [404] [https://bugs.python.org/setuptools/issue43]
> h3. [http://lucene.apache.org/pylucene/features.html]
> (/) [404] [https://svn.apache.org/viewcvs.cgi/lucene/pylucene/trunk/test]
> h3. [http://lucene.apache.org/solr/resources.html]
> (/) [0] [http://hrishikesh.karambelkar.co.in/]
>  (/) [404] [http://lucenerevolution.org/past-events/]
>  (/) [404] [http://www.packtpub.com/apache-solr-for-indexing-data/book]
>  (/) [0] [http://www.inkstall.com/]
> h3. [http://lucene.apache.org/pylucene/jcc/features.html]
> (/) [0] [https://peak.telecommunity.com/DevCenter/EasyInstall]
> h2. Javadoc
> h3. [http://lucene.apache.org/solr/api/changes/Changes.html]
> (/) [404] [http://wiki.apache.org/solr/FunctionQuery].
> h3. [http://lucene.apache.org/solr/api/solr-test-framework/overview-tree.html]
> (x) [404] 
> [https://junit.org/junit4/javadoc/4.12/org/junit/rules.TestRule.html?is-external=true]
> h3. 
> [http://lucene.apache.org/solr/api/solr-test-framework/org/apache/solr/util/RevertDefaultThreadHandlerRule.html]
> (x) [404] 
> [https://junit.org/junit4/javadoc/4.12/org/junit/runners/model.Statement.html?is-external=true]
> h3. 
> [http://lucene.apache.org/solr/api/solr-test-framework/org/apache/solr/util/SSLTestConfig.html]
> (x) [404] 
> [https://junit.org/junit4/javadoc/4.12/org/junit/internal.AssumptionViolatedException.html?is-external=true]
> h3. 
> [http://lucene.apache.org/solr/api/solr-solrj/org/apache/solr/common/util/ByteArrayUtf8CharSequence.html]
> (x) [404] [http://download.java.net/java/jdk9/docs/api/java/util/Arrays.html]
> h3. 
> [http://lucene.apache.org/solr/api/solr-solrj/org/apache/solr/client/solrj/SolrQuery.html]
> (x) [404] [https://wiki.apache.org/solr/SimpleFacetParameters]
> h3. 
> [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/legacy/BBoxStrategy.html]
> (x) [404] 
> [http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialClauseAdapter.java]
> h3. 
> [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/search/FastLRUCache.html]
> (x) [404] [http://wiki.apache.org/solr/SolrCaching]
> h3. 
> [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/util/hll/HLL.html]
> (x) [404] 
> [http://guava-libraries.googlecode.com/git/guava/src/com/google/common/hash/Murmur3_128HashFunction.java]
>  (x) [404] 
> [https://github.com/aggregateknowledge/postgresql-hll/blob/master/README.markdown]
>  (x) [404] 
> [https://github.com/aggregateknowledge/postgresql-hll/blob/master/STORAGE.markdown]
> h3. [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/legacy/]
> (x) [404] [http://lucene.apache.org/icons/blank.gif]
>  (x) [404] [http://lucene.apache.org/icons/text.gif]
>  (x) [404] [http://lucene.apache.org/icons/folder.gif]
>  (x) [404] [http://lucene.apache.org/icons/back.gif]
> h3. 
> [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/legacy/doc-files/]
> (x) [404] [http://lucene.apache.org/icons/image2.gif]
> (For RefGuide, please see https://issues.apache.org/jira/browse/SOLR-15497 
> instead)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10122) Explore using NumericDocValue to store taxonomy parent array

2021-11-03 Thread Haoyu Zhai (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438317#comment-17438317
 ] 

Haoyu Zhai commented on LUCENE-10122:
-

The luceneutil benchmark shows a mostly neutral result
{code:java}
TaskQPS base  StdDevQPS cand  StdDev
Pct diff p-value
  Fuzzy2   58.39  (5.6%)   57.70  (6.1%)   
-1.2% ( -12% -   11%) 0.518
BrowseDateTaxoFacets2.40  (6.6%)2.38  (5.8%)   
-0.7% ( -12% -   12%) 0.709
BrowseDayOfYearTaxoFacets2.40  (6.5%)2.38  (5.8%)   
-0.7% ( -12% -   12%) 0.721
   BrowseMonthTaxoFacets2.49  (6.8%)2.47  (6.1%)   
-0.7% ( -12% -   13%) 0.738
   BrowseMonthSSDVFacets   16.44 (36.1%)   16.38 (35.1%)   
-0.4% ( -52% -  110%) 0.974
 LowIntervalsOrdered   30.70  (2.8%)   30.61  (3.0%)   
-0.3% (  -5% -5%) 0.763
   LowPhrase  516.96  (1.7%)  515.67  (1.6%)   
-0.3% (  -3% -3%) 0.626
   OrNotHighHigh  580.07  (2.1%)  578.61  (2.8%)   
-0.3% (  -5% -4%) 0.747
BrowseDayOfYearSSDVFacets   15.22 (24.2%)   15.19 (24.2%)   
-0.2% ( -39% -   63%) 0.976
   HighTermDayOfYearSort  766.98  (1.7%)  765.20  (1.7%)   
-0.2% (  -3% -3%) 0.665
HighIntervalsOrdered2.46  (2.0%)2.45  (2.3%)   
-0.2% (  -4% -4%) 0.795
 MedIntervalsOrdered   27.55  (2.8%)   27.51  (2.8%)   
-0.1% (  -5% -5%) 0.878
  IntNRQ   28.96  (0.3%)   28.92  (0.6%)   
-0.1% (   0% -0%) 0.358
  OrHighHigh   36.05  (2.2%)   36.02  (1.7%)   
-0.1% (  -3% -3%) 0.870
   MedPhrase  119.18  (1.7%)  119.08  (2.0%)   
-0.1% (  -3% -3%) 0.884
 MedSpanNear   99.96  (1.1%)   99.88  (1.2%)   
-0.1% (  -2% -2%) 0.818
 MedTerm 1211.34  (2.4%) 1210.46  (2.2%)   
-0.1% (  -4% -4%) 0.919
 Respell   42.08  (1.9%)   42.06  (2.3%)   
-0.1% (  -4% -4%) 0.931
OrNotHighLow  608.56  (2.1%)  608.41  (2.4%)   
-0.0% (  -4% -4%) 0.971
HighSpanNear   38.01  (2.2%)   38.01  (2.9%)   
-0.0% (  -5% -5%) 0.994
 LowSpanNear   94.41  (1.5%)   94.42  (2.1%)
0.0% (  -3% -3%) 0.975
   OrHighLow  228.92  (2.4%)  228.98  (1.6%)
0.0% (  -3% -4%) 0.971
   OrHighMed   76.23  (2.3%)   76.26  (2.2%)
0.0% (  -4% -4%) 0.951
HighTermTitleBDVSort   19.07  (2.6%)   19.08  (2.5%)
0.0% (  -4% -5%) 0.952
  TermDTSort  312.90  (2.0%)  313.18  (2.5%)
0.1% (  -4% -4%) 0.901
PKLookup  153.21  (2.6%)  153.35  (2.5%)
0.1% (  -4% -5%) 0.910
OrHighNotMed  798.03  (2.0%)  798.83  (2.3%)
0.1% (  -4% -4%) 0.883
   HighTermMonthSort  103.99  (9.9%)  104.10  (9.7%)
0.1% ( -17% -   21%) 0.971
Wildcard  107.61  (2.1%)  107.74  (2.4%)
0.1% (  -4% -4%) 0.859
 Prefix3   82.74 (12.0%)   82.84 (12.1%)
0.1% ( -21% -   27%) 0.973
  HighPhrase   67.96  (2.0%)   68.07  (2.0%)
0.2% (  -3% -4%) 0.792
HighTerm 1058.76  (1.8%) 1060.59  (2.7%)
0.2% (  -4% -4%) 0.812
   OrHighNotHigh  528.01  (1.8%)  529.17  (2.5%)
0.2% (  -4% -4%) 0.751
  Fuzzy1   42.70  (3.0%)   42.80  (3.3%)
0.2% (  -5% -6%) 0.814
OrNotHighMed  613.17  (2.6%)  614.97  (2.6%)
0.3% (  -4% -5%) 0.722
 MedSloppyPhrase   15.29  (1.8%)   15.34  (2.2%)
0.3% (  -3% -4%) 0.601
OrHighNotLow  590.46  (2.5%)  592.57  (2.9%)
0.4% (  -4% -5%) 0.677
  AndHighLow  518.23  (2.5%)  520.65  (2.9%)
0.5% (  -4% -6%) 0.585
 LowTerm 1137.40  (2.9%) 1143.47  (2.8%)
0.5% (  -5% -6%) 0.556
HighSloppyPhrase   10.76  (3.2%)   10.82  (3.6%)
0.6% (  -6% -7%) 0.602
 LowSloppyPhrase  152.21  (2.1%)  153.24  (2.4%)
0.7% (  -3% -5%) 0.350
  AndHighMed  170.44  (2.5%)  171.76  (3.6%)
0.8% (  -5% -7%) 0.426
 AndHighHigh   64.45  (3.2%)   65.07  (4.4%)
1.0% (  -6% -8%) 0.424
{code}
 And size of taxonomy index does not change. 

I've also ran the internal benchmark we use

[jira] [Commented] (LUCENE-10216) Add concurrency to addIndexes(CodecReader…) API

2021-11-03 Thread Vigya Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438318#comment-17438318
 ] 

Vigya Sharma commented on LUCENE-10216:
---

Thanks for going through the proposal, [~jpountz].
{quote}One bit from this proposal I'm not fully comfortable with is the fact 
that Lucene would merge each reader independently when the flag is set. 
{quote}
I understand your concern, I am also a bit iffy about losing the opportunity to 
merge segments..

Ideally, I would like to land on a sweet middle ground - not run single segment 
merges, but also not add all provided segments into a single merge. Leverage 
some concurrency within addIndexes, and let background merges bring the segment 
count down further.

How do you feel about making this configurable in the API with a param that 
defines the number of segments merged together ({{MergeFactor?}}). We could 
make it flag to keep things simple for consumers, with values like {{ONE, 
THREAD_WIDE, and ALL, }}where {{ONE}} = single segment merges, {{ALL}} = all 
segments in one merge, and {{THREAD_WIDE}} = each merge gets 
({{readers.length/numThreads)}} number of segments. The default here could be 
{{ALL}} to retain current behavior.

 
{quote}Then you could still do what you want by using 
{{ConcurrentMergeScheduler}}, passing codec readers one by one to 
{{CodecReader#addIndexes}} with {{doWait=false}} and get resulting remapped 
segments as quickly as possible.
{quote}
The doWait flag would make the API non-blocking. But it would add additional 
steps for users to track when the merges triggered by addIndexes have 
completed. For segment replication, addIndexes is not useable until its merges 
complete. Merge within addIndexes(), is what creates segments with recomputed 
ordinal values. Until that is done, there are no segments available to copy to 
searcher (and replica) hosts.

We also want to bring the segment count down to as low as possible at Amazon 
product search. The tradeoff we are making here, is to first go with a higher 
segment count to make documents quickly available for search, then run 
background merges and bring down the segment count. This is similar to the 
other variant of add indexes - the {{addIndexes(Directory[])}} API, which 
copies all segments from provided directories into current index dir and then 
triggers a background merge.

I feel making segments available quickly would be useful for anyone who has 
multiple replica search hosts and uses segment replication.

 

> Add concurrency to addIndexes(CodecReader…) API
> ---
>
> Key: LUCENE-10216
> URL: https://issues.apache.org/jira/browse/LUCENE-10216
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Vigya Sharma
>Priority: Major
>
> I work at Amazon Product Search, and we use Lucene to power search for the 
> e-commerce platform. I’m working on a project that involves applying 
> metadata+ETL transforms and indexing documents on n different _indexing_ 
> boxes, combining them into a single index on a separate _reducer_ box, and 
> making it available for queries on m different _search_ boxes (replicas). 
> Segments are asynchronously copied from indexers to reducers to searchers as 
> they become available for the next layer to consume.
> I am using the addIndexes API to combine multiple indexes into one on the 
> reducer boxes. Since we also have taxonomy data, we need to remap facet field 
> ordinals, which means I need to use the {{addIndexes(CodecReader…)}} version 
> of this API. The API leverages {{SegmentMerger.merge()}} to create segments 
> with new ordinal values while also merging all provided segments in the 
> process.
> _This is however a blocking call that runs in a single thread._ Until we have 
> written segments with new ordinal values, we cannot copy them to searcher 
> boxes, which increases the time to make documents available for search.
> I was playing around with the API by creating multiple concurrent merges, 
> each with only a single reader, creating a concurrently running 1:1 
> conversion from old segments to new ones (with new ordinal values). We follow 
> this up with non-blocking background merges. This lets us copy the segments 
> to searchers and replicas as soon as they are available, and later replace 
> them with merged segments as background jobs complete. On the Amazon dataset 
> I profiled, this gave us around 2.5 to 3x improvement in addIndexes() time. 
> Each call was given about 5 readers to add on average.
> This might be useful add to Lucene. We could create another {{addIndexes()}} 
> API with a {{boolean}} flag for concurrency, that internally submits multiple 
> merge jobs (each with a single reader) to the {{ConcurrentMergeScheduler}}, 
> and waits for 

[GitHub] [lucene] rmuir commented on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache

2021-11-03 Thread GitBox


rmuir commented on pull request #422:
URL: https://github.com/apache/lucene/pull/422#issuecomment-958756345


   Why do we need a range docidset? RoaringDocIdSet will compress dense 
situations too. It should be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude merged pull request #2599:
URL: https://github.com/apache/lucene-solr/pull/2599


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2601: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests

2021-11-03 Thread GitBox


thelabdude merged pull request #2601:
URL: https://github.com/apache/lucene-solr/pull/2601






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2602: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests

2021-11-03 Thread GitBox


thelabdude merged pull request #2602:
URL: https://github.com/apache/lucene-solr/pull/2602






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2600: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude merged pull request #2600:
URL: https://github.com/apache/lucene-solr/pull/2600






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …

2021-11-03 Thread GitBox


dweiss commented on pull request #425:
URL: https://github.com/apache/lucene/pull/425#issuecomment-959477250






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #420: [DRAFT] LUCENE-10122 Explore using NumericDocValue to store taxonomy parent array

2021-11-03 Thread GitBox


jpountz commented on a change in pull request #420:
URL: https://github.com/apache/lucene/pull/420#discussion_r741939064



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##
@@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
 if (reader.maxDoc() == first) {
   return;
 }
-
-// it's ok to use MultiTerms because we only iterate on one posting list.
-// breaking it to loop over the leaves() only complicates code for no
-// apparent gain.
-PostingsEnum positions =
-MultiTerms.getTermPostingsEnum(
-reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, 
PostingsEnum.PAYLOADS);
-
-// shouldn't really happen, if it does, something's wrong
-if (positions == null || positions.advance(first) == 
DocIdSetIterator.NO_MORE_DOCS) {
-  throw new CorruptIndexException(
-  "Missing parent data for category " + first, reader.toString());
-}
-
-int num = reader.maxDoc();
-for (int i = first; i < num; i++) {
-  if (positions.docID() == i) {
-if (positions.freq() == 0) { // shouldn't happen
-  throw new CorruptIndexException(
-  "Missing parent data for category " + i, reader.toString());
-}
-
-parents[i] = positions.nextPosition();
-
-if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) {
-  if (i + 1 < num) {
-throw new CorruptIndexException(
-"Missing parent data for category " + (i + 1), 
reader.toString());
-  }
-  break;
+for (LeafReaderContext leafContext: reader.leaves()) {

Review comment:
   Maybe there are benefits of MultiDocValues I'm missing for this specific 
use-case, but in general we prefer consuming data-structures segment-by-segment 
whenever possible and only rely on the MultiXXX classes for merging.

##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##
@@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
 if (reader.maxDoc() == first) {
   return;
 }
-
-// it's ok to use MultiTerms because we only iterate on one posting list.
-// breaking it to loop over the leaves() only complicates code for no
-// apparent gain.
-PostingsEnum positions =
-MultiTerms.getTermPostingsEnum(
-reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, 
PostingsEnum.PAYLOADS);
-
-// shouldn't really happen, if it does, something's wrong
-if (positions == null || positions.advance(first) == 
DocIdSetIterator.NO_MORE_DOCS) {
-  throw new CorruptIndexException(
-  "Missing parent data for category " + first, reader.toString());
-}
-
-int num = reader.maxDoc();
-for (int i = first; i < num; i++) {
-  if (positions.docID() == i) {
-if (positions.freq() == 0) { // shouldn't happen
-  throw new CorruptIndexException(
-  "Missing parent data for category " + i, reader.toString());
-}
-
-parents[i] = positions.nextPosition();
-
-if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) {
-  if (i + 1 < num) {
-throw new CorruptIndexException(
-"Missing parent data for category " + (i + 1), 
reader.toString());
-  }
-  break;
+for (LeafReaderContext leafContext: reader.leaves()) {

Review comment:
   Maybe there are benefits of MultiDocValues I'm missing for this specific 
use-case, but in general we prefer consuming data-structures segment-by-segment 
whenever possible and only rely on the MultiXXX classes for merging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …

2021-11-03 Thread GitBox


uschindler merged pull request #425:
URL: https://github.com/apache/lucene/pull/425






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude merged pull request #2599:
URL: https://github.com/apache/lucene-solr/pull/2599


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] apanimesh061 commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety

2021-11-03 Thread GitBox


apanimesh061 commented on a change in pull request #412:
URL: https://github.com/apache/lucene/pull/412#discussion_r742449547



##
File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java
##
@@ -143,88 +135,168 @@
 
   private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD;
 
-  /** Extracts matching terms after rewriting against an empty index */
-  protected static Set extractTerms(Query query) throws IOException {
-Set queryTerms = new HashSet<>();
-
EMPTY_INDEXSEARCHER.rewrite(query).visit(QueryVisitor.termCollector(queryTerms));
-return queryTerms;
-  }
+  private Set flags;
+
+  /** Builder for UnifiedHighlighter. */
+  public abstract static class Builder> {

Review comment:
   Yes thanks for the feedback here. I'll  work on this builder now.
   
   Sorry this past week I got super busy, so could not get to this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache

2021-11-03 Thread GitBox


rmuir commented on pull request #422:
URL: https://github.com/apache/lucene/pull/422#issuecomment-958756345






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude merged pull request #2599:
URL: https://github.com/apache/lucene-solr/pull/2599


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2601: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests

2021-11-03 Thread GitBox


thelabdude merged pull request #2601:
URL: https://github.com/apache/lucene-solr/pull/2601


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] apanimesh061 commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety

2021-11-03 Thread GitBox


apanimesh061 commented on a change in pull request #412:
URL: https://github.com/apache/lucene/pull/412#discussion_r742449547



##
File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java
##
@@ -143,88 +135,168 @@
 
   private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD;
 
-  /** Extracts matching terms after rewriting against an empty index */
-  protected static Set extractTerms(Query query) throws IOException {
-Set queryTerms = new HashSet<>();
-
EMPTY_INDEXSEARCHER.rewrite(query).visit(QueryVisitor.termCollector(queryTerms));
-return queryTerms;
-  }
+  private Set flags;
+
+  /** Builder for UnifiedHighlighter. */
+  public abstract static class Builder> {

Review comment:
   Yes thanks for the feedback here. I'll  work on this builder now.
   
   Sorry this past week I got super busy, so could not get to this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #420: [DRAFT] LUCENE-10122 Explore using NumericDocValue to store taxonomy parent array

2021-11-03 Thread GitBox


jpountz commented on a change in pull request #420:
URL: https://github.com/apache/lucene/pull/420#discussion_r741939064



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##
@@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
 if (reader.maxDoc() == first) {
   return;
 }
-
-// it's ok to use MultiTerms because we only iterate on one posting list.
-// breaking it to loop over the leaves() only complicates code for no
-// apparent gain.
-PostingsEnum positions =
-MultiTerms.getTermPostingsEnum(
-reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, 
PostingsEnum.PAYLOADS);
-
-// shouldn't really happen, if it does, something's wrong
-if (positions == null || positions.advance(first) == 
DocIdSetIterator.NO_MORE_DOCS) {
-  throw new CorruptIndexException(
-  "Missing parent data for category " + first, reader.toString());
-}
-
-int num = reader.maxDoc();
-for (int i = first; i < num; i++) {
-  if (positions.docID() == i) {
-if (positions.freq() == 0) { // shouldn't happen
-  throw new CorruptIndexException(
-  "Missing parent data for category " + i, reader.toString());
-}
-
-parents[i] = positions.nextPosition();
-
-if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) {
-  if (i + 1 < num) {
-throw new CorruptIndexException(
-"Missing parent data for category " + (i + 1), 
reader.toString());
-  }
-  break;
+for (LeafReaderContext leafContext: reader.leaves()) {

Review comment:
   Maybe there are benefits of MultiDocValues I'm missing for this specific 
use-case, but in general we prefer consuming data-structures segment-by-segment 
whenever possible and only rely on the MultiXXX classes for merging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …

2021-11-03 Thread GitBox


dweiss commented on pull request #425:
URL: https://github.com/apache/lucene/pull/425#issuecomment-959477250


   I added task graph ordering that enforces the validation runs prior to 
compilation (if they're both scheduled to run). This is safer than just 
scheduling module's tests after validation because you make sure classes from 
other modules have been validated too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2600: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin

2021-11-03 Thread GitBox


thelabdude merged pull request #2600:
URL: https://github.com/apache/lucene-solr/pull/2600


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2602: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests

2021-11-03 Thread GitBox


thelabdude merged pull request #2602:
URL: https://github.com/apache/lucene-solr/pull/2602


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …

2021-11-03 Thread GitBox


uschindler merged pull request #425:
URL: https://github.com/apache/lucene/pull/425


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache

2021-11-03 Thread GitBox


LuXugang commented on pull request #422:
URL: https://github.com/apache/lucene/pull/422#issuecomment-960382982


   > RoaringDocIdSet will compress dense situations too. It should be used.
   
   As @jpountz said in 
[LUCENE-10120](https://issues.apache.org/jira/browse/LUCENE-10120), 
LRUQueryCache always use RoaringDocIdSet for caching no matter dense or sparse 
until for conjunction optimization, then FixedBitSet was used for caching when 
the condition is scorer.cost() * 100 >= maxDoc which means very dense, It also 
means a huge size FixedBitSet will be cached.  more details see  
[LUCENE-7339](https://issues.apache.org/jira/browse/LUCENE-7339)  and 
[LUCENE-7330](https://issues.apache.org/jira/browse/LUCENE-7330).
   
   > Why do we need a range docidset?
   
   So the purpose of range docidset in this RP is trying to only cahce minDoc 
and maxDoc while no docId gap in  FixedBitSet, and still have a random access 
ability see then implementation of RangeDocIdSet#bits() in RP.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang edited a comment on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache

2021-11-03 Thread GitBox


LuXugang edited a comment on pull request #422:
URL: https://github.com/apache/lucene/pull/422#issuecomment-960382982


   > RoaringDocIdSet will compress dense situations too. It should be used.
   
   As @jpountz said in 
[LUCENE-10120](https://issues.apache.org/jira/browse/LUCENE-10120), 
LRUQueryCache always use RoaringDocIdSet for caching no matter dense or sparse 
until for conjunction optimization, then FixedBitSet was used for caching when 
the condition is scorer.cost() * 100 >= maxDoc which means very dense, It also 
means a huge size FixedBitSet will be cached.  more details see  
[LUCENE-7339](https://issues.apache.org/jira/browse/LUCENE-7339)  and 
[LUCENE-7330](https://issues.apache.org/jira/browse/LUCENE-7330).
   
   > Why do we need a range docidset?
   
   So the purpose of range docidset in this RP is trying to only cahce minDoc 
and maxDoc while no docId gap in  FixedBitSet, and still have a random access 
ability see the implementation of RangeDocIdSet#bits() in RP.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #418: LUCENE-10061: [WIP] Implements basic dynamic pruning support for CombinedFieldsQuery

2021-11-03 Thread GitBox


zacharymorn commented on pull request #418:
URL: https://github.com/apache/lucene/pull/418#issuecomment-960493545


   Perf tests result for commit 2ba435e5c83f870be9566
   Run 1:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                        CFQHighHigh        3.53      (3.3%)        2.92      
(5.3%)  -17.2% ( -25% -   -8%) 0.000
                           PKLookup      108.13      (7.7%)      119.85      
(8.2%)   10.8% (  -4% -   28%) 0.000
                         CFQHighLow       14.88      (3.9%)       16.69     
(12.5%)   12.2% (  -3% -   29%) 0.000
                         CFQHighMed       21.11      (4.1%)       25.87     
(11.8%)   22.6% (   6% -   40%) 0.000
   
   Run 2:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                        CFQHighHigh        6.64      (3.1%)        5.63     
(10.2%)  -15.2% ( -27% -   -1%) 0.000
                         CFQHighLow        8.35      (2.8%)        8.05     
(15.0%)   -3.6% ( -20% -   14%) 0.297
                         CFQHighMed       24.51      (5.3%)       24.90     
(19.9%)    1.6% ( -22% -   28%) 0.733
                           PKLookup      110.06     (10.0%)      128.54      
(7.9%)   16.8% (  -1% -   38%) 0.000
   
   Run 3:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                         CFQHighMed       13.01      (2.9%)        9.82      
(7.8%)  -24.5% ( -34% -  -14%) 0.000
                           PKLookup      107.85      (8.1%)      111.79     
(11.2%)    3.7% ( -14% -   24%) 0.236
                        CFQHighHigh        4.83      (2.6%)        5.06      
(8.6%)    4.7% (  -6% -   16%) 0.018
                         CFQHighLow       14.95      (3.0%)       18.31     
(19.0%)   22.5% (   0% -   45%) 0.000
   
   Run 4:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                         CFQHighMed       11.11      (2.9%)        6.69      
(4.1%)  -39.7% ( -45% -  -33%) 0.000
                         CFQHighLow       27.55      (3.8%)       25.46     
(11.0%)   -7.6% ( -21% -    7%) 0.003
                        CFQHighHigh        5.25      (3.2%)        4.96      
(6.1%)   -5.7% ( -14% -    3%) 0.000
                           PKLookup      107.61      (6.7%)      121.19      
(4.6%)   12.6% (   1% -   25%) 0.000
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org