[jira] [Commented] (LUCENE-10120) Lazy initialize FixedBitSet in LRUQueryCache
[ https://issues.apache.org/jira/browse/LUCENE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437808#comment-17437808 ] Lu Xugang commented on LUCENE-10120: Hi, [~gsmiller] {quote}I'm not sure we necessarily need to handle the complexity of docs being added out-of-order though{quote} The purpose of considering whether the doc id is unordered(even some docId is duplicated) is that I hope *DocIdSetProducer* is not just only used in LRUQueryCache. eg. User can use *DocIdSetProducer* collect docIds in their custom *Collector* or when *FixBitSet* was used to collect unordered even duplicated docIds then instead of it with *DocIdSetProducer* {quote}Feel free to incorporate the ideas from my patch file into your PR if you think they make sense{quote} According to the patch you posted, I rewrote *DocIdSetProducer*, so that it only handle ordered docs and post a new PR. I still keep *RangeDocIdSet* so that it can provide random access for conjunction like *FixBitSet* in *BitDocIdSet* see *RangeDocIdSet#bits()*. > Lazy initialize FixedBitSet in LRUQueryCache > > > Key: LUCENE-10120 > URL: https://issues.apache.org/jira/browse/LUCENE-10120 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: main (10.0) >Reporter: Lu Xugang >Priority: Major > Attachments: 1.png, LUCENE-10120.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Basing on the implement of collecting docIds in DocsWithFieldSet, may be we > could do similar way to cache docIdSet in > *LRUQueryCache#cacheIntoBitSet(BulkScorer scorer, int maxDoc)* when docIdSet > is density. > In this way , we do not always init a huge FixedBitSet which sometime is not > necessary when maxDoc is large > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new pull request #423: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache
LuXugang opened a new pull request #423: URL: https://github.com/apache/lucene/pull/423 Detailed discussion see: https://issues.apache.org/jira/browse/LUCENE-10120 and https://github.com/apache/lucene/pull/422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache
rmuir commented on pull request #422: URL: https://github.com/apache/lucene/pull/422#issuecomment-958756345 Why do we need a range docidset? RoaringDocIdSet will compress dense situations too. It should be used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10120) Lazy initialize FixedBitSet in LRUQueryCache
[ https://issues.apache.org/jira/browse/LUCENE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437841#comment-17437841 ] Robert Muir commented on LUCENE-10120: -- I'm concerned about having 3 implementations of bitsets here (roaring, range, fixed). Can we please keep it to 2? > Lazy initialize FixedBitSet in LRUQueryCache > > > Key: LUCENE-10120 > URL: https://issues.apache.org/jira/browse/LUCENE-10120 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: main (10.0) >Reporter: Lu Xugang >Priority: Major > Attachments: 1.png, LUCENE-10120.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Basing on the implement of collecting docIds in DocsWithFieldSet, may be we > could do similar way to cache docIdSet in > *LRUQueryCache#cacheIntoBitSet(BulkScorer scorer, int maxDoc)* when docIdSet > is density. > In this way , we do not always init a huge FixedBitSet which sometime is not > necessary when maxDoc is large > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437850#comment-17437850 ] Adrien Grand commented on LUCENE-10196: --- There's a noticeable 3-4% indexing speedup on [http://people.apache.org/~mikemccand/geobench.html#index-times] that is vely likely due to this change. > Improve IntroSorter with 3-ways partitioning > > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Fix For: 8.11 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and with additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437856#comment-17437856 ] Dawid Weiss commented on LUCENE-10196: -- Go Bruno! > Improve IntroSorter with 3-ways partitioning > > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Fix For: 8.11 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and with additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437861#comment-17437861 ] Adrien Grand commented on LUCENE-10196: --- [~broustant] Do you know if i similar improvement can be made to IntroSelector? IntroSelector is one of the bottlenecks of this benchmark: [http://people.apache.org/~mikemccand/geobench.html#search-polyRussia|http://people.apache.org/~mikemccand/geobench.html#search-polyRussia.], which spends significant time converting the Russia polygon into a ComponentTree (see ComponentTree#createTree). > Improve IntroSorter with 3-ways partitioning > > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Fix For: 8.11 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and with additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10088) Too many open files in TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader
[ https://issues.apache.org/jira/browse/LUCENE-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437862#comment-17437862 ] Dawid Weiss commented on LUCENE-10088: -- I bumped the limit on just this test class to be twice the default. I know it masks the underlying issue but we know what it is, can verify any potential fix by removing the annotation, and the repeated failures in jenkins don't bring in anything new. This test passes for me with an increased handle count, although it takes a lng time for this seed: {code:java} -ea -Dtests.seed=21D53F262220F3E9 -Dtests.multiplier=2 -Dtests.nightly=true {code} > Too many open files in > TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader > -- > > Key: LUCENE-10088 > URL: https://issues.apache.org/jira/browse/LUCENE-10088 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > [This build > failure|https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/386/] > reproduces for me. I'll try to dig. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10217) BufferedUpdates is memory inefficient
Adrien Grand created LUCENE-10217: - Summary: BufferedUpdates is memory inefficient Key: LUCENE-10217 URL: https://issues.apache.org/jira/browse/LUCENE-10217 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand I recently got a question from [~David Turner] about why {{IndexWriter}} was flushing data so frequently despite very small documents. After investigating, we noticed that most of the RAM buffer was actually spent on BufferedUpdates since his test was using {{IndexWriter#updateDocument}}. This is not surprising given that BufferedUpdates accounts BYTES_PER_DEL_TERM=160 bytes per update, plus the length of the field and the length of the term, so often around 200 bytes only to record the updated term. As a comparison, Lucene's nightly NYC taxis benchmark only needs 286 bytes per document in the RAM buffer for about 20 fields, (http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#index_docs_per_mb_ram), or ~15 bytes per field. Updates are expected to be slower than appending given that they need to look up terms in the dictionary, but I suspect that this memory inefficiency is making updates even slower by forcing Lucene to flush its RAM buffer much more frequently than it has to when purely appending documents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437876#comment-17437876 ] Bruno Roustant commented on LUCENE-10196: - Thanks for sharing the benchmark Adrien. I'm not sure about IntroSelector, but I suppose yes. This is an exciting challenge :). I'll find some time to investigate. > Improve IntroSorter with 3-ways partitioning > > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Fix For: 8.11 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and with additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …
uschindler opened a new pull request #425: URL: https://github.com/apache/lucene/pull/425 See https://issues.apache.org/jira/browse/LUCENE-10218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #420: [DRAFT] LUCENE-10122 Explore using NumericDocValue to store taxonomy parent array
jpountz commented on a change in pull request #420: URL: https://github.com/apache/lucene/pull/420#discussion_r741939064 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java ## @@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) throws IOException { if (reader.maxDoc() == first) { return; } - -// it's ok to use MultiTerms because we only iterate on one posting list. -// breaking it to loop over the leaves() only complicates code for no -// apparent gain. -PostingsEnum positions = -MultiTerms.getTermPostingsEnum( -reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, PostingsEnum.PAYLOADS); - -// shouldn't really happen, if it does, something's wrong -if (positions == null || positions.advance(first) == DocIdSetIterator.NO_MORE_DOCS) { - throw new CorruptIndexException( - "Missing parent data for category " + first, reader.toString()); -} - -int num = reader.maxDoc(); -for (int i = first; i < num; i++) { - if (positions.docID() == i) { -if (positions.freq() == 0) { // shouldn't happen - throw new CorruptIndexException( - "Missing parent data for category " + i, reader.toString()); -} - -parents[i] = positions.nextPosition(); - -if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) { - if (i + 1 < num) { -throw new CorruptIndexException( -"Missing parent data for category " + (i + 1), reader.toString()); - } - break; +for (LeafReaderContext leafContext: reader.leaves()) { Review comment: Maybe there are benefits of MultiDocValues I'm missing for this specific use-case, but in general we prefer consuming data-structures segment-by-segment whenever possible and only rely on the MultiXXX classes for merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …
dweiss commented on pull request #425: URL: https://github.com/apache/lucene/pull/425#issuecomment-959477250 I added task graph ordering that enforces the validation runs prior to compilation (if they're both scheduled to run). This is safer than just scheduling module's tests after validation because you make sure classes from other modules have been validated too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …
uschindler merged pull request #425: URL: https://github.com/apache/lucene/pull/425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude opened a new pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude opened a new pull request #2599: URL: https://github.com/apache/lucene-solr/pull/2599 backport of https://github.com/apache/solr/pull/393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude merged pull request #2599: URL: https://github.com/apache/lucene-solr/pull/2599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude opened a new pull request #2600: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude opened a new pull request #2600: URL: https://github.com/apache/lucene-solr/pull/2600 backport of #2599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2600: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude merged pull request #2600: URL: https://github.com/apache/lucene-solr/pull/2600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
[ https://issues.apache.org/jira/browse/LUCENE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438215#comment-17438215 ] Adrien Grand commented on LUCENE-10190: --- For the record, we just hit this failure again on the 8.11 branch: {noformat} [junit4] Suite: org.apache.lucene.index.TestIndexWriter [junit4] 2> nov 03, 2021 5:23:40 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> AVVERTENZA: Uncaught exception in thread: Thread[Thread-1406,5,TGRP-TestIndexWriter] [junit4] 2> java.lang.AssertionError: expected:<1> but was:<0> [junit4] 2>at __randomizedtesting.SeedInfo.seed([AE7BCAA635E46482]:0) [junit4] 2>at org.junit.Assert.fail(Assert.java:89) [junit4] 2>at org.junit.Assert.failNotEquals(Assert.java:835) [junit4] 2>at org.junit.Assert.assertEquals(Assert.java:647) [junit4] 2>at org.junit.Assert.assertEquals(Assert.java:633) [junit4] 2>at org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4050) [junit4] 2>at java.lang.Thread.run(Thread.java:748) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtests.method=testMaxCompletedSequenceNumber -Dtests.seed=AE7BCAA635E46482 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=it-CH -Dtests.timezone=America/North_Dakota/Beulah -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 3.86s J3 | TestIndexWriter.testMaxCompletedSequenceNumber <<< [junit4]> Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1580, name=Thread-1406, state=RUNNABLE, group=TGRP-TestIndexWriter] [junit4]>at __randomizedtesting.SeedInfo.seed([AE7BCAA635E46482:92EF3564AF25E240]:0) [junit4]> Caused by: java.lang.AssertionError: expected:<1> but was:<0> [junit4]>at __randomizedtesting.SeedInfo.seed([AE7BCAA635E46482]:0) [junit4]>at org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4050) [junit4]>at java.lang.Thread.run(Thread.java:748){noformat} > Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber > - > > Key: LUCENE-10190 > URL: https://issues.apache.org/jira/browse/LUCENE-10190 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Nhat Nguyen >Priority: Minor > > CI failure in PR at: > https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246 > Does not reproduce. Stack below. > {code} > org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber > FAILED > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] > at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) > Caused by: > java.lang.AssertionError: expected:<1> but was:<0> > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > at java.base/java.lang.Thread.run(Thread.java:829) > org.apache.lucene.index.TestIndexWriter > test suite's output saved to > /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt, > copied below: > 2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 2> WARNING: Uncaught exception in thread: > Thread[Thread-1481,5,TGRP-TestIndexWriter] > 2> java.lang.AssertionError: expected:<1> but was:<0> > 2> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > 2> at org.junit.Assert.fail(Assert.java:89) > 2> at org.junit.Assert.failNotEquals(Assert.java:835) > 2> at org.junit.Assert.assertEquals(Assert.java:647) > 2> at org.junit.Assert.assertEquals(Assert.java:633) > 2> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > 2> at java.base/java.lang.Thread.run(Thread.java:829) > 2> >> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured > an uncaught exception i
[GitHub] [lucene-solr] thelabdude opened a new pull request #2601: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests
thelabdude opened a new pull request #2601: URL: https://github.com/apache/lucene-solr/pull/2601 backport of https://github.com/apache/solr/pull/394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2601: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests
thelabdude merged pull request #2601: URL: https://github.com/apache/lucene-solr/pull/2601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude opened a new pull request #2602: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests
thelabdude opened a new pull request #2602: URL: https://github.com/apache/lucene-solr/pull/2602 backport of #2601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2602: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests
thelabdude merged pull request #2602: URL: https://github.com/apache/lucene-solr/pull/2602 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley opened a new pull request #426: Javadocs, Sorter impls:
dsmiley opened a new pull request #426: URL: https://github.com/apache/lucene/pull/426 * clarify which sorts are stable/not * link from utility methods to the primary Sorter implementations for further information * describe when InPlaceMergeSorter is useful. Fix incorrect statement that is uses insertion sort. As an aside, I'm dubious on the value of InPlaceMergeSorter. If my statement in the docs I added is correct, that it's for small arrays to avoid allocating memory, then such use-cases could call TimSorter and we could enhance TimSorter to up-front recognize it's a "small" array and go directly into binarySort without allocating anything. WDYT? We could just do that any way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9229) Lucene web site broken links
[ https://issues.apache.org/jira/browse/LUCENE-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438295#comment-17438295 ] Kristin Heibel commented on LUCENE-9229: Hello, I am a newdev on this project, and wish to contribute to this ticket. I have a few questions: 1) Where do I find the sources (for lucene, solr seems better)? 2) Where are the images? Some of the issues are images, and I'm having issues finding the images that I am looking for. 3) Where are the docs compiled on my local machine? I didn't find a task for that in build.gradle. Is there a separate process? > Lucene web site broken links > > > Key: LUCENE-9229 > URL: https://issues.apache.org/jira/browse/LUCENE-9229 > Project: Lucene - Core > Issue Type: Bug > Components: general/javadocs, general/website >Reporter: Jan Høydahl >Priority: Major > Labels: newdev > > The new website is live, so I ran a dead-link checker on it to see if > anything is broken. Here is the list of dead links. A bunch of them are from > JavaDoc or from RefGuide, but some are also missing graphics on the site > itself. > Feel free to grab broken links and commit fixes either to lucene-site repo or > lucene-solr repo as you see fit. When a link is fixed, please change (x) into > (/) here so we see progress. Some of the broken links are in legacy changes > html, guess we don't need to fix those. > h2. Website > h3. [http://lucene.apache.org/pylucene/jcc/install.html] > (/) [404] [https://bugs.python.org/setuptools/issue43] > h3. [http://lucene.apache.org/pylucene/features.html] > (/) [404] [https://svn.apache.org/viewcvs.cgi/lucene/pylucene/trunk/test] > h3. [http://lucene.apache.org/solr/resources.html] > (/) [0] [http://hrishikesh.karambelkar.co.in/] > (/) [404] [http://lucenerevolution.org/past-events/] > (/) [404] [http://www.packtpub.com/apache-solr-for-indexing-data/book] > (/) [0] [http://www.inkstall.com/] > h3. [http://lucene.apache.org/pylucene/jcc/features.html] > (/) [0] [https://peak.telecommunity.com/DevCenter/EasyInstall] > h2. Javadoc > h3. [http://lucene.apache.org/solr/api/changes/Changes.html] > (/) [404] [http://wiki.apache.org/solr/FunctionQuery]. > h3. [http://lucene.apache.org/solr/api/solr-test-framework/overview-tree.html] > (x) [404] > [https://junit.org/junit4/javadoc/4.12/org/junit/rules.TestRule.html?is-external=true] > h3. > [http://lucene.apache.org/solr/api/solr-test-framework/org/apache/solr/util/RevertDefaultThreadHandlerRule.html] > (x) [404] > [https://junit.org/junit4/javadoc/4.12/org/junit/runners/model.Statement.html?is-external=true] > h3. > [http://lucene.apache.org/solr/api/solr-test-framework/org/apache/solr/util/SSLTestConfig.html] > (x) [404] > [https://junit.org/junit4/javadoc/4.12/org/junit/internal.AssumptionViolatedException.html?is-external=true] > h3. > [http://lucene.apache.org/solr/api/solr-solrj/org/apache/solr/common/util/ByteArrayUtf8CharSequence.html] > (x) [404] [http://download.java.net/java/jdk9/docs/api/java/util/Arrays.html] > h3. > [http://lucene.apache.org/solr/api/solr-solrj/org/apache/solr/client/solrj/SolrQuery.html] > (x) [404] [https://wiki.apache.org/solr/SimpleFacetParameters] > h3. > [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/legacy/BBoxStrategy.html] > (x) [404] > [http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialClauseAdapter.java] > h3. > [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/search/FastLRUCache.html] > (x) [404] [http://wiki.apache.org/solr/SolrCaching] > h3. > [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/util/hll/HLL.html] > (x) [404] > [http://guava-libraries.googlecode.com/git/guava/src/com/google/common/hash/Murmur3_128HashFunction.java] > (x) [404] > [https://github.com/aggregateknowledge/postgresql-hll/blob/master/README.markdown] > (x) [404] > [https://github.com/aggregateknowledge/postgresql-hll/blob/master/STORAGE.markdown] > h3. [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/legacy/] > (x) [404] [http://lucene.apache.org/icons/blank.gif] > (x) [404] [http://lucene.apache.org/icons/text.gif] > (x) [404] [http://lucene.apache.org/icons/folder.gif] > (x) [404] [http://lucene.apache.org/icons/back.gif] > h3. > [http://lucene.apache.org/solr/api/solr-core/org/apache/solr/legacy/doc-files/] > (x) [404] [http://lucene.apache.org/icons/image2.gif] > (For RefGuide, please see https://issues.apache.org/jira/browse/SOLR-15497 > instead) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10122) Explore using NumericDocValue to store taxonomy parent array
[ https://issues.apache.org/jira/browse/LUCENE-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438317#comment-17438317 ] Haoyu Zhai commented on LUCENE-10122: - The luceneutil benchmark shows a mostly neutral result {code:java} TaskQPS base StdDevQPS cand StdDev Pct diff p-value Fuzzy2 58.39 (5.6%) 57.70 (6.1%) -1.2% ( -12% - 11%) 0.518 BrowseDateTaxoFacets2.40 (6.6%)2.38 (5.8%) -0.7% ( -12% - 12%) 0.709 BrowseDayOfYearTaxoFacets2.40 (6.5%)2.38 (5.8%) -0.7% ( -12% - 12%) 0.721 BrowseMonthTaxoFacets2.49 (6.8%)2.47 (6.1%) -0.7% ( -12% - 13%) 0.738 BrowseMonthSSDVFacets 16.44 (36.1%) 16.38 (35.1%) -0.4% ( -52% - 110%) 0.974 LowIntervalsOrdered 30.70 (2.8%) 30.61 (3.0%) -0.3% ( -5% -5%) 0.763 LowPhrase 516.96 (1.7%) 515.67 (1.6%) -0.3% ( -3% -3%) 0.626 OrNotHighHigh 580.07 (2.1%) 578.61 (2.8%) -0.3% ( -5% -4%) 0.747 BrowseDayOfYearSSDVFacets 15.22 (24.2%) 15.19 (24.2%) -0.2% ( -39% - 63%) 0.976 HighTermDayOfYearSort 766.98 (1.7%) 765.20 (1.7%) -0.2% ( -3% -3%) 0.665 HighIntervalsOrdered2.46 (2.0%)2.45 (2.3%) -0.2% ( -4% -4%) 0.795 MedIntervalsOrdered 27.55 (2.8%) 27.51 (2.8%) -0.1% ( -5% -5%) 0.878 IntNRQ 28.96 (0.3%) 28.92 (0.6%) -0.1% ( 0% -0%) 0.358 OrHighHigh 36.05 (2.2%) 36.02 (1.7%) -0.1% ( -3% -3%) 0.870 MedPhrase 119.18 (1.7%) 119.08 (2.0%) -0.1% ( -3% -3%) 0.884 MedSpanNear 99.96 (1.1%) 99.88 (1.2%) -0.1% ( -2% -2%) 0.818 MedTerm 1211.34 (2.4%) 1210.46 (2.2%) -0.1% ( -4% -4%) 0.919 Respell 42.08 (1.9%) 42.06 (2.3%) -0.1% ( -4% -4%) 0.931 OrNotHighLow 608.56 (2.1%) 608.41 (2.4%) -0.0% ( -4% -4%) 0.971 HighSpanNear 38.01 (2.2%) 38.01 (2.9%) -0.0% ( -5% -5%) 0.994 LowSpanNear 94.41 (1.5%) 94.42 (2.1%) 0.0% ( -3% -3%) 0.975 OrHighLow 228.92 (2.4%) 228.98 (1.6%) 0.0% ( -3% -4%) 0.971 OrHighMed 76.23 (2.3%) 76.26 (2.2%) 0.0% ( -4% -4%) 0.951 HighTermTitleBDVSort 19.07 (2.6%) 19.08 (2.5%) 0.0% ( -4% -5%) 0.952 TermDTSort 312.90 (2.0%) 313.18 (2.5%) 0.1% ( -4% -4%) 0.901 PKLookup 153.21 (2.6%) 153.35 (2.5%) 0.1% ( -4% -5%) 0.910 OrHighNotMed 798.03 (2.0%) 798.83 (2.3%) 0.1% ( -4% -4%) 0.883 HighTermMonthSort 103.99 (9.9%) 104.10 (9.7%) 0.1% ( -17% - 21%) 0.971 Wildcard 107.61 (2.1%) 107.74 (2.4%) 0.1% ( -4% -4%) 0.859 Prefix3 82.74 (12.0%) 82.84 (12.1%) 0.1% ( -21% - 27%) 0.973 HighPhrase 67.96 (2.0%) 68.07 (2.0%) 0.2% ( -3% -4%) 0.792 HighTerm 1058.76 (1.8%) 1060.59 (2.7%) 0.2% ( -4% -4%) 0.812 OrHighNotHigh 528.01 (1.8%) 529.17 (2.5%) 0.2% ( -4% -4%) 0.751 Fuzzy1 42.70 (3.0%) 42.80 (3.3%) 0.2% ( -5% -6%) 0.814 OrNotHighMed 613.17 (2.6%) 614.97 (2.6%) 0.3% ( -4% -5%) 0.722 MedSloppyPhrase 15.29 (1.8%) 15.34 (2.2%) 0.3% ( -3% -4%) 0.601 OrHighNotLow 590.46 (2.5%) 592.57 (2.9%) 0.4% ( -4% -5%) 0.677 AndHighLow 518.23 (2.5%) 520.65 (2.9%) 0.5% ( -4% -6%) 0.585 LowTerm 1137.40 (2.9%) 1143.47 (2.8%) 0.5% ( -5% -6%) 0.556 HighSloppyPhrase 10.76 (3.2%) 10.82 (3.6%) 0.6% ( -6% -7%) 0.602 LowSloppyPhrase 152.21 (2.1%) 153.24 (2.4%) 0.7% ( -3% -5%) 0.350 AndHighMed 170.44 (2.5%) 171.76 (3.6%) 0.8% ( -5% -7%) 0.426 AndHighHigh 64.45 (3.2%) 65.07 (4.4%) 1.0% ( -6% -8%) 0.424 {code} And size of taxonomy index does not change. I've also ran the internal benchmark we use
[jira] [Commented] (LUCENE-10216) Add concurrency to addIndexes(CodecReader…) API
[ https://issues.apache.org/jira/browse/LUCENE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438318#comment-17438318 ] Vigya Sharma commented on LUCENE-10216: --- Thanks for going through the proposal, [~jpountz]. {quote}One bit from this proposal I'm not fully comfortable with is the fact that Lucene would merge each reader independently when the flag is set. {quote} I understand your concern, I am also a bit iffy about losing the opportunity to merge segments.. Ideally, I would like to land on a sweet middle ground - not run single segment merges, but also not add all provided segments into a single merge. Leverage some concurrency within addIndexes, and let background merges bring the segment count down further. How do you feel about making this configurable in the API with a param that defines the number of segments merged together ({{MergeFactor?}}). We could make it flag to keep things simple for consumers, with values like {{ONE, THREAD_WIDE, and ALL, }}where {{ONE}} = single segment merges, {{ALL}} = all segments in one merge, and {{THREAD_WIDE}} = each merge gets ({{readers.length/numThreads)}} number of segments. The default here could be {{ALL}} to retain current behavior. {quote}Then you could still do what you want by using {{ConcurrentMergeScheduler}}, passing codec readers one by one to {{CodecReader#addIndexes}} with {{doWait=false}} and get resulting remapped segments as quickly as possible. {quote} The doWait flag would make the API non-blocking. But it would add additional steps for users to track when the merges triggered by addIndexes have completed. For segment replication, addIndexes is not useable until its merges complete. Merge within addIndexes(), is what creates segments with recomputed ordinal values. Until that is done, there are no segments available to copy to searcher (and replica) hosts. We also want to bring the segment count down to as low as possible at Amazon product search. The tradeoff we are making here, is to first go with a higher segment count to make documents quickly available for search, then run background merges and bring down the segment count. This is similar to the other variant of add indexes - the {{addIndexes(Directory[])}} API, which copies all segments from provided directories into current index dir and then triggers a background merge. I feel making segments available quickly would be useful for anyone who has multiple replica search hosts and uses segment replication. > Add concurrency to addIndexes(CodecReader…) API > --- > > Key: LUCENE-10216 > URL: https://issues.apache.org/jira/browse/LUCENE-10216 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Vigya Sharma >Priority: Major > > I work at Amazon Product Search, and we use Lucene to power search for the > e-commerce platform. I’m working on a project that involves applying > metadata+ETL transforms and indexing documents on n different _indexing_ > boxes, combining them into a single index on a separate _reducer_ box, and > making it available for queries on m different _search_ boxes (replicas). > Segments are asynchronously copied from indexers to reducers to searchers as > they become available for the next layer to consume. > I am using the addIndexes API to combine multiple indexes into one on the > reducer boxes. Since we also have taxonomy data, we need to remap facet field > ordinals, which means I need to use the {{addIndexes(CodecReader…)}} version > of this API. The API leverages {{SegmentMerger.merge()}} to create segments > with new ordinal values while also merging all provided segments in the > process. > _This is however a blocking call that runs in a single thread._ Until we have > written segments with new ordinal values, we cannot copy them to searcher > boxes, which increases the time to make documents available for search. > I was playing around with the API by creating multiple concurrent merges, > each with only a single reader, creating a concurrently running 1:1 > conversion from old segments to new ones (with new ordinal values). We follow > this up with non-blocking background merges. This lets us copy the segments > to searchers and replicas as soon as they are available, and later replace > them with merged segments as background jobs complete. On the Amazon dataset > I profiled, this gave us around 2.5 to 3x improvement in addIndexes() time. > Each call was given about 5 readers to add on average. > This might be useful add to Lucene. We could create another {{addIndexes()}} > API with a {{boolean}} flag for concurrency, that internally submits multiple > merge jobs (each with a single reader) to the {{ConcurrentMergeScheduler}}, > and waits for
[GitHub] [lucene] rmuir commented on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache
rmuir commented on pull request #422: URL: https://github.com/apache/lucene/pull/422#issuecomment-958756345 Why do we need a range docidset? RoaringDocIdSet will compress dense situations too. It should be used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude merged pull request #2599: URL: https://github.com/apache/lucene-solr/pull/2599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2601: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests
thelabdude merged pull request #2601: URL: https://github.com/apache/lucene-solr/pull/2601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2602: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests
thelabdude merged pull request #2602: URL: https://github.com/apache/lucene-solr/pull/2602 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2600: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude merged pull request #2600: URL: https://github.com/apache/lucene-solr/pull/2600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …
dweiss commented on pull request #425: URL: https://github.com/apache/lucene/pull/425#issuecomment-959477250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #420: [DRAFT] LUCENE-10122 Explore using NumericDocValue to store taxonomy parent array
jpountz commented on a change in pull request #420: URL: https://github.com/apache/lucene/pull/420#discussion_r741939064 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java ## @@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) throws IOException { if (reader.maxDoc() == first) { return; } - -// it's ok to use MultiTerms because we only iterate on one posting list. -// breaking it to loop over the leaves() only complicates code for no -// apparent gain. -PostingsEnum positions = -MultiTerms.getTermPostingsEnum( -reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, PostingsEnum.PAYLOADS); - -// shouldn't really happen, if it does, something's wrong -if (positions == null || positions.advance(first) == DocIdSetIterator.NO_MORE_DOCS) { - throw new CorruptIndexException( - "Missing parent data for category " + first, reader.toString()); -} - -int num = reader.maxDoc(); -for (int i = first; i < num; i++) { - if (positions.docID() == i) { -if (positions.freq() == 0) { // shouldn't happen - throw new CorruptIndexException( - "Missing parent data for category " + i, reader.toString()); -} - -parents[i] = positions.nextPosition(); - -if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) { - if (i + 1 < num) { -throw new CorruptIndexException( -"Missing parent data for category " + (i + 1), reader.toString()); - } - break; +for (LeafReaderContext leafContext: reader.leaves()) { Review comment: Maybe there are benefits of MultiDocValues I'm missing for this specific use-case, but in general we prefer consuming data-structures segment-by-segment whenever possible and only rely on the MultiXXX classes for merging. ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java ## @@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) throws IOException { if (reader.maxDoc() == first) { return; } - -// it's ok to use MultiTerms because we only iterate on one posting list. -// breaking it to loop over the leaves() only complicates code for no -// apparent gain. -PostingsEnum positions = -MultiTerms.getTermPostingsEnum( -reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, PostingsEnum.PAYLOADS); - -// shouldn't really happen, if it does, something's wrong -if (positions == null || positions.advance(first) == DocIdSetIterator.NO_MORE_DOCS) { - throw new CorruptIndexException( - "Missing parent data for category " + first, reader.toString()); -} - -int num = reader.maxDoc(); -for (int i = first; i < num; i++) { - if (positions.docID() == i) { -if (positions.freq() == 0) { // shouldn't happen - throw new CorruptIndexException( - "Missing parent data for category " + i, reader.toString()); -} - -parents[i] = positions.nextPosition(); - -if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) { - if (i + 1 < num) { -throw new CorruptIndexException( -"Missing parent data for category " + (i + 1), reader.toString()); - } - break; +for (LeafReaderContext leafContext: reader.leaves()) { Review comment: Maybe there are benefits of MultiDocValues I'm missing for this specific use-case, but in general we prefer consuming data-structures segment-by-segment whenever possible and only rely on the MultiXXX classes for merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …
uschindler merged pull request #425: URL: https://github.com/apache/lucene/pull/425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude merged pull request #2599: URL: https://github.com/apache/lucene-solr/pull/2599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
apanimesh061 commented on a change in pull request #412: URL: https://github.com/apache/lucene/pull/412#discussion_r742449547 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -143,88 +135,168 @@ private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD; - /** Extracts matching terms after rewriting against an empty index */ - protected static Set extractTerms(Query query) throws IOException { -Set queryTerms = new HashSet<>(); - EMPTY_INDEXSEARCHER.rewrite(query).visit(QueryVisitor.termCollector(queryTerms)); -return queryTerms; - } + private Set flags; + + /** Builder for UnifiedHighlighter. */ + public abstract static class Builder> { Review comment: Yes thanks for the feedback here. I'll work on this builder now. Sorry this past week I got super busy, so could not get to this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache
rmuir commented on pull request #422: URL: https://github.com/apache/lucene/pull/422#issuecomment-958756345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2599: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude merged pull request #2599: URL: https://github.com/apache/lucene-solr/pull/2599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2601: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests
thelabdude merged pull request #2601: URL: https://github.com/apache/lucene-solr/pull/2601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
apanimesh061 commented on a change in pull request #412: URL: https://github.com/apache/lucene/pull/412#discussion_r742449547 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -143,88 +135,168 @@ private int cacheFieldValCharsThreshold = DEFAULT_CACHE_CHARS_THRESHOLD; - /** Extracts matching terms after rewriting against an empty index */ - protected static Set extractTerms(Query query) throws IOException { -Set queryTerms = new HashSet<>(); - EMPTY_INDEXSEARCHER.rewrite(query).visit(QueryVisitor.termCollector(queryTerms)); -return queryTerms; - } + private Set flags; + + /** Builder for UnifiedHighlighter. */ + public abstract static class Builder> { Review comment: Yes thanks for the feedback here. I'll work on this builder now. Sorry this past week I got super busy, so could not get to this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #420: [DRAFT] LUCENE-10122 Explore using NumericDocValue to store taxonomy parent array
jpountz commented on a change in pull request #420: URL: https://github.com/apache/lucene/pull/420#discussion_r741939064 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java ## @@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) throws IOException { if (reader.maxDoc() == first) { return; } - -// it's ok to use MultiTerms because we only iterate on one posting list. -// breaking it to loop over the leaves() only complicates code for no -// apparent gain. -PostingsEnum positions = -MultiTerms.getTermPostingsEnum( -reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, PostingsEnum.PAYLOADS); - -// shouldn't really happen, if it does, something's wrong -if (positions == null || positions.advance(first) == DocIdSetIterator.NO_MORE_DOCS) { - throw new CorruptIndexException( - "Missing parent data for category " + first, reader.toString()); -} - -int num = reader.maxDoc(); -for (int i = first; i < num; i++) { - if (positions.docID() == i) { -if (positions.freq() == 0) { // shouldn't happen - throw new CorruptIndexException( - "Missing parent data for category " + i, reader.toString()); -} - -parents[i] = positions.nextPosition(); - -if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) { - if (i + 1 < num) { -throw new CorruptIndexException( -"Missing parent data for category " + (i + 1), reader.toString()); - } - break; +for (LeafReaderContext leafContext: reader.leaves()) { Review comment: Maybe there are benefits of MultiDocValues I'm missing for this specific use-case, but in general we prefer consuming data-structures segment-by-segment whenever possible and only rely on the MultiXXX classes for merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …
dweiss commented on pull request #425: URL: https://github.com/apache/lucene/pull/425#issuecomment-959477250 I added task graph ordering that enforces the validation runs prior to compilation (if they're both scheduled to run). This is safer than just scheduling module's tests after validation because you make sure classes from other modules have been validated too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2600: SOLR-15721: Support editing Basic auth config when using the MultiAuthPlugin
thelabdude merged pull request #2600: URL: https://github.com/apache/lucene-solr/pull/2600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2602: SOLR-15766: MultiAuthPlugin should send non-AJAX anonymous requests to the plugin that allows anonymous requests
thelabdude merged pull request #2602: URL: https://github.com/apache/lucene-solr/pull/2602 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #425: LUCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL …
uschindler merged pull request #425: URL: https://github.com/apache/lucene/pull/425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache
LuXugang commented on pull request #422: URL: https://github.com/apache/lucene/pull/422#issuecomment-960382982 > RoaringDocIdSet will compress dense situations too. It should be used. As @jpountz said in [LUCENE-10120](https://issues.apache.org/jira/browse/LUCENE-10120), LRUQueryCache always use RoaringDocIdSet for caching no matter dense or sparse until for conjunction optimization, then FixedBitSet was used for caching when the condition is scorer.cost() * 100 >= maxDoc which means very dense, It also means a huge size FixedBitSet will be cached. more details see [LUCENE-7339](https://issues.apache.org/jira/browse/LUCENE-7339) and [LUCENE-7330](https://issues.apache.org/jira/browse/LUCENE-7330). > Why do we need a range docidset? So the purpose of range docidset in this RP is trying to only cahce minDoc and maxDoc while no docId gap in FixedBitSet, and still have a random access ability see then implementation of RangeDocIdSet#bits() in RP. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang edited a comment on pull request #422: LUCENE-10120: Lazy initialize FixedBitSet in LRUQueryCache
LuXugang edited a comment on pull request #422: URL: https://github.com/apache/lucene/pull/422#issuecomment-960382982 > RoaringDocIdSet will compress dense situations too. It should be used. As @jpountz said in [LUCENE-10120](https://issues.apache.org/jira/browse/LUCENE-10120), LRUQueryCache always use RoaringDocIdSet for caching no matter dense or sparse until for conjunction optimization, then FixedBitSet was used for caching when the condition is scorer.cost() * 100 >= maxDoc which means very dense, It also means a huge size FixedBitSet will be cached. more details see [LUCENE-7339](https://issues.apache.org/jira/browse/LUCENE-7339) and [LUCENE-7330](https://issues.apache.org/jira/browse/LUCENE-7330). > Why do we need a range docidset? So the purpose of range docidset in this RP is trying to only cahce minDoc and maxDoc while no docId gap in FixedBitSet, and still have a random access ability see the implementation of RangeDocIdSet#bits() in RP. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #418: LUCENE-10061: [WIP] Implements basic dynamic pruning support for CombinedFieldsQuery
zacharymorn commented on pull request #418: URL: https://github.com/apache/lucene/pull/418#issuecomment-960493545 Perf tests result for commit 2ba435e5c83f870be9566 Run 1: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 3.53 (3.3%) 2.92 (5.3%) -17.2% ( -25% - -8%) 0.000 PKLookup 108.13 (7.7%) 119.85 (8.2%) 10.8% ( -4% - 28%) 0.000 CFQHighLow 14.88 (3.9%) 16.69 (12.5%) 12.2% ( -3% - 29%) 0.000 CFQHighMed 21.11 (4.1%) 25.87 (11.8%) 22.6% ( 6% - 40%) 0.000 Run 2: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 6.64 (3.1%) 5.63 (10.2%) -15.2% ( -27% - -1%) 0.000 CFQHighLow 8.35 (2.8%) 8.05 (15.0%) -3.6% ( -20% - 14%) 0.297 CFQHighMed 24.51 (5.3%) 24.90 (19.9%) 1.6% ( -22% - 28%) 0.733 PKLookup 110.06 (10.0%) 128.54 (7.9%) 16.8% ( -1% - 38%) 0.000 Run 3: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 13.01 (2.9%) 9.82 (7.8%) -24.5% ( -34% - -14%) 0.000 PKLookup 107.85 (8.1%) 111.79 (11.2%) 3.7% ( -14% - 24%) 0.236 CFQHighHigh 4.83 (2.6%) 5.06 (8.6%) 4.7% ( -6% - 16%) 0.018 CFQHighLow 14.95 (3.0%) 18.31 (19.0%) 22.5% ( 0% - 45%) 0.000 Run 4: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 11.11 (2.9%) 6.69 (4.1%) -39.7% ( -45% - -33%) 0.000 CFQHighLow 27.55 (3.8%) 25.46 (11.0%) -7.6% ( -21% - 7%) 0.003 CFQHighHigh 5.25 (3.2%) 4.96 (6.1%) -5.7% ( -14% - 3%) 0.000 PKLookup 107.61 (6.7%) 121.19 (4.6%) 12.6% ( 1% - 25%) 0.000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org