[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits
gtroitskiy commented on a change in pull request #217: URL: https://github.com/apache/lucene/pull/217#discussion_r676822527 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java ## @@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA collectDocID = docID; Review comment: sorry, I've been away for a while thank you for the sketch, your approach is definitely more elegant :slightly_smiling_face: except I'm not sure we need caching, since by design only one drill-down collector is being called for a specific docId PS force-pushed changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits
gsmiller commented on a change in pull request #217: URL: https://github.com/apache/lucene/pull/217#discussion_r676879074 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java ## @@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA collectDocID = docID; Review comment: Thanks, this looks great! I think the caching could be useful down in `collectHit()` in the case that a `sidewaysLeafCollector` decides to call back into `score()` (e.g., if `FacetsCollector` has `keepScores` set to `true` and calls back to get the score [here](https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L125)). Without using something like `ScoreCachingWrappingScorer`, the underlying score would need to be recomputed for the same docid if I'm not mistaken. Does that sound right or am I overlooking something? Thanks again for taking this up! Excited to get this change merged once we figure out whether-or-not we need the caching layer in place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on pull request #220: LUCENE-9450: Use BinaryDocValue fields with a different name in the taxonomy index
gautamworah96 commented on pull request #220: URL: https://github.com/apache/lucene/pull/220#issuecomment-886968851 Changes in the new b9cbc4c commit: 1. The reason why the `SegmentInfos.readLatestCommit(dir).getMinSegmentLuceneVersion()` call was returning 9 as the version, was that the older zip file in the mainline was using the Lucene 8.6 Codec but the major version variable was still assigned as 9. This was because the `main` branch in the repo (during the 8.6 release) had already set the major version as 9. I reconstructed the 8.10 taxonomy index from the `branch_8x` branch and that correctly set the major version as 8 for those older segments. 2. Use a version based check for storing BDV fields or StringFields I think the new commit might be slower that the previous `$full_path_binary$` option during indexing because it checks the Lucene version of the last commit everytime we add a new category. Finally, I think there should be a cleaner way of knowing if the index has atleast one commit or no. I use the `indexWriter.getLiveCommitData().iterator().hasNext()` call but maybe there is a better way.. Side questions that need more thought: 1. What is the use of the `LiveIndexWriterConfig.createdVersionMajor` param. I think instead of initializing it to the latest version, maybe we can assign the value of the min back compat version of the index to it (when the `LiveIndexWriterConfig` class is initialized). 2. Can we fix the `DirectoryTaxonomyWriter.indexEpoch` variable to hold the accurate index epoch of the taxonomy index. The current logic for `indexEpoch` assigns 1 even if the index is completely fresh. It also saves 1 as the value when the index has just 1 commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9450) Taxonomy index should use DocValues not StoredFields
[ https://issues.apache.org/jira/browse/LUCENE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387588#comment-17387588 ] Gautam Worah commented on LUCENE-9450: -- Posted a new PR revision that implements the `{{use index created version`}} approach > Taxonomy index should use DocValues not StoredFields > > > Key: LUCENE-9450 > URL: https://issues.apache.org/jira/browse/LUCENE-9450 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.5.2 >Reporter: Gautam Worah >Priority: Minor > Labels: performance > Fix For: main (9.0) > > Attachments: LUCENE-9450-localrun.py-v1, wip_taxonomy_patch > > Time Spent: 4h 10m > Remaining Estimate: 0h > > The taxonomy index that maps binning labels to ordinals was created before > Lucene added BinaryDocValues. > I've attached a WIP patch (does not pass tests currently) > Issue suggested by [~mikemccand] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits
gtroitskiy commented on a change in pull request #217: URL: https://github.com/apache/lucene/pull/217#discussion_r676888240 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java ## @@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA collectDocID = docID; Review comment: oh, my bad, missed the general case :slightly_frowning_face: Thanks for your patience and elegant solution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits
gsmiller commented on a change in pull request #217: URL: https://github.com/apache/lucene/pull/217#discussion_r676894541 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java ## @@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA collectDocID = docID; Review comment: No worries at all. Thanks for sticking with the PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a change in pull request #220: LUCENE-9450: Use BinaryDocValue fields with a different name in the taxonomy index
mikemccand commented on a change in pull request #220: URL: https://github.com/apache/lucene/pull/220#discussion_r676902852 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java ## @@ -475,8 +476,20 @@ private int addCategoryDocument(FacetLabel categoryPath, int parent) throws IOEx String fieldPath = FacetsConfig.pathToString(categoryPath.components, categoryPath.length); fullPathField.setStringValue(fieldPath); + +boolean commitExists = indexWriter.getLiveCommitData().iterator().hasNext(); +/* no commits so this is a fresh index, or the old index was built using a Lucene 9 or greater version */ +if ((commitExists == false) +|| (SegmentInfos.readLatestCommit(dir) Review comment: This is a horrifyingly costly check to do for every added `FacetLabel`! Couldn't we do this check once in ctor when this `TaxonomyWriter` is created, and store the result in a `final boolean`? ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java ## @@ -475,8 +476,20 @@ private int addCategoryDocument(FacetLabel categoryPath, int parent) throws IOEx String fieldPath = FacetsConfig.pathToString(categoryPath.components, categoryPath.length); fullPathField.setStringValue(fieldPath); + +boolean commitExists = indexWriter.getLiveCommitData().iterator().hasNext(); +/* no commits so this is a fresh index, or the old index was built using a Lucene 9 or greater version */ +if ((commitExists == false) +|| (SegmentInfos.readLatestCommit(dir) +.getMinSegmentLuceneVersion() +.onOrAfter(Version.LUCENE_9_0_0))) { + /* Lucene 9 introduces BinaryDocValuesField for storing taxonomy categories */ Review comment: Maybe `switches to` instead of `introduces`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits
gsmiller merged pull request #217: URL: https://github.com/apache/lucene/pull/217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits
gsmiller commented on pull request #217: URL: https://github.com/apache/lucene/pull/217#issuecomment-886989427 Thanks again @gtroitskiy ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10030) [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring
[ https://issues.apache.org/jira/browse/LUCENE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387605#comment-17387605 ] ASF subversion and git services commented on LUCENE-10030: -- Commit 61f8517000b3af74c0b079e4a5fa81eb870b1c35 in lucene's branch refs/heads/main from Grigoriy Troitskiy [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=61f8517 ] LUCENE-10030: Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring (#217) > [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring > --- > > Key: LUCENE-10030 > URL: https://issues.apache.org/jira/browse/LUCENE-10030 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Grigoriy Troitskiy >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Diff* > {code:java} > @@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer { > > collectDocID = docID; > > - // TODO: we could score on demand instead since we are > - // daat here: > - collectScore = baseScorer.score(); > - > if (failedCollector == null) { > + collectScore = baseScorer.score(); > // Hit passed all filters, so it's "real": > collectHit(collector, dims); > } else { > {code} > > *Motivation* > 1. Performance degradation: we have quite heavy custom implementation of > score(). So when we started using DrillSideways, this call became top-1 in a > profiler snapshot (top-3 with default scoring). We tried doUnionScoring and > doDrillDownAdvanceScoring, but no luck: > doUnionScoring scores all baseQuery docIds > doDrillDownAdvanceScoring avoids some redundant docIds scorings, considering > symmetric difference of top two iterator's docIds, but still scores some > docIds, that will be filtered out by 3rd, 4th, ... dimension iterators > doQueryFirstScoring scores near-miss docIds > Best way is to score only true hits (where baseQuery and all N drill-down > iterators match). So we suggest a small modification of doQueryFirstScoring. > > 2. Speaking of doQueryFirstScoring, it doesn't look like we need to > calculate a score for near-miss hit, because it won't be used anywhere. > FacetsCollectorManager creates FacetsCollector with default constructor > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35] > so FacetCollector has false for keepScores > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119] > and collectScore is not being used > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10030) [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring
[ https://issues.apache.org/jira/browse/LUCENE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387607#comment-17387607 ] Greg Miller commented on LUCENE-10030: -- Thanks again for taking this up [~gtroitskiy]! I just merged the change onto {{main}} and it will go with the 9.0 release whenever that's ready. In the meantime, do you want to backport the change into 8.x? There's no reason not to since it's fully backwards compatible. Let me know if you want to give that a shot, and/or if you have any questions on how to go about doing that. Thanks! > [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring > --- > > Key: LUCENE-10030 > URL: https://issues.apache.org/jira/browse/LUCENE-10030 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Grigoriy Troitskiy >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Diff* > {code:java} > @@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer { > > collectDocID = docID; > > - // TODO: we could score on demand instead since we are > - // daat here: > - collectScore = baseScorer.score(); > - > if (failedCollector == null) { > + collectScore = baseScorer.score(); > // Hit passed all filters, so it's "real": > collectHit(collector, dims); > } else { > {code} > > *Motivation* > 1. Performance degradation: we have quite heavy custom implementation of > score(). So when we started using DrillSideways, this call became top-1 in a > profiler snapshot (top-3 with default scoring). We tried doUnionScoring and > doDrillDownAdvanceScoring, but no luck: > doUnionScoring scores all baseQuery docIds > doDrillDownAdvanceScoring avoids some redundant docIds scorings, considering > symmetric difference of top two iterator's docIds, but still scores some > docIds, that will be filtered out by 3rd, 4th, ... dimension iterators > doQueryFirstScoring scores near-miss docIds > Best way is to score only true hits (where baseQuery and all N drill-down > iterators match). So we suggest a small modification of doQueryFirstScoring. > > 2. Speaking of doQueryFirstScoring, it doesn't look like we need to > calculate a score for near-miss hit, because it won't be used anywhere. > FacetsCollectorManager creates FacetsCollector with default constructor > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35] > so FacetCollector has false for keepScores > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119] > and collectScore is not being used > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10030) [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring
[ https://issues.apache.org/jira/browse/LUCENE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387611#comment-17387611 ] ASF subversion and git services commented on LUCENE-10030: -- Commit 736d114901e009fa09a6cc8bccbe301a2db03058 in lucene's branch refs/heads/main from Greg Miller [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=736d114 ] Add CHANGES entry for LUCENE-10030 > [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring > --- > > Key: LUCENE-10030 > URL: https://issues.apache.org/jira/browse/LUCENE-10030 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Grigoriy Troitskiy >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Diff* > {code:java} > @@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer { > > collectDocID = docID; > > - // TODO: we could score on demand instead since we are > - // daat here: > - collectScore = baseScorer.score(); > - > if (failedCollector == null) { > + collectScore = baseScorer.score(); > // Hit passed all filters, so it's "real": > collectHit(collector, dims); > } else { > {code} > > *Motivation* > 1. Performance degradation: we have quite heavy custom implementation of > score(). So when we started using DrillSideways, this call became top-1 in a > profiler snapshot (top-3 with default scoring). We tried doUnionScoring and > doDrillDownAdvanceScoring, but no luck: > doUnionScoring scores all baseQuery docIds > doDrillDownAdvanceScoring avoids some redundant docIds scorings, considering > symmetric difference of top two iterator's docIds, but still scores some > docIds, that will be filtered out by 3rd, 4th, ... dimension iterators > doQueryFirstScoring scores near-miss docIds > Best way is to score only true hits (where baseQuery and all N drill-down > iterators match). So we suggest a small modification of doQueryFirstScoring. > > 2. Speaking of doQueryFirstScoring, it doesn't look like we need to > calculate a score for near-miss hit, because it won't be used anywhere. > FacetsCollectorManager creates FacetsCollector with default constructor > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35] > so FacetCollector has false for keepScores > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119] > and collectScore is not being used > > [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer
Greg Miller created LUCENE-10036: Summary: Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer Key: LUCENE-10036 URL: https://issues.apache.org/jira/browse/LUCENE-10036 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: main (9.0) Reporter: Greg Miller This is a trivial issue, but it's easy to mistakenly "double wrap" an instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The calling code currently needs to check the instance type of the {{Scorable}} they intend to wrap to avoid this. {{FieldComparator}} is actually the only calling code that does this check. It would be nice to add a factory method that encapsulates this check in {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry about it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih opened a new pull request #225: LUCENE-10010 Introduce NFARunAutomaton to run NFA directly
zhaih opened a new pull request #225: URL: https://github.com/apache/lucene/pull/225 # Description https://issues.apache.org/jira/browse/LUCENE-10010 Introduces `NFARunAutomaton` to run NFA directly Works to to: 1. Integrate with current `RunAutomaton` class hierarchy 2. Further optimize the `NFARunAutomaton` implementation # Tests A unit test that assert the NFARunAutomaton behaves the same as the DFA one by using random generated regex strings # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10010) Should we have a NFA Query?
[ https://issues.apache.org/jira/browse/LUCENE-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387626#comment-17387626 ] Haoyu Zhai commented on LUCENE-10010: - Here's a WIP PR: https://github.com/apache/lucene/pull/225 > Should we have a NFA Query? > --- > > Key: LUCENE-10010 > URL: https://issues.apache.org/jira/browse/LUCENE-10010 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Affects Versions: main (9.0) >Reporter: Haoyu Zhai >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Today when a {{RegexpQuery}} is created, it will be translated to NFA, > determinized to DFA and eventually become an {{AutomatonQuery}}, which is > very fast. However, not every NFA could be determinized to DFA easily, the > example given in LUCENE-9981 showed how easy could a short regexp break the > determinize process. > Maybe, instead of marking those kind of queries as adversarial cases, we > could make a new kind of NFA query, which execute directly on NFA and thus no > need to worry about determinize process or determinized DFA size. It should > be slower, but also makes those adversarial cases doable. > [This article|https://swtch.com/~rsc/regexp/regexp1.html] has provided a > simple but efficient way of searching over NFA, essentially it is a partial > determinize process that only determinize the necessary part of DFA. Maybe we > could give it a try? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gsmiller opened a new pull request #2534: LUCENE-10036: Add factory method to ScoreCachingWrappingScorer that ensures unnecessary wrapping doesn't occur
gsmiller opened a new pull request #2534: URL: https://github.com/apache/lucene-solr/pull/2534 Backport from `lucene/main`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request #226: LUCENE-10036: Add factory method to ScoreCachingWrappingScorer that ensures unnecessary wrapping doesn't occur
gsmiller opened a new pull request #226: URL: https://github.com/apache/lucene/pull/226 # Description Current users of `ScoreCachingWrappingScorer` must do their own instancetype checks if they want to avoid unnecessarily wrapping an existing `ScoreCachingWrappingScorer` instance. This change encapsulates the check in a factory method. NOTE: There's [another PR](https://github.com/apache/lucene-solr/pull/2534) for backporting this change into 8.x, but marking the ctor as `@deprecated` to provide API back-compat. # Solution A factory method was added to `ScoreCachingWrappingScorer` and the ctor was made `private`. # Tests Added a new test that ensures a new instance isn't created when attempting to wrap an existing `ScoreCachingWrappingScorer` instance. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer
[ https://issues.apache.org/jira/browse/LUCENE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller updated LUCENE-10036: - Affects Version/s: 8.10 > Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another > ScoreCachingWrappingScorer > --- > > Key: LUCENE-10036 > URL: https://issues.apache.org/jira/browse/LUCENE-10036 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: main (9.0), 8.10 >Reporter: Greg Miller >Priority: Trivial > Time Spent: 20m > Remaining Estimate: 0h > > This is a trivial issue, but it's easy to mistakenly "double wrap" an > instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The > calling code currently needs to check the instance type of the {{Scorable}} > they intend to wrap to avoid this. {{FieldComparator}} is actually the only > calling code that does this check. > It would be nice to add a factory method that encapsulates this check in > {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry > about it. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer
[ https://issues.apache.org/jira/browse/LUCENE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387629#comment-17387629 ] Greg Miller commented on LUCENE-10036: -- PRs open to make this change main/9.0 as well as backport into 8.10. > Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another > ScoreCachingWrappingScorer > --- > > Key: LUCENE-10036 > URL: https://issues.apache.org/jira/browse/LUCENE-10036 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: main (9.0), 8.10 >Reporter: Greg Miller >Priority: Trivial > Time Spent: 20m > Remaining Estimate: 0h > > This is a trivial issue, but it's easy to mistakenly "double wrap" an > instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The > calling code currently needs to check the instance type of the {{Scorable}} > they intend to wrap to avoid this. {{FieldComparator}} is actually the only > calling code that does this check. > It would be nice to add a factory method that encapsulates this check in > {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry > about it. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer
[ https://issues.apache.org/jira/browse/LUCENE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller updated LUCENE-10036: - Lucene Fields: New,Patch Available (was: New) > Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another > ScoreCachingWrappingScorer > --- > > Key: LUCENE-10036 > URL: https://issues.apache.org/jira/browse/LUCENE-10036 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: main (9.0), 8.10 >Reporter: Greg Miller >Priority: Trivial > Time Spent: 20m > Remaining Estimate: 0h > > This is a trivial issue, but it's easy to mistakenly "double wrap" an > instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The > calling code currently needs to check the instance type of the {{Scorable}} > they intend to wrap to avoid this. {{FieldComparator}} is actually the only > calling code that does this check. > It would be nice to add a factory method that encapsulates this check in > {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry > about it. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer
[ https://issues.apache.org/jira/browse/LUCENE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller reassigned LUCENE-10036: Assignee: Greg Miller > Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another > ScoreCachingWrappingScorer > --- > > Key: LUCENE-10036 > URL: https://issues.apache.org/jira/browse/LUCENE-10036 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: main (9.0), 8.10 >Reporter: Greg Miller >Assignee: Greg Miller >Priority: Trivial > Time Spent: 20m > Remaining Estimate: 0h > > This is a trivial issue, but it's easy to mistakenly "double wrap" an > instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The > calling code currently needs to check the instance type of the {{Scorable}} > they intend to wrap to avoid this. {{FieldComparator}} is actually the only > calling code that does this check. > It would be nice to add a factory method that encapsulates this check in > {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry > about it. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org