[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

2021-07-26 Thread GitBox


gtroitskiy commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r676822527



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, 
LeafCollector collector, DocsA
 
   collectDocID = docID;

Review comment:
   sorry, I've been away for a while
   thank you for the sketch, your approach is definitely more elegant 
:slightly_smiling_face: 
   except I'm not sure we need caching, since by design only one drill-down 
collector is being called for a specific docId
   
   PS force-pushed changes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

2021-07-26 Thread GitBox


gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r676879074



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, 
LeafCollector collector, DocsA
 
   collectDocID = docID;

Review comment:
   Thanks, this looks great!
   
   I think the caching could be useful down in `collectHit()` in the case that 
a `sidewaysLeafCollector` decides to call back into `score()` (e.g., if 
`FacetsCollector` has `keepScores` set to `true` and calls back to get the 
score 
[here](https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L125)).
 Without using something like `ScoreCachingWrappingScorer`, the underlying 
score would need to be recomputed for the same docid if I'm not mistaken. Does 
that sound right or am I overlooking something?
   
   Thanks again for taking this up! Excited to get this change merged once we 
figure out whether-or-not we need the caching layer in place.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on pull request #220: LUCENE-9450: Use BinaryDocValue fields with a different name in the taxonomy index

2021-07-26 Thread GitBox


gautamworah96 commented on pull request #220:
URL: https://github.com/apache/lucene/pull/220#issuecomment-886968851


   Changes in the new b9cbc4c commit:
   
   1. The reason why the 
`SegmentInfos.readLatestCommit(dir).getMinSegmentLuceneVersion()` call was 
returning 9 as the version, was that the older zip file in the mainline was 
using the Lucene 8.6 Codec but the major version variable was still assigned as 
9. This was because the `main` branch in the repo (during the 8.6 release) had 
already set the major version as 9. I reconstructed the 8.10 taxonomy index 
from the `branch_8x` branch and that correctly set the major version as 8 for 
those older segments.
   2. Use a version based check for storing BDV fields or StringFields 
   
   I think the new commit might be slower that the previous 
`$full_path_binary$` option during indexing because it checks the Lucene 
version of the last commit everytime we add a new category.

   Finally, I think there should be a cleaner way of knowing if the index has 
atleast one commit or no. I use the 
`indexWriter.getLiveCommitData().iterator().hasNext()` call but maybe there is 
a better way..
   
   Side questions that need more thought:
   1. What is the use of the `LiveIndexWriterConfig.createdVersionMajor` param. 
I think instead of initializing it to the latest version, maybe we can assign 
the value of the min back compat version of the index to it (when the 
`LiveIndexWriterConfig` class is initialized).
   2. Can we fix the `DirectoryTaxonomyWriter.indexEpoch` variable to hold the 
accurate index epoch of the taxonomy index. 
   The current logic for `indexEpoch` assigns 1 even if the index is completely 
fresh. It also saves 1 as the value when the index has just 1 commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9450) Taxonomy index should use DocValues not StoredFields

2021-07-26 Thread Gautam Worah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387588#comment-17387588
 ] 

Gautam Worah commented on LUCENE-9450:
--

Posted a new PR revision that implements the `{{use index created version`}} 
approach

> Taxonomy index should use DocValues not StoredFields
> 
>
> Key: LUCENE-9450
> URL: https://issues.apache.org/jira/browse/LUCENE-9450
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.5.2
>Reporter: Gautam Worah
>Priority: Minor
>  Labels: performance
> Fix For: main (9.0)
>
> Attachments: LUCENE-9450-localrun.py-v1, wip_taxonomy_patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The taxonomy index that maps binning labels to ordinals was created before 
> Lucene added BinaryDocValues.
> I've attached a WIP patch (does not pass tests currently)
> Issue suggested by [~mikemccand]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

2021-07-26 Thread GitBox


gtroitskiy commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r676888240



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, 
LeafCollector collector, DocsA
 
   collectDocID = docID;

Review comment:
   oh, my bad, missed the general case :slightly_frowning_face: 
   Thanks for your patience and elegant solution!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

2021-07-26 Thread GitBox


gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r676894541



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, 
LeafCollector collector, DocsA
 
   collectDocID = docID;

Review comment:
   No worries at all. Thanks for sticking with the PR!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a change in pull request #220: LUCENE-9450: Use BinaryDocValue fields with a different name in the taxonomy index

2021-07-26 Thread GitBox


mikemccand commented on a change in pull request #220:
URL: https://github.com/apache/lucene/pull/220#discussion_r676902852



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java
##
@@ -475,8 +476,20 @@ private int addCategoryDocument(FacetLabel categoryPath, 
int parent) throws IOEx
 
 String fieldPath = FacetsConfig.pathToString(categoryPath.components, 
categoryPath.length);
 fullPathField.setStringValue(fieldPath);
+
+boolean commitExists = 
indexWriter.getLiveCommitData().iterator().hasNext();
+/* no commits so this is a fresh index, or the old index was built using a 
Lucene 9 or greater version */
+if ((commitExists == false)
+|| (SegmentInfos.readLatestCommit(dir)

Review comment:
   This is a horrifyingly costly check to do for every added `FacetLabel`!  
Couldn't we do this check once in ctor when this `TaxonomyWriter` is created, 
and store the result in a `final boolean`?

##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java
##
@@ -475,8 +476,20 @@ private int addCategoryDocument(FacetLabel categoryPath, 
int parent) throws IOEx
 
 String fieldPath = FacetsConfig.pathToString(categoryPath.components, 
categoryPath.length);
 fullPathField.setStringValue(fieldPath);
+
+boolean commitExists = 
indexWriter.getLiveCommitData().iterator().hasNext();
+/* no commits so this is a fresh index, or the old index was built using a 
Lucene 9 or greater version */
+if ((commitExists == false)
+|| (SegmentInfos.readLatestCommit(dir)
+.getMinSegmentLuceneVersion()
+.onOrAfter(Version.LUCENE_9_0_0))) {
+  /* Lucene 9 introduces BinaryDocValuesField for storing taxonomy 
categories */

Review comment:
   Maybe `switches to` instead of `introduces`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

2021-07-26 Thread GitBox


gsmiller merged pull request #217:
URL: https://github.com/apache/lucene/pull/217


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

2021-07-26 Thread GitBox


gsmiller commented on pull request #217:
URL: https://github.com/apache/lucene/pull/217#issuecomment-886989427


   Thanks again @gtroitskiy !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10030) [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring

2021-07-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387605#comment-17387605
 ] 

ASF subversion and git services commented on LUCENE-10030:
--

Commit 61f8517000b3af74c0b079e4a5fa81eb870b1c35 in lucene's branch 
refs/heads/main from Grigoriy Troitskiy
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=61f8517 ]

LUCENE-10030: Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring 
(#217)



> [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring
> ---
>
> Key: LUCENE-10030
> URL: https://issues.apache.org/jira/browse/LUCENE-10030
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Grigoriy Troitskiy
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Diff*
> {code:java}
> @@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer {
>  
>        collectDocID = docID;
>  
> -      // TODO: we could score on demand instead since we are
> -      // daat here:
> -      collectScore = baseScorer.score();
> -
>        if (failedCollector == null) {
> +        collectScore = baseScorer.score();
>          // Hit passed all filters, so it's "real":
>          collectHit(collector, dims);
>        } else {
> {code}
>  
>  *Motivation*
>  1. Performance degradation: we have quite heavy custom implementation of 
> score(). So when we started using DrillSideways, this call became top-1 in a 
> profiler snapshot (top-3 with default scoring). We tried doUnionScoring and 
> doDrillDownAdvanceScoring, but no luck:
>  doUnionScoring scores all baseQuery docIds
>  doDrillDownAdvanceScoring avoids some redundant docIds scorings, considering 
> symmetric difference of top two iterator's docIds, but still scores some 
> docIds, that will be filtered out by 3rd, 4th, ... dimension iterators
>  doQueryFirstScoring scores near-miss docIds
>  Best way is to score only true hits (where baseQuery and all N drill-down 
> iterators match). So we suggest a small modification of doQueryFirstScoring.
>   
>  2. Speaking of doQueryFirstScoring, it doesn't look like we need to 
> calculate a score for near-miss hit, because it won't be used anywhere.
>  FacetsCollectorManager creates FacetsCollector with default constructor
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35]
>  so FacetCollector has false for keepScores 
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119]
>  and collectScore is not being used
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10030) [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring

2021-07-26 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387607#comment-17387607
 ] 

Greg Miller commented on LUCENE-10030:
--

Thanks again for taking this up [~gtroitskiy]! I just merged the change onto 
{{main}} and it will go with the 9.0 release whenever that's ready. In the 
meantime, do you want to backport the change into 8.x? There's no reason not to 
since it's fully backwards compatible. Let me know if you want to give that a 
shot, and/or if you have any questions on how to go about doing that. Thanks!

> [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring
> ---
>
> Key: LUCENE-10030
> URL: https://issues.apache.org/jira/browse/LUCENE-10030
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Grigoriy Troitskiy
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Diff*
> {code:java}
> @@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer {
>  
>        collectDocID = docID;
>  
> -      // TODO: we could score on demand instead since we are
> -      // daat here:
> -      collectScore = baseScorer.score();
> -
>        if (failedCollector == null) {
> +        collectScore = baseScorer.score();
>          // Hit passed all filters, so it's "real":
>          collectHit(collector, dims);
>        } else {
> {code}
>  
>  *Motivation*
>  1. Performance degradation: we have quite heavy custom implementation of 
> score(). So when we started using DrillSideways, this call became top-1 in a 
> profiler snapshot (top-3 with default scoring). We tried doUnionScoring and 
> doDrillDownAdvanceScoring, but no luck:
>  doUnionScoring scores all baseQuery docIds
>  doDrillDownAdvanceScoring avoids some redundant docIds scorings, considering 
> symmetric difference of top two iterator's docIds, but still scores some 
> docIds, that will be filtered out by 3rd, 4th, ... dimension iterators
>  doQueryFirstScoring scores near-miss docIds
>  Best way is to score only true hits (where baseQuery and all N drill-down 
> iterators match). So we suggest a small modification of doQueryFirstScoring.
>   
>  2. Speaking of doQueryFirstScoring, it doesn't look like we need to 
> calculate a score for near-miss hit, because it won't be used anywhere.
>  FacetsCollectorManager creates FacetsCollector with default constructor
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35]
>  so FacetCollector has false for keepScores 
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119]
>  and collectScore is not being used
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10030) [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring

2021-07-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387611#comment-17387611
 ] 

ASF subversion and git services commented on LUCENE-10030:
--

Commit 736d114901e009fa09a6cc8bccbe301a2db03058 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=736d114 ]

Add CHANGES entry for LUCENE-10030


> [DrillSidewaysScorer] redundant score() calculations in doQueryFirstScoring
> ---
>
> Key: LUCENE-10030
> URL: https://issues.apache.org/jira/browse/LUCENE-10030
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Grigoriy Troitskiy
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Diff*
> {code:java}
> @@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer {
>  
>        collectDocID = docID;
>  
> -      // TODO: we could score on demand instead since we are
> -      // daat here:
> -      collectScore = baseScorer.score();
> -
>        if (failedCollector == null) {
> +        collectScore = baseScorer.score();
>          // Hit passed all filters, so it's "real":
>          collectHit(collector, dims);
>        } else {
> {code}
>  
>  *Motivation*
>  1. Performance degradation: we have quite heavy custom implementation of 
> score(). So when we started using DrillSideways, this call became top-1 in a 
> profiler snapshot (top-3 with default scoring). We tried doUnionScoring and 
> doDrillDownAdvanceScoring, but no luck:
>  doUnionScoring scores all baseQuery docIds
>  doDrillDownAdvanceScoring avoids some redundant docIds scorings, considering 
> symmetric difference of top two iterator's docIds, but still scores some 
> docIds, that will be filtered out by 3rd, 4th, ... dimension iterators
>  doQueryFirstScoring scores near-miss docIds
>  Best way is to score only true hits (where baseQuery and all N drill-down 
> iterators match). So we suggest a small modification of doQueryFirstScoring.
>   
>  2. Speaking of doQueryFirstScoring, it doesn't look like we need to 
> calculate a score for near-miss hit, because it won't be used anywhere.
>  FacetsCollectorManager creates FacetsCollector with default constructor
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35]
>  so FacetCollector has false for keepScores 
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119]
>  and collectScore is not being used
>  
> [https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer

2021-07-26 Thread Greg Miller (Jira)
Greg Miller created LUCENE-10036:


 Summary: Ensure ScoreCachingWrappingScorer doesn't unnecessarily 
wrap another ScoreCachingWrappingScorer
 Key: LUCENE-10036
 URL: https://issues.apache.org/jira/browse/LUCENE-10036
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: main (9.0)
Reporter: Greg Miller


This is a trivial issue, but it's easy to mistakenly "double wrap" an instance 
of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The calling code 
currently needs to check the instance type of the {{Scorable}} they intend to 
wrap to avoid this. {{FieldComparator}} is actually the only calling code that 
does this check.

It would be nice to add a factory method that encapsulates this check in 
{{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry about 
it.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih opened a new pull request #225: LUCENE-10010 Introduce NFARunAutomaton to run NFA directly

2021-07-26 Thread GitBox


zhaih opened a new pull request #225:
URL: https://github.com/apache/lucene/pull/225


   
   
   
   # Description
   
   https://issues.apache.org/jira/browse/LUCENE-10010
   
   Introduces `NFARunAutomaton` to run NFA directly
   
   Works to to:
   1. Integrate with current `RunAutomaton` class hierarchy
   2. Further optimize the `NFARunAutomaton` implementation
   
   # Tests
   
   A unit test that assert the NFARunAutomaton behaves the same as the DFA one 
by using random generated regex strings
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10010) Should we have a NFA Query?

2021-07-26 Thread Haoyu Zhai (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387626#comment-17387626
 ] 

Haoyu Zhai commented on LUCENE-10010:
-

Here's a WIP PR: https://github.com/apache/lucene/pull/225

> Should we have a NFA Query?
> ---
>
> Key: LUCENE-10010
> URL: https://issues.apache.org/jira/browse/LUCENE-10010
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: main (9.0)
>Reporter: Haoyu Zhai
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Today when a {{RegexpQuery}} is created, it will be translated to NFA, 
> determinized to DFA and eventually become an {{AutomatonQuery}}, which is 
> very fast. However, not every NFA could be determinized to DFA easily, the 
> example given in LUCENE-9981 showed how easy could a short regexp break the 
> determinize process.
> Maybe, instead of marking those kind of queries as adversarial cases, we 
> could make a new kind of NFA query, which execute directly on NFA and thus no 
> need to worry about determinize process or determinized DFA size. It should 
> be slower, but also makes those adversarial cases doable.
> [This article|https://swtch.com/~rsc/regexp/regexp1.html] has provided a 
> simple but efficient way of searching over NFA, essentially it is a partial 
> determinize process that only determinize the necessary part of DFA. Maybe we 
> could give it a try?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gsmiller opened a new pull request #2534: LUCENE-10036: Add factory method to ScoreCachingWrappingScorer that ensures unnecessary wrapping doesn't occur

2021-07-26 Thread GitBox


gsmiller opened a new pull request #2534:
URL: https://github.com/apache/lucene-solr/pull/2534


   Backport from `lucene/main`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller opened a new pull request #226: LUCENE-10036: Add factory method to ScoreCachingWrappingScorer that ensures unnecessary wrapping doesn't occur

2021-07-26 Thread GitBox


gsmiller opened a new pull request #226:
URL: https://github.com/apache/lucene/pull/226


   # Description
   
   Current users of `ScoreCachingWrappingScorer` must do their own instancetype 
checks if they want to avoid unnecessarily wrapping an existing 
`ScoreCachingWrappingScorer` instance. This change encapsulates the check in a 
factory method.
   
   NOTE: There's [another PR](https://github.com/apache/lucene-solr/pull/2534) 
for backporting this change into 8.x, but marking the ctor as `@deprecated` to 
provide API back-compat.
   
   # Solution
   
   A factory method was added to `ScoreCachingWrappingScorer` and the ctor was 
made `private`.
   
   # Tests
   
   Added a new test that ensures a new instance isn't created when attempting 
to wrap an existing `ScoreCachingWrappingScorer` instance.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer

2021-07-26 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller updated LUCENE-10036:
-
Affects Version/s: 8.10

> Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another 
> ScoreCachingWrappingScorer
> ---
>
> Key: LUCENE-10036
> URL: https://issues.apache.org/jira/browse/LUCENE-10036
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (9.0), 8.10
>Reporter: Greg Miller
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a trivial issue, but it's easy to mistakenly "double wrap" an 
> instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The 
> calling code currently needs to check the instance type of the {{Scorable}} 
> they intend to wrap to avoid this. {{FieldComparator}} is actually the only 
> calling code that does this check.
> It would be nice to add a factory method that encapsulates this check in 
> {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry 
> about it.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer

2021-07-26 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387629#comment-17387629
 ] 

Greg Miller commented on LUCENE-10036:
--

PRs open to make this change main/9.0 as well as backport into 8.10.

> Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another 
> ScoreCachingWrappingScorer
> ---
>
> Key: LUCENE-10036
> URL: https://issues.apache.org/jira/browse/LUCENE-10036
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (9.0), 8.10
>Reporter: Greg Miller
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a trivial issue, but it's easy to mistakenly "double wrap" an 
> instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The 
> calling code currently needs to check the instance type of the {{Scorable}} 
> they intend to wrap to avoid this. {{FieldComparator}} is actually the only 
> calling code that does this check.
> It would be nice to add a factory method that encapsulates this check in 
> {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry 
> about it.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer

2021-07-26 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller updated LUCENE-10036:
-
Lucene Fields: New,Patch Available  (was: New)

> Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another 
> ScoreCachingWrappingScorer
> ---
>
> Key: LUCENE-10036
> URL: https://issues.apache.org/jira/browse/LUCENE-10036
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (9.0), 8.10
>Reporter: Greg Miller
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a trivial issue, but it's easy to mistakenly "double wrap" an 
> instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The 
> calling code currently needs to check the instance type of the {{Scorable}} 
> they intend to wrap to avoid this. {{FieldComparator}} is actually the only 
> calling code that does this check.
> It would be nice to add a factory method that encapsulates this check in 
> {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry 
> about it.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-10036) Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another ScoreCachingWrappingScorer

2021-07-26 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller reassigned LUCENE-10036:


Assignee: Greg Miller

> Ensure ScoreCachingWrappingScorer doesn't unnecessarily wrap another 
> ScoreCachingWrappingScorer
> ---
>
> Key: LUCENE-10036
> URL: https://issues.apache.org/jira/browse/LUCENE-10036
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: main (9.0), 8.10
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a trivial issue, but it's easy to mistakenly "double wrap" an 
> instance of {{ScoreCachingWrappingScorer}}, which is a bit wasteful. The 
> calling code currently needs to check the instance type of the {{Scorable}} 
> they intend to wrap to avoid this. {{FieldComparator}} is actually the only 
> calling code that does this check.
> It would be nice to add a factory method that encapsulates this check in 
> {{ScoreCachingWrappingScorer}} so that calling code doesn't need to worry 
> about it.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org