[GitHub] [lucene] mocobeta commented on a change in pull request #757: LUCENE-10478: mark Test4GBStoredFields as @Monster

2022-03-22 Thread GitBox


mocobeta commented on a change in pull request #757:
URL: https://github.com/apache/lucene/pull/757#discussion_r831910208



##
File path: lucene/core/src/test/org/apache/lucene/index/Test4GBStoredFields.java
##
@@ -26,13 +26,15 @@
 import org.apache.lucene.tests.codecs.compressing.CompressingCodec;
 import org.apache.lucene.tests.store.MockDirectoryWrapper;
 import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.LuceneTestCase.Monster;
 import org.apache.lucene.tests.util.LuceneTestCase.SuppressCodecs;
 import org.apache.lucene.tests.util.TimeUnits;
 import org.apache.lucene.util.BytesRef;
 
 /** This test creates an index with one segment that is a little larger than 
4GB. */
 @SuppressCodecs({"SimpleText", "Compressing"})
 @TimeoutSuite(millis = 4 * TimeUnits.HOUR)
+@Monster("consumes a lot of disk space")
 public class Test4GBStoredFields extends LuceneTestCase {
 
   @Nightly

Review comment:
   Sure, removed at 
https://github.com/apache/lucene/pull/757/commits/8fc5baa95cda96d4d44cbfdc74db54bedb8cb790




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta merged pull request #757: LUCENE-10478: mark Test4GBStoredFields as @Monster

2022-03-22 Thread GitBox


mocobeta merged pull request #757:
URL: https://github.com/apache/lucene/pull/757


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10478) Mark Test4GBStoredFields as @Monster (it consumes a lot of disk)

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510340#comment-17510340
 ] 

ASF subversion and git services commented on LUCENE-10478:
--

Commit fa61953afdd5b988adc25e11c559b3cb23820203 in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fa61953 ]

LUCENE-10478: mark Test4GBStoredFields as @Monster (#757)



> Mark Test4GBStoredFields as @Monster (it consumes a lot of disk)
> 
>
> Key: LUCENE-10478
> URL: https://issues.apache.org/jira/browse/LUCENE-10478
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> `Test4GBStoredFields` creates very large index files (7GiB+) and can cause a 
> "disk full" error when running the smoke tester if sufficient free space is 
> not available in tmpfs.
> See [https://github.com/apache/lucene/pull/755] for details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10478) Mark Test4GBStoredFields as @Monster (it consumes a lot of disk)

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510345#comment-17510345
 ] 

ASF subversion and git services commented on LUCENE-10478:
--

Commit c608d9660a7f7153bb0eccbb5d6cd8139969efb3 in lucene's branch 
refs/heads/branch_9x from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c608d96 ]

LUCENE-10478: mark Test4GBStoredFields as @Monster (#757)



> Mark Test4GBStoredFields as @Monster (it consumes a lot of disk)
> 
>
> Key: LUCENE-10478
> URL: https://issues.apache.org/jira/browse/LUCENE-10478
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Priority: Trivial
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> `Test4GBStoredFields` creates very large index files (7GiB+) and can cause a 
> "disk full" error when running the smoke tester if sufficient free space is 
> not available in tmpfs.
> See [https://github.com/apache/lucene/pull/755] for details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10478) Mark Test4GBStoredFields as @Monster (it consumes a lot of disk)

2022-03-22 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida resolved LUCENE-10478.

Fix Version/s: 10.0 (main)
   9.2
   Resolution: Fixed

> Mark Test4GBStoredFields as @Monster (it consumes a lot of disk)
> 
>
> Key: LUCENE-10478
> URL: https://issues.apache.org/jira/browse/LUCENE-10478
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Tomoko Uchida
>Priority: Trivial
> Fix For: 10.0 (main), 9.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> `Test4GBStoredFields` creates very large index files (7GiB+) and can cause a 
> "disk full" error when running the smoke tester if sufficient free space is 
> not available in tmpfs.
> See [https://github.com/apache/lucene/pull/755] for details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10422) Monitor instantiation configurabilty improvements

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510359#comment-17510359
 ] 

ASF subversion and git services commented on LUCENE-10422:
--

Commit 42bf77229ec2882ac9a8a004b98a103417d4ce2f in lucene's branch 
refs/heads/main from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=42bf772 ]

LUCENE-10422: Make errorprone happy


> Monitor instantiation configurabilty improvements
> -
>
> Key: LUCENE-10422
> URL: https://issues.apache.org/jira/browse/LUCENE-10422
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Niko Usai
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> I'm working on a project where I use very heavily Lucene Monitor package, but 
> I miss  some simple things in how {{Monitor}} manages it's Directory, 
> IndexWriter and IndexReader, what I want to do is extend 
> {{MonitorConfiguration}} to make possible mainly these two things: * use a 
> custom {{Directory}} implementation.
>  * use a readonly {{QueryIndex}} in order to have more Monitor instance on 
> different server reading from the same index (now the index reader is created 
> from the index writer so it is impossible to make a readonly {{{}Monitor{}}})



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10422) Monitor instantiation configurabilty improvements

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510362#comment-17510362
 ] 

ASF subversion and git services commented on LUCENE-10422:
--

Commit 28afaadfb81e7bc494f1a570cd87a178953382e0 in lucene's branch 
refs/heads/branch_9x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=28afaad ]

LUCENE-10422: Make errorprone happy


> Monitor instantiation configurabilty improvements
> -
>
> Key: LUCENE-10422
> URL: https://issues.apache.org/jira/browse/LUCENE-10422
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Niko Usai
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> I'm working on a project where I use very heavily Lucene Monitor package, but 
> I miss  some simple things in how {{Monitor}} manages it's Directory, 
> IndexWriter and IndexReader, what I want to do is extend 
> {{MonitorConfiguration}} to make possible mainly these two things: * use a 
> custom {{Directory}} implementation.
>  * use a readonly {{QueryIndex}} in order to have more Monitor instance on 
> different server reading from the same index (now the index reader is created 
> from the index writer so it is impossible to make a readonly {{{}Monitor{}}})



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #755: Add note for smoke tester --tmp-dir option in rc announcing

2022-03-22 Thread GitBox


mocobeta commented on pull request #755:
URL: https://github.com/apache/lucene/pull/755#issuecomment-1074927120


   The test in question was excluded from nightly tests (LUCENE-10478). We may 
not need to change the smoke tester or release wizard for disk usage - if a 
test uses lots of disk space, it shouldn't be run in the smoke tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta closed pull request #755: Add note for smoke tester --tmp-dir option in rc announcing

2022-03-22 Thread GitBox


mocobeta closed pull request #755:
URL: https://github.com/apache/lucene/pull/755


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #737: LUCENE-10464, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms to rewrite sufficiently

2022-03-22 Thread GitBox


jpountz commented on a change in pull request #737:
URL: https://github.com/apache/lucene/pull/737#discussion_r831971359



##
File path: 
lucene/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
##
@@ -309,11 +309,12 @@ protected void extractWeightedSpanTerms(
 final IndexSearcher searcher = new IndexSearcher(getLeafContext());
 searcher.setQueryCache(null);
 if (mustRewriteQuery) {
+  final SpanQuery rewrittenQuery =
+  (SpanQuery) IndexSearcher.rewrite(spanQuery, 
getLeafContext().reader());

Review comment:
   +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #740: LUCENE-10393: Unify binary dictionary and dictionary writer in kuromoji and nori

2022-03-22 Thread GitBox


mocobeta commented on pull request #740:
URL: https://github.com/apache/lucene/pull/740#issuecomment-1075037005


   I added test modules `analysis/kuromoji.tests` and `analysis/nori.tests` to 
make sure that both tokenizers correctly load the dictionary resources and work 
in module-mode. They are tiny tests but it'd be good to have ones for sanity 
checks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.

2022-03-22 Thread kkewwei (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510477#comment-17510477
 ] 

kkewwei edited comment on LUCENE-10448 at 3/22/22, 1:00 PM:


Optimization seems have nothing to do with memory pressure, it is guaranteed 
that there will be no big chunks written to disk in theory.


was (Author: kkewwei):
Optimization seems have nothing to do with memory pressure, it is guaranteed 
that there will be no big chunks in theory.

> MergeRateLimiter doesn't always limit instant rate.
> ---
>
> Key: LUCENE-10448
> URL: https://issues.apache.org/jira/browse/LUCENE-10448
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.11.1
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We can see the code in *MergeRateLimiter*:
> {code:java}
> private long maybePause(long bytes, long curNS) throws 
> MergePolicy.MergeAbortedException {
>
> double rate = mbPerSec; 
> double secondsToPause = (bytes / 1024. / 1024.) / rate;
> long targetNS = lastNS + (long) (10 * secondsToPause);
> long curPauseNS = targetNS - curNS;
> // We don't bother with thread pausing if the pause is smaller than 2 
> msec.
> if (curPauseNS <= MIN_PAUSE_NS) {
>   // Set to curNS, not targetNS, to enforce the instant rate, not
>   // the "averaged over all history" rate:
>   lastNS = curNS;
>   return -1;
> }
>..
>   }
> {code}
> If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, 
> then the *maybePause* is called in 7:05 again,  so the value of 
> *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than 
> *curNS*, no matter how big the bytes is, we will return -1 and ignore to 
> pause. 
> I count the total times(callTimes) calling *maybePause* and ignored pause 
> times(ignorePauseTimes) and detail ignored bytes(detailBytes):
> {code:java}
> [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] 
> [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 
> docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec 
> throttle], [callTimes=857], [ignorePauseTimes=25],  [detailBytes(mb) = 
> [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, 
> 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, 
> 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, 
> 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]]
> {code}
> There are 857 times calling *maybePause*, including 25 times which is ignored 
> to pause, we can see that the ignored detail bytes (such as 0.28125mb) are 
> not small.
> As long as the interval between two *maybePause* calls is relatively long, 
> the pause action that should be executed will not be executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.

2022-03-22 Thread kkewwei (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510477#comment-17510477
 ] 

kkewwei commented on LUCENE-10448:
--

Optimization seems have nothing to do with memory pressure, it is guaranteed 
that there will be no big chunks in theory.

> MergeRateLimiter doesn't always limit instant rate.
> ---
>
> Key: LUCENE-10448
> URL: https://issues.apache.org/jira/browse/LUCENE-10448
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.11.1
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We can see the code in *MergeRateLimiter*:
> {code:java}
> private long maybePause(long bytes, long curNS) throws 
> MergePolicy.MergeAbortedException {
>
> double rate = mbPerSec; 
> double secondsToPause = (bytes / 1024. / 1024.) / rate;
> long targetNS = lastNS + (long) (10 * secondsToPause);
> long curPauseNS = targetNS - curNS;
> // We don't bother with thread pausing if the pause is smaller than 2 
> msec.
> if (curPauseNS <= MIN_PAUSE_NS) {
>   // Set to curNS, not targetNS, to enforce the instant rate, not
>   // the "averaged over all history" rate:
>   lastNS = curNS;
>   return -1;
> }
>..
>   }
> {code}
> If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, 
> then the *maybePause* is called in 7:05 again,  so the value of 
> *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than 
> *curNS*, no matter how big the bytes is, we will return -1 and ignore to 
> pause. 
> I count the total times(callTimes) calling *maybePause* and ignored pause 
> times(ignorePauseTimes) and detail ignored bytes(detailBytes):
> {code:java}
> [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] 
> [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 
> docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec 
> throttle], [callTimes=857], [ignorePauseTimes=25],  [detailBytes(mb) = 
> [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, 
> 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, 
> 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, 
> 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]]
> {code}
> There are 857 times calling *maybePause*, including 25 times which is ignored 
> to pause, we can see that the ignored detail bytes (such as 0.28125mb) are 
> not small.
> As long as the interval between two *maybePause* calls is relatively long, 
> the pause action that should be executed will not be executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.

2022-03-22 Thread kkewwei (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510477#comment-17510477
 ] 

kkewwei edited comment on LUCENE-10448 at 3/22/22, 1:08 PM:


Optimization seems have nothing to do with memory pressure, it is guaranteed 
that there will be no big chunks written to disk in theory, What am I missing?


was (Author: kkewwei):
Optimization seems have nothing to do with memory pressure, it is guaranteed 
that there will be no big chunks written to disk in theory.

> MergeRateLimiter doesn't always limit instant rate.
> ---
>
> Key: LUCENE-10448
> URL: https://issues.apache.org/jira/browse/LUCENE-10448
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.11.1
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We can see the code in *MergeRateLimiter*:
> {code:java}
> private long maybePause(long bytes, long curNS) throws 
> MergePolicy.MergeAbortedException {
>
> double rate = mbPerSec; 
> double secondsToPause = (bytes / 1024. / 1024.) / rate;
> long targetNS = lastNS + (long) (10 * secondsToPause);
> long curPauseNS = targetNS - curNS;
> // We don't bother with thread pausing if the pause is smaller than 2 
> msec.
> if (curPauseNS <= MIN_PAUSE_NS) {
>   // Set to curNS, not targetNS, to enforce the instant rate, not
>   // the "averaged over all history" rate:
>   lastNS = curNS;
>   return -1;
> }
>..
>   }
> {code}
> If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, 
> then the *maybePause* is called in 7:05 again,  so the value of 
> *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than 
> *curNS*, no matter how big the bytes is, we will return -1 and ignore to 
> pause. 
> I count the total times(callTimes) calling *maybePause* and ignored pause 
> times(ignorePauseTimes) and detail ignored bytes(detailBytes):
> {code:java}
> [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] 
> [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 
> docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec 
> throttle], [callTimes=857], [ignorePauseTimes=25],  [detailBytes(mb) = 
> [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, 
> 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, 
> 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, 
> 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]]
> {code}
> There are 857 times calling *maybePause*, including 25 times which is ignored 
> to pause, we can see that the ignored detail bytes (such as 0.28125mb) are 
> not small.
> As long as the interval between two *maybePause* calls is relatively long, 
> the pause action that should be executed will not be executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.

2022-03-22 Thread kkewwei (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510477#comment-17510477
 ] 

kkewwei edited comment on LUCENE-10448 at 3/22/22, 1:09 PM:


Optimization seems have nothing to do with memory pressure, it is guaranteed 
that there will be no big chunks written to disk in theory, what am I missing?


was (Author: kkewwei):
Optimization seems have nothing to do with memory pressure, it is guaranteed 
that there will be no big chunks written to disk in theory, What am I missing?

> MergeRateLimiter doesn't always limit instant rate.
> ---
>
> Key: LUCENE-10448
> URL: https://issues.apache.org/jira/browse/LUCENE-10448
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.11.1
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We can see the code in *MergeRateLimiter*:
> {code:java}
> private long maybePause(long bytes, long curNS) throws 
> MergePolicy.MergeAbortedException {
>
> double rate = mbPerSec; 
> double secondsToPause = (bytes / 1024. / 1024.) / rate;
> long targetNS = lastNS + (long) (10 * secondsToPause);
> long curPauseNS = targetNS - curNS;
> // We don't bother with thread pausing if the pause is smaller than 2 
> msec.
> if (curPauseNS <= MIN_PAUSE_NS) {
>   // Set to curNS, not targetNS, to enforce the instant rate, not
>   // the "averaged over all history" rate:
>   lastNS = curNS;
>   return -1;
> }
>..
>   }
> {code}
> If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, 
> then the *maybePause* is called in 7:05 again,  so the value of 
> *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than 
> *curNS*, no matter how big the bytes is, we will return -1 and ignore to 
> pause. 
> I count the total times(callTimes) calling *maybePause* and ignored pause 
> times(ignorePauseTimes) and detail ignored bytes(detailBytes):
> {code:java}
> [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] 
> [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 
> docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec 
> throttle], [callTimes=857], [ignorePauseTimes=25],  [detailBytes(mb) = 
> [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, 
> 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, 
> 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, 
> 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]]
> {code}
> There are 857 times calling *maybePause*, including 25 times which is ignored 
> to pause, we can see that the ignored detail bytes (such as 0.28125mb) are 
> not small.
> As long as the interval between two *maybePause* calls is relatively long, 
> the pause action that should be executed will not be executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.

2022-03-22 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510491#comment-17510491
 ] 

Adrien Grand commented on LUCENE-10448:
---

bq.  it is guaranteed that there will be no big chunks written to disk in 
theory, what am I missing?

This is where I'm confused too. Based on the discussion and [~vigyas]'s PR, it 
feels to me like the problem stems from writing big chunks after a period of 
inactivity, which is why Vigya's PR that splits big chunks into smaller ones 
helps honor the instant rate.

So either Lucene writes big chunks at times, and then I'd rather look into 
these large writes to understand if we should be doing things differently 
instead of just fixing the problem at the rate limiter level which could delay 
the garbage collection of these large byte[] arrays.

Or Lucene does not write big chunks of data, but then I don't understand what 
is the bug that we are discussing in this issue. Sure, some writes might not 
honor the instant rate limit at times, but since these are small chunks we 
don't care?

> MergeRateLimiter doesn't always limit instant rate.
> ---
>
> Key: LUCENE-10448
> URL: https://issues.apache.org/jira/browse/LUCENE-10448
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.11.1
>Reporter: kkewwei
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We can see the code in *MergeRateLimiter*:
> {code:java}
> private long maybePause(long bytes, long curNS) throws 
> MergePolicy.MergeAbortedException {
>
> double rate = mbPerSec; 
> double secondsToPause = (bytes / 1024. / 1024.) / rate;
> long targetNS = lastNS + (long) (10 * secondsToPause);
> long curPauseNS = targetNS - curNS;
> // We don't bother with thread pausing if the pause is smaller than 2 
> msec.
> if (curPauseNS <= MIN_PAUSE_NS) {
>   // Set to curNS, not targetNS, to enforce the instant rate, not
>   // the "averaged over all history" rate:
>   lastNS = curNS;
>   return -1;
> }
>..
>   }
> {code}
> If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, 
> then the *maybePause* is called in 7:05 again,  so the value of 
> *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than 
> *curNS*, no matter how big the bytes is, we will return -1 and ignore to 
> pause. 
> I count the total times(callTimes) calling *maybePause* and ignored pause 
> times(ignorePauseTimes) and detail ignored bytes(detailBytes):
> {code:java}
> [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] 
> [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 
> docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec 
> throttle], [callTimes=857], [ignorePauseTimes=25],  [detailBytes(mb) = 
> [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, 
> 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, 
> 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, 
> 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]]
> {code}
> There are 857 times calling *maybePause*, including 25 times which is ignored 
> to pause, we can see that the ignored detail bytes (such as 0.28125mb) are 
> not small.
> As long as the interval between two *maybePause* calls is relatively long, 
> the pause action that should be executed will not be executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-03-22 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-10480:
-

 Summary: Specialize 2-clauses disjunctions
 Key: LUCENE-10480
 URL: https://issues.apache.org/jira/browse/LUCENE-10480
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand


WANDScorer is nice, but it also has lots of overhead to maintain its 
invariants: one linked list for the current candidates, one priority queue of 
scorers that are behind, another one for scorers that are ahead. All this could 
be simplified in the 2-clauses case, which feels worth specializing for as it's 
very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10479) Benchmark documentation referes to non-existent tasks

2022-03-22 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved LUCENE-10479.

Resolution: Duplicate

> Benchmark documentation referes to non-existent tasks
> -
>
> Key: LUCENE-10479
> URL: https://issues.apache.org/jira/browse/LUCENE-10479
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mike Drob
>Priority: Minor
>
> The Lucene benchmark package-info file has these instructions on how to run:
> {noformat}
>  * To run the short version of the StandardBenchmarker, call "ant 
> run-micro-standard". This
>  * should take a minute or so to complete and give you a preliminary idea of 
> how your change affects
>  * the code.
> {noformat}
> Ant doesn't exist for us anymore, so we should replace these instructions 
> with the Gradle equivalents. The intuitive replacements {{./gradlew 
> run-micro-standard}} or {{runMicroStandard}} didn't work for me, so I'm not 
> sure what the new way to run these benchmarks is. Maybe it still needs to be 
> implemented? Or this comment deleted.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] cpoerschke merged pull request #737: LUCENE-10464, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms to rewrite sufficiently

2022-03-22 Thread GitBox


cpoerschke merged pull request #737:
URL: https://github.com/apache/lucene/pull/737


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10464) unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510540#comment-17510540
 ] 

ASF subversion and git services commented on LUCENE-10464:
--

Commit ca252d6621277cad8ca34361f8920c07482b0a16 in lucene's branch 
refs/heads/main from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ca252d6 ]

LUCENE-10464, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms 
to rewrite sufficiently (#737)



> unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms 
> ---
>
> Key: LUCENE-10464
> URL: https://issues.apache.org/jira/browse/LUCENE-10464
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The 
> https://github.com/apache/lucene/commit/81c7ba4601a9aaf16e2255fe493ee582abe72a90
>  change in LUCENE-4728 included
> {code}
> - final SpanQuery rewrittenQuery = (SpanQuery) 
> spanQuery.rewrite(getLeafContextForField(field).reader());
> + final SpanQuery rewrittenQuery = (SpanQuery) 
> spanQuery.rewrite(getLeafContext().reader());
> {code}
> i.e. previously more needed to happen in the loop but now the query rewrite 
> and term collecting need not happen in the loop.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10477) SpanBoostQuery.rewrite was incomplete for boost==1 factor

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510541#comment-17510541
 ] 

ASF subversion and git services commented on LUCENE-10477:
--

Commit ca252d6621277cad8ca34361f8920c07482b0a16 in lucene's branch 
refs/heads/main from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ca252d6 ]

LUCENE-10464, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms 
to rewrite sufficiently (#737)



> SpanBoostQuery.rewrite was incomplete for boost==1 factor
> -
>
> Key: LUCENE-10477
> URL: https://issues.apache.org/jira/browse/LUCENE-10477
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> _(This bug report concerns pre-9.0 code only but it's so subtle that it 
> warrants sharing I think and maybe fixing if there was to be a 8.11.2 release 
> in future.)_
> Some existing code e.g. 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/queryparser/src/java/org/apache/lucene/queryparser/xml/builders/SpanNearBuilder.java#L54]
>  adds a {{SpanBoostQuery}} even if there is no boost or the boost factor is 
> {{1.0}} i.e. technically wrapping is unnecessary.
> Query rewriting should counteract this somewhat except it might not e.g. note 
> at 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanBoostQuery.java#L81-L83]
>  how the rewrite is a no-op i.e. {{this.query.rewrite}} is not called!
> This can then manifest in strange ways e.g. during highlighting:
> {code:java}
> ...
> java.lang.IllegalArgumentException: Rewrite first!
>   at 
> org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.createWeight(SpanMultiTermQueryWrapper.java:99)
>   at 
> org.apache.lucene.search.spans.SpanNearQuery.createWeight(SpanNearQuery.java:183)
>   at 
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:295)
>   ...
> {code}
> This stacktrace is not from 8.11.1 code but the general logic is that at line 
> 293 rewrite was called (except it didn't a full rewrite because of 
> {{SpanBoostQuery}} wrapping around the {{{}SpanNearQuery{}}}) and so then at 
> line 295 the {{IllegalArgumentException("Rewrite first!")}} arises: 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanMultiTermQueryWrapper.java#L101]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10477) SpanBoostQuery.rewrite was incomplete for boost==1 factor

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510547#comment-17510547
 ] 

ASF subversion and git services commented on LUCENE-10477:
--

Commit e7367f3047b7db2d6d54293b07ab121868a8de71 in lucene's branch 
refs/heads/branch_9x from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e7367f3 ]

LUCENE-10464, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms 
to rewrite sufficiently (#737)

(cherry picked from commit ca252d6621277cad8ca34361f8920c07482b0a16)


> SpanBoostQuery.rewrite was incomplete for boost==1 factor
> -
>
> Key: LUCENE-10477
> URL: https://issues.apache.org/jira/browse/LUCENE-10477
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> _(This bug report concerns pre-9.0 code only but it's so subtle that it 
> warrants sharing I think and maybe fixing if there was to be a 8.11.2 release 
> in future.)_
> Some existing code e.g. 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/queryparser/src/java/org/apache/lucene/queryparser/xml/builders/SpanNearBuilder.java#L54]
>  adds a {{SpanBoostQuery}} even if there is no boost or the boost factor is 
> {{1.0}} i.e. technically wrapping is unnecessary.
> Query rewriting should counteract this somewhat except it might not e.g. note 
> at 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanBoostQuery.java#L81-L83]
>  how the rewrite is a no-op i.e. {{this.query.rewrite}} is not called!
> This can then manifest in strange ways e.g. during highlighting:
> {code:java}
> ...
> java.lang.IllegalArgumentException: Rewrite first!
>   at 
> org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.createWeight(SpanMultiTermQueryWrapper.java:99)
>   at 
> org.apache.lucene.search.spans.SpanNearQuery.createWeight(SpanNearQuery.java:183)
>   at 
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:295)
>   ...
> {code}
> This stacktrace is not from 8.11.1 code but the general logic is that at line 
> 293 rewrite was called (except it didn't a full rewrite because of 
> {{SpanBoostQuery}} wrapping around the {{{}SpanNearQuery{}}}) and so then at 
> line 295 the {{IllegalArgumentException("Rewrite first!")}} arises: 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanMultiTermQueryWrapper.java#L101]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10458) BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables

2022-03-22 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-10458:
--
Fix Version/s: (was: 9.1)

> BoundedDocSetIdIterator may supply error count in 
> Weigth#count(LeafReaderContext) when missingValue enables
> ---
>
> Key: LUCENE-10458
> URL: https://issues.apache.org/jira/browse/LUCENE-10458
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When IndexSortSortedNumericDocValuesRangeQuery can take advantage of index 
> sort, Weight#count will use BoundedDocSetIdIterator's lastDoc and firstDoc to 
> calculate count, but if missingValue enables, those Documents which not 
> contain DocValues may be involved in calculating count.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs

2022-03-22 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10382.
---
Resolution: Fixed

> Allow KnnVectorQuery to operate over a subset of liveDocs
> -
>
> Key: LUCENE-10382
> URL: https://issues.apache.org/jira/browse/LUCENE-10382
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Joel Bernstein
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs

2022-03-22 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand closed LUCENE-10382.
-

Close after 9.1.0 release.

> Allow KnnVectorQuery to operate over a subset of liveDocs
> -
>
> Key: LUCENE-10382
> URL: https://issues.apache.org/jira/browse/LUCENE-10382
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Joel Bernstein
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] cpoerschke opened a new pull request #758: LUCENE-10477: mention 'call multiple times' in Query.rewrite javadoc

2022-03-22 Thread GitBox


cpoerschke opened a new pull request #758:
URL: https://github.com/apache/lucene/pull/758


   https://issues.apache.org/jira/browse/LUCENE-10477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10464) unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms

2022-03-22 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke resolved LUCENE-10464.
--
Fix Version/s: 10.0 (main)
   9.2
   Resolution: Fixed

> unnecessary for-loop in WeightedSpanTermExtractor.extractWeightedSpanTerms 
> ---
>
> Key: LUCENE-10464
> URL: https://issues.apache.org/jira/browse/LUCENE-10464
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: 10.0 (main), 9.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The 
> https://github.com/apache/lucene/commit/81c7ba4601a9aaf16e2255fe493ee582abe72a90
>  change in LUCENE-4728 included
> {code}
> - final SpanQuery rewrittenQuery = (SpanQuery) 
> spanQuery.rewrite(getLeafContextForField(field).reader());
> + final SpanQuery rewrittenQuery = (SpanQuery) 
> spanQuery.rewrite(getLeafContext().reader());
> {code}
> i.e. previously more needed to happen in the loop but now the query rewrite 
> and term collecting need not happen in the loop.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10477) SpanBoostQuery.rewrite was incomplete for boost==1 factor

2022-03-22 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510680#comment-17510680
 ] 

Christine Poerschke commented on LUCENE-10477:
--

bq. ... Would it be worth documenting e.g. in the javadocs? ...

https://github.com/apache/lucene/pull/758 opened for that.

> SpanBoostQuery.rewrite was incomplete for boost==1 factor
> -
>
> Key: LUCENE-10477
> URL: https://issues.apache.org/jira/browse/LUCENE-10477
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> _(This bug report concerns pre-9.0 code only but it's so subtle that it 
> warrants sharing I think and maybe fixing if there was to be a 8.11.2 release 
> in future.)_
> Some existing code e.g. 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/queryparser/src/java/org/apache/lucene/queryparser/xml/builders/SpanNearBuilder.java#L54]
>  adds a {{SpanBoostQuery}} even if there is no boost or the boost factor is 
> {{1.0}} i.e. technically wrapping is unnecessary.
> Query rewriting should counteract this somewhat except it might not e.g. note 
> at 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanBoostQuery.java#L81-L83]
>  how the rewrite is a no-op i.e. {{this.query.rewrite}} is not called!
> This can then manifest in strange ways e.g. during highlighting:
> {code:java}
> ...
> java.lang.IllegalArgumentException: Rewrite first!
>   at 
> org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.createWeight(SpanMultiTermQueryWrapper.java:99)
>   at 
> org.apache.lucene.search.spans.SpanNearQuery.createWeight(SpanNearQuery.java:183)
>   at 
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:295)
>   ...
> {code}
> This stacktrace is not from 8.11.1 code but the general logic is that at line 
> 293 rewrite was called (except it didn't a full rewrite because of 
> {{SpanBoostQuery}} wrapping around the {{{}SpanNearQuery{}}}) and so then at 
> line 295 the {{IllegalArgumentException("Rewrite first!")}} arises: 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanMultiTermQueryWrapper.java#L101]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] cpoerschke merged pull request #758: LUCENE-10477: mention 'call multiple times' in Query.rewrite javadoc

2022-03-22 Thread GitBox


cpoerschke merged pull request #758:
URL: https://github.com/apache/lucene/pull/758


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10477) SpanBoostQuery.rewrite was incomplete for boost==1 factor

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510702#comment-17510702
 ] 

ASF subversion and git services commented on LUCENE-10477:
--

Commit 779c332a8c76f5de171b5d0239e5123ff8b5a10d in lucene's branch 
refs/heads/main from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=779c332 ]

LUCENE-10477: mention 'call multiple times' in Query.rewrite javadoc (#758)



> SpanBoostQuery.rewrite was incomplete for boost==1 factor
> -
>
> Key: LUCENE-10477
> URL: https://issues.apache.org/jira/browse/LUCENE-10477
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> _(This bug report concerns pre-9.0 code only but it's so subtle that it 
> warrants sharing I think and maybe fixing if there was to be a 8.11.2 release 
> in future.)_
> Some existing code e.g. 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/queryparser/src/java/org/apache/lucene/queryparser/xml/builders/SpanNearBuilder.java#L54]
>  adds a {{SpanBoostQuery}} even if there is no boost or the boost factor is 
> {{1.0}} i.e. technically wrapping is unnecessary.
> Query rewriting should counteract this somewhat except it might not e.g. note 
> at 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanBoostQuery.java#L81-L83]
>  how the rewrite is a no-op i.e. {{this.query.rewrite}} is not called!
> This can then manifest in strange ways e.g. during highlighting:
> {code:java}
> ...
> java.lang.IllegalArgumentException: Rewrite first!
>   at 
> org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.createWeight(SpanMultiTermQueryWrapper.java:99)
>   at 
> org.apache.lucene.search.spans.SpanNearQuery.createWeight(SpanNearQuery.java:183)
>   at 
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:295)
>   ...
> {code}
> This stacktrace is not from 8.11.1 code but the general logic is that at line 
> 293 rewrite was called (except it didn't a full rewrite because of 
> {{SpanBoostQuery}} wrapping around the {{{}SpanNearQuery{}}}) and so then at 
> line 295 the {{IllegalArgumentException("Rewrite first!")}} arises: 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanMultiTermQueryWrapper.java#L101]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] madrob opened a new pull request #759: LUCENE-9651 Update benchmark module docs

2022-03-22 Thread GitBox


madrob opened a new pull request #759:
URL: https://github.com/apache/lucene/pull/759


   LUCENE-9651: Update javadoc and download tasks for benchmarks module
   
   # Description
   
   Update the Reuters download task to extract data where most of the 
benchmarks already expect it.
   Update docs to have gradle commands instead of ant commands.
   Switch Reuters download from personal site to archived institutional site, 
per maintainer's request.
   
   # Tests
   
   Manually ran several of the benchmark algorithms to verify.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `main` branch.
   - [X] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10477) SpanBoostQuery.rewrite was incomplete for boost==1 factor

2022-03-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510705#comment-17510705
 ] 

ASF subversion and git services commented on LUCENE-10477:
--

Commit ffb3168d6bd1ca70b2c32b0d78d5169000f34523 in lucene's branch 
refs/heads/branch_9x from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ffb3168 ]

LUCENE-10477: mention 'call multiple times' in Query.rewrite javadoc (#758)

(cherry picked from commit 779c332a8c76f5de171b5d0239e5123ff8b5a10d)


> SpanBoostQuery.rewrite was incomplete for boost==1 factor
> -
>
> Key: LUCENE-10477
> URL: https://issues.apache.org/jira/browse/LUCENE-10477
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> _(This bug report concerns pre-9.0 code only but it's so subtle that it 
> warrants sharing I think and maybe fixing if there was to be a 8.11.2 release 
> in future.)_
> Some existing code e.g. 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/queryparser/src/java/org/apache/lucene/queryparser/xml/builders/SpanNearBuilder.java#L54]
>  adds a {{SpanBoostQuery}} even if there is no boost or the boost factor is 
> {{1.0}} i.e. technically wrapping is unnecessary.
> Query rewriting should counteract this somewhat except it might not e.g. note 
> at 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanBoostQuery.java#L81-L83]
>  how the rewrite is a no-op i.e. {{this.query.rewrite}} is not called!
> This can then manifest in strange ways e.g. during highlighting:
> {code:java}
> ...
> java.lang.IllegalArgumentException: Rewrite first!
>   at 
> org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.createWeight(SpanMultiTermQueryWrapper.java:99)
>   at 
> org.apache.lucene.search.spans.SpanNearQuery.createWeight(SpanNearQuery.java:183)
>   at 
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:295)
>   ...
> {code}
> This stacktrace is not from 8.11.1 code but the general logic is that at line 
> 293 rewrite was called (except it didn't a full rewrite because of 
> {{SpanBoostQuery}} wrapping around the {{{}SpanNearQuery{}}}) and so then at 
> line 295 the {{IllegalArgumentException("Rewrite first!")}} arises: 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanMultiTermQueryWrapper.java#L101]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10477) SpanBoostQuery.rewrite was incomplete for boost==1 factor

2022-03-22 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke resolved LUCENE-10477.
--
Fix Version/s: 10.0 (main)
   9.2
   Resolution: Fixed

Thanks [~jpountz] for the collaboration here!

> SpanBoostQuery.rewrite was incomplete for boost==1 factor
> -
>
> Key: LUCENE-10477
> URL: https://issues.apache.org/jira/browse/LUCENE-10477
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: 10.0 (main), 9.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> _(This bug report concerns pre-9.0 code only but it's so subtle that it 
> warrants sharing I think and maybe fixing if there was to be a 8.11.2 release 
> in future.)_
> Some existing code e.g. 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/queryparser/src/java/org/apache/lucene/queryparser/xml/builders/SpanNearBuilder.java#L54]
>  adds a {{SpanBoostQuery}} even if there is no boost or the boost factor is 
> {{1.0}} i.e. technically wrapping is unnecessary.
> Query rewriting should counteract this somewhat except it might not e.g. note 
> at 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanBoostQuery.java#L81-L83]
>  how the rewrite is a no-op i.e. {{this.query.rewrite}} is not called!
> This can then manifest in strange ways e.g. during highlighting:
> {code:java}
> ...
> java.lang.IllegalArgumentException: Rewrite first!
>   at 
> org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.createWeight(SpanMultiTermQueryWrapper.java:99)
>   at 
> org.apache.lucene.search.spans.SpanNearQuery.createWeight(SpanNearQuery.java:183)
>   at 
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:295)
>   ...
> {code}
> This stacktrace is not from 8.11.1 code but the general logic is that at line 
> 293 rewrite was called (except it didn't a full rewrite because of 
> {{SpanBoostQuery}} wrapping around the {{{}SpanNearQuery{}}}) and so then at 
> line 295 the {{IllegalArgumentException("Rewrite first!")}} arises: 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanMultiTermQueryWrapper.java#L101]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10481) FacetsCollector does not need scores when not keeping them

2022-03-22 Thread Mike Drob (Jira)
Mike Drob created LUCENE-10481:
--

 Summary: FacetsCollector does not need scores when not keeping them
 Key: LUCENE-10481
 URL: https://issues.apache.org/jira/browse/LUCENE-10481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Mike Drob


FacetsCollector currently always specifies ScoreMode.COMPLETE, we could get 
better performance by not requesting scores when we don't need them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10481) FacetsCollector does not need scores when not keeping them

2022-03-22 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510793#comment-17510793
 ] 

Adrien Grand commented on LUCENE-10481:
---

Your change looks good, but this makes me wonder why the facets collector would 
ever need to buffer scores?

> FacetsCollector does not need scores when not keeping them
> --
>
> Key: LUCENE-10481
> URL: https://issues.apache.org/jira/browse/LUCENE-10481
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Mike Drob
>Priority: Major
>
> FacetsCollector currently always specifies ScoreMode.COMPLETE, we could get 
> better performance by not requesting scores when we don't need them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] madrob commented on pull request #760: LUCENE-10481: FacetsCollector will not request scores if it does not use them

2022-03-22 Thread GitBox


madrob commented on pull request #760:
URL: https://github.com/apache/lucene/pull/760#issuecomment-1075437489


   Local benchmarks showed a 2x improvement for facet counting when doing a 
search with filter queries. I'm attempting to replicate this using our existing 
lucene benchmarks, but they don't seem to have any non-scoring filters on 
facets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] madrob opened a new pull request #760: LUCENE-10481: FacetsCollector will not request scores if it does not use them

2022-03-22 Thread GitBox


madrob opened a new pull request #760:
URL: https://github.com/apache/lucene/pull/760


   LUCENE-10481: FacetsCollector will not request scores if it does not use them
   
   # Description
   
   When not collecting any documents, we don't need FacetsCollector to request 
scores.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `main` branch.
   - [X] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10481) FacetsCollector does not need scores when not keeping them

2022-03-22 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510809#comment-17510809
 ] 

Mike Drob commented on LUCENE-10481:


I _think_ the use case would be to collect facets over only the top N documents 
that match a query and if we're doing that search without a TopDocsCollector? 
I'm not entirely certain what the motivation was.

Our use case that is significantly improved by this was a boolean query with 
several filter clauses where we only want the facet counts and don't care about 
the documents themselves.

> FacetsCollector does not need scores when not keeping them
> --
>
> Key: LUCENE-10481
> URL: https://issues.apache.org/jira/browse/LUCENE-10481
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> FacetsCollector currently always specifies ScoreMode.COMPLETE, we could get 
> better performance by not requesting scores when we don't need them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-22 Thread GitBox


gsmiller commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r832488053



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##
@@ -178,10 +229,23 @@ private FacetResult getPathResult(
   }
 }
 
-if (q == null) {
-  return null;
+if (dimConfig.hierarchical == false) {

Review comment:
   I'm not sure all the cases are covered. What about `getTopChildren`? It 
looks like it could call `getPathResult` for a hierarchical + multivalued case. 
Maybe I'm just missing something?
   
   But either way, I'd prefer `getChildOrdsResult` not rely on the specific 
ways it gets called today. If it's providing correct counts, it should do so in 
all cases. Or alternatively, you could state that it's invalid to use that 
method for the hierarchical + multivalued case and put an assert in to ensure 
that's true (but it seems like it will need to be called for that case). Does 
this make sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10481) FacetsCollector does not need scores when not keeping them

2022-03-22 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510880#comment-17510880
 ] 

Mike Drob commented on LUCENE-10481:


Hmm... some slightly disappointing results - although we saw great improvement 
with this change, that doesn't seem to persist with Lucene 9.1 benchmarking 
that I'm trying to do right now. Possible that something else has taken care of 
this optimization in a different way.

> FacetsCollector does not need scores when not keeping them
> --
>
> Key: LUCENE-10481
> URL: https://issues.apache.org/jira/browse/LUCENE-10481
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> FacetsCollector currently always specifies ScoreMode.COMPLETE, we could get 
> better performance by not requesting scores when we don't need them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-22 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r832745109



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##
@@ -178,10 +229,23 @@ private FacetResult getPathResult(
   }
 }
 
-if (q == null) {
-  return null;
+if (dimConfig.hierarchical == false) {

Review comment:
   Thanks @gsmiller! Sorry that I didn't explain my code clear.
   
   > What about getTopChildren? It looks like it could call getPathResult for a 
hierarchical + multivalued case. Maybe I'm just missing something?
   
   For the hierarchical + multivalued case, the return statement in 
`getPathResult`will set dimCount = counts[pathOrd]. The logic here is that when 
a dim is hierarchical, dimCount will always equal to counts[pathOrd] regardless 
of the boolean value multivalued, and therefore, we only need to be concerned 
about the case when hierarchical == false, which is checked in 
`getChildOrdsResult`. 
   
   I added tests for `getTopChildren`, `getAllDims`, and `getTopChildren` to 
test different boolean combinations of `hierarchical`, `multivalued`, and 
`requireDimCount` to ensure the current design does not change the existing 
logic for other functionalities but only add support to `getTopDims`, and 
`getChildOrdsResult` does not rely on `getDimValue` to provide/reset  dimCount. 
Please let me know if I am missing anything here. Thanks again!
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-22 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r832745109



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##
@@ -178,10 +229,23 @@ private FacetResult getPathResult(
   }
 }
 
-if (q == null) {
-  return null;
+if (dimConfig.hierarchical == false) {

Review comment:
   Thanks @gsmiller! Sorry that I didn't explain my code clear.
   
   > What about getTopChildren? It looks like it could call getPathResult for a 
hierarchical + multivalued case. Maybe I'm just missing something?
   
   For the hierarchical + multivalued case, the return statement in 
`getPathResult`will set dimCount = counts[pathOrd]. The logic here is that when 
a dim is hierarchical, dimCount will always equal to counts[pathOrd] regardless 
of the boolean value of multivalued and requireDimCount, and therefore, we only 
need to be concerned about the case when hierarchical == false, which is 
checked in `getChildOrdsResult`. 
   
   I added tests for `getTopChildren`, `getAllDims`, and `getTopChildren` to 
test different boolean combinations of `hierarchical`, `multivalued`, and 
`requireDimCount` to ensure the current design does not change the existing 
logic for other functionalities but only add support to `getTopDims`, and 
`getChildOrdsResult` does not rely on `getDimValue` to provide/reset  dimCount. 
Please let me know if I am missing anything here. Thanks again!
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a change in pull request #747: LUCENE-10325: Add getTopDims functionality to Facets

2022-03-22 Thread GitBox


Yuti-G commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r832745109



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##
@@ -178,10 +229,23 @@ private FacetResult getPathResult(
   }
 }
 
-if (q == null) {
-  return null;
+if (dimConfig.hierarchical == false) {

Review comment:
   Thanks @gsmiller! Sorry that I didn't explain my code clear.
   
   > What about getTopChildren? It looks like it could call getPathResult for a 
hierarchical + multivalued case. Maybe I'm just missing something?
   
   For the hierarchical + multivalued case, the return statement in 
`getPathResult`will set dimCount = counts[pathOrd]. The logic here is that when 
a dim is hierarchical, dimCount will always equal to counts[pathOrd] regardless 
of the boolean value of multivalued and requireDimCount, and therefore, we only 
need to be concerned about the case when hierarchical == false, which is 
checked in `getChildOrdsResult`. 
   
   I added tests for `getTopChildren`, `getAllDims`, and `getTopDims` to test 
different boolean combinations of `hierarchical`, `multivalued`, and 
`requireDimCount` to ensure the current design does not change the existing 
logic for other functionalities but only add support to `getTopDims`, and 
`getChildOrdsResult` does not rely on `getDimValue` to provide/reset 
`dimCount`. Please let me know if I am missing anything here. Thanks again!
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org