[GitHub] [lucene] gf2121 commented on pull request #605: LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll

2022-01-14 Thread GitBox


gf2121 commented on pull request #605:
URL: https://github.com/apache/lucene/pull/605#issuecomment-1012923242


   I'm seeing BrowseMonthTaxoFacets increased 30% without any regression in 
[last night 
benchmark](https://home.apache.org/~mikemccand/lucenebench/2022.01.13.18.03.24.html),
 Thanks @gsmiller for such a great idea !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 edited a comment on pull request #602: LUCENE-10376: Roll up the loop in vint/vlong in DataInput

2022-01-14 Thread GitBox


gf2121 edited a comment on pull request #602:
URL: https://github.com/apache/lucene/pull/602#issuecomment-1012836713


   FYI here are some of my thoughts based on all these benchmark reports:
   
   * Rolling up loops for vint/vlong seems not bring a significant speed up 
(neither significant regression), IMO we should move on because the bug will no 
longer occur in java 17 ?
   
   * The [mmap approach](https://github.com/apache/lucene/pull/592) is showing 
a stable speed up, and "too many codes" seems not the major reason that 
prevents inline, so maybe (as @uschindler said) it is prevented by the try 
catch block? I'll suggest continue on the mmap approach too for the speed up, 
but maybe we can also loop the vint/vlong in `ByteBuffereIndexInput`? I'll 
benchmark on looped `ByteBuffereIndexInput#vint`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 edited a comment on pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-01-14 Thread GitBox


gf2121 edited a comment on pull request #541:
URL: https://github.com/apache/lucene/pull/541#issuecomment-1012163438


   Thanks @iverase !
   
   > I remember in the approach I tried I was batching the docIds by a number 
(128 or 256) but in general tI didn't see much better performance comparing to 
the added complexity.
   
   I saw the benchmark result in 
https://github.com/apache/lucene-solr/pull/1538. It is similar to the benchmark 
result mentioned above: around 20% QPS increase for 1D high cardinality cases. 
But it seems like the benchmark did not cover all use cases. There can be some 
cases more tempted for this approach, e.g. low cardinality fields (It would be 
great if you can take a look at 
[here](https://github.com/apache/lucene/pull/541#issue-1080062490)). I proposed 
a BitSet approach several days ago to solve this but that's not enough since 
the optimization can hardly be triggered when cardinality comes to 32+.
   
   > I don't like that the optimisation only works for a specific number of 
points, it feels trappy.
   
   +1, I made a 
[change](https://github.com/apache/lucene/pull/541/commits/ccdae2f6920abff52abf6aae42d65b33861d4cf6)
 to make the BKDForutil flexible and benchmark seems the performance doesn't 
show a significant regression.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #603: LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator()

2022-01-14 Thread GitBox


romseygeek commented on pull request #603:
URL: https://github.com/apache/lucene/pull/603#issuecomment-1012964049


   > I think you have already done this removal in this PR, haven't you?
   
   Yes, sorry I meant to comment to call it out - it turned out to be a really 
easy change so I folded it in here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10377) Replace 'sortPos' parameter in SortField.getComparator()

2022-01-14 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476049#comment-17476049
 ] 

Alan Woodward commented on LUCENE-10377:


Given that this change is on an internal API marked as `experimental` and that 
it fixes a bug in CheckIndex I think it's OK to commit this for 9.1?

> Replace 'sortPos' parameter in SortField.getComparator()
> 
>
> Key: LUCENE-10377
> URL: https://issues.apache.org/jira/browse/LUCENE-10377
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SortField.getComparator() takes two parameters: the number of hits we are 
> collecting, so that the comparator can size its internal priority queue; and 
> the position of this sortfield in the overall sort.  This second parameter is 
> only actually used to determine whether or not the SortField should enable 
> various skipping operations, building comparative iterators etc.  However, 
> it's not clear from the API that this is why it is passed, and as a 
> consequence all sorts of consumers of this API faithfully pass through their 
> sort positions even when they aren't actually using skipping at all.  In 
> particular, CheckIndex does this when checking index sorts, and this in fact 
> leads to a bug, where it will attempt to build skipping structures over 
> fields from 8x-created indexes that did not have the same type constraints 
> that 9x indexes do and incorrectly throw an error.
> I think we should replace this second parameter with a simple boolean, 
> `enableSkipping`, that the TopHits collectors can pass as `true` for the 
> first sortfield.  Other APIs that use sorts but not skipping, like CheckIndex 
> or the grouping collectors, can always pass `false` so we don't waste time 
> building skipping structures that never get used.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 edited a comment on pull request #602: LUCENE-10376: Roll up the loop in vint/vlong in DataInput

2022-01-14 Thread GitBox


gf2121 edited a comment on pull request #602:
URL: https://github.com/apache/lucene/pull/602#issuecomment-1012836713


   FYI here are some of my thoughts based on all these benchmark reports:
   
   * Rolling up loops for vint/vlong seems not bring a significant speed up 
(neither significant regression), IMO we should move on because the bug will no 
longer occur in java 17 ?
   
   * The [mmap approach](https://github.com/apache/lucene/pull/592) is showing 
a stable speed up, and "too many codes" seems not the major reason that 
prevents inline, so maybe (as @uschindler said) it is prevented by the try 
catch block? I'll suggest continue on the mmap approach too for the speed up.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10168) Only test N-2 codecs nightly

2022-01-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476121#comment-17476121
 ] 

ASF subversion and git services commented on LUCENE-10168:
--

Commit 81171690b83b567eaad3514f55abce7ee6d971b7 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8117169 ]

LUCENE-10168: Fix typo that would _not_ run nightly tests.


> Only test N-2 codecs nightly
> 
>
> Key: LUCENE-10168
> URL: https://issues.apache.org/jira/browse/LUCENE-10168
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Recently it was decided to extend back compat *two* major releases, as a best 
> effort.
> The effort has not been the best, though.
> Test times have doubled (10 minutes vs 5 minutes on my computer), all the 
> slowness is in backwards codecs tests. And 9.0 isn't even released! Imagine 
> how slow the tests would be if we somehow made it to e.g. 9.10 release or 
> something.
> It is unsustainable. 
> This issue is to figure out a way to cut test times back to a number that is 
> reasonable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10168) Only test N-2 codecs nightly

2022-01-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476122#comment-17476122
 ] 

ASF subversion and git services commented on LUCENE-10168:
--

Commit 457367e9b78fc180cfec2c1041a5ac2f1c0caef3 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=457367e ]

LUCENE-10168: Fix typo that would _not_ run nightly tests.


> Only test N-2 codecs nightly
> 
>
> Key: LUCENE-10168
> URL: https://issues.apache.org/jira/browse/LUCENE-10168
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Recently it was decided to extend back compat *two* major releases, as a best 
> effort.
> The effort has not been the best, though.
> Test times have doubled (10 minutes vs 5 minutes on my computer), all the 
> slowness is in backwards codecs tests. And 9.0 isn't even released! Imagine 
> how slow the tests would be if we somehow made it to e.g. 9.10 release or 
> something.
> It is unsustainable. 
> This issue is to figure out a way to cut test times back to a number that is 
> reasonable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #602: LUCENE-10376: Roll up the loop in vint/vlong in DataInput

2022-01-14 Thread GitBox


uschindler commented on pull request #602:
URL: https://github.com/apache/lucene/pull/602#issuecomment-1013117133


   Thanks. So unrolling brings no difference and as the bug is gone, we can 
replace all of those implementations.
   
   About the ByteBufferIndexInput for MMapDirectory: This is really bad, we 
should maybe check the assembly code generated by Hotspot to figure out why it 
is not inlined.
   
   So let's keep the other issue open to figure out more. +1 to commit this. We 
should also backport to 9.x, but then we should benchmark on Java 11, too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #602: LUCENE-10376: Roll up the loop in vint/vlong in DataInput

2022-01-14 Thread GitBox


uschindler commented on a change in pull request #602:
URL: https://github.com/apache/lucene/pull/602#discussion_r784844870



##
File path: lucene/core/src/java/org/apache/lucene/store/ByteArrayDataInput.java
##
@@ -109,55 +110,20 @@ public long readLong() {
 
   @Override
   public int readVInt() {
-byte b = bytes[pos++];
-if (b >= 0) return b;
-int i = b & 0x7F;
-b = bytes[pos++];
-i |= (b & 0x7F) << 7;
-if (b >= 0) return i;
-b = bytes[pos++];
-i |= (b & 0x7F) << 14;
-if (b >= 0) return i;
-b = bytes[pos++];
-i |= (b & 0x7F) << 21;
-if (b >= 0) return i;
-b = bytes[pos++];
-// Warning: the next ands use 0x0F / 0xF0 - beware copy/paste errors:
-i |= (b & 0x0F) << 28;
-if ((b & 0xF0) == 0) return i;
-throw new RuntimeException("Invalid vInt detected (too many bits)");
+try {
+  return super.readVInt();
+} catch (IOException e) {
+  throw new RuntimeException(e);

Review comment:
   This should be an AssertionError. This can never happen, so if it 
happens the assertion that the super's vint method does not throw IOException 
was violated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10380) Move liveDocs null check outside the loops in FastTaxonomyFacetCounts#countAll

2022-01-14 Thread Greg Miller (Jira)
Greg Miller created LUCENE-10380:


 Summary: Move liveDocs null check outside the loops in 
FastTaxonomyFacetCounts#countAll
 Key: LUCENE-10380
 URL: https://issues.apache.org/jira/browse/LUCENE-10380
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Greg Miller


As another follow up to LUCENE-10374, let's try breaking out the null check on 
liveDocs without changing the loop construct. This is the other piece of 
LUCENE-10350 that we previously reverted due to a regression in the nightly 
bench. I suspect we can move liveDocs out as long as we don't change the loop 
construct.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10379) Count directly into the values array in FastTaxonomyFacetCounts#countAl

2022-01-14 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476216#comment-17476216
 ] 

Greg Miller commented on LUCENE-10379:
--

Looks like this change worked! The performance improvement carried over into 
the nightly benchmarks this time with no regressions (e.g., 
[https://home.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html).] 
So I strongly suspect the issue before (in LUCENE-10350) was changing the 
for-loop to a while-loop. I suspect the compiler is able to optimize the 
for-loop construct on the nightly benchmark hardware in a way that it can't 
with the while-loop (SIMD maybe?). I'm going to give this a try (created 
LUCENE-10380 to track).

> Count directly into the values array in FastTaxonomyFacetCounts#countAl
> ---
>
> Key: LUCENE-10379
> URL: https://issues.apache.org/jira/browse/LUCENE-10379
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This breaks out one of the two optimizations proposed in LUCENE-10350. As 
> part of trying to track down the regressions caused by LUCENE-10350, I 
> propose we push just this one optimization independently (see LUCENE-10374).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10379) Count directly into the values array in FastTaxonomyFacetCounts#countAl

2022-01-14 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476216#comment-17476216
 ] 

Greg Miller edited comment on LUCENE-10379 at 1/14/22, 3:51 PM:


Looks like this change worked! The performance improvement carried over into 
the nightly benchmarks this time with no regressions (e.g., 
[https://home.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html).] 
So I strongly suspect the issue before (in LUCENE-10350) was changing the 
for-loop to a while-loop. I suspect the compiler is able to optimize the 
for-loop construct on the nightly benchmark hardware in a way that it can't 
with the while-loop (???). I'm going to give this a try (created LUCENE-10380 
to track).


was (Author: gsmiller):
Looks like this change worked! The performance improvement carried over into 
the nightly benchmarks this time with no regressions (e.g., 
[https://home.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html).] 
So I strongly suspect the issue before (in LUCENE-10350) was changing the 
for-loop to a while-loop. I suspect the compiler is able to optimize the 
for-loop construct on the nightly benchmark hardware in a way that it can't 
with the while-loop (SIMD maybe?). I'm going to give this a try (created 
LUCENE-10380 to track).

> Count directly into the values array in FastTaxonomyFacetCounts#countAl
> ---
>
> Key: LUCENE-10379
> URL: https://issues.apache.org/jira/browse/LUCENE-10379
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This breaks out one of the two optimizations proposed in LUCENE-10350. As 
> part of trying to track down the regressions caused by LUCENE-10350, I 
> propose we push just this one optimization independently (see LUCENE-10374).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller opened a new pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


gsmiller opened a new pull request #606:
URL: https://github.com/apache/lucene/pull/606


   This change attempts to bring in the other piece of the LUCENE-10350 change 
without the regressions. See LUCENE-10374 for more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


rmuir commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r784962069



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   i'm also suspicious of making `count()` and `countAll()` bigger and 
bigger with all these specializations.
   
   I would recommend trying to factor out these little "accumulator" loops into 
separate methods. They could then be shared across `count()` and `countAll()`. 
At least when I looked at this stuff for solr DocValuesFacets, it was needed to 
get performance across the various specializations there (admittedly this was a 
while ago, maybe compiler is smarter now):
   
   You can see what I mean if you start here in this file and scroll down:
   
   
https://github.com/apache/solr/blob/0f3893b8e08c7aaa81addda926303f7a0c6ee18c/solr/core/src/java/org/apache/solr/request/DocValuesFacets.java#L274




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


rmuir commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r784962069



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   i'm also suspicious of making `count()` and `countAll()` bigger and 
bigger with all these specializations.
   
   I would recommend trying to factor out these little "accumulator" loops into 
separate methods. They could then be shared across `count()` and `countAll()`. 
At least when I looked at this stuff for solr DocValuesFacets, it was needed to 
get performance across the various specializations there (admittedly this was a 
while ago, maybe compiler is smarter now):
   
   You can see what I mean if you start here in this file and scroll down:
   
   
https://github.com/apache/solr/blob/0f3893b8e08c7aaa81addda926303f7a0c6ee18c/solr/core/src/java/org/apache/solr/request/DocValuesFacets.java#L262




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


gsmiller commented on pull request #606:
URL: https://github.com/apache/lucene/pull/606#issuecomment-1013386501


   Benchmarking this change locally shows no impact at all. So I don't think 
it's actually worth pushing this change unless we just want to isolate where 
the nightly benchmark runs are different (i.e., see if this change regresses in 
the nightly run). So if I were to merge this, it would just be to see the 
nightly benchmark results and then likely revert it back out since it just adds 
complexity with no apparent value. So I won't merge it righ tnow.
   
   ```
   TaskQPS baseline  StdDevQPS candidate  
StdDevPct diff p-value
  BrowseMonthSSDVFacets   15.90 (23.7%)   15.46 
(24.2%)   -2.7% ( -40% -   59%) 0.717
  BrowseMonthTaxoFacets   27.47 (26.8%)   27.05 
(26.5%)   -1.5% ( -43% -   70%) 0.858
BrowseRandomLabelSSDVFacets9.58  (7.0%)9.44  
(4.8%)   -1.4% ( -12% -   11%) 0.455
   OrHighNotLow 1265.12  (3.1%) 1251.37  
(2.6%)   -1.1% (  -6% -4%) 0.234
  OrNotHighHigh  765.18  (3.2%)  757.45  
(3.1%)   -1.0% (  -7% -5%) 0.311
   OrNotHighMed 1020.03  (3.0%) 1010.06  
(3.3%)   -1.0% (  -7% -5%) 0.327
 IntNRQ  221.28  (1.3%)  219.28  
(1.2%)   -0.9% (  -3% -1%) 0.019
  OrHighNotHigh  888.59  (3.8%)  880.64  
(3.7%)   -0.9% (  -8% -6%) 0.452
   OrNotHighLow  828.68  (2.5%)  821.68  
(1.7%)   -0.8% (  -4% -3%) 0.216
   MedTermDayTaxoFacets   31.65  (4.2%)   31.41  
(4.2%)   -0.7% (  -8% -7%) 0.574
   HighSloppyPhrase3.04  (4.7%)3.02  
(4.6%)   -0.7% (  -9% -9%) 0.656
   OrHighNotMed 1008.99  (3.8%) 1002.52  
(3.8%)   -0.6% (  -8% -7%) 0.597
BrowseRandomLabelTaxoFacets   17.52 (19.6%)   17.41 
(18.9%)   -0.6% ( -32% -   47%) 0.916
 OrHighHigh   17.58  (3.7%)   17.47  
(3.8%)   -0.6% (  -7% -7%) 0.600
  OrHighMed  137.10  (3.8%)  136.28  
(4.6%)   -0.6% (  -8% -8%) 0.655
LowSloppyPhrase   11.93  (3.8%)   11.88  
(3.8%)   -0.4% (  -7% -7%) 0.750
LowTerm 1631.30  (2.6%) 1625.23  
(2.3%)   -0.4% (  -5% -4%) 0.630
MedSloppyPhrase   67.24  (2.5%)   66.99  
(2.6%)   -0.4% (  -5% -4%) 0.649
 Fuzzy1   80.85  (1.6%)   80.68  
(1.8%)   -0.2% (  -3% -3%) 0.699
Respell   51.74  (1.5%)   51.65  
(1.7%)   -0.2% (  -3% -3%) 0.729
  OrHighLow  861.47  (2.9%)  860.91  
(3.0%)   -0.1% (  -5% -6%) 0.944
AndHighMedDayTaxoFacets  110.66  (1.4%)  110.63  
(1.7%)   -0.0% (  -3% -3%) 0.958
MedSpanNear   50.07  (3.7%)   50.08  
(3.0%)0.0% (  -6% -6%) 0.996
   HighTermTitleBDVSort   67.00 (23.3%)   67.01 
(17.4%)0.0% ( -32% -   53%) 0.998
   HighSpanNear   10.62  (3.7%)   10.62  
(2.9%)0.0% (  -6% -6%) 0.985
 Fuzzy2   71.03  (1.5%)   71.06  
(1.7%)0.0% (  -3% -3%) 0.953
   BrowseDateTaxoFacets   20.83 (22.2%)   20.84 
(22.8%)0.1% ( -36% -   57%) 0.992
  LowPhrase  604.18  (2.9%)  604.76  
(2.7%)0.1% (  -5% -5%) 0.914
  BrowseDayOfYearTaxoFacets   20.82 (22.4%)   20.85 
(23.0%)0.1% ( -36% -   58%) 0.984
MedIntervalsOrdered   79.55  (5.1%)   79.68  
(4.5%)0.2% (  -8% -   10%) 0.915
 HighPhrase  163.13  (2.8%)  163.42  
(2.6%)0.2% (  -5% -5%) 0.837
 OrHighMedDayTaxoFacets7.74  (5.3%)7.76  
(4.4%)0.2% (  -8% -   10%) 0.888
  BrowseDayOfYearSSDVFacets   12.12 (14.1%)   12.17 
(13.8%)0.4% ( -24% -   32%) 0.930
LowIntervalsOrdered  187.35  (8.7%)  188.22  
(7.6%)0.5% ( -14% -   18%) 0.857
MedTerm 2446.71  (4.1%) 2458.80  
(4.8%)0.5% (  -8% -9%) 0.728
 AndHighLow 1427.63  (2.7%) 1435.06  
(2.5%)0.5% (  -4% -5%) 0.527
   AndHighHighDayTaxoFacets9.02  (1.7%)9.06  
(2.4%)0.5% (  -3% -4%) 0.415
  MedPhrase 

[GitHub] [lucene] gsmiller commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


gsmiller commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r785091711



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   That's interesting. It's tricky without being able to reproduce that 
nightly benchmark regression locally, but I'll give it a shot. This change as I 
have it appears to have no performance impact at all locally, and since it just 
adds code complexity, it would be silly to move forward with it except as an 
academic exercise to try to figure out why the nightly benchmarks are 
regressing. That's interesting and may be worthwhile, but I'll experiment with 
your idea more before moving forward. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #605: LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll

2022-01-14 Thread GitBox


gsmiller commented on pull request #605:
URL: https://github.com/apache/lucene/pull/605#issuecomment-1013393369


   @gf2121 glad it worked out. Now I just wish we could understand what exactly 
was causing the regression with the other part of the change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


gsmiller commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r785150523



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   Hmm, breaking out separate methods sent qps tanking in my local 
benchmarks. Any thoughts @rmuir? Maybe I missed the mark on what you were 
suggesting (entirely possible)? Here's the change: 
https://github.com/apache/lucene/pull/606/commits/d084f857f568d650e64d5c40dd9b411d3ce11e85
   
   ```
   TaskQPS baseline  StdDevQPS candidate  
StdDevPct diff p-value
  BrowseMonthTaxoFacets   27.87 (23.7%)   11.73  
(1.3%)  -57.9% ( -67% -  -43%) 0.000
   BrowseDateTaxoFacets   21.90 (20.9%)   11.77  
(7.9%)  -46.2% ( -62% -  -21%) 0.000
  BrowseDayOfYearTaxoFacets   21.88 (21.1%)   11.83  
(8.1%)  -45.9% ( -62% -  -21%) 0.000
BrowseRandomLabelTaxoFacets   18.22 (17.8%)9.96  
(6.8%)  -45.3% ( -59% -  -25%) 0.000
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


rmuir commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r785161786



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   I would make these simple static methods.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


rmuir commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r785162808



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   See the solr example again, just like those methods there. Instance 
methods are probably no good in facets, there are many abstractions, probably 
just drives compiler more crazy. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


gsmiller commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r785195850



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   Thanks @rmuir. Just to make sure we're on the same page, the main 
difference is just moving to `static` methods right? I've made that change and 
still seeing serious regressions. Anything else I might be missing here? Thanks 
again for your thoughts!
   
   ```
   TaskQPS baseline  StdDevQPS candidate  
StdDevPct diff p-value
  BrowseMonthTaxoFacets   29.96 (14.4%)   12.03  
(1.6%)  -59.8% ( -66% -  -51%) 0.000
   BrowseDateTaxoFacets   22.78 (13.5%)   12.33  
(5.5%)  -45.9% ( -57% -  -31%) 0.000
  BrowseDayOfYearTaxoFacets   22.77 (13.8%)   12.41  
(5.6%)  -45.5% ( -57% -  -30%) 0.000
BrowseRandomLabelTaxoFacets   19.00 (10.7%)   10.41  
(5.1%)  -45.2% ( -55% -  -32%) 0.000
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


rmuir commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r785197508



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   we still have the issue of inconsistent loop types between while and for 
loops? Maybe now that the accumulators are shared, it becomes more of a problem?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops

2022-01-14 Thread GitBox


rmuir commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r785220678



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
   NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
   if (singleValued != null) {
-for (int doc = singleValued.nextDoc();
-doc != DocIdSetIterator.NO_MORE_DOCS;
-doc = singleValued.nextDoc()) {
-  if (liveDocs != null && liveDocs.get(doc) == false) {
-continue;
+if (liveDocs != null) {
+  for (int doc = singleValued.nextDoc();
+  doc != DocIdSetIterator.NO_MORE_DOCS;
+  doc = singleValued.nextDoc()) {
+if (liveDocs.get(doc)) {
+  values[(int) singleValued.longValue()]++;
+}
+  }
+} else {

Review comment:
   also, is there really a reason anymore to have count vs countAll? They 
look the same to me. The only difference is livedocs check which is shown to do 
nothing? So if we remove livedocs specialization, and remove count-vs-countAll 
specialization, it should start to be a bit more manageable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org