[GitHub] [lucene] gf2121 commented on pull request #605: LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll
gf2121 commented on pull request #605: URL: https://github.com/apache/lucene/pull/605#issuecomment-1012923242 I'm seeing BrowseMonthTaxoFacets increased 30% without any regression in [last night benchmark](https://home.apache.org/~mikemccand/lucenebench/2022.01.13.18.03.24.html), Thanks @gsmiller for such a great idea ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 edited a comment on pull request #602: LUCENE-10376: Roll up the loop in vint/vlong in DataInput
gf2121 edited a comment on pull request #602: URL: https://github.com/apache/lucene/pull/602#issuecomment-1012836713 FYI here are some of my thoughts based on all these benchmark reports: * Rolling up loops for vint/vlong seems not bring a significant speed up (neither significant regression), IMO we should move on because the bug will no longer occur in java 17 ? * The [mmap approach](https://github.com/apache/lucene/pull/592) is showing a stable speed up, and "too many codes" seems not the major reason that prevents inline, so maybe (as @uschindler said) it is prevented by the try catch block? I'll suggest continue on the mmap approach too for the speed up, but maybe we can also loop the vint/vlong in `ByteBuffereIndexInput`? I'll benchmark on looped `ByteBuffereIndexInput#vint`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 edited a comment on pull request #541: LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil
gf2121 edited a comment on pull request #541: URL: https://github.com/apache/lucene/pull/541#issuecomment-1012163438 Thanks @iverase ! > I remember in the approach I tried I was batching the docIds by a number (128 or 256) but in general tI didn't see much better performance comparing to the added complexity. I saw the benchmark result in https://github.com/apache/lucene-solr/pull/1538. It is similar to the benchmark result mentioned above: around 20% QPS increase for 1D high cardinality cases. But it seems like the benchmark did not cover all use cases. There can be some cases more tempted for this approach, e.g. low cardinality fields (It would be great if you can take a look at [here](https://github.com/apache/lucene/pull/541#issue-1080062490)). I proposed a BitSet approach several days ago to solve this but that's not enough since the optimization can hardly be triggered when cardinality comes to 32+. > I don't like that the optimisation only works for a specific number of points, it feels trappy. +1, I made a [change](https://github.com/apache/lucene/pull/541/commits/ccdae2f6920abff52abf6aae42d65b33861d4cf6) to make the BKDForutil flexible and benchmark seems the performance doesn't show a significant regression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on pull request #603: LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator()
romseygeek commented on pull request #603: URL: https://github.com/apache/lucene/pull/603#issuecomment-1012964049 > I think you have already done this removal in this PR, haven't you? Yes, sorry I meant to comment to call it out - it turned out to be a really easy change so I folded it in here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10377) Replace 'sortPos' parameter in SortField.getComparator()
[ https://issues.apache.org/jira/browse/LUCENE-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476049#comment-17476049 ] Alan Woodward commented on LUCENE-10377: Given that this change is on an internal API marked as `experimental` and that it fixes a bug in CheckIndex I think it's OK to commit this for 9.1? > Replace 'sortPos' parameter in SortField.getComparator() > > > Key: LUCENE-10377 > URL: https://issues.apache.org/jira/browse/LUCENE-10377 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > SortField.getComparator() takes two parameters: the number of hits we are > collecting, so that the comparator can size its internal priority queue; and > the position of this sortfield in the overall sort. This second parameter is > only actually used to determine whether or not the SortField should enable > various skipping operations, building comparative iterators etc. However, > it's not clear from the API that this is why it is passed, and as a > consequence all sorts of consumers of this API faithfully pass through their > sort positions even when they aren't actually using skipping at all. In > particular, CheckIndex does this when checking index sorts, and this in fact > leads to a bug, where it will attempt to build skipping structures over > fields from 8x-created indexes that did not have the same type constraints > that 9x indexes do and incorrectly throw an error. > I think we should replace this second parameter with a simple boolean, > `enableSkipping`, that the TopHits collectors can pass as `true` for the > first sortfield. Other APIs that use sorts but not skipping, like CheckIndex > or the grouping collectors, can always pass `false` so we don't waste time > building skipping structures that never get used. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 edited a comment on pull request #602: LUCENE-10376: Roll up the loop in vint/vlong in DataInput
gf2121 edited a comment on pull request #602: URL: https://github.com/apache/lucene/pull/602#issuecomment-1012836713 FYI here are some of my thoughts based on all these benchmark reports: * Rolling up loops for vint/vlong seems not bring a significant speed up (neither significant regression), IMO we should move on because the bug will no longer occur in java 17 ? * The [mmap approach](https://github.com/apache/lucene/pull/592) is showing a stable speed up, and "too many codes" seems not the major reason that prevents inline, so maybe (as @uschindler said) it is prevented by the try catch block? I'll suggest continue on the mmap approach too for the speed up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10168) Only test N-2 codecs nightly
[ https://issues.apache.org/jira/browse/LUCENE-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476121#comment-17476121 ] ASF subversion and git services commented on LUCENE-10168: -- Commit 81171690b83b567eaad3514f55abce7ee6d971b7 in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8117169 ] LUCENE-10168: Fix typo that would _not_ run nightly tests. > Only test N-2 codecs nightly > > > Key: LUCENE-10168 > URL: https://issues.apache.org/jira/browse/LUCENE-10168 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Recently it was decided to extend back compat *two* major releases, as a best > effort. > The effort has not been the best, though. > Test times have doubled (10 minutes vs 5 minutes on my computer), all the > slowness is in backwards codecs tests. And 9.0 isn't even released! Imagine > how slow the tests would be if we somehow made it to e.g. 9.10 release or > something. > It is unsustainable. > This issue is to figure out a way to cut test times back to a number that is > reasonable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10168) Only test N-2 codecs nightly
[ https://issues.apache.org/jira/browse/LUCENE-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476122#comment-17476122 ] ASF subversion and git services commented on LUCENE-10168: -- Commit 457367e9b78fc180cfec2c1041a5ac2f1c0caef3 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=457367e ] LUCENE-10168: Fix typo that would _not_ run nightly tests. > Only test N-2 codecs nightly > > > Key: LUCENE-10168 > URL: https://issues.apache.org/jira/browse/LUCENE-10168 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Recently it was decided to extend back compat *two* major releases, as a best > effort. > The effort has not been the best, though. > Test times have doubled (10 minutes vs 5 minutes on my computer), all the > slowness is in backwards codecs tests. And 9.0 isn't even released! Imagine > how slow the tests would be if we somehow made it to e.g. 9.10 release or > something. > It is unsustainable. > This issue is to figure out a way to cut test times back to a number that is > reasonable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #602: LUCENE-10376: Roll up the loop in vint/vlong in DataInput
uschindler commented on pull request #602: URL: https://github.com/apache/lucene/pull/602#issuecomment-1013117133 Thanks. So unrolling brings no difference and as the bug is gone, we can replace all of those implementations. About the ByteBufferIndexInput for MMapDirectory: This is really bad, we should maybe check the assembly code generated by Hotspot to figure out why it is not inlined. So let's keep the other issue open to figure out more. +1 to commit this. We should also backport to 9.x, but then we should benchmark on Java 11, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #602: LUCENE-10376: Roll up the loop in vint/vlong in DataInput
uschindler commented on a change in pull request #602: URL: https://github.com/apache/lucene/pull/602#discussion_r784844870 ## File path: lucene/core/src/java/org/apache/lucene/store/ByteArrayDataInput.java ## @@ -109,55 +110,20 @@ public long readLong() { @Override public int readVInt() { -byte b = bytes[pos++]; -if (b >= 0) return b; -int i = b & 0x7F; -b = bytes[pos++]; -i |= (b & 0x7F) << 7; -if (b >= 0) return i; -b = bytes[pos++]; -i |= (b & 0x7F) << 14; -if (b >= 0) return i; -b = bytes[pos++]; -i |= (b & 0x7F) << 21; -if (b >= 0) return i; -b = bytes[pos++]; -// Warning: the next ands use 0x0F / 0xF0 - beware copy/paste errors: -i |= (b & 0x0F) << 28; -if ((b & 0xF0) == 0) return i; -throw new RuntimeException("Invalid vInt detected (too many bits)"); +try { + return super.readVInt(); +} catch (IOException e) { + throw new RuntimeException(e); Review comment: This should be an AssertionError. This can never happen, so if it happens the assertion that the super's vint method does not throw IOException was violated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10380) Move liveDocs null check outside the loops in FastTaxonomyFacetCounts#countAll
Greg Miller created LUCENE-10380: Summary: Move liveDocs null check outside the loops in FastTaxonomyFacetCounts#countAll Key: LUCENE-10380 URL: https://issues.apache.org/jira/browse/LUCENE-10380 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Greg Miller As another follow up to LUCENE-10374, let's try breaking out the null check on liveDocs without changing the loop construct. This is the other piece of LUCENE-10350 that we previously reverted due to a regression in the nightly bench. I suspect we can move liveDocs out as long as we don't change the loop construct. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10379) Count directly into the values array in FastTaxonomyFacetCounts#countAl
[ https://issues.apache.org/jira/browse/LUCENE-10379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476216#comment-17476216 ] Greg Miller commented on LUCENE-10379: -- Looks like this change worked! The performance improvement carried over into the nightly benchmarks this time with no regressions (e.g., [https://home.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html).] So I strongly suspect the issue before (in LUCENE-10350) was changing the for-loop to a while-loop. I suspect the compiler is able to optimize the for-loop construct on the nightly benchmark hardware in a way that it can't with the while-loop (SIMD maybe?). I'm going to give this a try (created LUCENE-10380 to track). > Count directly into the values array in FastTaxonomyFacetCounts#countAl > --- > > Key: LUCENE-10379 > URL: https://issues.apache.org/jira/browse/LUCENE-10379 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Greg Miller >Assignee: Greg Miller >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > This breaks out one of the two optimizations proposed in LUCENE-10350. As > part of trying to track down the regressions caused by LUCENE-10350, I > propose we push just this one optimization independently (see LUCENE-10374). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10379) Count directly into the values array in FastTaxonomyFacetCounts#countAl
[ https://issues.apache.org/jira/browse/LUCENE-10379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476216#comment-17476216 ] Greg Miller edited comment on LUCENE-10379 at 1/14/22, 3:51 PM: Looks like this change worked! The performance improvement carried over into the nightly benchmarks this time with no regressions (e.g., [https://home.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html).] So I strongly suspect the issue before (in LUCENE-10350) was changing the for-loop to a while-loop. I suspect the compiler is able to optimize the for-loop construct on the nightly benchmark hardware in a way that it can't with the while-loop (???). I'm going to give this a try (created LUCENE-10380 to track). was (Author: gsmiller): Looks like this change worked! The performance improvement carried over into the nightly benchmarks this time with no regressions (e.g., [https://home.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html).] So I strongly suspect the issue before (in LUCENE-10350) was changing the for-loop to a while-loop. I suspect the compiler is able to optimize the for-loop construct on the nightly benchmark hardware in a way that it can't with the while-loop (SIMD maybe?). I'm going to give this a try (created LUCENE-10380 to track). > Count directly into the values array in FastTaxonomyFacetCounts#countAl > --- > > Key: LUCENE-10379 > URL: https://issues.apache.org/jira/browse/LUCENE-10379 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Greg Miller >Assignee: Greg Miller >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > This breaks out one of the two optimizations proposed in LUCENE-10350. As > part of trying to track down the regressions caused by LUCENE-10350, I > propose we push just this one optimization independently (see LUCENE-10374). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
gsmiller opened a new pull request #606: URL: https://github.com/apache/lucene/pull/606 This change attempts to bring in the other piece of the LUCENE-10350 change without the regressions. See LUCENE-10374 for more details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
rmuir commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r784962069 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: i'm also suspicious of making `count()` and `countAll()` bigger and bigger with all these specializations. I would recommend trying to factor out these little "accumulator" loops into separate methods. They could then be shared across `count()` and `countAll()`. At least when I looked at this stuff for solr DocValuesFacets, it was needed to get performance across the various specializations there (admittedly this was a while ago, maybe compiler is smarter now): You can see what I mean if you start here in this file and scroll down: https://github.com/apache/solr/blob/0f3893b8e08c7aaa81addda926303f7a0c6ee18c/solr/core/src/java/org/apache/solr/request/DocValuesFacets.java#L274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
rmuir commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r784962069 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: i'm also suspicious of making `count()` and `countAll()` bigger and bigger with all these specializations. I would recommend trying to factor out these little "accumulator" loops into separate methods. They could then be shared across `count()` and `countAll()`. At least when I looked at this stuff for solr DocValuesFacets, it was needed to get performance across the various specializations there (admittedly this was a while ago, maybe compiler is smarter now): You can see what I mean if you start here in this file and scroll down: https://github.com/apache/solr/blob/0f3893b8e08c7aaa81addda926303f7a0c6ee18c/solr/core/src/java/org/apache/solr/request/DocValuesFacets.java#L262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
gsmiller commented on pull request #606: URL: https://github.com/apache/lucene/pull/606#issuecomment-1013386501 Benchmarking this change locally shows no impact at all. So I don't think it's actually worth pushing this change unless we just want to isolate where the nightly benchmark runs are different (i.e., see if this change regresses in the nightly run). So if I were to merge this, it would just be to see the nightly benchmark results and then likely revert it back out since it just adds complexity with no apparent value. So I won't merge it righ tnow. ``` TaskQPS baseline StdDevQPS candidate StdDevPct diff p-value BrowseMonthSSDVFacets 15.90 (23.7%) 15.46 (24.2%) -2.7% ( -40% - 59%) 0.717 BrowseMonthTaxoFacets 27.47 (26.8%) 27.05 (26.5%) -1.5% ( -43% - 70%) 0.858 BrowseRandomLabelSSDVFacets9.58 (7.0%)9.44 (4.8%) -1.4% ( -12% - 11%) 0.455 OrHighNotLow 1265.12 (3.1%) 1251.37 (2.6%) -1.1% ( -6% -4%) 0.234 OrNotHighHigh 765.18 (3.2%) 757.45 (3.1%) -1.0% ( -7% -5%) 0.311 OrNotHighMed 1020.03 (3.0%) 1010.06 (3.3%) -1.0% ( -7% -5%) 0.327 IntNRQ 221.28 (1.3%) 219.28 (1.2%) -0.9% ( -3% -1%) 0.019 OrHighNotHigh 888.59 (3.8%) 880.64 (3.7%) -0.9% ( -8% -6%) 0.452 OrNotHighLow 828.68 (2.5%) 821.68 (1.7%) -0.8% ( -4% -3%) 0.216 MedTermDayTaxoFacets 31.65 (4.2%) 31.41 (4.2%) -0.7% ( -8% -7%) 0.574 HighSloppyPhrase3.04 (4.7%)3.02 (4.6%) -0.7% ( -9% -9%) 0.656 OrHighNotMed 1008.99 (3.8%) 1002.52 (3.8%) -0.6% ( -8% -7%) 0.597 BrowseRandomLabelTaxoFacets 17.52 (19.6%) 17.41 (18.9%) -0.6% ( -32% - 47%) 0.916 OrHighHigh 17.58 (3.7%) 17.47 (3.8%) -0.6% ( -7% -7%) 0.600 OrHighMed 137.10 (3.8%) 136.28 (4.6%) -0.6% ( -8% -8%) 0.655 LowSloppyPhrase 11.93 (3.8%) 11.88 (3.8%) -0.4% ( -7% -7%) 0.750 LowTerm 1631.30 (2.6%) 1625.23 (2.3%) -0.4% ( -5% -4%) 0.630 MedSloppyPhrase 67.24 (2.5%) 66.99 (2.6%) -0.4% ( -5% -4%) 0.649 Fuzzy1 80.85 (1.6%) 80.68 (1.8%) -0.2% ( -3% -3%) 0.699 Respell 51.74 (1.5%) 51.65 (1.7%) -0.2% ( -3% -3%) 0.729 OrHighLow 861.47 (2.9%) 860.91 (3.0%) -0.1% ( -5% -6%) 0.944 AndHighMedDayTaxoFacets 110.66 (1.4%) 110.63 (1.7%) -0.0% ( -3% -3%) 0.958 MedSpanNear 50.07 (3.7%) 50.08 (3.0%)0.0% ( -6% -6%) 0.996 HighTermTitleBDVSort 67.00 (23.3%) 67.01 (17.4%)0.0% ( -32% - 53%) 0.998 HighSpanNear 10.62 (3.7%) 10.62 (2.9%)0.0% ( -6% -6%) 0.985 Fuzzy2 71.03 (1.5%) 71.06 (1.7%)0.0% ( -3% -3%) 0.953 BrowseDateTaxoFacets 20.83 (22.2%) 20.84 (22.8%)0.1% ( -36% - 57%) 0.992 LowPhrase 604.18 (2.9%) 604.76 (2.7%)0.1% ( -5% -5%) 0.914 BrowseDayOfYearTaxoFacets 20.82 (22.4%) 20.85 (23.0%)0.1% ( -36% - 58%) 0.984 MedIntervalsOrdered 79.55 (5.1%) 79.68 (4.5%)0.2% ( -8% - 10%) 0.915 HighPhrase 163.13 (2.8%) 163.42 (2.6%)0.2% ( -5% -5%) 0.837 OrHighMedDayTaxoFacets7.74 (5.3%)7.76 (4.4%)0.2% ( -8% - 10%) 0.888 BrowseDayOfYearSSDVFacets 12.12 (14.1%) 12.17 (13.8%)0.4% ( -24% - 32%) 0.930 LowIntervalsOrdered 187.35 (8.7%) 188.22 (7.6%)0.5% ( -14% - 18%) 0.857 MedTerm 2446.71 (4.1%) 2458.80 (4.8%)0.5% ( -8% -9%) 0.728 AndHighLow 1427.63 (2.7%) 1435.06 (2.5%)0.5% ( -4% -5%) 0.527 AndHighHighDayTaxoFacets9.02 (1.7%)9.06 (2.4%)0.5% ( -3% -4%) 0.415 MedPhrase
[GitHub] [lucene] gsmiller commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
gsmiller commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r785091711 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: That's interesting. It's tricky without being able to reproduce that nightly benchmark regression locally, but I'll give it a shot. This change as I have it appears to have no performance impact at all locally, and since it just adds code complexity, it would be silly to move forward with it except as an academic exercise to try to figure out why the nightly benchmarks are regressing. That's interesting and may be worthwhile, but I'll experiment with your idea more before moving forward. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #605: LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll
gsmiller commented on pull request #605: URL: https://github.com/apache/lucene/pull/605#issuecomment-1013393369 @gf2121 glad it worked out. Now I just wish we could understand what exactly was causing the regression with the other part of the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
gsmiller commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r785150523 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: Hmm, breaking out separate methods sent qps tanking in my local benchmarks. Any thoughts @rmuir? Maybe I missed the mark on what you were suggesting (entirely possible)? Here's the change: https://github.com/apache/lucene/pull/606/commits/d084f857f568d650e64d5c40dd9b411d3ce11e85 ``` TaskQPS baseline StdDevQPS candidate StdDevPct diff p-value BrowseMonthTaxoFacets 27.87 (23.7%) 11.73 (1.3%) -57.9% ( -67% - -43%) 0.000 BrowseDateTaxoFacets 21.90 (20.9%) 11.77 (7.9%) -46.2% ( -62% - -21%) 0.000 BrowseDayOfYearTaxoFacets 21.88 (21.1%) 11.83 (8.1%) -45.9% ( -62% - -21%) 0.000 BrowseRandomLabelTaxoFacets 18.22 (17.8%)9.96 (6.8%) -45.3% ( -59% - -25%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
rmuir commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r785161786 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: I would make these simple static methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
rmuir commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r785162808 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: See the solr example again, just like those methods there. Instance methods are probably no good in facets, there are many abstractions, probably just drives compiler more crazy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
gsmiller commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r785195850 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: Thanks @rmuir. Just to make sure we're on the same page, the main difference is just moving to `static` methods right? I've made that change and still seeing serious regressions. Anything else I might be missing here? Thanks again for your thoughts! ``` TaskQPS baseline StdDevQPS candidate StdDevPct diff p-value BrowseMonthTaxoFacets 29.96 (14.4%) 12.03 (1.6%) -59.8% ( -66% - -51%) 0.000 BrowseDateTaxoFacets 22.78 (13.5%) 12.33 (5.5%) -45.9% ( -57% - -31%) 0.000 BrowseDayOfYearTaxoFacets 22.77 (13.8%) 12.41 (5.6%) -45.5% ( -57% - -30%) 0.000 BrowseRandomLabelTaxoFacets 19.00 (10.7%) 10.41 (5.1%) -45.2% ( -55% - -32%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
rmuir commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r785197508 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: we still have the issue of inconsistent loop types between while and for loops? Maybe now that the accumulators are shared, it becomes more of a problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #606: LUCENE-10380: Further optimize FastTaxonomyFacetCounts#countAll by moving the liveDocs null check outside the loops
rmuir commented on a change in pull request #606: URL: https://github.com/apache/lucene/pull/606#discussion_r785220678 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java ## @@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws IOException { NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued); if (singleValued != null) { -for (int doc = singleValued.nextDoc(); -doc != DocIdSetIterator.NO_MORE_DOCS; -doc = singleValued.nextDoc()) { - if (liveDocs != null && liveDocs.get(doc) == false) { -continue; +if (liveDocs != null) { + for (int doc = singleValued.nextDoc(); + doc != DocIdSetIterator.NO_MORE_DOCS; + doc = singleValued.nextDoc()) { +if (liveDocs.get(doc)) { + values[(int) singleValued.longValue()]++; +} + } +} else { Review comment: also, is there really a reason anymore to have count vs countAll? They look the same to me. The only difference is livedocs check which is shown to do nothing? So if we remove livedocs specialization, and remove count-vs-countAll specialization, it should start to be a bit more manageable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org