[GitHub] [lucene] uschindler commented on a diff in pull request #890: Detect CI builds and enable errorprone by default for those CI builds
uschindler commented on code in PR #890: URL: https://github.com/apache/lucene/pull/890#discussion_r873128714 ## lucene/CHANGES.txt: ## @@ -211,7 +211,9 @@ Build * Upgrade forbiddenapis to version 3.3. (Uwe Schindler) -* LUCENE-10532: Remove LuceneTestCase.Slow annotation. ALl tests can be fast. (Robert Muir) +* Detect CI builds on Github or Jenkins and enable errorprone. (Uwe Schindler, Dawid Weiss) Review Comment: I was not sure if this was already implemented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #890: Detect CI builds and enable errorprone by default for those CI builds
uschindler commented on code in PR #890: URL: https://github.com/apache/lucene/pull/890#discussion_r873138044 ## lucene/CHANGES.txt: ## @@ -211,7 +211,9 @@ Build * Upgrade forbiddenapis to version 3.3. (Uwe Schindler) -* LUCENE-10532: Remove LuceneTestCase.Slow annotation. ALl tests can be fast. (Robert Muir) +* Detect CI builds on Github or Jenkins and enable errorprone. (Uwe Schindler, Dawid Weiss) Review Comment: Done in fcc6de1a1f39c2aed4e69659b9e49976a8cc1626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #890: Detect CI builds and enable errorprone by default for those CI builds
mocobeta commented on code in PR #890: URL: https://github.com/apache/lucene/pull/890#discussion_r873140596 ## lucene/CHANGES.txt: ## @@ -211,7 +211,9 @@ Build * Upgrade forbiddenapis to version 3.3. (Uwe Schindler) -* LUCENE-10532: Remove LuceneTestCase.Slow annotation. ALl tests can be fast. (Robert Muir) +* Detect CI builds on Github or Jenkins and enable errorprone. (Uwe Schindler, Dawid Weiss) Review Comment: Ah yes, I think few people notice it... although I once announced that in the dev list. I think we'd need leading cases to extend it within devs (we sometimes encounter situations where a pull request is sufficient, I hope reducing minor annoyance in opening a Jira in such cases saves developers' time - and improve experiences a little). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #890: Detect CI builds and enable errorprone by default for those CI builds
mocobeta commented on code in PR #890: URL: https://github.com/apache/lucene/pull/890#discussion_r873140812 ## lucene/CHANGES.txt: ## @@ -211,7 +211,9 @@ Build * Upgrade forbiddenapis to version 3.3. (Uwe Schindler) -* LUCENE-10532: Remove LuceneTestCase.Slow annotation. ALl tests can be fast. (Robert Muir) +* Detect CI builds on Github or Jenkins and enable errorprone. (Uwe Schindler, Dawid Weiss) Review Comment: Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #884: LUCENE-10568: fix javadocs errors in IndexWriter.DocStats
mocobeta commented on code in PR #884: URL: https://github.com/apache/lucene/pull/884#discussion_r873143938 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -6198,14 +6198,15 @@ public synchronized DocStats getDocStats() { /** DocStats for this index */ public static final class DocStats { /** - * The total number of docs in this index, including docs not yet flushed (still in the RAM - * buffer), not counting deletions. + * The total number of docs in this index, counting docs not yet flushed (still in the RAM + * buffer), and counting deleted docs. NOTE: buffered deletions are not counted. If you Review Comment: Minor suggestion - `and also counting deleted docs` may be a bit clear as suggested in https://github.com/apache/lucene/pull/884#discussion_r871492434. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #884: LUCENE-10568: fix javadocs errors in IndexWriter.DocStats
mocobeta commented on code in PR #884: URL: https://github.com/apache/lucene/pull/884#discussion_r873144139 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -6199,13 +6199,14 @@ public synchronized DocStats getDocStats() { public static final class DocStats { /** * The total number of docs in this index, including docs not yet flushed (still in the RAM - * buffer), not counting deletions. + * buffer), and including deletions. NOTE: buffered deletions are not counted. If you + * really need these to be counted you should call {@link IndexWriter#commit()} first. */ public final int maxDoc; + /** * The total number of docs in this index, including docs not yet flushed (still in the RAM - * buffer), and including deletions. NOTE: buffered deletions are not counted. If you - * really need these to be counted you should call {@link IndexWriter#commit()} first. + * buffer), not counting deletions. Review Comment: @wormday Sorry for the delay in responding. Looks good to me. I left a minor comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #884: LUCENE-10568: fix javadocs errors in IndexWriter.DocStats
mocobeta commented on code in PR #884: URL: https://github.com/apache/lucene/pull/884#discussion_r873143938 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -6198,14 +6198,15 @@ public synchronized DocStats getDocStats() { /** DocStats for this index */ public static final class DocStats { /** - * The total number of docs in this index, including docs not yet flushed (still in the RAM - * buffer), not counting deletions. + * The total number of docs in this index, counting docs not yet flushed (still in the RAM + * buffer), and counting deleted docs. NOTE: buffered deletions are not counted. If you Review Comment: Minor suggestion - `and also counting deleted docs` may be a bit clearer as suggested in https://github.com/apache/lucene/pull/884#discussion_r871492434. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wormday commented on a diff in pull request #884: LUCENE-10568: fix javadocs errors in IndexWriter.DocStats
wormday commented on code in PR #884: URL: https://github.com/apache/lucene/pull/884#discussion_r873248454 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -6198,14 +6198,15 @@ public synchronized DocStats getDocStats() { /** DocStats for this index */ public static final class DocStats { /** - * The total number of docs in this index, including docs not yet flushed (still in the RAM - * buffer), not counting deletions. + * The total number of docs in this index, counting docs not yet flushed (still in the RAM + * buffer), and counting deleted docs. NOTE: buffered deletions are not counted. If you Review Comment: @mocobeta You're right! I have alerdy corrected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #884: LUCENE-10568: fix javadocs errors in IndexWriter.DocStats
mocobeta commented on PR #884: URL: https://github.com/apache/lucene/pull/884#issuecomment-1127136929 @wormday looks like the gradle check fails due to the invalid Javadocs formatting. Can you apply `gradlew tidy`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn opened a new pull request, #891: LUCENE-10411: (Backport) Add NN vectors support to ExitableDirectoryReader
zacharymorn opened a new pull request, #891: URL: https://github.com/apache/lucene/pull/891 Backporting https://github.com/apache/lucene/pull/833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn merged pull request #891: LUCENE-10411: (Backport) Add NN vectors support to ExitableDirectoryReader
zacharymorn merged PR #891: URL: https://github.com/apache/lucene/pull/891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10411) Add NN vectors support to ExitableDirectoryReader
[ https://issues.apache.org/jira/browse/LUCENE-10411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537300#comment-17537300 ] ASF subversion and git services commented on LUCENE-10411: -- Commit 916da0fdaab6ffecd98fc46f540d8c26a15960f2 in lucene's branch refs/heads/branch_9x from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=916da0fdaab ] LUCENE-10411: Add NN vectors support to ExitableDirectoryReader (#833) (#891) (cherry picked from commit 96036bca9f667edbdc528bfe95eeb2795526e9fa) > Add NN vectors support to ExitableDirectoryReader > - > > Key: LUCENE-10411 > URL: https://issues.apache.org/jira/browse/LUCENE-10411 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > This is currently unsupported. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader
zacharymorn commented on PR #833: URL: https://github.com/apache/lucene/pull/833#issuecomment-1127149687 > @zacharymorn do you plan to backport this to `branch_9x`? It'd be nice to do that soon as the 9.2 release is coming up (and we will be cutting a new branch any day now). Hi @jtibshirani, sorry for the delay on this! I have created and merged the backporting PR into `branch_9x`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wormday commented on pull request #884: LUCENE-10568: fix javadocs errors in IndexWriter.DocStats
wormday commented on PR #884: URL: https://github.com/apache/lucene/pull/884#issuecomment-1127161146 @mocobeta I'll pay attention next time, sorry to inconvenience you. I submitted the code again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #884: LUCENE-10568: fix javadocs errors in IndexWriter.DocStats
mocobeta commented on PR #884: URL: https://github.com/apache/lucene/pull/884#issuecomment-1127210543 No problem, CI workflows report problems on behalf of humans. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta merged pull request #884: LUCENE-10568: fix javadocs errors in IndexWriter.DocStats
mocobeta merged PR #884: URL: https://github.com/apache/lucene/pull/884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10568) Javadocs errors in IndexWriter.DocStats
[ https://issues.apache.org/jira/browse/LUCENE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537332#comment-17537332 ] ASF subversion and git services commented on LUCENE-10568: -- Commit de8a6998d7e9277632f1621f10150f338202383d in lucene's branch refs/heads/main from sunwq [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=de8a6998d7e ] LUCENE-10568: fix javadocs errors in IndexWriter.DocStats (#884) > Javadocs errors in IndexWriter.DocStats > --- > > Key: LUCENE-10568 > URL: https://issues.apache.org/jira/browse/LUCENE-10568 > Project: Lucene - Core > Issue Type: Bug > Components: general/javadocs >Affects Versions: 8.0, 9.1 >Reporter: sun wuqiang >Priority: Trivial > Attachments: Image 007.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > *org.apache.lucene.index.IndexWriter.DocStats* > This class has two fields > The field maxDoc should contain numDeletedDocs, and numDocs does not contain > numDeletedDocs > However, the javadocs are just the opposite. > !Image 007.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10568) Javadocs errors in IndexWriter.DocStats
[ https://issues.apache.org/jira/browse/LUCENE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537333#comment-17537333 ] ASF subversion and git services commented on LUCENE-10568: -- Commit 47f960ab8c735dacaa322b0ffeab51931536f32f in lucene's branch refs/heads/branch_9x from sunwq [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=47f960ab8c7 ] LUCENE-10568: fix javadocs errors in IndexWriter.DocStats (#884) > Javadocs errors in IndexWriter.DocStats > --- > > Key: LUCENE-10568 > URL: https://issues.apache.org/jira/browse/LUCENE-10568 > Project: Lucene - Core > Issue Type: Bug > Components: general/javadocs >Affects Versions: 8.0, 9.1 >Reporter: sun wuqiang >Priority: Trivial > Attachments: Image 007.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > *org.apache.lucene.index.IndexWriter.DocStats* > This class has two fields > The field maxDoc should contain numDeletedDocs, and numDocs does not contain > numDeletedDocs > However, the javadocs are just the opposite. > !Image 007.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10568) Javadocs errors in IndexWriter.DocStats
[ https://issues.apache.org/jira/browse/LUCENE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida resolved LUCENE-10568. Fix Version/s: 10.0 (main) 9.2 Resolution: Fixed > Javadocs errors in IndexWriter.DocStats > --- > > Key: LUCENE-10568 > URL: https://issues.apache.org/jira/browse/LUCENE-10568 > Project: Lucene - Core > Issue Type: Bug > Components: general/javadocs >Affects Versions: 8.0, 9.1 >Reporter: sun wuqiang >Priority: Trivial > Fix For: 10.0 (main), 9.2 > > Attachments: Image 007.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > *org.apache.lucene.index.IndexWriter.DocStats* > This class has two fields > The field maxDoc should contain numDeletedDocs, and numDocs does not contain > numDeletedDocs > However, the javadocs are just the opposite. > !Image 007.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r873326117 ## lucene/facet/src/java/org/apache/lucene/facet/hyperrectangle/DoubleHyperRectangle.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.hyperrectangle; + +import org.apache.lucene.util.NumericUtils; + +/** Stores a hyper rectangle as an array of DoubleRangePairs */ +public class DoubleHyperRectangle extends HyperRectangle { + + /** Stores pair as LongRangePair */ + private final DoubleRangePair[] pairs; + + /** Created DoubleHyperRectangle */ + public DoubleHyperRectangle(String label, DoubleRangePair... pairs) { +super(label, pairs.length); +this.pairs = pairs; + } + + @Override + public LongHyperRectangle.LongRangePair getComparableDimRange(int dim) { +long longMin = NumericUtils.doubleToSortableLong(pairs[dim].min); +long longMax = NumericUtils.doubleToSortableLong(pairs[dim].max); +return new LongHyperRectangle.LongRangePair(longMin, true, longMax, true); + } + + /** Defines a single range in a DoubleHyperRectangle */ + public static class DoubleRangePair { +/** Inclusive min */ +public final double min; + +/** Inclusive max */ +public final double max; + +/** + * Creates a LongRangePair, very similar to the constructor of {@link + * org.apache.lucene.facet.range.DoubleRange} + * + * @param minIn Min value of pair + * @param minInclusive If minIn is inclusive + * @param maxIn Max value of pair + * @param maxInclusive If maxIn is inclusive + */ +public DoubleRangePair(double minIn, boolean minInclusive, double maxIn, boolean maxInclusive) { + if (Double.isNaN(minIn)) { +throw new IllegalArgumentException("min cannot be NaN"); + } + if (!minInclusive) { +minIn = Math.nextUp(minIn); + } + + if (Double.isNaN(maxIn)) { Review Comment: Move this check before `is (!minInclusive)` and I suggest unifying both checks like this: ``` if (Double.isNaN(minIn) || Double.isNaN(maxIn)) { throw new IllegalArgumentException("min and max cannot be NaN: min=" + minIn + ", max=" + maxIn); } ``` ## lucene/facet/src/java/org/apache/lucene/facet/hyperrectangle/HyperRectangleFacetCounts.java: ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.hyperrectangle; + +import java.io.IOException; +import java.util.Collections; +import java.util.List; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.Facets; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.LabelAndValue; +import org.apache.lucene.index.BinaryDocValues; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.search.DocIdSetIterator; + +/** Get counts given a list of HyperRectangles (which must be of the same type) */ +public class HyperRectangleFacetCounts extends Facets { + /** Hypper rectangles passed to constructor. */ + protected final HyperRectangle[] hyperRectangles; + + /** Counts, initialized in by subclass. */ Review Comment: Either remove "in" or "by", I think one of them is redundant ## lucene/facet/src/test/org/apache/lucene/facet/hyperrectangle/TestHyperRectangleFacetCounts.java: ## @@ -0,0 +1,189 @@ +/* + *
[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537353#comment-17537353 ] Dawid Weiss commented on LUCENE-10572: -- As much as I love BE (long live M68k), I think it's practically dead, so I think LE is a fine choice. > Ever tried to type a single word that is 128 chars long? One thing I'd be afraid of is that users index all sorts of non-language tokens and these can grow longer than the default of 128 chars. I have implemented a similar byte-fragment storage class in the past without using explicit length fragments at all - the difference in consecutive element offsets was used to compute the length. This does have potential drawbacks but it was fast in practice. I can dig it out from the closet if you like. > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this crazy hotspot? > Or maybe this is a "healthy" hotspot and we should leave it be! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537354#comment-17537354 ] Dawid Weiss commented on LUCENE-10572: -- This is the issue I filed it under, actually - note it's old, old... but the ideas may be worth revisiting. https://issues.apache.org/jira/browse/LUCENE-5854 > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this crazy hotspot? > Or maybe this is a "healthy" hotspot and we should leave it be! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org