[GitHub] [lucene] jpountz commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
jpountz commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r909281863 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Comparator; +import java.util.LinkedList; +import java.util.List; + +/** Scorer implementing Block-Max Maxscore algorithm */ +public class BlockMaxMaxscoreScorer extends Scorer { + // current doc ID of the leads + private int doc; + + // doc id boundary that all scorers maxScore are valid + private int upTo = -1; Review Comment: Nit: it's inconsistent that `upTo` gets initialized here while `doc` is initialized in the constructor. ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Comparator; +import java.util.LinkedList; +import java.util.List; + +/** Scorer implementing Block-Max Maxscore algorithm */ +public class BlockMaxMaxscoreScorer extends Scorer { + // current doc ID of the leads + private int doc; + + // doc id boundary that all scorers maxScore are valid + private int upTo = -1; + + // heap of scorers ordered by doc ID + private final DisiPriorityQueue essentialsScorers; + // list of scorers ordered by maxScore + private final LinkedList maxScoreSortedEssentialScorers; + + private final DisiWrapper[] allScorers; + + // sum of max scores of scorers in nonEssentialScorers list + private float nonEssentialMaxScoreSum; + + private long cost; + + private final MaxScoreSumPropagator maxScoreSumPropagator; + + // scaled min competitive score + private float minCompetitiveScore = 0; + + private int cachedScoredDoc = -1; + private float cachedScore = 0; + + /** + * Constructs a Scorer that scores doc based on Block-Max-Maxscore (BMM) algorithm + * http://engineering.nyu.edu/~suel/papers/bmm.pdf . This algorithm has lower overhead compared to + * WANDScorer, and could be used for simple disjunction queries. + * + * @param weight The weight to be used. + * @param scorers The sub scorers this Scorer should iterate on for optional clauses + */ + public BlockMaxMaxscoreScorer(Weight weight, List scorers) throws IOException { +super(weight); + +this.doc = -1; +this.allScorers = new DisiWrapper[scorers.size()]; +this.essentialsScorers = new DisiPriorityQueue(scorers.size()); +this.maxScoreSortedEssentialScorers = new LinkedList<>(); + +long cost = 0; +for (int i = 0; i < scorers.size(); i++) { + DisiWrapper w = new DisiWrapper(scorers.get(i)); + cost += w.cost; + allScorers[i] = w; +} + +this.cost = cost; +maxScoreSumPropagator = new MaxScoreSumPropagator(scorers); + } + + @Override + public DocIdSetIterator iterator() { +// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee +return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator()); + } + + @Override + public TwoPhaseIterator twoPhaseIterator() { +DocIdSetIterator approximation = +new D
[jira] [Created] (LUCENE-10630) error: 'gmtime' was not declared in this scope; did you mean 'getTime'?
Title: Message Title Martin Liška created an issue Lucene - Core / LUCENE-10630 error: 'gmtime' was not declared in this scope; did you mean 'getTime'? Issue Type: Bug Assignee: Unassigned Created: 29/Jun/22 08:57 Priority: Major Reporter: Martin Liška Happens with GCC 13 or with the current GCC-12 branch: cd /home/abuild/rpmbuild/BUILD/clucene-core-2.3.3.4/build/src/core && /usr/bin/c++ -DMAKE_CLUCENE_CORE_LIB -Dclucene_core_EXPORTS -I/home/abuild/rpmbuild/BUILD/clucene-core-2.3.3.4/src/shared -I/home/abuild/rpmbuild/BUILD/clucene-core-2.3.3.4/build/src/shared -I/home/abuild/rpmbuild/BUILD/clucene-core-2.3.3.4/src/core -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -fPIC -ansi -O2 -g -DNDEBUG -fPIC -D_REENTRANT -D_UCS2 -D_UNICODE -MD -MT src/core/CMakeFiles/clucene-core.dir/CLucene/queryParser/MultiFieldQueryParser.o -MF CMakeFiles/clucene-core.dir/CLucene/queryParser/MultiFieldQueryParser.o.d -o CMakeFiles/clucene-core.dir/CLucene/queryParser/MultiFieldQueryParser.o -c /home/abuild/rpmbuild/BUILD/clucene-core-2.3.3.4/src/core/CLucene/queryParser/MultiFieldQueryParser.cpp ... /home/abuild/rpmbuild/BUILD/clucene-core-2.3.3.4/src/core/CLucene/document/DateTools.cpp: In static member function 'static void lucene::document::DateTools::timeToString(int64_t, Resolution, TCHAR*, size_t)': /home/abuild/rpmbuild/BUILD/clucene-core-2.3.3.4/src/core/CLucene/document/DateTools.cpp:26:19: error: 'gmtime' was not declared in this scope; did you mean 'getTime'? 26 | tm *ptm = gmtime(&secs); | ^~ | getTime It's about missing system header `time.h`, please include it.
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I asked INFRA to create a new repo for archiving attachments (INFRA-23426) and was guided to the toolset for self-service purposes. Seems a new repo can be created here. I can't fill a mandatory field "Project" (I see "lucene" in the list but can't select it). https://gitbox.apache.org/boxer/?action=""> I'm not sure this is due to my account role (committer). Could anyone take a look at the tool - and if possible, create a new repository named lucene-jira-archive? Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Done. Your repository has been created and will be available for use within a few minutes. Your project is available on gitbox at: https://gitbox.apache.org/repos/asf/lucene-jira-archive.git Your project is available on GitHub at: https://github.com/apache/lucene-jira-archive.git User permissions should be set up within the next five minutes. If not, please let us know at: us...@infra.apache.org Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Hi Tomoko, I am able to create repos: Will now create the issue. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) If image attachments aren't displayed, see this article.
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler updated an issue Lucene - Core / LUCENE-10557 Migrate to GitHub issue from Jira Change By: Uwe Schindler Attachment: screenshot-1.png Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler edited a comment on LUCENE-10557 Re: Migrate to GitHub issue from Jira Hi Tomoko, I am able to create repos:!screenshot-1.png|width=720!Will now create the issue repo . Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira LOL. I got message that it already exists. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10593) VectorSimilarityFunction reverse removal
Title: Message Title ASF subversion and git services commented on LUCENE-10593 Re: VectorSimilarityFunction reverse removal Commit b3b7098cd9636c5ad2516055f768dd29b795a05d in lucene's branch refs/heads/branch_9x from Alessandro Benedetti [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b3b7098cd96 ] LUCENE-10593: VectorSimilarityFunction reverse removal (#926) Vector Similarity Function reverse property removed NeighborQueue tie-breaking fixed (node id + node score encoding) NeighborQueue readability refactor BoundChecker removal (now it's only in backward-codecs) Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Updated] (LUCENE-10593) VectorSimilarityFunction reverse removal
Title: Message Title Alessandro Benedetti updated an issue Lucene - Core / LUCENE-10593 VectorSimilarityFunction reverse removal Change By: Alessandro Benedetti Fix Version/s: 9.3 Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Assigned] (LUCENE-10593) VectorSimilarityFunction reverse removal
Title: Message Title Alessandro Benedetti assigned an issue to Alessandro Benedetti Lucene - Core / LUCENE-10593 VectorSimilarityFunction reverse removal Change By: Alessandro Benedetti Assignee: Alessandro Benedetti Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Resolved] (LUCENE-10593) VectorSimilarityFunction reverse removal
Title: Message Title Alessandro Benedetti resolved as Fixed Lucene - Core / LUCENE-10593 VectorSimilarityFunction reverse removal Change By: Alessandro Benedetti Resolution: Fixed Status: Open Resolved Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] alessandrobenedetti commented on pull request #926: VectorSimilarityFunction reverse removal
alessandrobenedetti commented on PR #926: URL: https://github.com/apache/lucene/pull/926#issuecomment-1169764171 Done, everything is merged and backported to 9.x, thanks for your support! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Uwe Schindler Dawid Weiss thank you both! I was able to push the first commit. https://github.com/apache/lucene-jira-archive Looks like watchers are inherited from apache/lucene ... Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I set https://github.com/apache/lucene-jira-archive/blob/main/.asf.yaml not to send notifications to mail groups. Looks like all updates in the repository are still noticed in d...@lucene.apache.org (initial setting when creating the repo?). Could anybody mute this? Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss updated an issue Lucene - Core / LUCENE-10557 Migrate to GitHub issue from Jira Change By: Dawid Weiss Attachment: image-2022-06-29-13-36-57-365.png Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira https://gitbox.apache.org/schemes.cgi?lucene-jira-archive Something seems wrong. According to https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features, the update should be approved via an e-mail sent to private mailing list - I don't see any such email yet. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) If image attachments aren't displayed, see this article.
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Thanks for the information. Something in the automation system could be delayed? I'll check the status page later again. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Looks like all updates in the repository are still noticed in d...@lucene.apache.org (initial setting when creating the repo?). Could anybody mute this? d...@lucene.apache.org and comm...@lucene.apache.org were selected as default during creating repo (see my screenshot above). Actually the PR/issue list should have been issues@lucene.apache.org, but for this case it should be completely silent. I think there seems to be some delay, maybe ask on Slack's infra channel. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Michael McCandless Just for your information, we now have a public ASF repository https://github.com/apache/lucene-jira-archive for the migration and I pushed the migration scripts there to develop/archive it under Apache. I also opened a few issues for it. Tomoko Uchida could you share the source code of the import tool you are working on? Maybe post it in a personal public GitHub repo? We call can try to make PRs / review Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] gsmiller opened a new pull request, #995: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code
gsmiller opened a new pull request, #995: URL: https://github.com/apache/lucene/pull/995 This migrates the remaining production code iteration to use `SSDV#docValueCount` for iteration, getting us closer to removing support for `NO_MORE_ORDS` in `SSDV#nextOrd`. Test-related code still needs to get updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #995: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code
gsmiller commented on PR #995: URL: https://github.com/apache/lucene/pull/995#issuecomment-1169931258 Hmm, something's busted with my changes to `CheckIndex`. Will dig in shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10631) Consolidate java version numbers in one place and reuse them across build parts
Title: Message Title Dawid Weiss created an issue Lucene - Core / LUCENE-10631 Consolidate java version numbers in one place and reuse them across build parts Issue Type: Sub-task Assignee: Unassigned Created: 29/Jun/22 12:43 Priority: Minor Reporter: Dawid Weiss [R. Muir/ mailing list discussions] Ideally we could consolidate a lot of them in a simple .properties file that contains the min/max major version numbers. could be then sucked in by: gradle logic java logic such as checks done in WrapperDownloader bash logic such as error messaging in ./gradlew.sh python smoketester logic? Add Comment
[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing
Title: Message Title Mayya Sharipova commented on LUCENE-10592 Re: Should we build HNSW graph on the fly during indexing PR: https://github.com/apache/lucene/pull/992 Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10630) error: 'gmtime' was not declared in this scope; did you mean 'getTime'?
Title: Message Title Alan Woodward commented on LUCENE-10630 Re: error: 'gmtime' was not declared in this scope; did you mean 'getTime'? This is the issue tracker for the Apache Lucene project, which is written in Java. I think you want http://clucene.sourceforge.net/, which is the website for the c++ port. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10630) error: 'gmtime' was not declared in this scope; did you mean 'getTime'?
Title: Message Title Martin Liška commented on LUCENE-10630 Re: error: 'gmtime' was not declared in this scope; did you mean 'getTime'? Oh, you are right. The C++ port bug lives here: https://sourceforge.net/p/clucene/bugs/235/ Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Resolved] (LUCENE-10630) error: 'gmtime' was not declared in this scope; did you mean 'getTime'?
Title: Message Title Alan Woodward resolved as Invalid Lucene - Core / LUCENE-10630 error: 'gmtime' was not declared in this scope; did you mean 'getTime'? Change By: Alan Woodward Resolution: Invalid Status: Open Resolved Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira According to the infra, you cannot set your personal email address in the repos' notification setting. I changed the address to issues@. https://github.com/apache/lucene-jira-archive/blob/main/.asf.yaml Would you please see private@ list if the notification mail for review was sent (or will have been sent in shortly) there this time? Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Michael McCandless commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I think I have addressed attachments. Woot! I love seeing the attached patch file rendered inline via GitHub like that (versus downloading to my local disk in Jira)Unable to render embedded object: File ( This is awesome progress – thanks [~tomoko]) not found. Michael McCandless Just for your information, we now have a public ASF repository https://github.com/apache/lucene-jira-archive for the migration and I pushed the migration scripts there to develop/archive it under Apache. I also opened a few issues for it. YAY! Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] msokolov merged pull request #927: LUCENE-10151: Adding Timeout Support to IndexSearcher
msokolov merged PR #927: URL: https://github.com/apache/lucene/pull/927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #927: LUCENE-10151: Adding Timeout Support to IndexSearcher
msokolov commented on PR #927: URL: https://github.com/apache/lucene/pull/927#issuecomment-1170060397 I'll follow up with a CHANGES.txt and backport to 9.x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher
Title: Message Title ASF subversion and git services commented on LUCENE-10151 Re: Add timeout support to IndexSearcher Commit af05550ebfe3dc1bc40aeb2318c132a9b12e37a2 in lucene's branch refs/heads/main from Deepika0510 [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=af05550ebfe ] LUCENE-10151: Adding Timeout Support to IndexSearcher (#927) Authored-by: Deepika Sharma Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher
Title: Message Title ASF subversion and git services commented on LUCENE-10151 Re: Add timeout support to IndexSearcher Commit 95de554b65bece9697396eeb4a5e78a8352f58d0 in lucene's branch refs/heads/main from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=95de554b65b ] CHANGES entry for LUCENE-10151 Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Michael McCandless updated an issue Lucene - Core / LUCENE-10557 Migrate to GitHub issue from Jira Change By: Michael McCandless Attachment: Screen Shot 2022-06-05 at 8.13.41 AM.png Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Michael McCandless updated an issue Lucene - Core / LUCENE-10557 Migrate to GitHub issue from Jira Change By: Michael McCandless Attachment: Screen Shot 2022-06-05 at 8.13.41 AM.png Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Michael McCandless commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira So cool! I asked for all (open and closed) issues from Tomoko Uchida's latest migration, sorting by oldest and I see all the original issues (LUCENE-1, -2, -3, etc.): Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) If image attachments aren't displayed, see this article.
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Michael McCandless updated an issue Lucene - Core / LUCENE-10557 Migrate to GitHub issue from Jira Change By: Michael McCandless Attachment: Screen Shot 2022-06-29 at 11.02.35 AM.png Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher
Title: Message Title Michael Sokolov commented on LUCENE-10151 Re: Add timeout support to IndexSearcher Thanks, Deepika Sharma I've merged this now to main and backported to 9.x One oddity I noticed was a linter failure that happened on 9.x only, but not on main? I don't know if we may have relaxed some checks on main? In any case I added a patch for both branches, which is this change: https://gitbox.apache.org/repos/asf?p=lucene.git;a=commit;h=e078bc1cd9c1e647f963fbdd55cbcd4ec59fac94 Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Resolved] (LUCENE-10151) Add timeout support to IndexSearcher
Title: Message Title Michael Sokolov resolved as Fixed Lucene - Core / LUCENE-10151 Add timeout support to IndexSearcher Change By: Michael Sokolov Fix Version/s: 9.3 Resolution: Fixed Status: Open Resolved Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Tomoko Uchida, this came to private@lao: Betreff: Notification schemes for lucene-jira-archive.git updated Datum: Wed, 29 Jun 2022 14:28:15 - Von: GitBox Antwort an: priv...@lucene.apache.org An: priv...@lucene.apache.org The following notification schemes have been changed on lucene-jira-archive by tomoko: adding new scheme (commits): 'comm...@lucene.apache.org' adding new scheme (issues): 'issues@lucene.apache.org' adding new scheme (pullrequests): 'issues@lucene.apache.org' adding new scheme (jira_options): 'link label worklog' With regards, ASF Infra. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira When we do the migration, we should use some "generic" user / bot account? Otherwise we have "mocobeta" linked on all issues Maybe theres an account for doing this by INFRA. They have tokens and some bot user in Github that could be used for the migration. We should contact them if they can give us a token (maybe they can create a token just for Lucene). I'd really recommend to talk in Slack with them, using interfaces is a bit slow in discussing such ad hoc solutions. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Maybe a solution to silence all mails during migration would be to use a fake-address below @lucene.apache.org like nore...@lucene.apache.org. The limitation by the automation at infra is possibly limited to the mailing list domain. and mocob...@apache.org has wrong mail domain. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Tomoko Uchida, this came to private@lao: Thanks Uwe Schindler! Then we'll be able to use the repo to improve migration scripts. nore...@lucene.apache.org. Sounds good to me - I'll update the yaml once again. When we do the migration, we should use some "generic" user / bot account? Otherwise we have "mocobeta" linked on all issues We can't run the migration job on ourselves (and I don't want to use my account for it). Actual migration will be done by an INFRA's account. See Lucene.NET project: https://github.com/apache/lucenenet/issues/280 - seems it is still a personal account. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira We can't run the migration job on ourselves (and I don't want to use my account for it). Actual migration will be done by an INFRA's account. See Lucene.NET project: https://github.com/apache/lucenenet/issues/280 - seems it is still a personal account. Chris Lambertus (fluxo) is his private account. I don't like to use that account, too. I would prefer some generic "bot" account Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira noreply@lao did not work. This time it gave an error message! Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Maybe we can ask them to manually disable notifications during the import. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I changed the notification mails to noreply@ so that we can silence them. Could you check the notification in private@ again please, Uwe Schindler? Thanks. https://github.com/apache/lucene-jira-archive/commit/bbcc1b3a77be635b82942971150f37a076ab26b5 Chris Lambertus (fluxo) is his private account. I don't like to use that account, too. I would prefer some generic "bot" account I think it's against GitHub's terms of policy to have multiple free accounts. I'm not sure it is possible though if we have a paid organization account that is not tied to a person, we could ask infra if we use it for the migration? noreply@lao did not work. This time it gave an error message! Hmm, then I'll revert the change. Maybe we can ask them to manually disable notifications during the import. We verified that any notifications are not sent when executing migration scripts (importing issues and updating issues/comments), thanks to Houston and Dawid. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida edited a comment on LUCENE-10557 Re: Migrate to GitHub issue from Jira I changed the notification mails to noreply@ so that we can silence them. Could you check the notification in private@ again please, [~uschindler]? Thanks.https://github.com/apache/lucene-jira-archive/commit/bbcc1b3a77be635b82942971150f37a076ab26b5bq. Chris Lambertus (fluxo) is his private account. I don't like to use that account, too. I would prefer some generic "bot" accountI think it's against GitHub's terms of policy to have multiple free accounts. I'm not sure it is possible though if we have a paid organization account that is not tied to a person, we could ask infra if we use it for the migration?bq. noreply@lao did not work. This time it gave an error message!Hmm, then I'll revert the change.bq. Maybe we can ask them to manually disable notifications during the import. We In the recent migration test where all issues are migrated, we verified that any notifications are not sent when executing migration scripts (importing issues and updating issues/comments), thanks to Houston and Dawid. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira We can't use an arbitrary github account for migration because importing/creating issues with GitHub API requires not only the access token but also admin access to the repo - it is not allowed to have for developers. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] jpountz opened a new pull request, #996: LUCENE-10151: Some fixes to query timeouts.
jpountz opened a new pull request, #996: URL: https://github.com/apache/lucene/pull/996 I noticed some minor bugs in the original PR #927 that this PR should fix: - When a timeout is set, we would no longer catch `CollectionTerminatedException`. - I added randomization to `LuceneTestCase` to randomly set a timeout, it would have caught the above bug. - Fixed visibility of `TimeLimitingBulkScorer`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I found this issue is an excellent sample for testing - this includes: Cross-issue link Pull Request link External link (to an image) Attachments (images) and references to them Mention to Jira IDs Bullet list Code block Quote So I would add a numbered list and a fake table in this comment to make this more convenient for testing. Please ignore this comment. Jira GitHub LUCENE-1 #251 LUCENE-2 #252 LUCENE-3 #253 Add Comment
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I know from the past that INFRA had some video conferences with Github representatives, so ASF is not just "some arbitrary customer". I think there was a lot of discussions going on. The LUCENE.NET import was long before they had close contact to Github. I would really prefer to keep all orginal contributors, the change of names to some private account is a real blocker to me. When we can't modify the comment/issue creator mail address to use the official ASF one of the person or use some generic bot account, I would vote now -1 to the migration. P.S.: Spring used a generic user for the import "spring-projects-issues": https://github.com/spring-projects/spring-framework/issues/created_by/spring-projects-issues Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler edited a comment on LUCENE-10557 Re: Migrate to GitHub issue from Jira Spring also have a cool redirector in their webserver. It only redirects if you don't have some special param: https://jira.spring.io/browse/SPR-17649?redirect=false And they also added a comment at end of all their issues (also by the bot). Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Spring also have a cool redirector in their webserver. It only redirects if you don't have some special param: https://jira.spring.io/browse/SPR-17649?redirect=false Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira P.S.: Spring used a generic user for the import "spring-projects-issues" Yes, I like it. It should be an organization account - maybe we can ask infra if we have one? Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Uwe Schindler edited a comment on LUCENE-10557 Re: Migrate to GitHub issue from Jira Spring also have a cool redirector in their webserver. It only redirects if you don't have some special param: https://jira.spring.io/browse/SPR- 17649 17639 ?redirect=falseAnd they also added a comment at end of all their issues (also by the bot). Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] jpountz commented on a diff in pull request #995: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code
jpountz commented on code in PR #995: URL: https://github.com/apache/lucene/pull/995#discussion_r910164457 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3382,6 +3383,7 @@ private static void checkSortedSetDocValues( seenOrds.set(ord); ordCount++; } + Review Comment: At this point `ord` is going to be the last ord while it used to always be NO_MORE_ORDS, which I suspect may cause CheckIndex failures. ## lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java: ## @@ -304,25 +306,19 @@ public int lookupTerm(BytesRef key) throws IOException { private void setOrd() throws IOException { if (docID() != NO_MORE_DOCS) { -int upto = 0; -while (true) { - long nextOrd = in.nextOrd(); - if (nextOrd == NO_MORE_ORDS) { -break; - } - if (upto == ords.length) { -ords = ArrayUtil.grow(ords); - } - ords[upto++] = (int) nextOrd; -} - -if (upto == 0) { +int docValueCount = in.docValueCount(); +if (docValueCount == 0) { Review Comment: likewise here ## lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java: ## @@ -304,25 +306,19 @@ public int lookupTerm(BytesRef key) throws IOException { private void setOrd() throws IOException { if (docID() != NO_MORE_DOCS) { -int upto = 0; -while (true) { - long nextOrd = in.nextOrd(); - if (nextOrd == NO_MORE_ORDS) { -break; - } - if (upto == ords.length) { -ords = ArrayUtil.grow(ords); - } - ords[upto++] = (int) nextOrd; -} - -if (upto == 0) { +int docValueCount = in.docValueCount(); +if (docValueCount == 0) { // iterator should not have returned this docID if it has no ords: assert false; ord = (int) NO_MORE_ORDS; -} else { - ord = ords[(upto - 1) >>> 1]; + return; +} + +ords = ArrayUtil.grow(ords, docValueCount); +for (int i = 0; i < docValueCount; i++) { + ords[i] = (int) in.nextOrd(); } +ord = ords[(docValueCount - 1) >>> 1]; Review Comment: we don't even need to buffer ords now that we know their number I think? We could compute the index of the ord we're interested in, then consume this number of ords, and the next ord would be the median ord? ## lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java: ## @@ -226,12 +226,14 @@ public int lookupTerm(BytesRef key) throws IOException { private void setOrd() throws IOException { if (docID() != NO_MORE_DOCS) { -while (true) { - long nextOrd = in.nextOrd(); - if (nextOrd == NO_MORE_ORDS) { -break; +int docValueCount = in.docValueCount(); +if (docValueCount == 0) { Review Comment: docValueCount may never return 0, we can drop this branch ## lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java: ## @@ -394,25 +390,19 @@ public int lookupTerm(BytesRef key) throws IOException { private void setOrd() throws IOException { if (docID() != NO_MORE_DOCS) { -int upto = 0; -while (true) { - long nextOrd = in.nextOrd(); - if (nextOrd == NO_MORE_ORDS) { -break; - } - if (upto == ords.length) { -ords = ArrayUtil.grow(ords); - } - ords[upto++] = (int) nextOrd; -} - -if (upto == 0) { +int docValueCount = in.docValueCount(); +if (docValueCount == 0) { // iterator should not have returned this docID if it has no ords: assert false; ord = (int) NO_MORE_ORDS; -} else { - ord = ords[upto >>> 1]; + return; +} + +ords = ArrayUtil.grow(ords, docValueCount); +for (int i = 0; i < docValueCount; i++) { + ords[i] = (int) in.nextOrd(); } +ord = ords[docValueCount >>> 1]; Review Comment: and likewise here? ## lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java: ## @@ -394,25 +390,19 @@ public int lookupTerm(BytesRef key) throws IOException { private void setOrd() throws IOException { if (docID() != NO_MORE_DOCS) { -int upto = 0; -while (true) { - long nextOrd = in.nextOrd(); - if (nextOrd == NO_MORE_ORDS) { -break; - } - if (upto == ords.length) { -ords = ArrayUtil.grow(ords); - } - ords[upto++] = (int) nextOrd; -} - -if (upto == 0) { +int docValueCount = in.docValueCount(); +if (docValueCount == 0) { Review Comment: we can drop this branch -- This is an automated message fro
[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #4: Which GitHub accont should we use for migration?
mocobeta opened a new issue, #4: URL: https://github.com/apache/lucene-jira-archive/issues/4 To import/create issues with GItHub API, you need admin access to the repo and we developers are not allowed to have it. Actual migration will be done by infra; it seems a personal account was used for the import job when Lucene.NET project migrated their issues to GitHub. See https://github.com/apache/lucenenet/issues/280. For example, Spring uses an organization account that is not tied to a person (https://github.com/spring-projects/spring-framework/issues/22178). Can we do the same? What organization account is available to us? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G closed pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
Yuti-G closed pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts URL: https://github.com/apache/lucene/pull/974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I think it looks like we have too many topics to deal with in one issue? We can break up them into sub-jira tasks though, I created a few github issues in the lucene-jira-archive repo. For example https://github.com/apache/lucene-jira-archive/issues/4 ("Which GitHub account should we use for migration?") Notifications were sent to issues@ this time. Looks fine. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #5: Prepare complete migration script to GitHub issue from Jira (best effort)
mocobeta opened a new issue, #5: URL: https://github.com/apache/lucene-jira-archive/issues/5 This is the umbrella to improve migration scripts. Sub tasks are: - #1 - #2 - #3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)
Title: Message Title Tomoko Uchida resolved as Duplicate Moved to https://github.com/apache/lucene-jira-archive/issues/5 Lucene - Core / LUCENE-10622 Prepare complete migration script to GitHub issue from Jira (best effort) Change By: Tomoko Uchida Resolution: Duplicate Status: Open Resolved Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] gsmiller commented on a diff in pull request #995: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code
gsmiller commented on code in PR #995: URL: https://github.com/apache/lucene/pull/995#discussion_r910315408 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3382,6 +3383,7 @@ private static void checkSortedSetDocValues( seenOrds.set(ord); ordCount++; } + Review Comment: Ack. Yeah that's the issue. I don't think the equality check between `ord` and `ord2` after this loop makes sense anymore given that there's no guarantee about what value `ord` will be if calling `nextOrd()` more times than advertised by `docValueCount()`, so I removed the check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #995: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code
gsmiller commented on code in PR #995: URL: https://github.com/apache/lucene/pull/995#discussion_r910344898 ## lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java: ## @@ -226,12 +226,14 @@ public int lookupTerm(BytesRef key) throws IOException { private void setOrd() throws IOException { if (docID() != NO_MORE_DOCS) { -while (true) { - long nextOrd = in.nextOrd(); - if (nextOrd == NO_MORE_ORDS) { -break; +int docValueCount = in.docValueCount(); +if (docValueCount == 0) { Review Comment: Ah right. Thanks! Addressed this in all four places. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #995: LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in production code
gsmiller commented on code in PR #995: URL: https://github.com/apache/lucene/pull/995#discussion_r910345129 ## lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java: ## @@ -304,25 +306,19 @@ public int lookupTerm(BytesRef key) throws IOException { private void setOrd() throws IOException { if (docID() != NO_MORE_DOCS) { -int upto = 0; -while (true) { - long nextOrd = in.nextOrd(); - if (nextOrd == NO_MORE_ORDS) { -break; - } - if (upto == ords.length) { -ords = ArrayUtil.grow(ords); - } - ords[upto++] = (int) nextOrd; -} - -if (upto == 0) { +int docValueCount = in.docValueCount(); +if (docValueCount == 0) { // iterator should not have returned this docID if it has no ords: assert false; ord = (int) NO_MORE_ORDS; -} else { - ord = ords[(upto - 1) >>> 1]; + return; +} + +ords = ArrayUtil.grow(ords, docValueCount); +for (int i = 0; i < docValueCount; i++) { + ords[i] = (int) in.nextOrd(); } +ord = ords[(docValueCount - 1) >>> 1]; Review Comment: Good point. Tweaked this (and the other location). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #983: Some refactoring/cleanup of AbstractSortedSetDocValueFacetCounts
gsmiller merged PR #983: URL: https://github.com/apache/lucene/pull/983 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #984: Switch Float/IntTaxonomyFacets to primitive list data structures in getAllChildren
gsmiller merged PR #984: URL: https://github.com/apache/lucene/pull/984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #984: Switch Float/IntTaxonomyFacets to primitive list data structures in getAllChildren
gsmiller commented on PR #984: URL: https://github.com/apache/lucene/pull/984#issuecomment-1170449045 Thanks @shaie ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #983: Some refactoring/cleanup of AbstractSortedSetDocValueFacetCounts
gsmiller commented on PR #983: URL: https://github.com/apache/lucene/pull/983#issuecomment-1170449195 Thanks @shaie ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #997: Backport GH#983 and GH#984
gsmiller opened a new pull request, #997: URL: https://github.com/apache/lucene/pull/997 Using a PR to backport for convenience. No review required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #997: Backport GH#983 and GH#984
gsmiller merged PR #997: URL: https://github.com/apache/lucene/pull/997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
gsmiller commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r910480915 ## lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java: ## @@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... path) throws IOException return new FacetResult(dim, path, totCount, labelValues, labelValues.length); } - // The current getTopChildren method is not returning "top" ranges. Instead, it returns all - // user-provided ranges in - // the order the user specified them when instantiating. This concept is being introduced and - // supported in the - // getAllChildren functionality in LUCENE-10550. getTopChildren is temporarily calling - // getAllChildren to maintain its - // current behavior, and the current implementation will be replaced by an actual "top children" - // implementation - // in LUCENE-10614 - // TODO: fix getTopChildren in LUCENE-10614 @Override public FacetResult getTopChildren(int topN, String dim, String... path) throws IOException { validateTopN(topN); -return getAllChildren(dim, path); +validateDimAndPathForGetChildren(dim, path); + +int resultSize = Math.min(topN, counts.length); +PriorityQueue pq = +new PriorityQueue<>(resultSize) { + @Override + protected boolean lessThan(LabelAndValue a, LabelAndValue b) { +int cmp = Integer.compare(a.value.intValue(), b.value.intValue()); +if (cmp == 0) { + cmp = b.label.compareTo(a.label); +} +return cmp < 0; + } +}; + +for (int i = 0; i < counts.length; i++) { + if (pq.size() < resultSize) { +pq.add(new LabelAndValue(ranges[i].label, counts[i])); Review Comment: Perfect, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
gsmiller commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r910485371 ## lucene/demo/src/java/org/apache/lucene/demo/facet/DistanceFacetsExample.java: ## @@ -212,7 +212,26 @@ public static Query getBoundingBoxQuery( } /** User runs a query and counts facets. */ - public FacetResult search() throws IOException { + public FacetResult searchAllChildren() throws IOException { + +FacetsCollector fc = searcher.search(new MatchAllDocsQuery(), new FacetsCollectorManager()); + +Facets facets = +new DoubleRangeFacetCounts( +"field", +getDistanceValueSource(), +fc, +getBoundingBoxQuery(ORIGIN_LATITUDE, ORIGIN_LONGITUDE, 10.0), +ONE_KM, +TWO_KM, +FIVE_KM, +TEN_KM); + +return facets.getAllChildren("field"); + } + + /** User runs a query and counts facets. */ + public FacetResult searchTopChildren() throws IOException { Review Comment: Yeah maybe. I think if you can come up with a real-world example that has a somewhat high cardinality of children but where you only want a small subset, then building an example around that could be useful. Here's one I just thought of, but maybe you can come up with something else? What if, as an example, you indexed error messages in a service log so you could do analysis over them. Each document could be an error log entry that contains the log message string and also a timestamp for when it occurred. Then let's say you wanted to find the top 5 hour periods that had the most errors over the past week. To do this, you could create 168 ranges (each for a one hour time period; 7 * 24 = 268) and facet on them. Then you could ask for the top-5 by count. That would give you the five hour periods over the last week with the most errors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r910551704 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Comparator; +import java.util.LinkedList; +import java.util.List; + +/** Scorer implementing Block-Max Maxscore algorithm */ +public class BlockMaxMaxscoreScorer extends Scorer { + // current doc ID of the leads + private int doc; + + // doc id boundary that all scorers maxScore are valid + private int upTo = -1; Review Comment: Moved `upTo` as well as a few others into constructor. ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Comparator; +import java.util.LinkedList; +import java.util.List; + +/** Scorer implementing Block-Max Maxscore algorithm */ +public class BlockMaxMaxscoreScorer extends Scorer { + // current doc ID of the leads + private int doc; + + // doc id boundary that all scorers maxScore are valid + private int upTo = -1; + + // heap of scorers ordered by doc ID + private final DisiPriorityQueue essentialsScorers; + // list of scorers ordered by maxScore + private final LinkedList maxScoreSortedEssentialScorers; + + private final DisiWrapper[] allScorers; + + // sum of max scores of scorers in nonEssentialScorers list + private float nonEssentialMaxScoreSum; + + private long cost; + + private final MaxScoreSumPropagator maxScoreSumPropagator; + + // scaled min competitive score + private float minCompetitiveScore = 0; + + private int cachedScoredDoc = -1; + private float cachedScore = 0; + + /** + * Constructs a Scorer that scores doc based on Block-Max-Maxscore (BMM) algorithm + * http://engineering.nyu.edu/~suel/papers/bmm.pdf . This algorithm has lower overhead compared to + * WANDScorer, and could be used for simple disjunction queries. + * + * @param weight The weight to be used. + * @param scorers The sub scorers this Scorer should iterate on for optional clauses + */ + public BlockMaxMaxscoreScorer(Weight weight, List scorers) throws IOException { +super(weight); + +this.doc = -1; +this.allScorers = new DisiWrapper[scorers.size()]; +this.essentialsScorers = new DisiPriorityQueue(scorers.size()); +this.maxScoreSortedEssentialScorers = new LinkedList<>(); + +long cost = 0; +for (int i = 0; i < scorers.size(); i++) { + DisiWrapper w = new DisiWrapper(scorers.get(i)); + cost += w.cost; + allScorers[i] = w; +} + +this.cost = cost; +maxScoreSumPropagator = new MaxScoreSumPropagator(scorers); + } + + @Override + public DocIdSetIterator iterator() { +// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee +return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator()); + } + + @Override + public TwoPhaseIterator twoPhaseIterator() { +DocIdSetIterator approximation = +new DocIdSetIterator() { + + @Override +
[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r910551829 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Comparator; +import java.util.LinkedList; +import java.util.List; + +/** Scorer implementing Block-Max Maxscore algorithm */ +public class BlockMaxMaxscoreScorer extends Scorer { + // current doc ID of the leads + private int doc; + + // doc id boundary that all scorers maxScore are valid + private int upTo = -1; + + // heap of scorers ordered by doc ID + private final DisiPriorityQueue essentialsScorers; + // list of scorers ordered by maxScore + private final LinkedList maxScoreSortedEssentialScorers; + + private final DisiWrapper[] allScorers; + + // sum of max scores of scorers in nonEssentialScorers list + private float nonEssentialMaxScoreSum; + + private long cost; + + private final MaxScoreSumPropagator maxScoreSumPropagator; + + // scaled min competitive score + private float minCompetitiveScore = 0; + + private int cachedScoredDoc = -1; + private float cachedScore = 0; + + /** + * Constructs a Scorer that scores doc based on Block-Max-Maxscore (BMM) algorithm + * http://engineering.nyu.edu/~suel/papers/bmm.pdf . This algorithm has lower overhead compared to + * WANDScorer, and could be used for simple disjunction queries. + * + * @param weight The weight to be used. + * @param scorers The sub scorers this Scorer should iterate on for optional clauses + */ + public BlockMaxMaxscoreScorer(Weight weight, List scorers) throws IOException { +super(weight); + +this.doc = -1; +this.allScorers = new DisiWrapper[scorers.size()]; +this.essentialsScorers = new DisiPriorityQueue(scorers.size()); +this.maxScoreSortedEssentialScorers = new LinkedList<>(); + +long cost = 0; +for (int i = 0; i < scorers.size(); i++) { + DisiWrapper w = new DisiWrapper(scorers.get(i)); + cost += w.cost; + allScorers[i] = w; +} + +this.cost = cost; +maxScoreSumPropagator = new MaxScoreSumPropagator(scorers); + } + + @Override + public DocIdSetIterator iterator() { +// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee +return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator()); + } + + @Override + public TwoPhaseIterator twoPhaseIterator() { +DocIdSetIterator approximation = +new DocIdSetIterator() { + + @Override + public int docID() { +return doc; + } + + @Override + public int nextDoc() throws IOException { +return advance(doc + 1); + } + + @Override + public int advance(int target) throws IOException { +while (true) { + + if (target > upTo) { +updateMaxScoresAndLists(target); + } else { +// minCompetitiveScore might have increased, +// move potentially no-longer-competitive scorers from essential to non-essential +// list +movePotentiallyNonCompetitiveScorers(); + } + + assert target <= upTo; + + DisiWrapper top = essentialsScorers.top(); + + if (top == null) { +// all scorers in non-essential list, skip to next boundary or return no_more_docs +if (upTo == NO_MORE_DOCS) { + return doc = NO_MORE_DOCS; +} else { + target = upTo + 1; +} + } else { +// position all scorers in essential list to on or after target +while (top.doc < target) { + top.doc = top.iterator.advance(target); + top = essentialsScorers.updateTop(); +} + +if (top.doc == NO_MORE_DOCS) { + return doc = NO_MORE_DOCS; +} else if (
[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r910552757 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Comparator; +import java.util.LinkedList; +import java.util.List; + +/** Scorer implementing Block-Max Maxscore algorithm */ +public class BlockMaxMaxscoreScorer extends Scorer { + // current doc ID of the leads + private int doc; + + // doc id boundary that all scorers maxScore are valid + private int upTo = -1; + + // heap of scorers ordered by doc ID + private final DisiPriorityQueue essentialsScorers; + // list of scorers ordered by maxScore + private final LinkedList maxScoreSortedEssentialScorers; + + private final DisiWrapper[] allScorers; + + // sum of max scores of scorers in nonEssentialScorers list + private float nonEssentialMaxScoreSum; + + private long cost; + + private final MaxScoreSumPropagator maxScoreSumPropagator; + + // scaled min competitive score + private float minCompetitiveScore = 0; + + private int cachedScoredDoc = -1; + private float cachedScore = 0; + + /** + * Constructs a Scorer that scores doc based on Block-Max-Maxscore (BMM) algorithm + * http://engineering.nyu.edu/~suel/papers/bmm.pdf . This algorithm has lower overhead compared to + * WANDScorer, and could be used for simple disjunction queries. + * + * @param weight The weight to be used. + * @param scorers The sub scorers this Scorer should iterate on for optional clauses + */ + public BlockMaxMaxscoreScorer(Weight weight, List scorers) throws IOException { +super(weight); + +this.doc = -1; +this.allScorers = new DisiWrapper[scorers.size()]; +this.essentialsScorers = new DisiPriorityQueue(scorers.size()); +this.maxScoreSortedEssentialScorers = new LinkedList<>(); + +long cost = 0; +for (int i = 0; i < scorers.size(); i++) { + DisiWrapper w = new DisiWrapper(scorers.get(i)); + cost += w.cost; + allScorers[i] = w; +} + +this.cost = cost; +maxScoreSumPropagator = new MaxScoreSumPropagator(scorers); + } + + @Override + public DocIdSetIterator iterator() { +// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee +return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator()); + } + + @Override + public TwoPhaseIterator twoPhaseIterator() { +DocIdSetIterator approximation = +new DocIdSetIterator() { + + @Override + public int docID() { +return doc; + } + + @Override + public int nextDoc() throws IOException { +return advance(doc + 1); + } + + @Override + public int advance(int target) throws IOException { +while (true) { + + if (target > upTo) { +updateMaxScoresAndLists(target); + } else { +// minCompetitiveScore might have increased, +// move potentially no-longer-competitive scorers from essential to non-essential +// list +movePotentiallyNonCompetitiveScorers(); + } + + assert target <= upTo; + + DisiWrapper top = essentialsScorers.top(); + + if (top == null) { +// all scorers in non-essential list, skip to next boundary or return no_more_docs +if (upTo == NO_MORE_DOCS) { + return doc = NO_MORE_DOCS; +} else { + target = upTo + 1; +} + } else { +// position all scorers in essential list to on or after target +while (top.doc < target) { + top.doc = top.iterator.advance(target); + top = essentialsScorers.updateTop(); +} + +if (top.doc == NO_MORE_DOCS) { + return doc = NO_MORE_DOCS; +} else if (
[GitHub] [lucene] zacharymorn commented on a diff in pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on code in PR #972: URL: https://github.com/apache/lucene/pull/972#discussion_r910552971 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Comparator; +import java.util.LinkedList; +import java.util.List; + +/** Scorer implementing Block-Max Maxscore algorithm */ +public class BlockMaxMaxscoreScorer extends Scorer { + // current doc ID of the leads + private int doc; + + // doc id boundary that all scorers maxScore are valid + private int upTo = -1; + + // heap of scorers ordered by doc ID + private final DisiPriorityQueue essentialsScorers; + // list of scorers ordered by maxScore + private final LinkedList maxScoreSortedEssentialScorers; + + private final DisiWrapper[] allScorers; + + // sum of max scores of scorers in nonEssentialScorers list + private float nonEssentialMaxScoreSum; + + private long cost; + + private final MaxScoreSumPropagator maxScoreSumPropagator; + + // scaled min competitive score + private float minCompetitiveScore = 0; + + private int cachedScoredDoc = -1; + private float cachedScore = 0; + + /** + * Constructs a Scorer that scores doc based on Block-Max-Maxscore (BMM) algorithm + * http://engineering.nyu.edu/~suel/papers/bmm.pdf . This algorithm has lower overhead compared to + * WANDScorer, and could be used for simple disjunction queries. + * + * @param weight The weight to be used. + * @param scorers The sub scorers this Scorer should iterate on for optional clauses + */ + public BlockMaxMaxscoreScorer(Weight weight, List scorers) throws IOException { +super(weight); + +this.doc = -1; +this.allScorers = new DisiWrapper[scorers.size()]; +this.essentialsScorers = new DisiPriorityQueue(scorers.size()); +this.maxScoreSortedEssentialScorers = new LinkedList<>(); + +long cost = 0; +for (int i = 0; i < scorers.size(); i++) { + DisiWrapper w = new DisiWrapper(scorers.get(i)); + cost += w.cost; + allScorers[i] = w; +} + +this.cost = cost; +maxScoreSumPropagator = new MaxScoreSumPropagator(scorers); + } + + @Override + public DocIdSetIterator iterator() { +// twoPhaseIterator needed to honor scorer.setMinCompetitiveScore guarantee +return TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator()); + } + + @Override + public TwoPhaseIterator twoPhaseIterator() { +DocIdSetIterator approximation = +new DocIdSetIterator() { + + @Override + public int docID() { +return doc; + } + + @Override + public int nextDoc() throws IOException { +return advance(doc + 1); + } + + @Override + public int advance(int target) throws IOException { +while (true) { + + if (target > upTo) { +updateMaxScoresAndLists(target); + } else { +// minCompetitiveScore might have increased, +// move potentially no-longer-competitive scorers from essential to non-essential +// list +movePotentiallyNonCompetitiveScorers(); + } + + assert target <= upTo; + + DisiWrapper top = essentialsScorers.top(); + + if (top == null) { +// all scorers in non-essential list, skip to next boundary or return no_more_docs +if (upTo == NO_MORE_DOCS) { + return doc = NO_MORE_DOCS; +} else { + target = upTo + 1; +} + } else { +// position all scorers in essential list to on or after target +while (top.doc < target) { + top.doc = top.iterator.advance(target); + top = essentialsScorers.updateTop(); +} + +if (top.doc == NO_MORE_DOCS) { + return doc = NO_MORE_DOCS; +} else if (
[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1170684358 > With this change, I suspect that some scorers created in `TestWANDScorer` would now use your new `BlockMaxMaxScoreScorer`, which is going to decrease the coverage of WANDScorer. Can we somehow make sure that `TestWANDScorer` always gets a `WANDScorer`? E.g. I spotted this query under `TestWANDScorer#testBasics` which likely uses your now scorer: > > ```java > // test a filtered disjunction > query = > new BooleanQuery.Builder() > .add( > new BooleanQuery.Builder() > .add( > new BoostQuery( > new ConstantScoreQuery(new TermQuery(new Term("foo", "A"))), 2), > Occur.SHOULD) > .add(new ConstantScoreQuery(new TermQuery(new Term("foo", "B"))), Occur.SHOULD) > .build(), > Occur.MUST) > .add(new TermQuery(new Term("foo", "C")), Occur.FILTER) > .build(); > ``` Yeah this is a good question. In my newly added tests I have used something like this to confirm it's testing the right scorer, but I'm not totally happy about this approach myself : ``` if (scorer instanceof AssertingScorer) { assertTrue(((AssertingScorer) scorer).getIn() instanceof BlockMaxMaxscoreScorer); } else { assertTrue(scorer instanceof BlockMaxMaxscoreScorer); } ``` One alternative approach could be instantiating `WANDScorer` directly inside the test for lower level tests, and moving the higher level tests into another test class that doesn't care about the specific scorer implementation for disjunction? This may require duplicating some code from `BooleanWeight`, `AssertingWeight` etc though but should be do-able. On the other hand, if we don't plan on initiating `WANDScorer` directly in the test, varying the query clauses and asserting like above might be the best we could do I feel? This has the potential test coverage decrease issue as you suggested so may not be ideal either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10632) Change getAllChildren to return all children regardless of the count
Yuting Gan created LUCENE-10632: --- Summary: Change getAllChildren to return all children regardless of the count Key: LUCENE-10632 URL: https://issues.apache.org/jira/browse/LUCENE-10632 Project: Lucene - Core Issue Type: Improvement Reporter: Yuting Gan Currently, the getAllChildren functionality is implemented in a way that is similar to getTopChildren, where they only return children with count that is greater than zero. However, he original getTopChildren in RangeFacetCounts returned all children whether-or-not the count was zero. This actually has good use cases and we should continue supporting the feature in getAllChildren, so that we will not lose it after properly supporting getTopChildren in RangeFacetCounts. As discussed with [~gsmiller] in the [LUCENE-10614 pr|https://github.com/apache/lucene/pull/974], allowing getAllChildren to behave differently from getTopChildren can actually be more helpful for users. If users want to get children with only positive count, we have getTopChildren supporting this behavior already. Therefore, the getAllChildren API should provide all children in all of the implementations, whether-or-not the count is zero. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org