[GitHub] [lucene-jira-archive] mocobeta commented on issue #29: Can/should we make Jira read-only on migration to GitHub issues?
mocobeta commented on issue #29: URL: https://github.com/apache/lucene-jira-archive/issues/29#issuecomment-1186138763 I just wanted to let you know that I'm not able to edit the Jira configuration such as workflow or issue template (I don't have permission). So anyway, I have to pass it to you after GitHub issue is lifted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #29: Can/should we make Jira read-only on migration to GitHub issues?
mikemccand commented on issue #29: URL: https://github.com/apache/lucene-jira-archive/issues/29#issuecomment-1186145578 Hmm OK let me see if I have permissions ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567465#comment-17567465 ] Michael McCandless commented on LUCENE-10557: - bq. [TEST] This was moved to GitHub issue: https://github.com/mocobeta/migration-test-3/issues/196. Oooh that looks promising!! > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 40m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #7: Make a detailed migration plan
mocobeta commented on issue #7: URL: https://github.com/apache/lucene-jira-archive/issues/7#issuecomment-1186146650 Once the migration is started, issues opened in Jira have to be manually migrated to GitHub by the authors afterward and it'd be bothersome. I wanted to add some texts that say, ``` We are switching from Jira to GitHub issues, and data migration is now in progress. Although you can still open a Jira issue, you may want to wait until the migration is finished and open a GitHub issue after that, if you are not in a hurry. Migration will be completed within a few days. ``` to the Jira issue template (wording could be refined). But it looks like I don't have permission to browse/edit the issue templates... Could someone who is able to edit the issue template help me with it? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567466#comment-17567466 ] Michael McCandless commented on LUCENE-10557: - OK I am able to administer our Jira instance. There are some wrinkles – apparently because some of our workflows are shared across two projects (Lucene and Solr), the workflows themselves are read-only! So we cannot change them unless we work the workflows. But there is much discussion about this problem, e.g.: [https://community.atlassian.com/t5/Jira-questions/Fastest-way-to-make-JIRA-read-only/qaq-p/1261492] I'll try to find the simplest way that works for us. > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 40m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #1024: LUCENE-10557: Add GitHub issue templates
mocobeta commented on PR #1024: URL: https://github.com/apache/lucene/pull/1024#issuecomment-1186152291 There are five pre-fixed issue templates (forms) written in YAML and they look like: - Bug Report  - Test Improvement / Failure Report  - Enhance Request/Suggestions - Task - Documentation Improvement  Other fields/components (checkbox, dropbox, and so on) can be added if we'd like. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #987: LUCENE-10627: Using CompositeByteBuf to Reduce Memory Copy
jpountz commented on code in PR #987: URL: https://github.com/apache/lucene/pull/987#discussion_r922672203 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java: ## @@ -247,21 +249,18 @@ private void flush(boolean force) throws IOException { writeHeader(docBase, numBufferedDocs, numStoredFields, lengths, sliced, dirtyChunk); // compress stored fields to fieldsStream. -// -// TODO: do we need to slice it since we already have the slices in the buffer? Perhaps -// we should use max-block-bits restriction on the buffer itself, then we won't have to check it -// here. -byte[] content = bufferedDocs.toArrayCopy(); -bufferedDocs.reset(); - if (sliced) { - // big chunk, slice it - for (int compressed = 0; compressed < content.length; compressed += chunkSize) { -compressor.compress( -content, compressed, Math.min(chunkSize, content.length - compressed), fieldsStream); + // big chunk, slice it, using ByteBuffersDataInput ignore memory copy + ByteBuffersDataInput bytebuffers = bufferedDocs.toDataInput(); + final int capacity = (int) bytebuffers.size(); + for (int compressed = 0; compressed < capacity; compressed += chunkSize) { +int l = Math.min(chunkSize, capacity - compressed); +ByteBuffersDataInput bbdi = bytebuffers.slice(compressed, l); +compressor.compress(bbdi, fieldsStream); } } else { - compressor.compress(content, 0, content.length, fieldsStream); + ByteBuffersDataInput bytebuffers = bufferedDocs.toDataInput(); Review Comment: Maybe move this before the `if` statement since we create `byteBuffers` the same way on both branches? ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java: ## @@ -519,7 +518,13 @@ private void copyOneDoc(Lucene90CompressingStoredFieldsReader reader, int docID) assert reader.getVersion() == VERSION_CURRENT; SerializedDocument doc = reader.document(docID); startDocument(); -bufferedDocs.copyBytes(doc.in, doc.length); + +if (doc.in instanceof ByteArrayDataInput) { + // reuse ByteArrayDataInput to reduce memory copy + bufferedDocs.copyBytes((ByteArrayDataInput) doc.in, doc.length); +} else { + bufferedDocs.copyBytes(doc.in, doc.length); +} Review Comment: I think that we could avoid this `instanceof` check by overriding `ByteBuffersDataOutput#copyBytes` to read directly into its internal buffers when they are not direct (ie. backed by a `byte[]`)? (Maybe in a separate change?) ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/DeflateWithPresetDictCompressionMode.java: ## @@ -163,12 +165,16 @@ private static class DeflateWithPresetDictCompressor extends Compressor { final Deflater compressor; final BugfixDeflater_JDK8252739 deflaterBugfix; byte[] compressed; +byte[] bufferDict; +byte[] bufferBlock; boolean closed; DeflateWithPresetDictCompressor(int level) { compressor = new Deflater(level, true); deflaterBugfix = BugfixDeflater_JDK8252739.createBugfix(compressor); compressed = new byte[64]; + bufferDict = BytesRef.EMPTY_BYTES; + bufferBlock = BytesRef.EMPTY_BYTES; } private void doCompress(byte[] bytes, int off, int len, DataOutput out) throws IOException { Review Comment: Can we remove this one and require callers to use the ByteBuffer variant instead? ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/DeflateWithPresetDictCompressionMode.java: ## @@ -199,23 +205,65 @@ private void doCompress(byte[] bytes, int off, int len, DataOutput out) throws I out.writeBytes(compressed, totalCount); } +private void doCompress(ByteBuffer bytes, int len, DataOutput out) throws IOException { + if (len == 0) { +out.writeVInt(0); +return; + } + compressor.setInput(bytes); + compressor.finish(); + if (compressor.needsInput()) { +throw new IllegalStateException(); + } + + int totalCount = 0; + for (; ; ) { +final int count = +compressor.deflate(compressed, totalCount, compressed.length - totalCount); +totalCount += count; +assert totalCount <= compressed.length; +if (compressor.finished()) { + break; +} else { + compressed = ArrayUtil.grow(compressed); +} + } + + out.writeVInt(totalCount); + out.writeBytes(compressed, totalCount); +} + @Override -public void compress(byte[] bytes, int off, int len, DataOutput out) throws IOException { +public void compress(ByteBuffersDataInput buffersInput, DataOutput out) throws IOException { + final int len = (int) (buffersInput.size() - buffersInput.position()); + final int en
[GitHub] [lucene] jpountz commented on a diff in pull request #1018: LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions
jpountz commented on code in PR #1018: URL: https://github.com/apache/lucene/pull/1018#discussion_r922674843 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -191,6 +191,69 @@ public long cost() { // or null if it is not applicable // pkg-private for forcing use of BooleanScorer in tests BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOException { +if (scoreMode == ScoreMode.TOP_SCORES) { + if (query.getMinimumNumberShouldMatch() > 1 || weightedClauses.size() > 2) { +return null; + } + + List optional = new ArrayList<>(); + for (WeightedBooleanClause wc : weightedClauses) { +Weight w = wc.weight; +BooleanClause c = wc.clause; +if (c.getOccur() != Occur.SHOULD) { + continue; +} +ScorerSupplier scorer = w.scorerSupplier(context); +if (scorer != null) { + optional.add(scorer); +} + } + + if (optional.size() <= 1) { +return null; + } + + List optionalScorers = new ArrayList<>(); + for (ScorerSupplier ss : optional) { +optionalScorers.add(ss.get(Long.MAX_VALUE)); + } + + return new BulkScorer() { Review Comment: It shouldn't be slower than the current code in `main` since `main` is using `DefaultBulkScorer`, is it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #1018: LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions
jpountz commented on code in PR #1018: URL: https://github.com/apache/lucene/pull/1018#discussion_r922675268 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -191,6 +191,69 @@ public long cost() { // or null if it is not applicable // pkg-private for forcing use of BooleanScorer in tests BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOException { +if (scoreMode == ScoreMode.TOP_SCORES) { + if (query.getMinimumNumberShouldMatch() > 1 || weightedClauses.size() > 2) { +return null; + } + + List optional = new ArrayList<>(); + for (WeightedBooleanClause wc : weightedClauses) { +Weight w = wc.weight; +BooleanClause c = wc.clause; +if (c.getOccur() != Occur.SHOULD) { + continue; +} +ScorerSupplier scorer = w.scorerSupplier(context); +if (scorer != null) { + optional.add(scorer); +} + } + + if (optional.size() <= 1) { +return null; + } + + List optionalScorers = new ArrayList<>(); + for (ScorerSupplier ss : optional) { +optionalScorers.add(ss.get(Long.MAX_VALUE)); + } + + return new BulkScorer() { +final Scorer bmmScorer = new BlockMaxMaxscoreScorer(BooleanWeight.this, optionalScorers); +final int maxDoc = context.reader().maxDoc(); +final DocIdSetIterator iterator = bmmScorer.iterator(); + +@Override +public int score(LeafCollector collector, Bits acceptDocs, int min, int max) +throws IOException { + max = Math.min(max, maxDoc); Review Comment: I don't think we need this, do tests fail without it? ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -191,6 +191,69 @@ public long cost() { // or null if it is not applicable // pkg-private for forcing use of BooleanScorer in tests BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOException { +if (scoreMode == ScoreMode.TOP_SCORES) { + if (query.getMinimumNumberShouldMatch() > 1 || weightedClauses.size() > 2) { +return null; + } + + List optional = new ArrayList<>(); + for (WeightedBooleanClause wc : weightedClauses) { +Weight w = wc.weight; +BooleanClause c = wc.clause; +if (c.getOccur() != Occur.SHOULD) { + continue; +} +ScorerSupplier scorer = w.scorerSupplier(context); +if (scorer != null) { + optional.add(scorer); +} + } + + if (optional.size() <= 1) { +return null; + } + + List optionalScorers = new ArrayList<>(); + for (ScorerSupplier ss : optional) { +optionalScorers.add(ss.get(Long.MAX_VALUE)); + } + + return new BulkScorer() { +final Scorer bmmScorer = new BlockMaxMaxscoreScorer(BooleanWeight.this, optionalScorers); +final int maxDoc = context.reader().maxDoc(); +final DocIdSetIterator iterator = bmmScorer.iterator(); + +@Override +public int score(LeafCollector collector, Bits acceptDocs, int min, int max) +throws IOException { + max = Math.min(max, maxDoc); + collector.setScorer(bmmScorer); + + for (int doc = min; doc < max; ) { +int advancedDoc = iterator.advance(doc); +if (advancedDoc == DocIdSetIterator.NO_MORE_DOCS) { + return DocIdSetIterator.NO_MORE_DOCS; +} else if (advancedDoc >= max) { + return max; +} + +if (acceptDocs == null || acceptDocs.get(advancedDoc)) { + collector.collect(advancedDoc); +} + +doc = advancedDoc + 1; + } + + return max == maxDoc ? DocIdSetIterator.NO_MORE_DOCS : max; Review Comment: Maybe we could remove the end condition from the for loop, so that we would hit the `if (advanceDoc >= max)` condition instead, and remove the above line? ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -191,6 +191,69 @@ public long cost() { // or null if it is not applicable // pkg-private for forcing use of BooleanScorer in tests BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOException { +if (scoreMode == ScoreMode.TOP_SCORES) { + if (query.getMinimumNumberShouldMatch() > 1 || weightedClauses.size() > 2) { +return null; + } + + List optional = new ArrayList<>(); + for (WeightedBooleanClause wc : weightedClauses) { +Weight w = wc.weight; +BooleanClause c = wc.clause; +if (c.getOccur() != Occur.SHOULD) { + continue; +} +ScorerSupplier scorer = w.scorerSupplier(context); +if (scorer != null) { + optional.add(scorer); +} + } + + if (optional.size() <= 1) { +return null; + } + + List optionalScorers = new Ar
[jira] [Commented] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?
[ https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567482#comment-17567482 ] Adrien Grand commented on LUCENE-10655: --- I've been wondering if using a simple int hash set would help. FixedBitSet is super efficient CPU-wise, but it also requires lots of memory on large segments while we typically only set a limited number of bits, so it can quickly become memory-bound for random access, like we do when building the graph. An int hash set should also be cheaper to clear. > can we optimize visited bitset usage in HNSW graph search/indexing? > --- > > Key: LUCENE-10655 > URL: https://issues.apache.org/jira/browse/LUCENE-10655 > Project: Lucene - Core > Issue Type: Improvement > Components: core/hnsw >Reporter: Michael Sokolov >Priority: Major > > When running {{luceneutil}} I noticed that {{FixedBitSet.clear()}} dominates > the CPU profiler output. I had a few ideas: > # In upper graph layers, the occupied nodes are very sparse - maybe > {{SparseFixedBitSet}} would be a better fit for those > # We are caching these bitsets, but they are only used for a single search > (single document insert, during indexing). Should we cache across searches? > We would need to pool them though, and they would vary by field since fields > can have different numbers of vector nodes. This starts to get complex > # Are we sure that clearing a bitset is more efficient than allocating a new > one? Maybe the JDK maintains a pool of already-zeroed memory for us > I think we could try specializing the bitset type by graph level, and then I > think we ought to measure the performance of allocation vs the limited reuse > that we currently have. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field
[ https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567485#comment-17567485 ] Adrien Grand commented on LUCENE-10633: --- Indeed the speedup is impressive. :) I should have noted that I had to tweak luceneutil to also index fields that were used for sorting so that the inverted index could be used to skip hits. This change is very similar to LUCENE-9280, which led to annotation DD on [https://home.apache.org/~mikemccand/lucenebench/TermDayOfYearSort.html] and https://home.apache.org/~mikemccand/lucenebench/TermDTSort.html. > Dynamic pruning for queries sorted by SORTED(_SET) field > > > Key: LUCENE-10633 > URL: https://issues.apache.org/jira/browse/LUCENE-10633 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > LUCENE-9280 introduced the ability to dynamically prune non-competitive hits > when sorting by a numeric field, by leveraging the points index to skip > documents that do not compare better than the top of the priority queue > maintained by the field comparator. > However queries sorted by a SORTED(_SET) field still look at all hits, which > is disappointing. Could we leverage the terms index to skip hits? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567486#comment-17567486 ] Michael McCandless commented on LUCENE-10557: - Hi [~tomoko] – I added you as a Jira Administrator so you can poke around if you want to. But I'll still try to figure out how to make Jira read-only. > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 50m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567492#comment-17567492 ] Tomoko Uchida commented on LUCENE-10557: [~mikemccand] thank you, I'm now able to edit the configuration. However, I am struggling with figuring out how to tweak the issue creation panel. I just wanted to set a placeholder or default value to the "Description" field... > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 50m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567495#comment-17567495 ] Tomoko Uchida commented on LUCENE-10557: it seems "Project admin" is not really allowed to do meaningful things, almost all components are shared between projects, and only Jira administrators can change them. > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 50m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567504#comment-17567504 ] Michael McCandless commented on LUCENE-10557: - OK hmm we will likely need Infra's help for this then. > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 50m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] hcqs33 opened a new pull request, #1026: Fix error in TieredMergePolicy
hcqs33 opened a new pull request, #1026: URL: https://github.com/apache/lucene/pull/1026 Fix error in comparing between bytes of candidates and bytes of max merge. It's wrong to use `candidateSize` rather than `currentCandidateBytes` comparing with `maxMergeBytes`. Minor change to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a diff in pull request #1018: LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions
zacharymorn commented on code in PR #1018: URL: https://github.com/apache/lucene/pull/1018#discussion_r922765220 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -191,6 +191,69 @@ public long cost() { // or null if it is not applicable // pkg-private for forcing use of BooleanScorer in tests BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOException { +if (scoreMode == ScoreMode.TOP_SCORES) { + if (query.getMinimumNumberShouldMatch() > 1 || weightedClauses.size() > 2) { +return null; + } + + List optional = new ArrayList<>(); + for (WeightedBooleanClause wc : weightedClauses) { +Weight w = wc.weight; +BooleanClause c = wc.clause; +if (c.getOccur() != Occur.SHOULD) { + continue; +} +ScorerSupplier scorer = w.scorerSupplier(context); +if (scorer != null) { + optional.add(scorer); +} + } + + if (optional.size() <= 1) { +return null; + } + + List optionalScorers = new ArrayList<>(); + for (ScorerSupplier ss : optional) { +optionalScorers.add(ss.get(Long.MAX_VALUE)); + } + + return new BulkScorer() { Review Comment: > It shouldn't be slower than the current code in main since main is using DefaultBulkScorer, is it? The baseline of all of the above benchmark results are still using the head prior to all BMM changes. Since this approach (anonymous bulk scorer + BMM scorer) still has similar performance boost with the previous one (just BMM scorer) for top-level disjunctions, but no impact to nested boolean queries, I would think so? I'm not sure I'm fully understanding this question though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a diff in pull request #1018: LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions
zacharymorn commented on code in PR #1018: URL: https://github.com/apache/lucene/pull/1018#discussion_r922765998 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -191,6 +191,69 @@ public long cost() { // or null if it is not applicable // pkg-private for forcing use of BooleanScorer in tests BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOException { +if (scoreMode == ScoreMode.TOP_SCORES) { + if (query.getMinimumNumberShouldMatch() > 1 || weightedClauses.size() > 2) { +return null; + } + + List optional = new ArrayList<>(); + for (WeightedBooleanClause wc : weightedClauses) { +Weight w = wc.weight; +BooleanClause c = wc.clause; +if (c.getOccur() != Occur.SHOULD) { + continue; +} +ScorerSupplier scorer = w.scorerSupplier(context); +if (scorer != null) { + optional.add(scorer); +} + } + + if (optional.size() <= 1) { +return null; + } + + List optionalScorers = new ArrayList<>(); + for (ScorerSupplier ss : optional) { +optionalScorers.add(ss.get(Long.MAX_VALUE)); + } + + return new BulkScorer() { +final Scorer bmmScorer = new BlockMaxMaxscoreScorer(BooleanWeight.this, optionalScorers); +final int maxDoc = context.reader().maxDoc(); +final DocIdSetIterator iterator = bmmScorer.iterator(); + +@Override +public int score(LeafCollector collector, Bits acceptDocs, int min, int max) +throws IOException { + max = Math.min(max, maxDoc); Review Comment: Yup this is indeed optional and tests didn't fail without it. I've removed it. ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -191,6 +191,69 @@ public long cost() { // or null if it is not applicable // pkg-private for forcing use of BooleanScorer in tests BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOException { +if (scoreMode == ScoreMode.TOP_SCORES) { + if (query.getMinimumNumberShouldMatch() > 1 || weightedClauses.size() > 2) { +return null; + } + + List optional = new ArrayList<>(); + for (WeightedBooleanClause wc : weightedClauses) { +Weight w = wc.weight; +BooleanClause c = wc.clause; +if (c.getOccur() != Occur.SHOULD) { + continue; +} +ScorerSupplier scorer = w.scorerSupplier(context); +if (scorer != null) { + optional.add(scorer); +} + } + + if (optional.size() <= 1) { +return null; + } + + List optionalScorers = new ArrayList<>(); + for (ScorerSupplier ss : optional) { +optionalScorers.add(ss.get(Long.MAX_VALUE)); + } + + return new BulkScorer() { +final Scorer bmmScorer = new BlockMaxMaxscoreScorer(BooleanWeight.this, optionalScorers); +final int maxDoc = context.reader().maxDoc(); +final DocIdSetIterator iterator = bmmScorer.iterator(); + +@Override +public int score(LeafCollector collector, Bits acceptDocs, int min, int max) +throws IOException { + max = Math.min(max, maxDoc); + collector.setScorer(bmmScorer); + + for (int doc = min; doc < max; ) { +int advancedDoc = iterator.advance(doc); +if (advancedDoc == DocIdSetIterator.NO_MORE_DOCS) { + return DocIdSetIterator.NO_MORE_DOCS; +} else if (advancedDoc >= max) { + return max; +} Review Comment: Updated. ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -191,6 +191,69 @@ public long cost() { // or null if it is not applicable // pkg-private for forcing use of BooleanScorer in tests BulkScorer optionalBulkScorer(LeafReaderContext context) throws IOException { +if (scoreMode == ScoreMode.TOP_SCORES) { + if (query.getMinimumNumberShouldMatch() > 1 || weightedClauses.size() > 2) { +return null; + } + + List optional = new ArrayList<>(); + for (WeightedBooleanClause wc : weightedClauses) { +Weight w = wc.weight; +BooleanClause c = wc.clause; +if (c.getOccur() != Occur.SHOULD) { + continue; +} +ScorerSupplier scorer = w.scorerSupplier(context); +if (scorer != null) { + optional.add(scorer); +} + } + + if (optional.size() <= 1) { +return null; + } + + List optionalScorers = new ArrayList<>(); + for (ScorerSupplier ss : optional) { +optionalScorers.add(ss.get(Long.MAX_VALUE)); + } + + return new BulkScorer() { +final Scorer bmmScorer = new BlockMaxMaxscoreScorer(BooleanWeight.this, optionalScorers); +final int maxDoc = context.reader().maxDoc(); +final DocIdSetIterator iterator = bmmScorer.iterator();
[GitHub] [lucene] zacharymorn commented on pull request #1018: LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions
zacharymorn commented on PR #1018: URL: https://github.com/apache/lucene/pull/1018#issuecomment-1186390917 > Thanks for explaining the motivation for the dedicated bulk scorer, I left some suggestions. No problem and thanks for the suggestions! I have incorporated them and like how clean the bulk scorer looks now! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #48: test issue with component
mocobeta opened a new issue, #48: URL: https://github.com/apache/lucene-jira-archive/issues/48 ### Description test ### Version and Environments _No response_ ### Lucene Component component:module/analysis -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta closed issue #48: test issue with component
mocobeta closed issue #48: test issue with component URL: https://github.com/apache/lucene-jira-archive/issues/48 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #1018: LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions
zacharymorn commented on PR #1018: URL: https://github.com/apache/lucene/pull/1018#issuecomment-1186398587 Here are the latest `wikinightly` benchmark results: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDateSSDVFacets3.98 (34.1%)3.73 (29.8%) -6.2% ( -52% - 87%) 0.541 OrHighMedDayTaxoFacets 24.64 (5.9%) 23.96 (9.5%) -2.7% ( -17% - 13%) 0.271 TermDTSort 342.77 (7.8%) 336.36 (4.7%) -1.9% ( -13% - 11%) 0.359 BrowseRandomLabelSSDVFacets 20.43 (9.3%) 20.06 (9.4%) -1.8% ( -18% - 18%) 0.539 TermBGroup1M1P 37.19 (7.0%) 36.72 (5.2%) -1.3% ( -12% - 11%) 0.521 AndHighHighDayTaxoFacets 12.29 (3.1%) 12.13 (2.9%) -1.3% ( -7% -4%) 0.191 MedTermDayTaxoFacets 75.53 (5.2%) 75.06 (5.3%) -0.6% ( -10% - 10%) 0.706 TermMonthSort 351.78 (6.0%) 349.61 (2.6%) -0.6% ( -8% -8%) 0.675 Fuzzy1 79.12 (2.5%) 78.71 (2.4%) -0.5% ( -5% -4%) 0.509 IntervalsOrdered 13.21 (3.1%) 13.14 (3.4%) -0.5% ( -6% -6%) 0.625 TermDateFacets 72.10 (5.6%) 71.78 (5.5%) -0.4% ( -10% - 11%) 0.797 TermTitleSort 350.94 (6.0%) 349.80 (2.8%) -0.3% ( -8% -8%) 0.826 PKLookup 322.25 (5.8%) 321.46 (4.3%) -0.2% ( -9% - 10%) 0.879 SpanNear 166.41 (3.5%) 166.06 (2.1%) -0.2% ( -5% -5%) 0.821 SloppyPhrase4.74 (4.4%)4.75 (3.7%)0.1% ( -7% -8%) 0.942 Term 3394.26 (5.0%) 3398.22 (5.5%)0.1% ( -9% - 11%) 0.944 AndMedOrHighHigh 70.98 (5.5%) 71.07 (5.5%)0.1% ( -10% - 11%) 0.945 AndHighMedDayTaxoFacets 121.81 (2.5%) 122.12 (2.3%)0.3% ( -4% -5%) 0.737 Phrase 38.19 (2.5%) 38.29 (2.2%)0.3% ( -4% -5%) 0.724 AndHighOrMedMed 120.53 (5.4%) 120.92 (5.4%)0.3% ( -9% - 11%) 0.849 Respell 91.05 (2.9%) 91.55 (2.5%)0.5% ( -4% -6%) 0.522 Fuzzy2 120.74 (2.5%) 121.46 (2.5%)0.6% ( -4% -5%) 0.453 AndHighHigh 99.32 (3.3%) 100.24 (3.7%)0.9% ( -5% -8%) 0.403 IntNRQ 1188.88 (3.2%) 1200.31 (3.4%)1.0% ( -5% -7%) 0.361 Wildcard 163.38 (7.0%) 165.12 (4.5%)1.1% ( -9% - 13%) 0.566 AndHighMed 156.13 (5.2%) 158.09 (5.1%)1.3% ( -8% - 12%) 0.439 TermDayOfYearSort 140.35 (3.1%) 142.36 (4.7%)1.4% ( -6% -9%) 0.255 BrowseDayOfYearSSDVFacets 26.19 (12.8%) 26.60 (11.7%)1.6% ( -20% - 29%) 0.686 TermGroup100 65.78 (2.5%) 66.85 (3.7%)1.6% ( -4% -8%) 0.109 BrowseMonthTaxoFacets 28.68 (34.4%) 29.16 (37.1%)1.7% ( -52% - 111%) 0.883 Prefix3 85.54 (6.6%) 87.24 (5.6%)2.0% ( -9% - 15%) 0.301 BrowseDayOfYearTaxoFacets 28.90 (30.4%) 29.64 (33.9%)2.6% ( -47% - 96%) 0.800 TermGroup10K 40.11 (3.8%) 41.31 (4.0%)3.0% ( -4% - 11%) 0.017 TermGroup1M 38.63 (3.8%) 39.82 (3.7%)3.1% ( -4% - 10%) 0.009 TermBGroup1M 46.33 (3.8%) 47.77 (4.5%)3.1% ( -5% - 11%) 0.019 BrowseDateTaxoFacets 28.50 (30.4%) 29.46 (34.6%)3.4% ( -47% - 98%) 0.745 BrowseMonthSSDVFacets 28.27 (14.7%) 29.70 (15.4%)5.0% ( -21% - 41%) 0.292 BrowseRandomLabelTaxoFacets 28.78 (50.1%) 30.70 (52.7%)6.7% ( -64% - 219%) 0.680 OrHighHigh 25.55 (5.9%) 37.99 (6.8%) 48.7% ( 34% - 65%) 0.000 OrHighMed 92.43 (6.4%) 210.19 (11.3%) 127.4% ( 103% - 155%) 0.000 ``` ``` TaskQPS baseline StdDevQPS my_modified_vers
[GitHub] [lucene] zacharymorn commented on pull request #1018: LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions
zacharymorn commented on PR #1018: URL: https://github.com/apache/lucene/pull/1018#issuecomment-1186399338 @jpountz If this approach to limiting BMM scorer to top-level disjunctions looks good to you, I can go ahead and update the corresponding tests to make this PR ready ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #49: Enable mention to comment authors
mocobeta opened a new pull request, #49: URL: https://github.com/apache/lucene-jira-archive/pull/49 #27  should be  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org