[GitHub] [lucene-jira-archive] mocobeta commented on issue #12: Make a test set for improving markup conversion quality
mocobeta commented on issue #12: URL: https://github.com/apache/lucene-jira-archive/issues/12#issuecomment-1180040078 > Indeed, there was at least one comment (I think?) where the author used Markdown (which does not work in Jira, yet many of us forget and use it anyway, just like seeing a naked `bq.` here on GitHub or in emails!) and then the rendering worked on migration! A surprising benefit of migration ;) I reviewed the converter library's code again. Your insight is correct - it seems there is no escaping for Markdown, so _if there are no extra space characters that interfere with the frontend_, Markdowns in Jira are rendered in GitHub (as the authors might expect). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #27: Improve the `Jira Information` header?
mocobeta commented on issue #27: URL: https://github.com/apache/lucene-jira-archive/issues/27#issuecomment-1180049119 For prototyping, it was the easiest way to embed the fixed template for Jira information in the conversion script for me... I agree that there are more sophisticated methods to flexibly generate the Jira information paragraph. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10645) Wrong autocomplete suggestion
[ https://issues.apache.org/jira/browse/LUCENE-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emiliyan Sinigerov updated LUCENE-10645: Description: I have problem with autocomplete suggestion (I use your test to show you where is the bug [https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java]). This is your test and everything works fine: public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); suggester.refresh(); List results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the pen is pretty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } But if I add this row to the test {*}suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong. public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); *suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz"));* suggester.refresh(); List results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the pen is pretty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } We want to find everything that contains "pen p" and we have just one matcher "the pen is pretty", but in the results we have two matches "the pen is pretty" and "the pen is fretty". I think when we want to find some words - in this study "pen" and the second word with one letter, which is the same as the first letter in our word - in this study "p", the suggester first match word "pen" and then match "p" in "pen", which is inccorect. We want to match "p" in a word other than "pen". was: I have problem with autocomplete suggestion (I use your test to show you where is the bug https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java). This is your test and everything works fine: public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); suggester.refresh(); List results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the pen is pretty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } But if I add this row to the test {*}suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong. public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); *suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz"));* suggester.refresh(); List results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the pen is pretty", results.
[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #31: Make converter script work without account mapping file
mocobeta opened a new pull request, #31: URL: https://github.com/apache/lucene-jira-archive/pull/31 I have a second thought about this. It may be better to work the converter script regardless of whether there is an account mapping file or not (it's not a critical part of the converter). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta merged pull request #31: Make converter script work without account mapping file
mocobeta merged PR #31: URL: https://github.com/apache/lucene-jira-archive/pull/31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie opened a new pull request, #1015: [LUCENE-10629]: Add fast match query support to FacetSets
shaie opened a new pull request, #1015: URL: https://github.com/apache/lucene/pull/1015 ### Description (or a Jira issue link if you have one) Add `fastMatchQuery` support to `MatchingFacetSetCounts` to improve counting efficiency in case of many possible facet-set indexed combinations, where only a small subset of that is of interest during counting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #32: only escape HTML tags
mocobeta opened a new pull request, #32: URL: https://github.com/apache/lucene-jira-archive/pull/32 Follow-up of #23. To avoid unintentional escaping, escape only HTML tag-like texts (``) and preserve other `<`, `>`, and `&`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta merged pull request #32: only escape HTML tags
mocobeta merged PR #32: URL: https://github.com/apache/lucene-jira-archive/pull/32 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #14: Investigate import failure of LUCENE-1498
mocobeta commented on issue #14: URL: https://github.com/apache/lucene-jira-archive/issues/14#issuecomment-1180072639 The quick workaround (manual recovering) should work. I'm closing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta closed issue #14: Investigate import failure of LUCENE-1498
mocobeta closed issue #14: Investigate import failure of LUCENE-1498 URL: https://github.com/apache/lucene-jira-archive/issues/14 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts
[ https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564830#comment-17564830 ] Shai Erera commented on LUCENE-10629: - Oh [~stefanvodita] I didn't refresh the issue for a while and missed your PR! I pushed my PR and only then refreshed this page, sorry about that. Let's find a way to merge our PRs since I've also added example to the demo package and more tests. > Add fastMatchQuery param to MatchingFacetSetCounts > -- > > Key: LUCENE-10629 > URL: https://issues.apache.org/jira/browse/LUCENE-10629 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a > {{fastMatchQuery}} parameter in order to quickly and efficiently filter out > documents in the passed in match set. We should create this same parameter in > {{MatchingFacetSetCounts}} as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #1001: LUCENE-10629: Add fastMatchQuery to MatchingFacetSetCounts
shaie commented on code in PR #1001: URL: https://github.com/apache/lucene/pull/1001#discussion_r917641237 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java: ## @@ -76,8 +92,12 @@ private int count(String field, List matchingDocs) BinaryDocValues binaryDocValues = DocValues.getBinary(hits.context.reader(), field); - final DocIdSetIterator it = - ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), binaryDocValues)); + DocIdSetIterator it = createIterator(hits); Review Comment: Yes I agree, for that reason I decided not to do it in [my PR](https://github.com/apache/lucene/pull/1015). I don't think the base collector helps us much at this point. It's not a lot of code duplication and as you note it prevents us from optimizing the conjunction? BTW, as I wrote on the issue I totally missed this PR when I pushed my PR. Let's find a way to merge the two! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions
[ https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564885#comment-17564885 ] Adrien Grand commented on LUCENE-10480: --- I haven't tried to reproduce it but the steps you took by running on wikibigall with the nightly tasks file sound good to me. Another thing that changes performance sometimes is the doc ID order, were you using multiple indexing threads maybe? Ignoring the fact that we cannot reproduce the slowdown, if I try to think of the main differences between WANDScorer and BlockMaxMaxscoreScorer for AndHighOrMedMed, I think the main one is the way that {{advanceShallow}} is computed. Conjunctions use block boundaries of the clause that has the lowest cost, so this could explain why we are seeing a slowdown with AndHighOrMedMed (since the conjunction uses block boundaries of OrMedMed) and not AndMedOrHighHigh (since the conjunction uses block boundaries of Med). Maybe we could explore other approaches for {{advanceShallow}} such as taking the minimum block boundary across essential clauses only instead of all clauses. > Specialize 2-clauses disjunctions > - > > Key: LUCENE-10480 > URL: https://issues.apache.org/jira/browse/LUCENE-10480 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 7h 20m > Remaining Estimate: 0h > > WANDScorer is nice, but it also has lots of overhead to maintain its > invariants: one linked list for the current candidates, one priority queue of > scorers that are behind, another one for scorers that are ahead. All this > could be simplified in the 2-clauses case, which feels worth specializing for > as it's very common that end users enter queries that only have two terms? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #1014: Add comment for no pauses in RateLimitedIndexOutput.writeBytes
jpountz merged PR #1014: URL: https://github.com/apache/lucene/pull/1014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #1011: LUCENE-10647: Fix TestMergeSchedulerExternal failures
jpountz merged PR #1011: URL: https://github.com/apache/lucene/pull/1011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10647) Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564896#comment-17564896 ] ASF subversion and git services commented on LUCENE-10647: -- Commit 128869d63aef6a448af991fa2768113a560a8dbc in lucene's branch refs/heads/main from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=128869d63ae ] LUCENE-10647: Fix TestMergeSchedulerExternal failures (#1011) Ensure mergeScheduler.sync() gets called before we rollback the writer. > Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler > -- > > Key: LUCENE-10647 > URL: https://issues.apache.org/jira/browse/LUCENE-10647 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Vigya Sharma >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Recent builds are intermittently failing on > TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler. Example: > https://jenkins.thetaphi.de/job/Lucene-main-Linux/35576/testReport/junit/org.apache.lucene/TestMergeSchedulerExternal/testSubclassConcurrentMergeScheduler/ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10647) Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564900#comment-17564900 ] ASF subversion and git services commented on LUCENE-10647: -- Commit 190cfbc65c66be807d6c61291500a6fdcf9a975e in lucene's branch refs/heads/branch_9x from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=190cfbc65c6 ] LUCENE-10647: Fix TestMergeSchedulerExternal failures (#1011) Ensure mergeScheduler.sync() gets called before we rollback the writer. > Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler > -- > > Key: LUCENE-10647 > URL: https://issues.apache.org/jira/browse/LUCENE-10647 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Vigya Sharma >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Recent builds are intermittently failing on > TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler. Example: > https://jenkins.thetaphi.de/job/Lucene-main-Linux/35576/testReport/junit/org.apache.lucene/TestMergeSchedulerExternal/testSubclassConcurrentMergeScheduler/ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new pull request, #33: Polish wording of Legacy Jira details header, and each comment footer
mikemccand opened a new pull request, #33: URL: https://github.com/apache/lucene-jira-archive/pull/33 This is a start at #27 but I expect to iterate some more. Progress not perfection! Now the header is more compact and looks like this for issues w/ no attachments, PRs, etc:  And then with PRs and attachments:  I've only tested on 100 issues so far ... I'll run the full export and conversion to confirm I didn't break anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mikemccand commented on PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180310425 Oh also note that I added another dependency (`dateutil`), very helpful for parsing ISO-8601 dates. I couldn't (quickly) figure out how to reliably do this with Python's `datetime`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10647) Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10647. --- Fix Version/s: 9.3 Resolution: Fixed > Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler > -- > > Key: LUCENE-10647 > URL: https://issues.apache.org/jira/browse/LUCENE-10647 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Vigya Sharma >Priority: Major > Fix For: 9.3 > > Time Spent: 20m > Remaining Estimate: 0h > > Recent builds are intermittently failing on > TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler. Example: > https://jenkins.thetaphi.de/job/Lucene-main-Linux/35576/testReport/junit/org.apache.lucene/TestMergeSchedulerExternal/testSubclassConcurrentMergeScheduler/ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on a diff in pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mocobeta commented on code in PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#discussion_r917859701 ## migration/src/jira2github_import.py: ## @@ -69,45 +70,53 @@ def convert_issue(num: int, dump_dir: Path, output_dir: Path, account_map: dict[ attachment_list_items = [] att_replace_map = {} for (filename, cnt) in attachments: -attachment_list_items.append(f"- [{filename}]({attachment_url(num, filename, att_repo, att_branch)})" + (f" (versions: {cnt})\n" if cnt > 1 else "\n")) +attachment_list_items.append(f"[{filename}]({attachment_url(num, filename, att_repo, att_branch)})" + (f" (versions: {cnt})" if cnt > 1 else "")) att_replace_map[filename] = attachment_url(num, filename, att_repo, att_branch) +print(f'{jira_id}: attachments: {attachment_list_items}') Review Comment: I think this print() is added for debugging and should be suppressed? ```suggestion # print(f'{jira_id}: attachments: {attachment_list_items}') ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on a diff in pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mikemccand commented on code in PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#discussion_r917872038 ## migration/src/jira2github_import.py: ## @@ -69,45 +70,53 @@ def convert_issue(num: int, dump_dir: Path, output_dir: Path, account_map: dict[ attachment_list_items = [] att_replace_map = {} for (filename, cnt) in attachments: -attachment_list_items.append(f"- [{filename}]({attachment_url(num, filename, att_repo, att_branch)})" + (f" (versions: {cnt})\n" if cnt > 1 else "\n")) +attachment_list_items.append(f"[{filename}]({attachment_url(num, filename, att_repo, att_branch)})" + (f" (versions: {cnt})" if cnt > 1 else "")) att_replace_map[filename] = attachment_url(num, filename, att_repo, att_branch) +print(f'{jira_id}: attachments: {attachment_list_items}') Review Comment: Woops sorry yes I'll remove all the prints I added! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
Nathan Meisels created LUCENE-10650: --- Summary: "after_effect": "no" was removed what replaces it? Key: LUCENE-10650 URL: https://issues.apache.org/jira/browse/LUCENE-10650 Project: Lucene - Core Issue Type: Wish Reporter: Nathan Meisels Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see here that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {{}} {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} {{}} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I(n) should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); I tried changing to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Meisels updated LUCENE-10650: Description: Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see here that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {{}} {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} {{}} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: {code:java} long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); {code} I tried changing to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks was: Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see here that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {{}} {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} {{}} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I(n) should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); I tried changing to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see here that "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {{}} > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > {{}} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing to "l" but the scoring is different than what we are used > to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mikemccand commented on PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180351403 BTW, as I run the full Jira download, I see errors like this: ``` [2022-07-11 07:57:25,815] WARNING:download_jira: Can't download LUCENE-498. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} [2022-07-11 07:59:10,096] WARNING:download_jira: Can't download LUCENE-613. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} [2022-07-11 07:59:10,978] WARNING:download_jira: Can't download LUCENE-614. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} [2022-07-11 07:59:13,615] WARNING:download_jira: Can't download LUCENE-617. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} [2022-07-11 08:10:36,059] WARNING:download_jira: Can't download LUCENE-1362. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} [2022-07-11 08:10:36,932] WARNING:download_jira: Can't download LUCENE-1363. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} [2022-07-11 08:10:37,798] WARNING:download_jira: Can't download LUCENE-1364. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} [2022-07-11 08:26:22,112] WARNING:download_jira: Can't download LUCENE-2375. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} [2022-07-11 08:27:02,304] WARNING:download_jira: Can't download LUCENE-2418. status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}} ``` Indeed, when I search Jira itself, my email, the internet, `LUCENE-2418` seems not to exist. I wonder what happened? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Meisels updated LUCENE-10650: Description: Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see here that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: {code:java} long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); {code} I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks was: Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see here that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {{}} {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} {{}} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: {code:java} long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); {code} I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see here that "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional command
[jira] [Updated] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Meisels updated LUCENE-10650: Description: Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see here that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {{}} {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} {{}} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: {code:java} long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); {code} I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks was: Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see here that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {{}} {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} {{}} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: {code:java} long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); {code} I tried changing to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see here that "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {{}} > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > {{}} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: is
[jira] [Updated] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Meisels updated LUCENE-10650: Description: Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: {code:java} long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); {code} I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks was: Hi! We have been using an old version of elasticsearch with the following settings: {code:java} "default": { "queryNorm": "1", "type": "DFR", "basic_model": "in", "after_effect": "no", "normalization": "no" }{code} I see here that "after_effect": "no" was removed. In [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] version score was: {code:java} return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} In [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] version it's: {code:java} long N = stats.getNumberOfDocuments(); long n = stats.getDocFreq(); double A = log2((N + 1) / (n + 0.5)); // basic model I should return A * tfn // which we rewrite to A * (1 + tfn) - A // so that it can be combined with the after effect while still guaranteeing // that the result is non-decreasing with tfn return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); {code} I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is different than what we are used to. (We depend heavily on the exact scoring). Do you have any advice how we can keep the same scoring as before? Thanks > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) ---
[GitHub] [lucene-jira-archive] mocobeta commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mocobeta commented on PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180354136 Yes, I also noticed several issues do not exist (not sure why); in that case, the script just emits an error and proceeds to the next issue as you see. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564988#comment-17564988 ] Adrien Grand commented on LUCENE-10650: --- Hi Nathan. When we introduced dynamic pruning to Lucene, we also introduced the requirement that similarities produce scores that are non-decreasing when tf increases or when the length norm decreases (all other things equal). Unfortunately, this property could not be retained while keeping DFR similarities pluggable as they were so we removed support for the no after effect and only retained L and B. It looks like this specific similarity that you are looking for could still be implemented in a way that scores are non-decreasing with increasing tf or decreasing norm, so you should be able to re-implement it using a scripted similarity for instance (https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html#scripted_similarity) with something like below (untested): {code} "similarity": { "my_dfr_sim": { "type": "scripted", "weight_script": { "source": "return query.boost * Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);" }, "script": { "source": "return weight * doc.freq;" } } } {code} > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10650. --- Resolution: Won't Fix > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565003#comment-17565003 ] Nathan Meisels commented on LUCENE-10650: - Thanks for the answer! Just to clarify: query.boost * # Which part is this? Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 1) / (n + 0.5))) /Math.log(2); # Is this equal to tfn? Thanks! > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565003#comment-17565003 ] Nathan Meisels edited comment on LUCENE-10650 at 7/11/22 1:26 PM: -- Thanks for the answer! Just to clarify: {code:java} query.boost * # Which part is this? Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 1) / (n + 0.5))) /Math.log(2); # Is this equal to tfn?{code} Thanks! was (Author: JIRAUSER292626): Thanks for the answer! Just to clarify: query.boost * # Which part is this? Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 1) / (n + 0.5))) /Math.log(2); # Is this equal to tfn? Thanks! > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Meisels reopened LUCENE-10650: - > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10650) "after_effect": "no" was removed what replaces it?
[ https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565003#comment-17565003 ] Nathan Meisels edited comment on LUCENE-10650 at 7/11/22 1:54 PM: -- Thanks for the answer! 1. Will latency be higher using a scripted similarity? 2. Just to clarify: {code:java} query.boost * # Which part is this? Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 1) / (n + 0.5))) /Math.log(2); # Is this equal to tfn?{code} Thanks! was (Author: JIRAUSER292626): Thanks for the answer! Just to clarify: {code:java} query.boost * # Which part is this? Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 1) / (n + 0.5))) /Math.log(2); # Is this equal to tfn?{code} Thanks! > "after_effect": "no" was removed what replaces it? > -- > > Key: LUCENE-10650 > URL: https://issues.apache.org/jira/browse/LUCENE-10650 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nathan Meisels >Priority: Major > > Hi! > We have been using an old version of elasticsearch with the following > settings: > > {code:java} > "default": { > "queryNorm": "1", > "type": "DFR", > "basic_model": "in", > "after_effect": "no", > "normalization": "no" > }{code} > > I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that > "after_effect": "no" was removed. > In > [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33] > version score was: > {code:java} > return tfn * (float)(log2((N + 1) / (n + 0.5)));{code} > In > [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43] > version it's: > {code:java} > long N = stats.getNumberOfDocuments(); > long n = stats.getDocFreq(); > double A = log2((N + 1) / (n + 0.5)); > // basic model I should return A * tfn > // which we rewrite to A * (1 + tfn) - A > // so that it can be combined with the after effect while still guaranteeing > // that the result is non-decreasing with tfn > return A * aeTimes1pTfn * (1 - 1 / (1 + tfn)); > {code} > I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is > different than what we are used to. (We depend heavily on the exact scoring). > Do you have any advice how we can keep the same scoring as before? > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #34: Add a tool to generate account mapping
mocobeta opened a new pull request, #34: URL: https://github.com/apache/lucene-jira-archive/pull/34 #3 This adds a helper tool to create a Jira user - GitHub account mapping file; this is used in "Convert Jira issues to GitHub issues" step. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
gsmiller commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r918110247 ## lucene/demo/src/java/org/apache/lucene/demo/facet/RangeFacetsExample.java: ## @@ -73,6 +76,35 @@ public void index() throws IOException { indexWriter.addDocument(doc); } +// Add documents with a fake timestamp, 3600 sec (1 hour) after "now", 7200 sec (2 +// hours) after "now", ...: +long startTime = 0; +// Index error messages since a week (24 * 7 = 168 hours) ago +for (int i = 0; i < 168; i++) { + long endTime = startTime + (i + 1) * 3600; + + // Choose a relatively larger number, e,g., "35", in order to create variation in count for + // the top-n children, so that getTopChildren(10) in the searchTopChildren functionality + // can return children with different counts + for (int j = 0; j < i % 35; j++) { +Document doc = new Document(); +// index document at a different timestamp by using endTime - i * j Review Comment: OK, thank you! I like the idea and I think it's close, but I think we can come up with a simpler way. I think you could randomly distribute the "data points" within each hour when you index without impacting testing at all. The facet counts should remain the same regardless of how the data points are distributed, so testing should be stable I think? So maybe we hit a compromise that uses a stable number of data points per hour time period (which you could do with your modulus operation if you like) but then randomly jitter the data within each hour block? But yeah, let's take it up as a follow on issue. Would you mind linking that here once you create it so the conversation is easier to follow for future readers? Thanks again for all the hard work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
gsmiller merged PR #974: URL: https://github.com/apache/lucene/pull/974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts
[ https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565075#comment-17565075 ] ASF subversion and git services commented on LUCENE-10614: -- Commit 5ef7e5025def61cf20442806486c8f6102ebcdc4 in lucene's branch refs/heads/main from Yuting Gan [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5ef7e5025de ] LUCENE-10614: Properly support getTopChildren in RangeFacetCounts (#974) > Properly support getTopChildren in RangeFacetCounts > --- > > Key: LUCENE-10614 > URL: https://issues.apache.org/jira/browse/LUCENE-10614 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 10.0 (main) >Reporter: Greg Miller >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing > {{getTopChildren}}. Instead of returning "top" ranges, it returns all > user-provided ranges in the order the user specified them when instantiating. > This is probably more useful functionality, but it would be nice to support > {{getTopChildren}} as well. > LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that > lands, we can replace the current implementation of {{getTopChildren}} with > an actual "top children" implementation and direct users to > {{getAllChildren}} if they want to maintain the current behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on pull request #34: Add a tool to generate account mapping
mocobeta commented on PR #34: URL: https://github.com/apache/lucene-jira-archive/pull/34#issuecomment-1180601283 FYI @mikemccand @dweiss I will keep this open for a while and do some more extensive tests on that (this is a helper tool that should not block/conflict with the main scripts). If you have suggestions for generating account mapping, please review this when you have some time. I think there is room to improve in this simplistic approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts
[ https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565081#comment-17565081 ] ASF subversion and git services commented on LUCENE-10614: -- Commit d6dbe4374a5229b827613b85066f3a4da91d5f27 in lucene's branch refs/heads/main from Greg Miller [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d6dbe4374a5 ] Move LUCENE-10614 CHANGES entry to 10.0 and add MIGRATE entry > Properly support getTopChildren in RangeFacetCounts > --- > > Key: LUCENE-10614 > URL: https://issues.apache.org/jira/browse/LUCENE-10614 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 10.0 (main) >Reporter: Greg Miller >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing > {{getTopChildren}}. Instead of returning "top" ranges, it returns all > user-provided ranges in the order the user specified them when instantiating. > This is probably more useful functionality, but it would be nice to support > {{getTopChildren}} as well. > LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that > lands, we can replace the current implementation of {{getTopChildren}} with > an actual "top children" implementation and direct users to > {{getAllChildren}} if they want to maintain the current behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts
[ https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565084#comment-17565084 ] Greg Miller commented on LUCENE-10614: -- Thanks again [~yutinggan] ! > Properly support getTopChildren in RangeFacetCounts > --- > > Key: LUCENE-10614 > URL: https://issues.apache.org/jira/browse/LUCENE-10614 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 10.0 (main) >Reporter: Greg Miller >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing > {{getTopChildren}}. Instead of returning "top" ranges, it returns all > user-provided ranges in the order the user specified them when instantiating. > This is probably more useful functionality, but it would be nice to support > {{getTopChildren}} as well. > LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that > lands, we can replace the current implementation of {{getTopChildren}} with > an actual "top children" implementation and direct users to > {{getAllChildren}} if they want to maintain the current behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts
[ https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller resolved LUCENE-10614. -- Fix Version/s: 10.0 (main) Resolution: Fixed > Properly support getTopChildren in RangeFacetCounts > --- > > Key: LUCENE-10614 > URL: https://issues.apache.org/jira/browse/LUCENE-10614 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 10.0 (main) >Reporter: Greg Miller >Priority: Minor > Fix For: 10.0 (main) > > Time Spent: 4h 10m > Remaining Estimate: 0h > > As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing > {{getTopChildren}}. Instead of returning "top" ranges, it returns all > user-provided ranges in the order the user specified them when instantiating. > This is probably more useful functionality, but it would be nice to support > {{getTopChildren}} as well. > LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that > lands, we can replace the current implementation of {{getTopChildren}} with > an actual "top children" implementation and direct users to > {{getAllChildren}} if they want to maintain the current behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts
[ https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565083#comment-17565083 ] Greg Miller commented on LUCENE-10614: -- Just merged this to {{{}main{}}}. I don't think we should backport this to 9.x since it is a functional change to an existing API. Because of this, I moved the CHANGES entry under 10.0 and added an entry to MIGRATE describing the difference and how to retain the 9.x functionality if desired. > Properly support getTopChildren in RangeFacetCounts > --- > > Key: LUCENE-10614 > URL: https://issues.apache.org/jira/browse/LUCENE-10614 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 10.0 (main) >Reporter: Greg Miller >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing > {{getTopChildren}}. Instead of returning "top" ranges, it returns all > user-provided ranges in the order the user specified them when instantiating. > This is probably more useful functionality, but it would be nice to support > {{getTopChildren}} as well. > LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that > lands, we can replace the current implementation of {{getTopChildren}} with > an actual "top children" implementation and direct users to > {{getAllChildren}} if they want to maintain the current behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts
[ https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565089#comment-17565089 ] Yuting Gan commented on LUCENE-10614: - Thank you so much [~gsmiller] ! > Properly support getTopChildren in RangeFacetCounts > --- > > Key: LUCENE-10614 > URL: https://issues.apache.org/jira/browse/LUCENE-10614 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 10.0 (main) >Reporter: Greg Miller >Priority: Minor > Fix For: 10.0 (main) > > Time Spent: 4h 10m > Remaining Estimate: 0h > > As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing > {{getTopChildren}}. Instead of returning "top" ranges, it returns all > user-provided ranges in the order the user specified them when instantiating. > This is probably more useful functionality, but it would be nice to support > {{getTopChildren}} as well. > LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that > lands, we can replace the current implementation of {{getTopChildren}} with > an actual "top children" implementation and direct users to > {{getAllChildren}} if they want to maintain the current behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi opened a new pull request, #1016: LUCENE-10646: Add some comment on LevenshteinAutomata
tang-hi opened a new pull request, #1016: URL: https://github.com/apache/lucene/pull/1016 [https://issues.apache.org/jira/browse/LUCENE-10646](JIRA) 1. I have add some comment on Lev1ParametricDescription, hope it will help others better understand the code of Lev2ParametricDescription, Lev2TParametricDescription 2. I use breadth first search to pretty the Automaton#toDot. For example,LevenshteinAutomata of "abcd" before  after  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points
shahrs87 commented on PR #907: URL: https://github.com/apache/lucene/pull/907#issuecomment-1180628206 @jpountz Hi Adrian, can you please make one more pass over the PR and provide your feedback ? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mikemccand commented on PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180758900 Good thing I tested on all issues -- I hit a couple fun exceptions -- so please don't push this PR just yet! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10651) SimpleQueryParser stack overflow for large nested queries.
Marc created LUCENE-10651: - Summary: SimpleQueryParser stack overflow for large nested queries. Key: LUCENE-10651 URL: https://issues.apache.org/jira/browse/LUCENE-10651 Project: Lucene - Core Issue Type: Bug Affects Versions: 9.2, 8.10, 9.1, 9.3 Reporter: Marc The OpenSearch project received an issue [1] where stack overflow can occur for large nested boolean queries during rewrite. In trying to reproduce this error I've also encountered SO during parsing where queries expand beyond the default 1024 clause limit. This unit test will fail with SO: {code:java} public void testSimpleQueryParserWithTooManyClauses() { StringBuilder queryString = new StringBuilder("foo"); for (int i = 0; i < 1024; i++) { queryString.append(" | bar").append(i).append(" + baz"); } expectThrows(IndexSearcher.TooManyClauses.class, () -> parse(queryString.toString())); } {code} I would expect this case to also fail with TooManyClauses, is my understanding correct? If so, I've attempted a fix [2] that during parsing increments a counter whenever a clause is added. [1] [https://github.com/opensearch-project/OpenSearch/issues/3760] [2] [https://github.com/mch2/lucene/commit/6a558f17f448b92ae4cf8c43e0b759ff7425acdf] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mikemccand commented on PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180771389 Somehow I am hitting a stack overflow when trying to convert [LUCENE-550](https://issues.apache.org/jira/browse/LUCENE-550)! It doesn't look like a particularly challenging issue to convert :) ``` (.venv) beast3:migration[polish_legacy_jira]$ python src/jira2github_import.py --min 1 --max 10649 [2022-07-11 15:01:02,826] INFO:jira2github_import: Converting Jira issues to GitHub issues in /l/jira-github-migration/migration/github-import-data [2022-07-11 15:10:25,306] WARNING:jira2github_import: Jira dump file not found: /l/jira-github-migration/migration/jira-dump/LUCENE-498.json ERROR: unhandled exception while converting LUCENE-550 Traceback (most recent call last): File "/l/jira-github-migration/migration/src/jira2github_import.py", line 229, in convert_issue(num, dump_dir, output_dir, account_map, github_att_repo, github_att_branch) File "/l/jira-github-migration/migration/src/jira2github_import.py", line 133, in convert_issue comment_body = f"""{convert_text(comment_body, att_replace_map, account_map)} File "/l/jira-github-migration/migration/src/jira_util.py", line 216, in convert_text text = jira2markdown.convert(text, elements=elements) File "/l/jira-github-migration/.venv/lib/python3.10/site-packages/jira2markdown/parser.py", line 20, in convert return markup.transformString(text) File "/l/jira-github-migration/.venv/lib/python3.10/site-packages/pyparsing.py", line 2059, in transformString for t, s, e in self.scanString(instring): File "/l/jira-github-migration/.venv/lib/python3.10/site-packages/pyparsing.py", line 2007, in scanString nextLoc, tokens = parseFn(instring, preloc, callPreParse=False) File "/l/jira-github-migration/.venv/lib/python3.10/site-packages/pyparsing.py", line 1683, in _parseNoCache loc, tokens = self.parseImpl(instring, preloc, doActions) File "/l/jira-github-migration/.venv/lib/python3.10/site-packages/pyparsing.py", line 4462, in parseImpl return self.expr._parse(
[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mikemccand commented on PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180785030 Hmm this seems to be an issue on `main` as well. This is what I'm running to trigger it: `python src/jira2github_import.py --min 550`. I'll catch the exception (trying to convert one comment text) and try to best-effort continue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] stefanvodita commented on a diff in pull request #1015: [LUCENE-10629]: Add fast match query support to FacetSets
stefanvodita commented on code in PR #1015: URL: https://github.com/apache/lucene/pull/1015#discussion_r918289599 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java: ## @@ -52,8 +52,10 @@ public MatchingFacetSetsCounts( String field, FacetsCollector hits, FacetSetDecoder facetSetDecoder, + Query fastMatchQuery, Review Comment: What do you think of preserving the constructor without `fastMatchQuery`? It would avoid adding that `null` to all existing (and possibly some future) uses. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10652) Add a top-n range faceting example to RangeFacetsExample
Yuting Gan created LUCENE-10652: --- Summary: Add a top-n range faceting example to RangeFacetsExample Key: LUCENE-10652 URL: https://issues.apache.org/jira/browse/LUCENE-10652 Project: Lucene - Core Issue Type: Improvement Reporter: Yuting Gan In LUCENE-10614, we modified the behavior of getTopChildren to actually return top-n ranges ordered by count. The original behavior of getTopChildren in RangeFacetsCounts was to return all ranges ordered by constructor-specified range order, and this behavior is now retained in the getAllChildren API (LUCENE-10550). Therefore, it would be helpful to add an example in RangeFacetsExample to demo this change. I replaced the original example of getTopChildren with getAllChildren, and will add an example of the current getTopChildren API soon. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
Yuti-G commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r918297532 ## lucene/demo/src/java/org/apache/lucene/demo/facet/RangeFacetsExample.java: ## @@ -73,6 +76,35 @@ public void index() throws IOException { indexWriter.addDocument(doc); } +// Add documents with a fake timestamp, 3600 sec (1 hour) after "now", 7200 sec (2 +// hours) after "now", ...: +long startTime = 0; +// Index error messages since a week (24 * 7 = 168 hours) ago +for (int i = 0; i < 168; i++) { + long endTime = startTime + (i + 1) * 3600; + + // Choose a relatively larger number, e,g., "35", in order to create variation in count for + // the top-n children, so that getTopChildren(10) in the searchTopChildren functionality + // can return children with different counts + for (int j = 0; j < i % 35; j++) { +Document doc = new Document(); +// index document at a different timestamp by using endTime - i * j Review Comment: I've created a spin off [issue](https://issues.apache.org/jira/browse/LUCENE-10652) to add a get top-n range faceting example to demo. Thanks again for reviewing my pr and left all the detailed and constructive feedback! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] MarcusSorealheis commented on pull request #940: Use similarity.tf() in MoreLikeThis
MarcusSorealheis commented on PR #940: URL: https://github.com/apache/lucene/pull/940#issuecomment-1180816282 Is there anything else needed here? Is there something we can add to improve the robustness of the quality check? Please advise us @rmuir and @mocobeta -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mikemccand commented on PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180836771 This is the comment that stack overflows during conversion: ``` A note on, and output from contrib/benchmark: I'm getting really poor results compared to my own test and live enviroment stats. At query time I expected maximum 1/6th time spent in InstantiatedIndex than RAMDirectory, but it turns out that in the be\ nchmarker the speed is almost the same as RAMDirectory. Retrieving documents is only 1/5th of the speed rather than maximum 1/60th as expected. Investigated the code a bit and noticed that ReadTask creates a new instance of IndexReader and IndexSearcher for each query. Could this be the reason? Memory consumption is 3x of a RAMDirectory, but half of the memory is spent on keeping the Document instances in heap. Perhaps it would be interesting to use the same persistency for these as in the Direc\ tory implementations. The merge factor sweet spot is around 2500, where it turns out to be a little bit faster than the RAMDirectory sweet spot. At defualt 10 InstantiatedIndex consumes about 5x more time than a RAMDirectory. \ If I fix the locklessness as suggested in previous comment, it most probably will be much faster than a RAMDirectory at any setting. /** * The sweet spot for this implementation is at 2500. * * Benchmark output: * * > Report sum by Prefix (MAddDocs) and Round (8 about 8 out of 160153) * Operation round mrg buf cmpnd runCnt recsPerRunrec/s elapsedSecavgUsedMemavgTotalMem * MAddDocs_2 0 10 10 true12 81,4 245,68 200 325 152268 156 928 * MAddDocs_2 - 1 1000 10 true - - 1 - - 2 - - 494,1 - - 40,47 - 247 119 072 - 347 025 408 * MAddDocs_2 2 10 100 true12104,8 190,81 233 895 552363 720 704
[jira] [Created] (LUCENE-10653) Should BlockMaxMaxscoreScorer rebuild its heap in bulk?
Greg Miller created LUCENE-10653: Summary: Should BlockMaxMaxscoreScorer rebuild its heap in bulk? Key: LUCENE-10653 URL: https://issues.apache.org/jira/browse/LUCENE-10653 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Greg Miller BMMScorer has to frequently rebuild its heap, and does do by clearing and then iteratively calling {{{}add{}}}. It would be more efficient to heapify in bulk. This is more academic than anything right now though since BMMScorer is only used with two-clause disjunctions, so it's sort of a silly optimization if it's not supporting a greater number of clauses. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #947: LUCENE-10577: enable quantization of HNSW vectors to 8 bits
msokolov commented on PR #947: URL: https://github.com/apache/lucene/pull/947#issuecomment-1180963745 I'm looking to address various comments; just pushed a commit that makes the vector encoding explicit by adding a new enum and parameter "vectorEncoding", splitting this out from "similarityFunction". > During merging when writing a merged vector field it looks like we first expand vector values only to again to compress them later? Would be nice to avoid this. Oh good catch, @mayya-sharipova I will look into addressing this. > Not sure if possible at this stage, but it would be nice if HnswGraphBuilder and HnswGraphSearcher were not aware of different calculations needed for different similarity functions, and refer all this calculations (dotProduct(BytesRef a..)) to VectorSimilarityFunction I don't see how to do this efficiently (without many conversions from byte to float) and neatly (without code duplication in tricky algorithmic areas) and with complete API purity, so I sacrificed some purity. If you have any ideas how to do it better, I'm open to changing it though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #947: LUCENE-10577: enable quantization of HNSW vectors to 8 bits
msokolov commented on PR #947: URL: https://github.com/apache/lucene/pull/947#issuecomment-1180965832 Also - if anybody has advice about how to rebase while maintaining this PR I'd be interested. Should I `git merge` from `main`?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #947: LUCENE-10577: enable quantization of HNSW vectors to 8 bits
msokolov commented on PR #947: URL: https://github.com/apache/lucene/pull/947#issuecomment-1181017377 > During merging when writing a merged vector field it looks like we first expand vector values only to again to compress them later? Would be nice to avoid this. In fact after checking, I don't think we are doing this expand/compress step *even though getVectorValues() returns `ExpandingVectorValues` to the merger*. This is because the merger uses the `binaryValue()` call to write the vectors themselves, and that value is unchanged by EVV, and the graph is created by the (now) polymorphic hnsw utils that also call `binaryValue()` when they are dealing with a field encoded as bytes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10653) Should BlockMaxMaxscoreScorer rebuild its heap in bulk?
[ https://issues.apache.org/jira/browse/LUCENE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565195#comment-17565195 ] Greg Miller commented on LUCENE-10653: -- Here's essentially what I'm thinking: https://github.com/gsmiller/lucene/commit/597a760d6c0b0524ba1d72c290689e4dc4b4b9e9 > Should BlockMaxMaxscoreScorer rebuild its heap in bulk? > --- > > Key: LUCENE-10653 > URL: https://issues.apache.org/jira/browse/LUCENE-10653 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Minor > > BMMScorer has to frequently rebuild its heap, and does do by clearing and > then iteratively calling {{{}add{}}}. It would be more efficient to heapify > in bulk. This is more academic than anything right now though since BMMScorer > is only used with two-clause disjunctions, so it's sort of a silly > optimization if it's not supporting a greater number of clauses. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #1013: LUCENE-10644: Facets#getAllChildren testing should ignore child order
gsmiller commented on code in PR #1013: URL: https://github.com/apache/lucene/pull/1013#discussion_r918423465 ## lucene/facet/src/test/org/apache/lucene/facet/FacetTestCase.java: ## @@ -264,4 +264,24 @@ protected void assertFloatValuesEquals(FacetResult a, FacetResult b) { a.labelValues[i].value.floatValue() / 1e5); } } + + protected void assertNumericValuesEquals(Number a, Number b) { +assertTrue(a.getClass().isInstance(b)); +if (a instanceof Float) { + assertEquals(a.floatValue(), b.floatValue(), a.floatValue() / 1e5); +} else if (a instanceof Double) { + assertEquals(a.doubleValue(), b.doubleValue(), a.doubleValue() / 1e5); +} else { + assertEquals(a, b); +} + } + + protected void assertAllChildrenEqualsWithoutOrdering(FacetResult a, FacetResult b) { Review Comment: The naming of this method leads me to believe it's only going to validate the children, but it's checking dims, paths, etc. I wonder if we shouldn't name it something more generic? Also, it feels a little weird to me that callers have to create a `FacetResult` for their expected data to use this method. I wonder if it would be easier to have a signature like this: ``` protected void assertFacetResult(String expectedDim, String[] expectedPath, int expectedChildCount, Number expectedValue, LabelAndValue... expectedChildren) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer
mocobeta commented on PR #33: URL: https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1181218297 It looks like a bug introduced in https://github.com/apache/lucene-jira-archive/commit/cfbc821390859a7053e43028325b6bc616ec2b5b. (I have postponed testing it with the whole Jira dump.) I'll take a look at it. > I'll catch the exception (trying to convert one comment text) and try to best-effort continue. Sorry there should have been a "catch all" try~except clause. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10471) Increase the number of dims for KNN vectors to 2048
[ https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565240#comment-17565240 ] Stanislav commented on LUCENE-10471: I don't think there is a trend to increase dimensionality. Only few models have feature dimensions more than 2048. Most of modern neural networks (ViT and whole Bert family) have dimensions less than 1k. However there are still many models like ms-resnet or EfficientNet that operate in range from 1k to 2048. And they are most common models for image embedding and vector search. Current limit is forcing to do dimensionally reduction for pretty standard shapes. > Increase the number of dims for KNN vectors to 2048 > --- > > Key: LUCENE-10471 > URL: https://issues.apache.org/jira/browse/LUCENE-10471 > Project: Lucene - Core > Issue Type: Wish >Reporter: Mayya Sharipova >Priority: Trivial > Time Spent: 40m > Remaining Estimate: 0h > > The current maximum allowed number of dimensions is equal to 1024. But we see > in practice a couple well-known models that produce vectors with > 1024 > dimensions (e.g > [mobilenet_v2|https://tfhub.dev/google/imagenet/mobilenet_v2_035_224/feature_vector/1] > uses 1280d vectors, OpenAI / GPT-3 Babbage uses 2048d vectors). Increasing > max dims to `2048` will satisfy these use cases. > I am wondering if anybody has strong objections against this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10628) Enable MatchingFacetSetCounts to use space partitioning data structures
[ https://issues.apache.org/jira/browse/LUCENE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565241#comment-17565241 ] Marc D'Mello commented on LUCENE-10628: --- I started work on this issue but I was informed that [~ivera] is experienced with space partitioning algorithms and might have some pointers in Lucene where I can find examples of KD trees and R trees, so I'm just tagging you here in case you have any tips/pointers for this issue before I get too deep into it :). Thanks! > Enable MatchingFacetSetCounts to use space partitioning data structures > --- > > Key: LUCENE-10628 > URL: https://issues.apache.org/jira/browse/LUCENE-10628 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Marc D'Mello >Priority: Minor > > Currently, {{MatchingFacetSetCounts}} iterates over {{FacetSetMatcher}} > instances passed into it linearly. While this is fine in some cases, if we > have a large amount of {{FacetSetMatcher}}'s, this can be inefficient. We > should provide the option to users to enable the use of space partitioning > data structures (namely R trees and KD trees) so we can potentially scan over > these {{FacetSetMatcher}}'s in sub-linear time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #35: Catch all exceptions (and proceed to the nexe issue) in jira2github_import.py
mocobeta opened a new pull request, #35: URL: https://github.com/apache/lucene-jira-archive/pull/35 Added try-catch so that it does not stop with a conversion failure/error. ``` (.venv) migration $ python src/jira2github_import.py --issues 550 [2022-07-12 12:09:06,759] INFO:jira2github_import: Converting Jira issues to GitHub issues in /mnt/hdd/repo/lucene-jira-archive/migration/github-import-data [2022-07-12 12:09:35,785] ERROR:jira2github_import: Traceback (most recent call last): File "/mnt/hdd/repo/lucene-jira-archive/migration/src/jira2github_import.py", line 216, in convert_issue(num, dump_dir, output_dir, account_map, github_att_repo, github_att_branch) File "/mnt/hdd/repo/lucene-jira-archive/migration/src/jira2github_import.py", line 121, in convert_issue "body": f"""{convert_text(comment_body, att_replace_map, account_map)} File "/mnt/hdd/repo/lucene-jira-archive/migration/src/jira_util.py", line 216, in convert_text text = jira2markdown.convert(text, elements=elements) File "/mnt/hdd/repo/lucene-jira-archive/migration/.venv/lib/python3.9/site-packages/jira2markdown/parser.py", line 20, in convert return markup.transformString(text) File "/mnt/hdd/repo/lucene-jira-archive/migration/.venv/lib/python3.9/site-packages/pyparsing.py", line 2059, in transformString for t, s, e in self.scanString(instring): File "/mnt/hdd/repo/lucene-jira-archive/migration/.venv/lib/python3.9/site-packages/pyparsing.py", line 2007, in scanString nextLoc, tokens = parseFn(instring, preloc, callPreParse=False) ... RecursionError: maximum recursion depth exceeded [2022-07-12 12:09:35,786] ERROR:jira2github_import: Failed to convert Jira issue. An error 'maximum recursion depth exceeded' occurred; skipped LUCENE-550. [2022-07-12 12:09:35,786] INFO:jira2github_import: Done. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta merged pull request #35: Catch all exceptions (and proceed to the nexe issue) in jira2github_import.py
mocobeta merged PR #35: URL: https://github.com/apache/lucene-jira-archive/pull/35 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #1015: [LUCENE-10629]: Add fast match query support to FacetSets
shaie commented on code in PR #1015: URL: https://github.com/apache/lucene/pull/1015#discussion_r918535845 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java: ## @@ -52,8 +52,10 @@ public MatchingFacetSetsCounts( String field, FacetsCollector hits, FacetSetDecoder facetSetDecoder, + Query fastMatchQuery, Review Comment: Sure, added it back -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10480) Specialize 2-clauses disjunctions
[ https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565261#comment-17565261 ] Zach Chen edited comment on LUCENE-10480 at 7/12/22 4:27 AM: - {quote}Another thing that changes performance sometimes is the doc ID order, were you using multiple indexing threads maybe? {quote} Ok this is actually the case for me. I was previously using 10 threads to index (INDEX_NUM_THREADS = 10) , and after I commented that out and reindexed with default setting, I was able to reproduce the slowdown: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value AndHighOrMedMed 91.27 (4.3%) 85.52 (4.3%) -6.3% ( -14% - 2%) 0.000 PKLookup 333.25 (4.3%) 329.48 (3.8%) -1.1% ( -8% - 7%) 0.380 AndHighHigh 104.25 (2.9%) 103.11 (3.0%) -1.1% ( -6% - 5%) 0.247 SpanNear 16.52 (3.8%) 16.36 (3.1%) -0.9% ( -7% - 6%) 0.396 TermGroup10K 23.99 (3.3%) 23.78 (3.0%) -0.9% ( -6% - 5%) 0.384 Phrase 234.74 (2.7%) 232.71 (1.8%) -0.9% ( -5% - 3%) 0.235 AndHighMed 163.80 (3.5%) 162.42 (4.3%) -0.8% ( -8% - 7%) 0.496 TermBGroup1M 48.02 (3.5%) 47.65 (3.7%) -0.8% ( -7% - 6%) 0.496 SloppyPhrase 4.82 (3.4%) 4.78 (2.7%) -0.7% ( -6% - 5%) 0.460 TermGroup100 41.90 (3.9%) 41.63 (3.3%) -0.7% ( -7% - 6%) 0.569 Term 2680.42 (4.7%) 2664.05 (3.3%) -0.6% ( -8% - 7%) 0.632 TermGroup1M 39.95 (2.9%) 39.71 (3.2%) -0.6% ( -6% - 5%) 0.531 TermBGroup1M1P 84.21 (6.1%) 83.82 (5.7%) -0.5% ( -11% - 12%) 0.801 Respell 113.78 (1.9%) 113.44 (1.7%) -0.3% ( -3% - 3%) 0.603 BrowseRandomLabelSSDVFacets 20.75 (8.2%) 20.74 (10.3%) -0.0% ( -17% - 20%) 0.989 Fuzzy2 83.12 (1.8%) 83.11 (1.1%) -0.0% ( -2% - 2%) 0.976 BrowseDayOfYearSSDVFacets 26.69 (12.0%) 26.70 (11.6%) 0.0% ( -21% - 26%) 0.995 Wildcard 115.84 (5.1%) 115.96 (5.8%) 0.1% ( -10% - 11%) 0.951 TermDayOfYearSort 260.70 (5.4%) 260.99 (2.8%) 0.1% ( -7% - 8%) 0.937 AndHighMedDayTaxoFacets 136.32 (2.6%) 136.63 (2.3%) 0.2% ( -4% - 5%) 0.773 IntervalsOrdered 128.13 (7.5%) 128.45 (7.7%) 0.3% ( -13% - 16%) 0.916 AndHighHighDayTaxoFacets 13.82 (2.8%) 13.87 (2.6%) 0.4% ( -4% - 5%) 0.657 Fuzzy1 79.16 (2.7%) 79.60 (1.8%) 0.6% ( -3% - 5%) 0.433 TermMonthSort 360.17 (6.4%) 362.83 (7.1%) 0.7% ( -11% - 15%) 0.728 TermTitleSort 191.21 (6.8%) 192.70 (7.1%) 0.8% ( -12% - 15%) 0.723 TermDTSort 208.40 (2.9%) 210.39 (2.9%) 1.0% ( -4% - 7%) 0.301 MedTermDayTaxoFacets 78.66 (5.2%) 79.59 (4.4%) 1.2% ( -7% - 11%) 0.436 TermDateFacets 41.04 (5.4%) 41.61 (4.7%) 1.4% ( -8% - 12%) 0.385 IntNRQ 122.00 (8.1%) 124.08 (8.3%) 1.7% ( -13% - 19%) 0.513 OrHighMedDayTaxoFacets 23.16 (8.4%) 23.71 (4.9%) 2.4% ( -10% - 17%) 0.272 BrowseMonthSSDVFacets 28.68 (13.8%) 29.55 (16.8%) 3.0% ( -24% - 39%) 0.531 BrowseDayOfYearTaxoFacets 30.40 (32.2%) 31.67 (34.2%) 4.2% ( -47% - 103%) 0.690 BrowseDateTaxoFacets 30.26 (32.2%) 31.57 (34.4%) 4.3% ( -47% - 104%) 0.680 Prefix3 402.14 (8.6%) 419.96 (8.9%) 4.4% ( -12% - 23%) 0.109 AndMedOrHighHigh 94.79 (4.0%) 99.03 (4.5%) 4.5% ( -3% - 13%) 0.001 BrowseRandomLabelTaxoFacets 32.45 (49.2%) 35.05 (53.4%) 8.0% ( -63% - 217%) 0.622 BrowseMonthTaxoFacets 28.68 (35.3%) 31.37 (39.1%) 9.4% ( -48% - 129%) 0.425 BrowseDateSSDVFacets 3.96 (28.1%) 4.54
[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions
[ https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565261#comment-17565261 ] Zach Chen commented on LUCENE-10480: {quote}Another thing that changes performance sometimes is the doc ID order, were you using multiple indexing threads maybe? {quote} Ok this is actually the case for me. I was previously using 10 threads to index (INDEX_NUM_THREADS = 10) , and after I commented that out and reindexed with default setting, I was able to reproduce the slowdown: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value AndHighOrMedMed 91.27 (4.3%) 85.52 (4.3%) -6.3% ( -14% - 2%) 0.000 PKLookup 333.25 (4.3%) 329.48 (3.8%) -1.1% ( -8% - 7%) 0.380 AndHighHigh 104.25 (2.9%) 103.11 (3.0%) -1.1% ( -6% - 5%) 0.247 SpanNear 16.52 (3.8%) 16.36 (3.1%) -0.9% ( -7% - 6%) 0.396 TermGroup10K 23.99 (3.3%) 23.78 (3.0%) -0.9% ( -6% - 5%) 0.384 Phrase 234.74 (2.7%) 232.71 (1.8%) -0.9% ( -5% - 3%) 0.235 AndHighMed 163.80 (3.5%) 162.42 (4.3%) -0.8% ( -8% - 7%) 0.496 TermBGroup1M 48.02 (3.5%) 47.65 (3.7%) -0.8% ( -7% - 6%) 0.496 SloppyPhrase 4.82 (3.4%) 4.78 (2.7%) -0.7% ( -6% - 5%) 0.460 TermGroup100 41.90 (3.9%) 41.63 (3.3%) -0.7% ( -7% - 6%) 0.569 Term 2680.42 (4.7%) 2664.05 (3.3%) -0.6% ( -8% - 7%) 0.632 TermGroup1M 39.95 (2.9%) 39.71 (3.2%) -0.6% ( -6% - 5%) 0.531 TermBGroup1M1P 84.21 (6.1%) 83.82 (5.7%) -0.5% ( -11% - 12%) 0.801 Respell 113.78 (1.9%) 113.44 (1.7%) -0.3% ( -3% - 3%) 0.603 BrowseRandomLabelSSDVFacets 20.75 (8.2%) 20.74 (10.3%) -0.0% ( -17% - 20%) 0.989 Fuzzy2 83.12 (1.8%) 83.11 (1.1%) -0.0% ( -2% - 2%) 0.976 BrowseDayOfYearSSDVFacets 26.69 (12.0%) 26.70 (11.6%) 0.0% ( -21% - 26%) 0.995 Wildcard 115.84 (5.1%) 115.96 (5.8%) 0.1% ( -10% - 11%) 0.951 TermDayOfYearSort 260.70 (5.4%) 260.99 (2.8%) 0.1% ( -7% - 8%) 0.937 AndHighMedDayTaxoFacets 136.32 (2.6%) 136.63 (2.3%) 0.2% ( -4% - 5%) 0.773 IntervalsOrdered 128.13 (7.5%) 128.45 (7.7%) 0.3% ( -13% - 16%) 0.916 AndHighHighDayTaxoFacets 13.82 (2.8%) 13.87 (2.6%) 0.4% ( -4% - 5%) 0.657 Fuzzy1 79.16 (2.7%) 79.60 (1.8%) 0.6% ( -3% - 5%) 0.433 TermMonthSort 360.17 (6.4%) 362.83 (7.1%) 0.7% ( -11% - 15%) 0.728 TermTitleSort 191.21 (6.8%) 192.70 (7.1%) 0.8% ( -12% - 15%) 0.723 TermDTSort 208.40 (2.9%) 210.39 (2.9%) 1.0% ( -4% - 7%) 0.301 MedTermDayTaxoFacets 78.66 (5.2%) 79.59 (4.4%) 1.2% ( -7% - 11%) 0.436 TermDateFacets 41.04 (5.4%) 41.61 (4.7%) 1.4% ( -8% - 12%) 0.385 IntNRQ 122.00 (8.1%) 124.08 (8.3%) 1.7% ( -13% - 19%) 0.513 OrHighMedDayTaxoFacets 23.16 (8.4%) 23.71 (4.9%) 2.4% ( -10% - 17%) 0.272 BrowseMonthSSDVFacets 28.68 (13.8%) 29.55 (16.8%) 3.0% ( -24% - 39%) 0.531 BrowseDayOfYearTaxoFacets 30.40 (32.2%) 31.67 (34.2%) 4.2% ( -47% - 103%) 0.690 BrowseDateTaxoFacets 30.26 (32.2%) 31.57 (34.4%) 4.3% ( -47% - 104%) 0.680 Prefix3 402.14 (8.6%) 419.96 (8.9%) 4.4% ( -12% - 23%) 0.109 AndMedOrHighHigh 94.79 (4.0%) 99.03 (4.5%) 4.5% ( -3% - 13%) 0.001 BrowseRandomLabelTaxoFacets 32.45 (49.2%) 35.05 (53.4%) 8.0% ( -63% - 217%) 0.622 BrowseMonthTaxoFacets 28.68 (35.3%) 31.37 (39.1%) 9.4% ( -48% - 129%) 0.425 BrowseDateSSDVFacets 3.96 (28.1%) 4.54 (26.3%) 14.7% ( -31% - 96%) 0.089
[jira] [Comment Edited] (LUCENE-10480) Specialize 2-clauses disjunctions
[ https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565261#comment-17565261 ] Zach Chen edited comment on LUCENE-10480 at 7/12/22 4:27 AM: - {quote}Another thing that changes performance sometimes is the doc ID order, were you using multiple indexing threads maybe? {quote} Ok this is actually the case for me. I was previously using 10 threads to index (INDEX_NUM_THREADS = 10) , and after I commented that out and reindexed with default setting, I was able to reproduce the slowdown: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value AndHighOrMedMed 91.27 (4.3%) 85.52 (4.3%) -6.3% ( -14% - 2%) 0.000 PKLookup 333.25 (4.3%) 329.48 (3.8%) -1.1% ( -8% - 7%) 0.380 AndHighHigh 104.25 (2.9%) 103.11 (3.0%) -1.1% ( -6% - 5%) 0.247 SpanNear 16.52 (3.8%) 16.36 (3.1%) -0.9% ( -7% - 6%) 0.396 TermGroup10K 23.99 (3.3%) 23.78 (3.0%) -0.9% ( -6% - 5%) 0.384 Phrase 234.74 (2.7%) 232.71 (1.8%) -0.9% ( -5% - 3%) 0.235 AndHighMed 163.80 (3.5%) 162.42 (4.3%) -0.8% ( -8% - 7%) 0.496 TermBGroup1M 48.02 (3.5%) 47.65 (3.7%) -0.8% ( -7% - 6%) 0.496 SloppyPhrase 4.82 (3.4%) 4.78 (2.7%) -0.7% ( -6% - 5%) 0.460 TermGroup100 41.90 (3.9%) 41.63 (3.3%) -0.7% ( -7% - 6%) 0.569 Term 2680.42 (4.7%) 2664.05 (3.3%) -0.6% ( -8% - 7%) 0.632 TermGroup1M 39.95 (2.9%) 39.71 (3.2%) -0.6% ( -6% - 5%) 0.531 TermBGroup1M1P 84.21 (6.1%) 83.82 (5.7%) -0.5% ( -11% - 12%) 0.801 Respell 113.78 (1.9%) 113.44 (1.7%) -0.3% ( -3% - 3%) 0.603 BrowseRandomLabelSSDVFacets 20.75 (8.2%) 20.74 (10.3%) -0.0% ( -17% - 20%) 0.989 Fuzzy2 83.12 (1.8%) 83.11 (1.1%) -0.0% ( -2% - 2%) 0.976 BrowseDayOfYearSSDVFacets 26.69 (12.0%) 26.70 (11.6%) 0.0% ( -21% - 26%) 0.995 Wildcard 115.84 (5.1%) 115.96 (5.8%) 0.1% ( -10% - 11%) 0.951 TermDayOfYearSort 260.70 (5.4%) 260.99 (2.8%) 0.1% ( -7% - 8%) 0.937 AndHighMedDayTaxoFacets 136.32 (2.6%) 136.63 (2.3%) 0.2% ( -4% - 5%) 0.773 IntervalsOrdered 128.13 (7.5%) 128.45 (7.7%) 0.3% ( -13% - 16%) 0.916 AndHighHighDayTaxoFacets 13.82 (2.8%) 13.87 (2.6%) 0.4% ( -4% - 5%) 0.657 Fuzzy1 79.16 (2.7%) 79.60 (1.8%) 0.6% ( -3% - 5%) 0.433 TermMonthSort 360.17 (6.4%) 362.83 (7.1%) 0.7% ( -11% - 15%) 0.728 TermTitleSort 191.21 (6.8%) 192.70 (7.1%) 0.8% ( -12% - 15%) 0.723 TermDTSort 208.40 (2.9%) 210.39 (2.9%) 1.0% ( -4% - 7%) 0.301 MedTermDayTaxoFacets 78.66 (5.2%) 79.59 (4.4%) 1.2% ( -7% - 11%) 0.436 TermDateFacets 41.04 (5.4%) 41.61 (4.7%) 1.4% ( -8% - 12%) 0.385 IntNRQ 122.00 (8.1%) 124.08 (8.3%) 1.7% ( -13% - 19%) 0.513 OrHighMedDayTaxoFacets 23.16 (8.4%) 23.71 (4.9%) 2.4% ( -10% - 17%) 0.272 BrowseMonthSSDVFacets 28.68 (13.8%) 29.55 (16.8%) 3.0% ( -24% - 39%) 0.531 BrowseDayOfYearTaxoFacets 30.40 (32.2%) 31.67 (34.2%) 4.2% ( -47% - 103%) 0.690 BrowseDateTaxoFacets 30.26 (32.2%) 31.57 (34.4%) 4.3% ( -47% - 104%) 0.680 Prefix3 402.14 (8.6%) 419.96 (8.9%) 4.4% ( -12% - 23%) 0.109 AndMedOrHighHigh 94.79 (4.0%) 99.03 (4.5%) 4.5% ( -3% - 13%) 0.001 BrowseRandomLabelTaxoFacets 32.45 (49.2%) 35.05 (53.4%) 8.0% ( -63% - 217%) 0.622 BrowseMonthTaxoFacets 28.68 (35.3%) 31.37 (39.1%) 9.4% ( -48% - 129%) 0.425 BrowseDateSSDVFacets 3.96 (28.1%) 4.54
[GitHub] [lucene] mocobeta commented on pull request #940: Use similarity.tf() in MoreLikeThis
mocobeta commented on PR #940: URL: https://github.com/apache/lucene/pull/940#issuecomment-1181301533 Personally, I'd love to commit this to the upstream branch. I think we'd need a reproducible quality check (or regression test?) in Lucene as Robert suggested; I just haven't been able to take enough time to look at it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] stefanvodita commented on a diff in pull request #1015: [LUCENE-10629]: Add fast match query support to FacetSets
stefanvodita commented on code in PR #1015: URL: https://github.com/apache/lucene/pull/1015#discussion_r918597529 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java: ## @@ -52,8 +52,10 @@ public MatchingFacetSetsCounts( String field, FacetsCollector hits, FacetSetDecoder facetSetDecoder, + Query fastMatchQuery, Review Comment: Thanks! I'm happy with the PR as it is now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org