date:20220711

[GitHub] [lucene-jira-archive] mocobeta commented on issue #12: Make a test set for improving markup conversion quality

2022-07-11 Thread GitBox



mocobeta commented on issue #12:
URL: 
https://github.com/apache/lucene-jira-archive/issues/12#issuecomment-1180040078

   > Indeed, there was at least one comment (I think?) where the author used 
Markdown (which does not work in Jira, yet many of us forget and use it anyway, 
just like seeing a naked `bq.` here on GitHub or in emails!) and then the 
rendering worked on migration! A surprising benefit of migration ;)
   
   I reviewed the converter library's code again. Your insight is correct - it 
seems there is no escaping for Markdown, so _if there are no extra space 
characters that interfere with the frontend_, Markdowns in Jira are rendered in 
GitHub (as the authors might expect).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta commented on issue #27: Improve the `Jira Information` header?

2022-07-11 Thread GitBox



mocobeta commented on issue #27:
URL: 
https://github.com/apache/lucene-jira-archive/issues/27#issuecomment-1180049119

   For prototyping, it was the easiest way to embed the fixed template for Jira 
information in the conversion script for me... I agree that there are more 
sophisticated methods to flexibly generate the Jira information paragraph.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10645) Wrong autocomplete suggestion

2022-07-11 Thread Emiliyan Sinigerov (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emiliyan Sinigerov updated LUCENE-10645:

Description: 
I have problem with autocomplete suggestion (I use your test to show you where 
is the bug 
[https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java]).

This is your test and everything works fine:

public void testBothExactAndPrefix() throws Exception

{     Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); 
    AnalyzingInfixSuggester suggester = new 
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);     
suggester.build(new InputArrayIterator(new Input[0]));     suggester.add(new 
BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz"));     
suggester.refresh();     List results =         
suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, 
true);     assertEquals(1, results.size());     assertEquals("the pen is 
pretty", results.get(0).key);     assertEquals("the pen is 
pretty", results.get(0).highlightKey);     assertEquals(10, 
results.get(0).value);     assertEquals(new BytesRef("foobaz"), 
results.get(0).payload);     suggester.close();     a.close();  }

 

But if I add this row to the test {*}suggester.add(new BytesRef("the pen is 
fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong.

public void testBothExactAndPrefix() throws Exception

{   Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);   
AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), 
a, a, 3, false);   suggester.build(new InputArrayIterator(new Input[0]));   
suggester.add(new BytesRef("the pen is pretty"), null, 10, new 
BytesRef("foobaz"));   *suggester.add(new BytesRef("the pen is fretty"), null, 
10, new BytesRef("foobaz"));*   suggester.refresh();   List 
results =       suggester.lookup(TestUtil.stringToCharSequence("pen p", 
random()), 10, true, true);   assertEquals(1, results.size());   
assertEquals("the pen is pretty", results.get(0).key);   assertEquals("the 
pen is pretty", results.get(0).highlightKey);   assertEquals(10, 
results.get(0).value);   assertEquals(new BytesRef("foobaz"), 
results.get(0).payload);   suggester.close();   a.close(); }

We want to find everything that contains "pen p" and we have just one matcher 
"the pen is pretty", but in the results we have two matches "the pen is pretty" 
and "the pen is fretty".

I think when we want to find some words - in this study "pen" and the second 
word with one letter, which is the same as the first letter in our word - in 
this study "p", the suggester first match word "pen" and then match "p" in 
"pen", which is inccorect. We want to match "p" in a word other than "pen".

  was:
I have problem with autocomplete suggestion (I use your test to show you where 
is the bug 
https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java).

This is your test and everything works fine:

public void testBothExactAndPrefix() throws Exception {
    Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
    AnalyzingInfixSuggester suggester = new 
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);
    suggester.build(new InputArrayIterator(new Input[0]));
    suggester.add(new BytesRef("the pen is pretty"), null, 10, new 
BytesRef("foobaz"));
    suggester.refresh();

    List results =
        suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, 
true, true);
    assertEquals(1, results.size());
    assertEquals("the pen is pretty", results.get(0).key);
    assertEquals("the pen is pretty", 
results.get(0).highlightKey);
    assertEquals(10, results.get(0).value);
    assertEquals(new BytesRef("foobaz"), results.get(0).payload);
    suggester.close();
    a.close();
 }

 

But if I add this row to the test {*}suggester.add(new BytesRef("the pen is 
fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong.

public void testBothExactAndPrefix() throws Exception {
  Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
  AnalyzingInfixSuggester suggester = new 
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);
  suggester.build(new InputArrayIterator(new Input[0]));
  suggester.add(new BytesRef("the pen is pretty"), null, 10, new 
BytesRef("foobaz"));
  *suggester.add(new BytesRef("the pen is fretty"), null, 10, new 
BytesRef("foobaz"));*

  suggester.refresh();

  List results =
      suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, 
true, true);
  assertEquals(1, results.size());
  assertEquals("the pen is pretty", results.get(0).key);
  assertEquals("the pen is pretty", results.

[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #31: Make converter script work without account mapping file

2022-07-11 Thread GitBox



mocobeta opened a new pull request, #31:
URL: https://github.com/apache/lucene-jira-archive/pull/31

   I have a second thought about this. It may be better to work the converter 
script regardless of whether there is an account mapping file or not (it's not 
a critical part of the converter).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta merged pull request #31: Make converter script work without account mapping file

2022-07-11 Thread GitBox



mocobeta merged PR #31:
URL: https://github.com/apache/lucene-jira-archive/pull/31


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shaie opened a new pull request, #1015: [LUCENE-10629]: Add fast match query support to FacetSets

2022-07-11 Thread GitBox



shaie opened a new pull request, #1015:
URL: https://github.com/apache/lucene/pull/1015

   ### Description (or a Jira issue link if you have one)
   
   Add `fastMatchQuery` support to `MatchingFacetSetCounts` to improve counting 
efficiency in case of many possible facet-set indexed combinations, where only 
a small subset of that is of interest during counting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #32: only escape HTML tags

2022-07-11 Thread GitBox



mocobeta opened a new pull request, #32:
URL: https://github.com/apache/lucene-jira-archive/pull/32

   Follow-up of #23.
   To avoid unintentional escaping, escape only HTML tag-like texts (``) and preserve other `<`, `>`, and `&`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta merged pull request #32: only escape HTML tags

2022-07-11 Thread GitBox



mocobeta merged PR #32:
URL: https://github.com/apache/lucene-jira-archive/pull/32


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta commented on issue #14: Investigate import failure of LUCENE-1498

2022-07-11 Thread GitBox



mocobeta commented on issue #14:
URL: 
https://github.com/apache/lucene-jira-archive/issues/14#issuecomment-1180072639

   The quick workaround (manual recovering) should work. I'm closing this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta closed issue #14: Investigate import failure of LUCENE-1498

2022-07-11 Thread GitBox



mocobeta closed issue #14: Investigate import failure of LUCENE-1498
URL: https://github.com/apache/lucene-jira-archive/issues/14


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-07-11 Thread Shai Erera (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564830#comment-17564830
 ] 

Shai Erera commented on LUCENE-10629:
-

Oh [~stefanvodita] I didn't refresh the issue for a while and missed your PR! I 
pushed my PR and only then refreshed this page, sorry about that. Let's find a 
way to merge our PRs since I've also added example to the demo package and more 
tests.

> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shaie commented on a diff in pull request #1001: LUCENE-10629: Add fastMatchQuery to MatchingFacetSetCounts

2022-07-11 Thread GitBox



shaie commented on code in PR #1001:
URL: https://github.com/apache/lucene/pull/1001#discussion_r917641237


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java:
##
@@ -76,8 +92,12 @@ private int count(String field, 
List matchingDocs)
 
   BinaryDocValues binaryDocValues = 
DocValues.getBinary(hits.context.reader(), field);
 
-  final DocIdSetIterator it =
-  
ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), 
binaryDocValues));
+  DocIdSetIterator it = createIterator(hits);

Review Comment:
   Yes I agree, for that reason I decided not to do it in [my 
PR](https://github.com/apache/lucene/pull/1015). I don't think the base 
collector helps us much at this point. It's not a lot of code duplication and 
as you note it prevents us from optimizing the conjunction?
   
   BTW, as I wrote on the issue I totally missed this PR when I pushed my PR. 
Let's find a way to merge the two!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-11 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564885#comment-17564885
 ] 

Adrien Grand commented on LUCENE-10480:
---

I haven't tried to reproduce it but the steps you took by running on wikibigall 
with the nightly tasks file sound good to me. Another thing that changes 
performance sometimes is the doc ID order, were you using multiple indexing 
threads maybe?

Ignoring the fact that we cannot reproduce the slowdown, if I try to think of 
the main differences between WANDScorer and BlockMaxMaxscoreScorer for 
AndHighOrMedMed, I think the main one is the way that {{advanceShallow}} is 
computed. Conjunctions use block boundaries of the clause that has the lowest 
cost, so this could explain why we are seeing a slowdown with AndHighOrMedMed 
(since the conjunction uses block boundaries of OrMedMed) and not 
AndMedOrHighHigh (since the conjunction uses block boundaries of Med). Maybe we 
could explore other approaches for {{advanceShallow}} such as taking the 
minimum block boundary across essential clauses only instead of all clauses.

> Specialize 2-clauses disjunctions
> -
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #1014: Add comment for no pauses in RateLimitedIndexOutput.writeBytes

2022-07-11 Thread GitBox



jpountz merged PR #1014:
URL: https://github.com/apache/lucene/pull/1014


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #1011: LUCENE-10647: Fix TestMergeSchedulerExternal failures

2022-07-11 Thread GitBox



jpountz merged PR #1011:
URL: https://github.com/apache/lucene/pull/1011


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10647) Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler

2022-07-11 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564896#comment-17564896
 ] 

ASF subversion and git services commented on LUCENE-10647:
--

Commit 128869d63aef6a448af991fa2768113a560a8dbc in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=128869d63ae ]

LUCENE-10647: Fix TestMergeSchedulerExternal failures (#1011)

Ensure mergeScheduler.sync() gets called before we rollback the writer.

> Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
> --
>
> Key: LUCENE-10647
> URL: https://issues.apache.org/jira/browse/LUCENE-10647
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Vigya Sharma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recent builds are intermittently failing on 
> TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler. Example:
> https://jenkins.thetaphi.de/job/Lucene-main-Linux/35576/testReport/junit/org.apache.lucene/TestMergeSchedulerExternal/testSubclassConcurrentMergeScheduler/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10647) Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler

2022-07-11 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564900#comment-17564900
 ] 

ASF subversion and git services commented on LUCENE-10647:
--

Commit 190cfbc65c66be807d6c61291500a6fdcf9a975e in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=190cfbc65c6 ]

LUCENE-10647: Fix TestMergeSchedulerExternal failures (#1011)

Ensure mergeScheduler.sync() gets called before we rollback the writer.

> Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
> --
>
> Key: LUCENE-10647
> URL: https://issues.apache.org/jira/browse/LUCENE-10647
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Vigya Sharma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recent builds are intermittently failing on 
> TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler. Example:
> https://jenkins.thetaphi.de/job/Lucene-main-Linux/35576/testReport/junit/org.apache.lucene/TestMergeSchedulerExternal/testSubclassConcurrentMergeScheduler/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mikemccand opened a new pull request, #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mikemccand opened a new pull request, #33:
URL: https://github.com/apache/lucene-jira-archive/pull/33

   This is a start at #27 but I expect to iterate some more.  Progress not 
perfection!
   
   Now the header is more compact and looks like this for issues w/ no 
attachments, PRs, etc:
   
   ![Screen Shot 2022-07-11 at 7 13 24 
AM](https://user-images.githubusercontent.com/796508/178256766-58d59f29-816f-4d61-aedd-354ccd82fba0.png)
   
   And then with PRs and attachments:
   
   ![Screen Shot 2022-07-11 at 7 27 53 
AM](https://user-images.githubusercontent.com/796508/178256828-f07dd6fb-21bd-4139-b3a8-ff433fd246f1.png)
   
   I've only tested on 100 issues so far ... I'll run the full export and 
conversion to confirm I didn't break anything.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mikemccand commented on PR #33:
URL: 
https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180310425

   Oh also note that I added another dependency (`dateutil`), very helpful for 
parsing ISO-8601 dates.  I couldn't (quickly) figure out how to reliably do 
this with Python's `datetime`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10647) Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler

2022-07-11 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10647.
---
Fix Version/s: 9.3
   Resolution: Fixed

> Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
> --
>
> Key: LUCENE-10647
> URL: https://issues.apache.org/jira/browse/LUCENE-10647
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Vigya Sharma
>Priority: Major
> Fix For: 9.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recent builds are intermittently failing on 
> TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler. Example:
> https://jenkins.thetaphi.de/job/Lucene-main-Linux/35576/testReport/junit/org.apache.lucene/TestMergeSchedulerExternal/testSubclassConcurrentMergeScheduler/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta commented on a diff in pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mocobeta commented on code in PR #33:
URL: https://github.com/apache/lucene-jira-archive/pull/33#discussion_r917859701


##
migration/src/jira2github_import.py:
##
@@ -69,45 +70,53 @@ def convert_issue(num: int, dump_dir: Path, output_dir: 
Path, account_map: dict[
 attachment_list_items = []
 att_replace_map = {}
 for (filename, cnt) in attachments:
-attachment_list_items.append(f"- [{filename}]({attachment_url(num, 
filename, att_repo, att_branch)})" + (f" (versions: {cnt})\n" if cnt > 1 else 
"\n"))
+attachment_list_items.append(f"[{filename}]({attachment_url(num, 
filename, att_repo, att_branch)})" + (f" (versions: {cnt})" if cnt > 1 else ""))
 att_replace_map[filename] = attachment_url(num, filename, 
att_repo, att_branch)
+print(f'{jira_id}: attachments: {attachment_list_items}')

Review Comment:
   I think this print() is added for debugging and should be suppressed?
   ```suggestion
   # print(f'{jira_id}: attachments: {attachment_list_items}')
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mikemccand commented on a diff in pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mikemccand commented on code in PR #33:
URL: https://github.com/apache/lucene-jira-archive/pull/33#discussion_r917872038


##
migration/src/jira2github_import.py:
##
@@ -69,45 +70,53 @@ def convert_issue(num: int, dump_dir: Path, output_dir: 
Path, account_map: dict[
 attachment_list_items = []
 att_replace_map = {}
 for (filename, cnt) in attachments:
-attachment_list_items.append(f"- [{filename}]({attachment_url(num, 
filename, att_repo, att_branch)})" + (f" (versions: {cnt})\n" if cnt > 1 else 
"\n"))
+attachment_list_items.append(f"[{filename}]({attachment_url(num, 
filename, att_repo, att_branch)})" + (f" (versions: {cnt})" if cnt > 1 else ""))
 att_replace_map[filename] = attachment_url(num, filename, 
att_repo, att_branch)
+print(f'{jira_id}: attachments: {attachment_list_items}')

Review Comment:
   Woops sorry yes I'll remove all the prints I added!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)

Nathan Meisels created LUCENE-10650:
---

 Summary: "after_effect": "no" was removed what replaces it?
 Key: LUCENE-10650
 URL: https://issues.apache.org/jira/browse/LUCENE-10650
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Nathan Meisels


Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see here that "after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:
{{}}
{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
{{}}
In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));

// basic model I(n) should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn

return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
I tried changing to "l" but the scoring is different than what we are used to. 
(We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Meisels updated LUCENE-10650:

Description: 
Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see here that "after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:
{{}}
{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
{{}}
In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
{code:java}
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));
// basic model I should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn
return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
{code}

I tried changing to "l" but the scoring is different than what we are used to. 
(We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks

  was:
Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see here that "after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:
{{}}
{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
{{}}
In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));

// basic model I(n) should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn

return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
I tried changing to "l" but the scoring is different than what we are used to. 
(We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks


> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see here that "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {{}}
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> {{}}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing to "l" but the scoring is different than what we are used 
> to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mikemccand commented on PR #33:
URL: 
https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180351403

   BTW, as I run the full Jira download, I see errors like this:
   
   ```
   [2022-07-11 07:57:25,815] WARNING:download_jira: Can't download LUCENE-498. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   [2022-07-11 07:59:10,096] WARNING:download_jira: Can't download LUCENE-613. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   [2022-07-11 07:59:10,978] WARNING:download_jira: Can't download LUCENE-614. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   [2022-07-11 07:59:13,615] WARNING:download_jira: Can't download LUCENE-617. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   [2022-07-11 08:10:36,059] WARNING:download_jira: Can't download LUCENE-1362. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   [2022-07-11 08:10:36,932] WARNING:download_jira: Can't download LUCENE-1363. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   [2022-07-11 08:10:37,798] WARNING:download_jira: Can't download LUCENE-1364. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   [2022-07-11 08:26:22,112] WARNING:download_jira: Can't download LUCENE-2375. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   [2022-07-11 08:27:02,304] WARNING:download_jira: Can't download LUCENE-2418. 
status code=404, message={"errorMessages":["Issue Does Not Exist"],"errors":{}}
   ```
   
   Indeed, when I search Jira itself, my email, the internet, `LUCENE-2418` 
seems not to exist.  I wonder what happened?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Meisels updated LUCENE-10650:

Description: 
Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see here that "after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:


{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}

In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
{code:java}
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));
// basic model I should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn
return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
{code}
I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
different than what we are used to. (We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks

  was:
Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see here that "after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:
{{}}
{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
{{}}
In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
{code:java}
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));
// basic model I should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn
return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
{code}
I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
different than what we are used to. (We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks


> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see here that "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional command

[jira] [Updated] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Meisels updated LUCENE-10650:

Description: 
Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see here that "after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:
{{}}
{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
{{}}
In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
{code:java}
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));
// basic model I should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn
return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
{code}
I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
different than what we are used to. (We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks

  was:
Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see here that "after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:
{{}}
{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
{{}}
In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
{code:java}
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));
// basic model I should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn
return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
{code}

I tried changing to "l" but the scoring is different than what we are used to. 
(We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks


> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see here that "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {{}}
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> {{}}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: is

[jira] [Updated] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Meisels updated LUCENE-10650:

Description: 
Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
"after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:
{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
{code:java}
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));
// basic model I should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn
return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
{code}
I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
different than what we are used to. (We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks

  was:
Hi!

We have been using an old version of elasticsearch with the following settings:

 
{code:java}
        "default": {
          "queryNorm": "1",
          "type": "DFR",
          "basic_model": "in",
          "after_effect": "no",
          "normalization": "no"
        }{code}
 

I see here that "after_effect": "no" was removed.

In 
[old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
 version score was:


{code:java}
return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}

In 
[new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
 version it's:
{code:java}
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
double A = log2((N + 1) / (n + 0.5));
// basic model I should return A * tfn
// which we rewrite to A * (1 + tfn) - A
// so that it can be combined with the after effect while still guaranteeing
// that the result is non-decreasing with tfn
return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
{code}
I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
different than what we are used to. (We depend heavily on the exact scoring).

Do you have any advice how we can keep the same scoring as before?

Thanks


> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---

[GitHub] [lucene-jira-archive] mocobeta commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mocobeta commented on PR #33:
URL: 
https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180354136

   Yes, I also noticed several issues do not exist (not sure why); in that 
case, the script just emits an error and proceeds to the next issue as you see.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564988#comment-17564988
 ] 

Adrien Grand commented on LUCENE-10650:
---

Hi Nathan. When we introduced dynamic pruning to Lucene, we also introduced the 
requirement that similarities produce scores that are non-decreasing when tf 
increases or when the length norm decreases (all other things equal). 
Unfortunately, this property could not be retained while keeping DFR 
similarities pluggable as they were so we removed support for the no after 
effect and only retained L and B.

It looks like this specific similarity that you are looking for could still be 
implemented in a way that scores are non-decreasing with increasing tf or 
decreasing norm, so you should be able to re-implement it using a scripted 
similarity for instance 
(https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html#scripted_similarity)
 with something like below (untested):

{code}
"similarity": {
  "my_dfr_sim": {
"type": "scripted",
"weight_script": {
  "source": "return query.boost * 
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);"
},
"script": {
  "source": "return weight * doc.freq;"
}
  }
}
{code}

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10650.
---
Resolution: Won't Fix

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565003#comment-17565003
 ] 

Nathan Meisels commented on LUCENE-10650:
-

Thanks for the answer!

Just to clarify:
query.boost * # Which part is this?
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 
1) / (n + 0.5)))
/Math.log(2); # Is this equal to tfn?

Thanks!

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565003#comment-17565003
 ] 

Nathan Meisels edited comment on LUCENE-10650 at 7/11/22 1:26 PM:
--

Thanks for the answer!

Just to clarify:
{code:java}
query.boost * # Which part is this?
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 
1) / (n + 0.5)))
/Math.log(2); # Is this equal to tfn?{code}
Thanks!


was (Author: JIRAUSER292626):
Thanks for the answer!

Just to clarify:
query.boost * # Which part is this?
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 
1) / (n + 0.5)))
/Math.log(2); # Is this equal to tfn?

Thanks!

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Meisels reopened LUCENE-10650:
-

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

2022-07-11 Thread Nathan Meisels (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565003#comment-17565003
 ] 

Nathan Meisels edited comment on LUCENE-10650 at 7/11/22 1:54 PM:
--

Thanks for the answer!

1. Will latency be higher using a scripted similarity?

2. Just to clarify:
{code:java}
query.boost * # Which part is this?
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 
1) / (n + 0.5)))
/Math.log(2); # Is this equal to tfn?{code}
Thanks!


was (Author: JIRAUSER292626):
Thanks for the answer!

Just to clarify:
{code:java}
query.boost * # Which part is this?
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) # This is (float)(log2((N + 
1) / (n + 0.5)))
/Math.log(2); # Is this equal to tfn?{code}
Thanks!

> "after_effect": "no" was removed what replaces it?
> --
>
> Key: LUCENE-10650
> URL: https://issues.apache.org/jira/browse/LUCENE-10650
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Nathan Meisels
>Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #34: Add a tool to generate account mapping

2022-07-11 Thread GitBox



mocobeta opened a new pull request, #34:
URL: https://github.com/apache/lucene-jira-archive/pull/34

   #3 
   
   This adds a helper tool to create a Jira user - GitHub account mapping file; 
this is used in "Convert Jira issues to GitHub issues" step. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread GitBox



gsmiller commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r918110247


##
lucene/demo/src/java/org/apache/lucene/demo/facet/RangeFacetsExample.java:
##
@@ -73,6 +76,35 @@ public void index() throws IOException {
   indexWriter.addDocument(doc);
 }
 
+// Add documents with a fake timestamp, 3600 sec (1 hour) after "now", 
7200 sec (2
+// hours) after "now", ...:
+long startTime = 0;
+// Index error messages since a week (24 * 7 = 168 hours) ago
+for (int i = 0; i < 168; i++) {
+  long endTime = startTime + (i + 1) * 3600;
+
+  // Choose a relatively larger number, e,g., "35", in order to create 
variation in count for
+  // the top-n children, so that getTopChildren(10) in the 
searchTopChildren functionality
+  // can return children with different counts
+  for (int j = 0; j < i % 35; j++) {
+Document doc = new Document();
+// index document at a different timestamp by using endTime - i * j

Review Comment:
   OK, thank you! I like the idea and I think it's close, but I think we can 
come up with a simpler way. I think you could randomly distribute the "data 
points" within each hour when you index without impacting testing at all. The 
facet counts should remain the same regardless of how the data points are 
distributed, so testing should be stable I think? So maybe we hit a compromise 
that uses a stable number of data points per hour time period (which you could 
do with your modulus operation if you like) but then randomly jitter the data 
within each hour block?
   
   But yeah, let's take it up as a follow on issue. Would you mind linking that 
here once you create it so the conversation is easier to follow for future 
readers? Thanks again for all the hard work!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller merged pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread GitBox



gsmiller merged PR #974:
URL: https://github.com/apache/lucene/pull/974


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565075#comment-17565075
 ] 

ASF subversion and git services commented on LUCENE-10614:
--

Commit 5ef7e5025def61cf20442806486c8f6102ebcdc4 in lucene's branch 
refs/heads/main from Yuting Gan
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5ef7e5025de ]

LUCENE-10614: Properly support getTopChildren in RangeFacetCounts (#974)



> Properly support getTopChildren in RangeFacetCounts
> ---
>
> Key: LUCENE-10614
> URL: https://issues.apache.org/jira/browse/LUCENE-10614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 10.0 (main)
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing 
> {{getTopChildren}}. Instead of returning "top" ranges, it returns all 
> user-provided ranges in the order the user specified them when instantiating. 
> This is probably more useful functionality, but it would be nice to support 
> {{getTopChildren}} as well.
> LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that 
> lands, we can replace the current implementation of {{getTopChildren}} with 
> an actual "top children" implementation and direct users to 
> {{getAllChildren}} if they want to maintain the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta commented on pull request #34: Add a tool to generate account mapping

2022-07-11 Thread GitBox



mocobeta commented on PR #34:
URL: 
https://github.com/apache/lucene-jira-archive/pull/34#issuecomment-1180601283

   FYI @mikemccand @dweiss 
   I will keep this open for a while and do some more extensive tests on that 
(this is a helper tool that should not block/conflict with the main scripts). 
If you have suggestions for generating account mapping, please review this when 
you have some time. I think there is room to improve in this simplistic 
approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565081#comment-17565081
 ] 

ASF subversion and git services commented on LUCENE-10614:
--

Commit d6dbe4374a5229b827613b85066f3a4da91d5f27 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d6dbe4374a5 ]

Move LUCENE-10614 CHANGES entry to 10.0 and add MIGRATE entry


> Properly support getTopChildren in RangeFacetCounts
> ---
>
> Key: LUCENE-10614
> URL: https://issues.apache.org/jira/browse/LUCENE-10614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 10.0 (main)
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing 
> {{getTopChildren}}. Instead of returning "top" ranges, it returns all 
> user-provided ranges in the order the user specified them when instantiating. 
> This is probably more useful functionality, but it would be nice to support 
> {{getTopChildren}} as well.
> LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that 
> lands, we can replace the current implementation of {{getTopChildren}} with 
> an actual "top children" implementation and direct users to 
> {{getAllChildren}} if they want to maintain the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565084#comment-17565084
 ] 

Greg Miller commented on LUCENE-10614:
--

Thanks again [~yutinggan] !

> Properly support getTopChildren in RangeFacetCounts
> ---
>
> Key: LUCENE-10614
> URL: https://issues.apache.org/jira/browse/LUCENE-10614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 10.0 (main)
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing 
> {{getTopChildren}}. Instead of returning "top" ranges, it returns all 
> user-provided ranges in the order the user specified them when instantiating. 
> This is probably more useful functionality, but it would be nice to support 
> {{getTopChildren}} as well.
> LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that 
> lands, we can replace the current implementation of {{getTopChildren}} with 
> an actual "top children" implementation and direct users to 
> {{getAllChildren}} if they want to maintain the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread Greg Miller (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller resolved LUCENE-10614.
--
Fix Version/s: 10.0 (main)
   Resolution: Fixed

> Properly support getTopChildren in RangeFacetCounts
> ---
>
> Key: LUCENE-10614
> URL: https://issues.apache.org/jira/browse/LUCENE-10614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 10.0 (main)
>Reporter: Greg Miller
>Priority: Minor
> Fix For: 10.0 (main)
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing 
> {{getTopChildren}}. Instead of returning "top" ranges, it returns all 
> user-provided ranges in the order the user specified them when instantiating. 
> This is probably more useful functionality, but it would be nice to support 
> {{getTopChildren}} as well.
> LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that 
> lands, we can replace the current implementation of {{getTopChildren}} with 
> an actual "top children" implementation and direct users to 
> {{getAllChildren}} if they want to maintain the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565083#comment-17565083
 ] 

Greg Miller commented on LUCENE-10614:
--

Just merged this to {{{}main{}}}. I don't think we should backport this to 9.x 
since it is a functional change to an existing API. Because of this, I moved 
the CHANGES entry under 10.0 and added an entry to MIGRATE describing the 
difference and how to retain the 9.x functionality if desired.

> Properly support getTopChildren in RangeFacetCounts
> ---
>
> Key: LUCENE-10614
> URL: https://issues.apache.org/jira/browse/LUCENE-10614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 10.0 (main)
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing 
> {{getTopChildren}}. Instead of returning "top" ranges, it returns all 
> user-provided ranges in the order the user specified them when instantiating. 
> This is probably more useful functionality, but it would be nice to support 
> {{getTopChildren}} as well.
> LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that 
> lands, we can replace the current implementation of {{getTopChildren}} with 
> an actual "top children" implementation and direct users to 
> {{getAllChildren}} if they want to maintain the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread Yuting Gan (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565089#comment-17565089
 ] 

Yuting Gan commented on LUCENE-10614:
-

Thank you so much [~gsmiller] !

> Properly support getTopChildren in RangeFacetCounts
> ---
>
> Key: LUCENE-10614
> URL: https://issues.apache.org/jira/browse/LUCENE-10614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 10.0 (main)
>Reporter: Greg Miller
>Priority: Minor
> Fix For: 10.0 (main)
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing 
> {{getTopChildren}}. Instead of returning "top" ranges, it returns all 
> user-provided ranges in the order the user specified them when instantiating. 
> This is probably more useful functionality, but it would be nice to support 
> {{getTopChildren}} as well.
> LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that 
> lands, we can replace the current implementation of {{getTopChildren}} with 
> an actual "top children" implementation and direct users to 
> {{getAllChildren}} if they want to maintain the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] tang-hi opened a new pull request, #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-07-11 Thread GitBox



tang-hi opened a new pull request, #1016:
URL: https://github.com/apache/lucene/pull/1016

   [https://issues.apache.org/jira/browse/LUCENE-10646](JIRA)
   1. I have add some comment on Lev1ParametricDescription, hope it will help 
others better understand the code of  Lev2ParametricDescription, 
Lev2TParametricDescription
   2. I use breadth first search to pretty the Automaton#toDot.
   For example,LevenshteinAutomata of "abcd" 
   before
   
![before](https://user-images.githubusercontent.com/72755185/178311971-2c52f6bd-6474-4608-a6bb-641417d0a2da.png)
   after
   
![after](https://user-images.githubusercontent.com/72755185/178312022-06e214db-24ab-4bcf-9972-7a1b8338376d.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shahrs87 commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

2022-07-11 Thread GitBox



shahrs87 commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1180628206

   @jpountz  Hi Adrian, can you please make one more pass over the PR and 
provide your feedback ? Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mikemccand commented on PR #33:
URL: 
https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180758900

   Good thing I tested on all issues -- I hit a couple fun exceptions -- so 
please don't push this PR just yet!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10651) SimpleQueryParser stack overflow for large nested queries.

2022-07-11 Thread Marc (Jira)

Marc created LUCENE-10651:
-

 Summary: SimpleQueryParser stack overflow for large nested queries.
 Key: LUCENE-10651
 URL: https://issues.apache.org/jira/browse/LUCENE-10651
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 9.2, 8.10, 9.1, 9.3
Reporter: Marc


The OpenSearch project received an issue [1] where stack overflow can occur for 
large nested boolean queries during rewrite.  In trying to reproduce this error 
I've also encountered SO during parsing where queries expand beyond the default 
1024 clause limit.  This unit test will fail with SO:
{code:java}
public void testSimpleQueryParserWithTooManyClauses() {
  StringBuilder queryString = new StringBuilder("foo");
  for (int i = 0; i < 1024; i++) {
queryString.append(" | bar").append(i).append(" + baz");
  }
  expectThrows(IndexSearcher.TooManyClauses.class, () -> 
parse(queryString.toString()));
}
 {code}
I would expect this case to also fail with TooManyClauses, is my understanding 
correct?  If so, I've attempted a fix [2] that during parsing increments a 
counter whenever a clause is added.

 [1] [https://github.com/opensearch-project/OpenSearch/issues/3760]

 [2] 
[https://github.com/mch2/lucene/commit/6a558f17f448b92ae4cf8c43e0b759ff7425acdf]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mikemccand commented on PR #33:
URL: 
https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180771389

   Somehow I am hitting a stack overflow when trying to convert 
[LUCENE-550](https://issues.apache.org/jira/browse/LUCENE-550)!  It doesn't 
look like a particularly challenging issue to convert :)
   
   ```
   (.venv) beast3:migration[polish_legacy_jira]$ python 
src/jira2github_import.py --min 1 --max 10649   

   [2022-07-11 15:01:02,826] INFO:jira2github_import: Converting Jira issues to 
GitHub issues in /l/jira-github-migration/migration/github-import-data  

   [2022-07-11 15:10:25,306] WARNING:jira2github_import: Jira dump file not 
found: /l/jira-github-migration/migration/jira-dump/LUCENE-498.json 




   ERROR: unhandled exception while converting LUCENE-550   





   Traceback (most recent call last):   


 File "/l/jira-github-migration/migration/src/jira2github_import.py", line 
229, in 
 
   convert_issue(num, dump_dir, output_dir, account_map, github_att_repo, 
github_att_branch)  
  
 File "/l/jira-github-migration/migration/src/jira2github_import.py", line 
133, in convert_issue   
 
   comment_body = f"""{convert_text(comment_body, att_replace_map, 
account_map)}   
 
 File "/l/jira-github-migration/migration/src/jira_util.py", line 216, in 
convert_text
  
   text = jira2markdown.convert(text, elements=elements)


 File 
"/l/jira-github-migration/.venv/lib/python3.10/site-packages/jira2markdown/parser.py",
 line 20, in convert

   return markup.transformString(text)  


 File 
"/l/jira-github-migration/.venv/lib/python3.10/site-packages/pyparsing.py", 
line 2059, in transformString   
  
   for t, s, e in self.scanString(instring):


 File 
"/l/jira-github-migration/.venv/lib/python3.10/site-packages/pyparsing.py", 
line 2007, in scanString
  
   nextLoc, tokens = parseFn(instring, preloc, callPreParse=False)  


 File 
"/l/jira-github-migration/.venv/lib/python3.10/site-packages/pyparsing.py", 
line 1683, in _parseNoCache 
  
   loc, tokens = self.parseImpl(instring, preloc, doActions)


 File 
"/l/jira-github-migration/.venv/lib/python3.10/site-packages/pyparsing.py", 
line 4462, in parseImpl 
  
   return self.expr._parse(

[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mikemccand commented on PR #33:
URL: 
https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180785030

   Hmm this seems to be an issue on `main` as well.  This is what I'm running 
to trigger it: `python src/jira2github_import.py --min 550`.
   
   I'll catch the exception (trying to convert one comment text) and try to 
best-effort continue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] stefanvodita commented on a diff in pull request #1015: [LUCENE-10629]: Add fast match query support to FacetSets

2022-07-11 Thread GitBox



stefanvodita commented on code in PR #1015:
URL: https://github.com/apache/lucene/pull/1015#discussion_r918289599


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java:
##
@@ -52,8 +52,10 @@ public MatchingFacetSetsCounts(
   String field,
   FacetsCollector hits,
   FacetSetDecoder facetSetDecoder,
+  Query fastMatchQuery,

Review Comment:
   What do you think of preserving the constructor without `fastMatchQuery`? It 
would avoid adding that `null` to all existing (and possibly some future) uses.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10652) Add a top-n range faceting example to RangeFacetsExample

2022-07-11 Thread Yuting Gan (Jira)

Yuting Gan created LUCENE-10652:
---

 Summary: Add a top-n range faceting example to RangeFacetsExample
 Key: LUCENE-10652
 URL: https://issues.apache.org/jira/browse/LUCENE-10652
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yuting Gan


In LUCENE-10614, we modified the behavior of getTopChildren to actually return 
top-n ranges ordered by count. The original behavior of getTopChildren in 
RangeFacetsCounts was to return all ranges ordered by constructor-specified 
range order, and this behavior is now retained in the getAllChildren API 
(LUCENE-10550).

Therefore, it would be helpful to add an example in RangeFacetsExample to demo 
this change. I replaced the original example of getTopChildren with 
getAllChildren, and will add an example of the current getTopChildren API soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] Yuti-G commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread GitBox



Yuti-G commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r918297532


##
lucene/demo/src/java/org/apache/lucene/demo/facet/RangeFacetsExample.java:
##
@@ -73,6 +76,35 @@ public void index() throws IOException {
   indexWriter.addDocument(doc);
 }
 
+// Add documents with a fake timestamp, 3600 sec (1 hour) after "now", 
7200 sec (2
+// hours) after "now", ...:
+long startTime = 0;
+// Index error messages since a week (24 * 7 = 168 hours) ago
+for (int i = 0; i < 168; i++) {
+  long endTime = startTime + (i + 1) * 3600;
+
+  // Choose a relatively larger number, e,g., "35", in order to create 
variation in count for
+  // the top-n children, so that getTopChildren(10) in the 
searchTopChildren functionality
+  // can return children with different counts
+  for (int j = 0; j < i % 35; j++) {
+Document doc = new Document();
+// index document at a different timestamp by using endTime - i * j

Review Comment:
   I've created a spin off 
[issue](https://issues.apache.org/jira/browse/LUCENE-10652) to add a get top-n 
range faceting example to demo. Thanks again for reviewing my pr and left all 
the detailed and constructive feedback! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] MarcusSorealheis commented on pull request #940: Use similarity.tf() in MoreLikeThis

2022-07-11 Thread GitBox



MarcusSorealheis commented on PR #940:
URL: https://github.com/apache/lucene/pull/940#issuecomment-1180816282

   Is there anything else needed here? Is there something we can add to improve 
the robustness of the quality check? Please advise us @rmuir  and @mocobeta 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mikemccand commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mikemccand commented on PR #33:
URL: 
https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1180836771

   This is the comment that stack overflows during conversion:
   
   ```
   A note on, and output from contrib/benchmark:





   I'm getting really poor results compared to my own test and live enviroment 
stats. At query time I expected maximum 1/6th time spent in InstantiatedIndex 
than RAMDirectory, but it turns out that in the be\
   nchmarker the speed is almost the same as RAMDirectory. Retrieving documents 
is only 1/5th of the speed rather than maximum 1/60th as expected.  




   Investigated the code a bit and noticed that ReadTask creates a new instance 
of IndexReader and IndexSearcher for each query. Could this be the reason?  




   Memory consumption is 3x of a RAMDirectory, but half of the memory is spent 
on keeping the Document instances in heap. Perhaps it would be interesting to 
use the same persistency for these as in the Direc\
   tory implementations.





   The merge factor sweet spot is around 2500, where it turns out to be a 
little bit faster than the RAMDirectory sweet spot. At defualt 10 
InstantiatedIndex consumes about 5x more time than a RAMDirectory. \
   If I fix the locklessness as suggested in previous comment, it most probably 
will be much faster than a RAMDirectory at any setting. 




   /**  


  * The sweet spot for this implementation is at 2500.  


  * 


  * Benchmark output:   


  *


  *  > Report sum by Prefix (MAddDocs) and Round (8 about 8 out 
of 160153)  

  *  Operation  round  mrg buf cmpnd   runCnt   recsPerRunrec/s 
 elapsedSecavgUsedMemavgTotalMem

  *  MAddDocs_2 0   10  10  true12 81,4 
 245,68   200 325 152268 156 928

  *  MAddDocs_2 -   1 1000  10  true -  -   1 -  -   2 -  -   494,1 
-  -  40,47 - 247 119 072 -  347 025 408

  *  MAddDocs_2 2   10 100  true12104,8 
 190,81   233 895 552363 720 704

[jira] [Created] (LUCENE-10653) Should BlockMaxMaxscoreScorer rebuild its heap in bulk?

2022-07-11 Thread Greg Miller (Jira)

Greg Miller created LUCENE-10653:


 Summary: Should BlockMaxMaxscoreScorer rebuild its heap in bulk?
 Key: LUCENE-10653
 URL: https://issues.apache.org/jira/browse/LUCENE-10653
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Greg Miller


BMMScorer has to frequently rebuild its heap, and does do by clearing and then 
iteratively calling {{{}add{}}}. It would be more efficient to heapify in bulk. 
This is more academic than anything right now though since BMMScorer is only 
used with two-clause disjunctions, so it's sort of a silly optimization if it's 
not supporting a greater number of clauses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #947: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-07-11 Thread GitBox



msokolov commented on PR #947:
URL: https://github.com/apache/lucene/pull/947#issuecomment-1180963745

   I'm looking to address various comments; just pushed a commit that makes the 
vector encoding explicit by adding a new enum and parameter "vectorEncoding", 
splitting this out from "similarityFunction".
   
   > During merging when writing a merged vector field it looks like we first 
expand vector values only to again to compress them later? Would be nice to 
avoid this.
   
   Oh good catch, @mayya-sharipova I will look into addressing this.
   
   > Not sure if possible at this stage, but it would be nice if 
HnswGraphBuilder and HnswGraphSearcher were not aware of different calculations 
needed for different similarity functions, and refer all this calculations 
(dotProduct(BytesRef a..)) to VectorSimilarityFunction
   
   I don't see how to do this efficiently (without many conversions from byte 
to float) and neatly (without code duplication in tricky algorithmic areas) and 
with complete API purity, so I sacrificed some purity. If you have any ideas 
how to do it better, I'm open to changing it though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #947: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-07-11 Thread GitBox



msokolov commented on PR #947:
URL: https://github.com/apache/lucene/pull/947#issuecomment-1180965832

   Also - if anybody has advice about how to rebase while maintaining this PR 
I'd be interested. Should I `git merge` from `main`??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #947: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-07-11 Thread GitBox



msokolov commented on PR #947:
URL: https://github.com/apache/lucene/pull/947#issuecomment-1181017377

   >  During merging when writing a merged vector field it looks like we first 
expand vector values only to again to compress them later? Would be nice to 
avoid this.
   
   In fact after checking, I don't think we are doing this expand/compress step 
*even though getVectorValues() returns `ExpandingVectorValues` to the merger*. 
This is because the merger uses the `binaryValue()` call to write the vectors 
themselves, and that value is unchanged by EVV, and the graph is created by the 
(now) polymorphic hnsw utils that also call `binaryValue()` when they are 
dealing with a field encoded as bytes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10653) Should BlockMaxMaxscoreScorer rebuild its heap in bulk?

2022-07-11 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565195#comment-17565195
 ] 

Greg Miller commented on LUCENE-10653:
--

Here's essentially what I'm thinking: 
https://github.com/gsmiller/lucene/commit/597a760d6c0b0524ba1d72c290689e4dc4b4b9e9

> Should BlockMaxMaxscoreScorer rebuild its heap in bulk?
> ---
>
> Key: LUCENE-10653
> URL: https://issues.apache.org/jira/browse/LUCENE-10653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
>
> BMMScorer has to frequently rebuild its heap, and does do by clearing and 
> then iteratively calling {{{}add{}}}. It would be more efficient to heapify 
> in bulk. This is more academic than anything right now though since BMMScorer 
> is only used with two-clause disjunctions, so it's sort of a silly 
> optimization if it's not supporting a greater number of clauses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #1013: LUCENE-10644: Facets#getAllChildren testing should ignore child order

2022-07-11 Thread GitBox



gsmiller commented on code in PR #1013:
URL: https://github.com/apache/lucene/pull/1013#discussion_r918423465


##
lucene/facet/src/test/org/apache/lucene/facet/FacetTestCase.java:
##
@@ -264,4 +264,24 @@ protected void assertFloatValuesEquals(FacetResult a, 
FacetResult b) {
   a.labelValues[i].value.floatValue() / 1e5);
 }
   }
+
+  protected void assertNumericValuesEquals(Number a, Number b) {
+assertTrue(a.getClass().isInstance(b));
+if (a instanceof Float) {
+  assertEquals(a.floatValue(), b.floatValue(), a.floatValue() / 1e5);
+} else if (a instanceof Double) {
+  assertEquals(a.doubleValue(), b.doubleValue(), a.doubleValue() / 1e5);
+} else {
+  assertEquals(a, b);
+}
+  }
+
+  protected void assertAllChildrenEqualsWithoutOrdering(FacetResult a, 
FacetResult b) {

Review Comment:
   The naming of this method leads me to believe it's only going to validate 
the children, but it's checking dims, paths, etc. I wonder if we shouldn't name 
it something more generic?
   
   Also, it feels a little weird to me that callers have to create a 
`FacetResult` for their expected data to use this method. I wonder if it would 
be easier to have a signature like this:
   
   ```
 protected void assertFacetResult(String expectedDim,
  String[] expectedPath,
  int expectedChildCount,
  Number expectedValue,
  LabelAndValue... expectedChildren)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta commented on pull request #33: Polish wording of Legacy Jira details header, and each comment footer

2022-07-11 Thread GitBox



mocobeta commented on PR #33:
URL: 
https://github.com/apache/lucene-jira-archive/pull/33#issuecomment-1181218297

   It looks like a bug introduced in 
https://github.com/apache/lucene-jira-archive/commit/cfbc821390859a7053e43028325b6bc616ec2b5b.
 (I have postponed testing it with the whole Jira dump.)
   I'll take a look at it.
   
   > I'll catch the exception (trying to convert one comment text) and try to 
best-effort continue.
   
   Sorry there should have been a "catch all" try~except clause.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10471) Increase the number of dims for KNN vectors to 2048

2022-07-11 Thread Stanislav (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565240#comment-17565240
 ] 

Stanislav commented on LUCENE-10471:


I don't think there is a trend to increase dimensionality. Only few models have 
feature dimensions more than 2048.

Most of modern neural networks (ViT and whole Bert family) have dimensions less 
than 1k. 

However there are still many models like ms-resnet or EfficientNet that operate 
in range from 1k to 2048. 

And they are most common models for image embedding and vector search.

Current limit is forcing to do dimensionally reduction for pretty standard 
shapes. 

 

> Increase the number of dims for KNN vectors to 2048
> ---
>
> Key: LUCENE-10471
> URL: https://issues.apache.org/jira/browse/LUCENE-10471
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Mayya Sharipova
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current maximum allowed number of dimensions is equal to 1024. But we see 
> in practice a couple well-known models that produce vectors with > 1024 
> dimensions (e.g 
> [mobilenet_v2|https://tfhub.dev/google/imagenet/mobilenet_v2_035_224/feature_vector/1]
>  uses 1280d vectors, OpenAI / GPT-3 Babbage uses 2048d vectors). Increasing 
> max dims to `2048` will satisfy these use cases.
> I am wondering if anybody has strong objections against this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10628) Enable MatchingFacetSetCounts to use space partitioning data structures

2022-07-11 Thread Marc D'Mello (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565241#comment-17565241
 ] 

Marc D'Mello commented on LUCENE-10628:
---

I started work on this issue but I was informed that [~ivera] is experienced 
with space partitioning algorithms and might have some pointers in Lucene where 
I can find examples of KD trees and R trees, so I'm just tagging you here in 
case you have any tips/pointers for this issue before I get too deep into it 
:). Thanks!

> Enable MatchingFacetSetCounts to use space partitioning data structures
> ---
>
> Key: LUCENE-10628
> URL: https://issues.apache.org/jira/browse/LUCENE-10628
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
>
> Currently, {{MatchingFacetSetCounts}} iterates over {{FacetSetMatcher}} 
> instances passed into it linearly. While this is fine in some cases, if we 
> have a large amount of {{FacetSetMatcher}}'s, this can be inefficient. We 
> should provide the option to users to enable the use of space partitioning 
> data structures (namely R trees and KD trees) so we can potentially scan over 
> these {{FacetSetMatcher}}'s in sub-linear time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #35: Catch all exceptions (and proceed to the nexe issue) in jira2github_import.py

2022-07-11 Thread GitBox



mocobeta opened a new pull request, #35:
URL: https://github.com/apache/lucene-jira-archive/pull/35

   Added try-catch so that it does not stop with a conversion failure/error.
   
   ```
   (.venv) migration $ python src/jira2github_import.py --issues 550
   [2022-07-12 12:09:06,759] INFO:jira2github_import: Converting Jira issues to 
GitHub issues in /mnt/hdd/repo/lucene-jira-archive/migration/github-import-data
   [2022-07-12 12:09:35,785] ERROR:jira2github_import: Traceback (most recent 
call last):
 File 
"/mnt/hdd/repo/lucene-jira-archive/migration/src/jira2github_import.py", line 
216, in 
   convert_issue(num, dump_dir, output_dir, account_map, github_att_repo, 
github_att_branch)
 File 
"/mnt/hdd/repo/lucene-jira-archive/migration/src/jira2github_import.py", line 
121, in convert_issue
   "body": f"""{convert_text(comment_body, att_replace_map, account_map)}
 File "/mnt/hdd/repo/lucene-jira-archive/migration/src/jira_util.py", line 
216, in convert_text
   text = jira2markdown.convert(text, elements=elements)
 File 
"/mnt/hdd/repo/lucene-jira-archive/migration/.venv/lib/python3.9/site-packages/jira2markdown/parser.py",
 line 20, in convert
   return markup.transformString(text)
 File 
"/mnt/hdd/repo/lucene-jira-archive/migration/.venv/lib/python3.9/site-packages/pyparsing.py",
 line 2059, in transformString
   for t, s, e in self.scanString(instring):
 File 
"/mnt/hdd/repo/lucene-jira-archive/migration/.venv/lib/python3.9/site-packages/pyparsing.py",
 line 2007, in scanString
   nextLoc, tokens = parseFn(instring, preloc, callPreParse=False)
   ...
   RecursionError: maximum recursion depth exceeded
   
   [2022-07-12 12:09:35,786] ERROR:jira2github_import: Failed to convert Jira 
issue. An error 'maximum recursion depth exceeded' occurred; skipped LUCENE-550.
   [2022-07-12 12:09:35,786] INFO:jira2github_import: Done.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-jira-archive] mocobeta merged pull request #35: Catch all exceptions (and proceed to the nexe issue) in jira2github_import.py

2022-07-11 Thread GitBox



mocobeta merged PR #35:
URL: https://github.com/apache/lucene-jira-archive/pull/35


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shaie commented on a diff in pull request #1015: [LUCENE-10629]: Add fast match query support to FacetSets

2022-07-11 Thread GitBox



shaie commented on code in PR #1015:
URL: https://github.com/apache/lucene/pull/1015#discussion_r918535845


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java:
##
@@ -52,8 +52,10 @@ public MatchingFacetSetsCounts(
   String field,
   FacetsCollector hits,
   FacetSetDecoder facetSetDecoder,
+  Query fastMatchQuery,

Review Comment:
   Sure, added it back



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-11 Thread Zach Chen (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565261#comment-17565261
 ] 

Zach Chen edited comment on LUCENE-10480 at 7/12/22 4:27 AM:
-

{quote}Another thing that changes performance sometimes is the doc ID order, 
were you using multiple indexing threads maybe?
{quote}
Ok this is actually the case for me. I was previously using 10 threads to index 
(INDEX_NUM_THREADS = 10) , and after I commented that out and reindexed with 
default setting, I was able to reproduce the slowdown:
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                 AndHighOrMedMed       91.27      (4.3%)       85.52      
(4.3%)   -6.3% ( -14% -    2%) 0.000
                        PKLookup      333.25      (4.3%)      329.48      
(3.8%)   -1.1% (  -8% -    7%) 0.380
                     AndHighHigh      104.25      (2.9%)      103.11      
(3.0%)   -1.1% (  -6% -    5%) 0.247
                        SpanNear       16.52      (3.8%)       16.36      
(3.1%)   -0.9% (  -7% -    6%) 0.396
                    TermGroup10K       23.99      (3.3%)       23.78      
(3.0%)   -0.9% (  -6% -    5%) 0.384
                          Phrase      234.74      (2.7%)      232.71      
(1.8%)   -0.9% (  -5% -    3%) 0.235
                      AndHighMed      163.80      (3.5%)      162.42      
(4.3%)   -0.8% (  -8% -    7%) 0.496
                    TermBGroup1M       48.02      (3.5%)       47.65      
(3.7%)   -0.8% (  -7% -    6%) 0.496
                    SloppyPhrase        4.82      (3.4%)        4.78      
(2.7%)   -0.7% (  -6% -    5%) 0.460
                    TermGroup100       41.90      (3.9%)       41.63      
(3.3%)   -0.7% (  -7% -    6%) 0.569
                            Term     2680.42      (4.7%)     2664.05      
(3.3%)   -0.6% (  -8% -    7%) 0.632
                     TermGroup1M       39.95      (2.9%)       39.71      
(3.2%)   -0.6% (  -6% -    5%) 0.531
                  TermBGroup1M1P       84.21      (6.1%)       83.82      
(5.7%)   -0.5% ( -11% -   12%) 0.801
                         Respell      113.78      (1.9%)      113.44      
(1.7%)   -0.3% (  -3% -    3%) 0.603
     BrowseRandomLabelSSDVFacets       20.75      (8.2%)       20.74     
(10.3%)   -0.0% ( -17% -   20%) 0.989
                          Fuzzy2       83.12      (1.8%)       83.11      
(1.1%)   -0.0% (  -2% -    2%) 0.976
       BrowseDayOfYearSSDVFacets       26.69     (12.0%)       26.70     
(11.6%)    0.0% ( -21% -   26%) 0.995
                        Wildcard      115.84      (5.1%)      115.96      
(5.8%)    0.1% ( -10% -   11%) 0.951
               TermDayOfYearSort      260.70      (5.4%)      260.99      
(2.8%)    0.1% (  -7% -    8%) 0.937
         AndHighMedDayTaxoFacets      136.32      (2.6%)      136.63      
(2.3%)    0.2% (  -4% -    5%) 0.773
                IntervalsOrdered      128.13      (7.5%)      128.45      
(7.7%)    0.3% ( -13% -   16%) 0.916
        AndHighHighDayTaxoFacets       13.82      (2.8%)       13.87      
(2.6%)    0.4% (  -4% -    5%) 0.657
                          Fuzzy1       79.16      (2.7%)       79.60      
(1.8%)    0.6% (  -3% -    5%) 0.433
                   TermMonthSort      360.17      (6.4%)      362.83      
(7.1%)    0.7% ( -11% -   15%) 0.728
                   TermTitleSort      191.21      (6.8%)      192.70      
(7.1%)    0.8% ( -12% -   15%) 0.723
                      TermDTSort      208.40      (2.9%)      210.39      
(2.9%)    1.0% (  -4% -    7%) 0.301
            MedTermDayTaxoFacets       78.66      (5.2%)       79.59      
(4.4%)    1.2% (  -7% -   11%) 0.436
                  TermDateFacets       41.04      (5.4%)       41.61      
(4.7%)    1.4% (  -8% -   12%) 0.385
                          IntNRQ      122.00      (8.1%)      124.08      
(8.3%)    1.7% ( -13% -   19%) 0.513
          OrHighMedDayTaxoFacets       23.16      (8.4%)       23.71      
(4.9%)    2.4% ( -10% -   17%) 0.272
           BrowseMonthSSDVFacets       28.68     (13.8%)       29.55     
(16.8%)    3.0% ( -24% -   39%) 0.531
       BrowseDayOfYearTaxoFacets       30.40     (32.2%)       31.67     
(34.2%)    4.2% ( -47% -  103%) 0.690
            BrowseDateTaxoFacets       30.26     (32.2%)       31.57     
(34.4%)    4.3% ( -47% -  104%) 0.680
                         Prefix3      402.14      (8.6%)      419.96      
(8.9%)    4.4% ( -12% -   23%) 0.109
                AndMedOrHighHigh       94.79      (4.0%)       99.03      
(4.5%)    4.5% (  -3% -   13%) 0.001
     BrowseRandomLabelTaxoFacets       32.45     (49.2%)       35.05     
(53.4%)    8.0% ( -63% -  217%) 0.622
           BrowseMonthTaxoFacets       28.68     (35.3%)       31.37     
(39.1%)    9.4% ( -48% -  129%) 0.425
            BrowseDateSSDVFacets        3.96     (28.1%)        4.54

[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-11 Thread Zach Chen (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565261#comment-17565261
 ] 

Zach Chen commented on LUCENE-10480:


{quote}Another thing that changes performance sometimes is the doc ID order, 
were you using multiple indexing threads maybe?
{quote}
Ok this is actually the case for me. I was previously using 10 threads to index 
(INDEX_NUM_THREADS = 10) , and after I commented that out and reindexed with 
default setting, I was able to reproduce the slowdown:

 
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                 AndHighOrMedMed       91.27      (4.3%)       85.52      
(4.3%)   -6.3% ( -14% -    2%) 0.000
                        PKLookup      333.25      (4.3%)      329.48      
(3.8%)   -1.1% (  -8% -    7%) 0.380
                     AndHighHigh      104.25      (2.9%)      103.11      
(3.0%)   -1.1% (  -6% -    5%) 0.247
                        SpanNear       16.52      (3.8%)       16.36      
(3.1%)   -0.9% (  -7% -    6%) 0.396
                    TermGroup10K       23.99      (3.3%)       23.78      
(3.0%)   -0.9% (  -6% -    5%) 0.384
                          Phrase      234.74      (2.7%)      232.71      
(1.8%)   -0.9% (  -5% -    3%) 0.235
                      AndHighMed      163.80      (3.5%)      162.42      
(4.3%)   -0.8% (  -8% -    7%) 0.496
                    TermBGroup1M       48.02      (3.5%)       47.65      
(3.7%)   -0.8% (  -7% -    6%) 0.496
                    SloppyPhrase        4.82      (3.4%)        4.78      
(2.7%)   -0.7% (  -6% -    5%) 0.460
                    TermGroup100       41.90      (3.9%)       41.63      
(3.3%)   -0.7% (  -7% -    6%) 0.569
                            Term     2680.42      (4.7%)     2664.05      
(3.3%)   -0.6% (  -8% -    7%) 0.632
                     TermGroup1M       39.95      (2.9%)       39.71      
(3.2%)   -0.6% (  -6% -    5%) 0.531
                  TermBGroup1M1P       84.21      (6.1%)       83.82      
(5.7%)   -0.5% ( -11% -   12%) 0.801
                         Respell      113.78      (1.9%)      113.44      
(1.7%)   -0.3% (  -3% -    3%) 0.603
     BrowseRandomLabelSSDVFacets       20.75      (8.2%)       20.74     
(10.3%)   -0.0% ( -17% -   20%) 0.989
                          Fuzzy2       83.12      (1.8%)       83.11      
(1.1%)   -0.0% (  -2% -    2%) 0.976
       BrowseDayOfYearSSDVFacets       26.69     (12.0%)       26.70     
(11.6%)    0.0% ( -21% -   26%) 0.995
                        Wildcard      115.84      (5.1%)      115.96      
(5.8%)    0.1% ( -10% -   11%) 0.951
               TermDayOfYearSort      260.70      (5.4%)      260.99      
(2.8%)    0.1% (  -7% -    8%) 0.937
         AndHighMedDayTaxoFacets      136.32      (2.6%)      136.63      
(2.3%)    0.2% (  -4% -    5%) 0.773
                IntervalsOrdered      128.13      (7.5%)      128.45      
(7.7%)    0.3% ( -13% -   16%) 0.916
        AndHighHighDayTaxoFacets       13.82      (2.8%)       13.87      
(2.6%)    0.4% (  -4% -    5%) 0.657
                          Fuzzy1       79.16      (2.7%)       79.60      
(1.8%)    0.6% (  -3% -    5%) 0.433
                   TermMonthSort      360.17      (6.4%)      362.83      
(7.1%)    0.7% ( -11% -   15%) 0.728
                   TermTitleSort      191.21      (6.8%)      192.70      
(7.1%)    0.8% ( -12% -   15%) 0.723
                      TermDTSort      208.40      (2.9%)      210.39      
(2.9%)    1.0% (  -4% -    7%) 0.301
            MedTermDayTaxoFacets       78.66      (5.2%)       79.59      
(4.4%)    1.2% (  -7% -   11%) 0.436
                  TermDateFacets       41.04      (5.4%)       41.61      
(4.7%)    1.4% (  -8% -   12%) 0.385
                          IntNRQ      122.00      (8.1%)      124.08      
(8.3%)    1.7% ( -13% -   19%) 0.513
          OrHighMedDayTaxoFacets       23.16      (8.4%)       23.71      
(4.9%)    2.4% ( -10% -   17%) 0.272
           BrowseMonthSSDVFacets       28.68     (13.8%)       29.55     
(16.8%)    3.0% ( -24% -   39%) 0.531
       BrowseDayOfYearTaxoFacets       30.40     (32.2%)       31.67     
(34.2%)    4.2% ( -47% -  103%) 0.690
            BrowseDateTaxoFacets       30.26     (32.2%)       31.57     
(34.4%)    4.3% ( -47% -  104%) 0.680
                         Prefix3      402.14      (8.6%)      419.96      
(8.9%)    4.4% ( -12% -   23%) 0.109
                AndMedOrHighHigh       94.79      (4.0%)       99.03      
(4.5%)    4.5% (  -3% -   13%) 0.001
     BrowseRandomLabelTaxoFacets       32.45     (49.2%)       35.05     
(53.4%)    8.0% ( -63% -  217%) 0.622
           BrowseMonthTaxoFacets       28.68     (35.3%)       31.37     
(39.1%)    9.4% ( -48% -  129%) 0.425
            BrowseDateSSDVFacets        3.96     (28.1%)        4.54     
(26.3%)   14.7% ( -31% -   96%) 0.089

[jira] [Comment Edited] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-11 Thread Zach Chen (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565261#comment-17565261
 ] 

Zach Chen edited comment on LUCENE-10480 at 7/12/22 4:27 AM:
-

{quote}Another thing that changes performance sometimes is the doc ID order, 
were you using multiple indexing threads maybe?
{quote}
Ok this is actually the case for me. I was previously using 10 threads to index 
(INDEX_NUM_THREADS = 10) , and after I commented that out and reindexed with 
default setting, I was able to reproduce the slowdown:
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                 AndHighOrMedMed       91.27      (4.3%)       85.52      
(4.3%)   -6.3% ( -14% -    2%) 0.000
                        PKLookup      333.25      (4.3%)      329.48      
(3.8%)   -1.1% (  -8% -    7%) 0.380
                     AndHighHigh      104.25      (2.9%)      103.11      
(3.0%)   -1.1% (  -6% -    5%) 0.247
                        SpanNear       16.52      (3.8%)       16.36      
(3.1%)   -0.9% (  -7% -    6%) 0.396
                    TermGroup10K       23.99      (3.3%)       23.78      
(3.0%)   -0.9% (  -6% -    5%) 0.384
                          Phrase      234.74      (2.7%)      232.71      
(1.8%)   -0.9% (  -5% -    3%) 0.235
                      AndHighMed      163.80      (3.5%)      162.42      
(4.3%)   -0.8% (  -8% -    7%) 0.496
                    TermBGroup1M       48.02      (3.5%)       47.65      
(3.7%)   -0.8% (  -7% -    6%) 0.496
                    SloppyPhrase        4.82      (3.4%)        4.78      
(2.7%)   -0.7% (  -6% -    5%) 0.460
                    TermGroup100       41.90      (3.9%)       41.63      
(3.3%)   -0.7% (  -7% -    6%) 0.569
                            Term     2680.42      (4.7%)     2664.05      
(3.3%)   -0.6% (  -8% -    7%) 0.632
                     TermGroup1M       39.95      (2.9%)       39.71      
(3.2%)   -0.6% (  -6% -    5%) 0.531
                  TermBGroup1M1P       84.21      (6.1%)       83.82      
(5.7%)   -0.5% ( -11% -   12%) 0.801
                         Respell      113.78      (1.9%)      113.44      
(1.7%)   -0.3% (  -3% -    3%) 0.603
     BrowseRandomLabelSSDVFacets       20.75      (8.2%)       20.74     
(10.3%)   -0.0% ( -17% -   20%) 0.989
                          Fuzzy2       83.12      (1.8%)       83.11      
(1.1%)   -0.0% (  -2% -    2%) 0.976
       BrowseDayOfYearSSDVFacets       26.69     (12.0%)       26.70     
(11.6%)    0.0% ( -21% -   26%) 0.995
                        Wildcard      115.84      (5.1%)      115.96      
(5.8%)    0.1% ( -10% -   11%) 0.951
               TermDayOfYearSort      260.70      (5.4%)      260.99      
(2.8%)    0.1% (  -7% -    8%) 0.937
         AndHighMedDayTaxoFacets      136.32      (2.6%)      136.63      
(2.3%)    0.2% (  -4% -    5%) 0.773
                IntervalsOrdered      128.13      (7.5%)      128.45      
(7.7%)    0.3% ( -13% -   16%) 0.916
        AndHighHighDayTaxoFacets       13.82      (2.8%)       13.87      
(2.6%)    0.4% (  -4% -    5%) 0.657
                          Fuzzy1       79.16      (2.7%)       79.60      
(1.8%)    0.6% (  -3% -    5%) 0.433
                   TermMonthSort      360.17      (6.4%)      362.83      
(7.1%)    0.7% ( -11% -   15%) 0.728
                   TermTitleSort      191.21      (6.8%)      192.70      
(7.1%)    0.8% ( -12% -   15%) 0.723
                      TermDTSort      208.40      (2.9%)      210.39      
(2.9%)    1.0% (  -4% -    7%) 0.301
            MedTermDayTaxoFacets       78.66      (5.2%)       79.59      
(4.4%)    1.2% (  -7% -   11%) 0.436
                  TermDateFacets       41.04      (5.4%)       41.61      
(4.7%)    1.4% (  -8% -   12%) 0.385
                          IntNRQ      122.00      (8.1%)      124.08      
(8.3%)    1.7% ( -13% -   19%) 0.513
          OrHighMedDayTaxoFacets       23.16      (8.4%)       23.71      
(4.9%)    2.4% ( -10% -   17%) 0.272
           BrowseMonthSSDVFacets       28.68     (13.8%)       29.55     
(16.8%)    3.0% ( -24% -   39%) 0.531
       BrowseDayOfYearTaxoFacets       30.40     (32.2%)       31.67     
(34.2%)    4.2% ( -47% -  103%) 0.690
            BrowseDateTaxoFacets       30.26     (32.2%)       31.57     
(34.4%)    4.3% ( -47% -  104%) 0.680
                         Prefix3      402.14      (8.6%)      419.96      
(8.9%)    4.4% ( -12% -   23%) 0.109
                AndMedOrHighHigh       94.79      (4.0%)       99.03      
(4.5%)    4.5% (  -3% -   13%) 0.001
     BrowseRandomLabelTaxoFacets       32.45     (49.2%)       35.05     
(53.4%)    8.0% ( -63% -  217%) 0.622
           BrowseMonthTaxoFacets       28.68     (35.3%)       31.37     
(39.1%)    9.4% ( -48% -  129%) 0.425
            BrowseDateSSDVFacets        3.96     (28.1%)        4.54

[GitHub] [lucene] mocobeta commented on pull request #940: Use similarity.tf() in MoreLikeThis

2022-07-11 Thread GitBox



mocobeta commented on PR #940:
URL: https://github.com/apache/lucene/pull/940#issuecomment-1181301533

   Personally, I'd love to commit this to the upstream branch.
   I think we'd need a reproducible quality check (or regression test?) in 
Lucene as Robert suggested; I just haven't been able to take enough time to 
look at it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] stefanvodita commented on a diff in pull request #1015: [LUCENE-10629]: Add fast match query support to FacetSets

2022-07-11 Thread GitBox



stefanvodita commented on code in PR #1015:
URL: https://github.com/apache/lucene/pull/1015#discussion_r918597529


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java:
##
@@ -52,8 +52,10 @@ public MatchingFacetSetsCounts(
   String field,
   FacetsCollector hits,
   FacetSetDecoder facetSetDecoder,
+  Query fastMatchQuery,

Review Comment:
   Thanks! I'm happy with the PR as it is now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

73 matches

Mail list logo