[GitHub] [lucene] ldkjdk commented on pull request #730: Create ConjunctionDISI:patcher

2022-06-17 Thread GitBox


ldkjdk commented on PR #730:
URL: https://github.com/apache/lucene/pull/730#issuecomment-1158691084

   yes, for a search case
   
   BooleanQuery.Builder bQuery = new BooleanQuery.Builder();
   TermQuery contents= new TermQuery(new Term("contents", "hello"));
   bQuery.add(contents, BooleanClause.Occur.MUST);
   Query idq= IntPoint.newRangeQuery("id", 140, 150);
   bQuery.add(idq, BooleanClause.Occur.FILTER);
   Query q = bQuery.build();//MultiFieldQueryParser.parse(key, fields, flags);
   TopDocs td = searcher.search(q, 10);
   I think,
   if key word "hello" have matched a lot of record , perhaps will increase 
computational cost for  "skipper.skipTo(target) + 1"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges

2022-06-17 Thread Thomas Hoffmann (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1789#comment-1789
 ] 

Thomas Hoffmann commented on LUCENE-10583:
--

Hello Vigya,

adding the warning to most used classes sounds reasonable.

 

Another improvement would be to prevent the infinite loop (infinity is always 
bad except in math) and add a maximum waiting time with a reasonable default 
value.

The exception message when reaching the max. waiting time should include this 
hint, not synchronizing with lucene classes.
Thus the programmer might run into this problem but can easily fix it after 
reading the message.

 

Greetings, Thomas

> Deadlock with MMapDirectory while waitForMerges
> ---
>
> Key: LUCENE-10583
> URL: https://issues.apache.org/jira/browse/LUCENE-10583
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 8.11.1
> Environment: Java 17
> OS: Windows 2016
>Reporter: Thomas Hoffmann
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
> a deadlock situation happened in our application. We are using MMapDirectory 
> on Windows 2016 and got the following stacktrace:
> {code:java}
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait()  
> [0x413fc000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(java.base@17.0.2/Native Method)
>     - waiting on 
>     at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at 
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236)
>     at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278)
>     at 
> com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723)
>     - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory)
>     at 
> com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142)
> ...{code}
> All threads were waiting to lock <0x0006d5c00208> which got never 
> released.
> A lucene thread was also blocked, I dont know if this is relevant:
> {code:java}
> "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms 
> elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry  
> [0x5da9e000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346)
>     - waiting to lock <0x0006d5c00208> (a 
> org.apache.lucene.store.MMapDirectory)
>     at 
> org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363)
>     at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248)
>     at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289)
>     at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)
>     at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code}
> If looks like the merge operation never finished and released the lock.
> Is there any option to prevent this deadlock or how to investigate it further?
> A load-test didn't show this problem unfortunately.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [lucene] jpountz commented on pull request #964: LUCENE-10620: Pass the Weight to Collectors.

2022-06-17 Thread GitBox


jpountz commented on PR #964:
URL: https://github.com/apache/lucene/pull/964#issuecomment-1158911923

   Thanks for looking @romseygeek. To make sure this new API would effectively 
have more than one use-case, I migrated `TopScoreDocCollector` and 
`TopFieldCollector` to it too. The immediate benefit is that collectors that 
pass a `totalHitsThreshold` of `Integer.MAX_VALUE` will still be able to skip 
non-competitive hits if the weight supports counting hits. In addition to that, 
I fixed some tests that were assuming that `TotalHitCountCollector` would 
naively iterate over matches by using a new `DummyTotalHitCountCollector` 
instead.
   
   I verified that there is no performance impact on luceneutil using 
`wikimedium10m`:
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   HighTerm 2374.78  (5.1%) 2297.55  
(5.2%)   -3.3% ( -12% -7%) 0.047
MedTerm 2795.30  (5.4%) 2704.66  
(5.6%)   -3.2% ( -13% -8%) 0.063
   OrNotHighMed 1448.25  (3.9%) 1427.48  
(4.5%)   -1.4% (  -9% -7%) 0.286
  OrNotHighHigh  996.35  (3.1%)  982.37  
(4.6%)   -1.4% (  -8% -6%) 0.255
   OrHighNotMed 1898.69  (3.8%) 1876.02  
(4.7%)   -1.2% (  -9% -7%) 0.375
 AndHighLow 1049.40  (3.3%) 1042.92  
(3.8%)   -0.6% (  -7% -6%) 0.583
   HighSloppyPhrase   21.77  (4.0%)   21.66  
(4.8%)   -0.5% (  -8% -8%) 0.716
LowTerm 2640.20  (6.3%) 2629.11  
(4.2%)   -0.4% ( -10% -   10%) 0.803
   OrHighNotLow 1667.62  (4.2%) 1660.75  
(5.6%)   -0.4% (  -9% -9%) 0.794
   OrNotHighLow 1663.32  (3.0%) 1658.41  
(4.2%)   -0.3% (  -7% -7%) 0.801
LowSloppyPhrase   54.27  (3.1%)   54.15  
(3.6%)   -0.2% (  -6% -6%) 0.834
  OrHighNotHigh 1259.39  (3.7%) 1257.03  
(4.7%)   -0.2% (  -8% -8%) 0.889
MedSloppyPhrase  115.91  (4.3%)  115.79  
(6.1%)   -0.1% ( -10% -   10%) 0.952
   PKLookup  249.41  (1.2%)  249.32  
(1.5%)   -0.0% (  -2% -2%) 0.934
 Fuzzy2  118.47  (1.1%)  118.75  
(1.2%)0.2% (  -2% -2%) 0.538
Respell   74.59  (1.1%)   74.90  
(1.5%)0.4% (  -2% -3%) 0.323
 IntNRQ  682.36  (2.8%)  685.81  
(3.7%)0.5% (  -5% -7%) 0.628
 Fuzzy1  124.32  (1.1%)  125.09  
(1.1%)0.6% (  -1% -2%) 0.079
  MedPhrase  623.13  (3.3%)  627.26  
(3.0%)0.7% (  -5% -7%) 0.502
  OrHighMed  130.02  (3.7%)  130.94  
(4.2%)0.7% (  -6% -8%) 0.571
  LowPhrase  110.49  (3.6%)  111.30  
(2.5%)0.7% (  -5% -7%) 0.459
   Wildcard   40.65  (1.6%)   40.95  
(1.8%)0.7% (  -2% -4%) 0.167
  OrHighLow 1092.12  (3.0%) 1101.15  
(2.7%)0.8% (  -4% -6%) 0.360
 AndHighMed  234.73  (4.5%)  236.77  
(5.3%)0.9% (  -8% -   11%) 0.575
MedSpanNear   28.83  (4.1%)   29.14  
(3.3%)1.1% (  -6% -8%) 0.369
LowSpanNear   16.20  (4.2%)   16.38  
(3.4%)1.1% (  -6% -9%) 0.363
   HighSpanNear7.51  (4.7%)7.59  
(3.5%)1.1% (  -6% -9%) 0.405
AndHighHigh   70.69  (5.3%)   71.60  
(6.4%)1.3% (  -9% -   13%) 0.486
 OrHighHigh   30.64  (3.2%)   31.07  
(4.3%)1.4% (  -5% -9%) 0.244
 HighPhrase   22.89  (3.8%)   23.25  
(3.6%)1.6% (  -5% -9%) 0.178
Prefix3  421.34  (3.5%)  430.69  
(4.4%)2.2% (  -5% -   10%) 0.078
LowIntervalsOrdered   67.14  (4.8%)   69.35  
(5.5%)3.3% (  -6% -   14%) 0.043
   HighIntervalsOrdered6.49  (7.8%)6.73  
(7.1%)3.7% ( -10% -   20%) 0.112
MedIntervalsOrdered   37.02  (7.8%)   38.45  
(7.3%)3.9% ( -10% -   20%) 0.108
  HighTermDayOfYearSort  144.92  (3.7%)  150.78  
(4.6%)4.0% (  -4% -   12%) 0.002
 TermDTSort  204.11  (7.0%)  213.24  
(7.7%)4.5% (

[GitHub] [lucene] jpountz commented on pull request #730: Create ConjunctionDISI:patcher

2022-06-17 Thread GitBox


jpountz commented on PR #730:
URL: https://github.com/apache/lucene/pull/730#issuecomment-1158932157

   I worry that such a change would be adding little overhead all the time only 
to help in some rare cases, it's not clear to me that it would be a good 
trade-off. I'd be interested in more data about performance, e.g. latency 
before and after the change, number of segments, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #965: LUCENE-10618: Implement BooleanQuery rewrite rules based for minimumShouldMatch

2022-06-17 Thread GitBox


jpountz commented on code in PR #965:
URL: https://github.com/apache/lucene/pull/965#discussion_r900181624


##
lucene/CHANGES.txt:
##
@@ -86,6 +86,8 @@ Improvements
 * LUCENE-10585: Facet module code cleanup (copy/paste scrubbing, 
simplification and some very minor
   optimization tweaks). (Greg Miller)
 
+* LUCENE-10618: Implement BooleanQuery rewrite rules based for 
minimumShouldMatch. (Fang Hou)

Review Comment:
   nit: I would put this change under `Optimizations`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10619) Optimize the writeBytes in TermsHashPerField

2022-06-17 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555646#comment-17555646
 ] 

Adrien Grand commented on LUCENE-10619:
---

This looks like an interesting idea!

> Optimize the writeBytes in TermsHashPerField
> 
>
> Key: LUCENE-10619
> URL: https://issues.apache.org/jira/browse/LUCENE-10619
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.2
>Reporter: tangdh
>Priority: Major
>
> Because we don't know the length of slice, writeBytes will always write byte 
> one after another instead of writing a block of bytes.
> May be we could return both offset and length in ByteBlockPool#allocSlice?
> 1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits.
> 2. slice size is at most 200, so it could fit in 8 bits.
> So we could put them together into an int  offset | length
> There are only two places where this function is used,the cost of change it 
> is relatively small.
> When allocSlice could return the offset and length of new Slice, we could 
> change writeBytes like below
> {code:java}
> // write block of bytes each time
> while(remaining > 0 ) {
>int offsetAndLength = allocSlice(bytes, offset);
>length = min(remaining, (offsetAndLength & 0xff) - 1);
>offset = offsetAndLength >> 8;
>System.arraycopy(src, srcPos, bytePool.buffer, offset, length);
>remaining -= length;
>offset+= (length + 1);
> }
> {code}
> If it could work, I'd like to raise a pr.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #907: LUCENE-10357 Ghost fields and postings/points

2022-06-17 Thread GitBox


jpountz commented on PR #907:
URL: https://github.com/apache/lucene/pull/907#issuecomment-1158977379

   Thanks @shahrs87, could you now try to remove all instances of `if (terms == 
Terms.EMPTY)`? Hopefully existing logic should work with Terms instances 
regardless of whether they are empty or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #959: LUCENE-10507: Make it more likely to perform concurrent search in tests

2022-06-17 Thread GitBox


jpountz commented on PR #959:
URL: https://github.com/apache/lucene/pull/959#issuecomment-1158980744

   > The one thing I think (?) can change is the estimated total hit count if 
BMW kicked in. That can change even if you search the same segments, serially, 
but in a different order, I think.
   
   This is correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #959: LUCENE-10507: Make it more likely to perform concurrent search in tests

2022-06-17 Thread GitBox


jpountz merged PR #959:
URL: https://github.com/apache/lucene/pull/959


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #959: LUCENE-10507: Make it more likely to perform concurrent search in tests

2022-06-17 Thread GitBox


javanna commented on PR #959:
URL: https://github.com/apache/lucene/pull/959#issuecomment-1159003847

   thanks all for the feedback. Please keep me in the loop if you see things go 
wrong with this change, I am happy to make further adjustments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10617) Investigate recent Jenkins build failures in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler

2022-06-17 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10617.
---
Fix Version/s: 9.3
   Resolution: Fixed

This one looks addressed.

> Investigate recent Jenkins build failures in 
> TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
> 
>
> Key: LUCENE-10617
> URL: https://issues.apache.org/jira/browse/LUCENE-10617
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Gautam Worah
>Priority: Minor
> Fix For: 9.3
>
>
> Sample failures: [https://jenkins.thetaphi.de/job/Lucene-9.x-MacOSX/692/, 
> https://jenkins.thetaphi.de/job/Lucene-main-MacOSX/8177/|https://jenkins.thetaphi.de/job/Lucene-9.x-MacOSX/692/]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10078) Enable merge-on-refresh by default?

2022-06-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555672#comment-17555672
 ] 

ASF subversion and git services commented on LUCENE-10078:
--

Commit b180a8a97e7ad14df84196005ba0ac2581dd08a0 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b180a8a97e7 ]

LUCENE-10078: Fix more TestIndexWriterWithThreads failures.


> Enable merge-on-refresh by default?
> ---
>
> Key: LUCENE-10078
> URL: https://issues.apache.org/jira/browse/LUCENE-10078
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 9.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is a spinoff from the discussion in LUCENE-10073.
> The newish merge-on-refresh ([crazy origin 
> story|https://blog.mikemccandless.com/2021/03/open-source-collaboration-or-how-we.html])
>  feature is a powerful way to reduce searched segment counts, especially 
> helpful for applications using many indexing threads.  Such usage will write 
> many tiny segments on each refresh, which could quickly be merged up during 
> the {{refresh}} operation.
> We would have to implement a default for {{findFullFlushMerges}} 
> (LUCENE-10064 is open for this), and then we would need 
> {{IndexWriterConfig.getMaxFullFlushMergeWaitMillis}} a non-zero value (this 
> issue).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10078) Enable merge-on-refresh by default?

2022-06-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555673#comment-17555673
 ] 

ASF subversion and git services commented on LUCENE-10078:
--

Commit 288cf4385aacede30b61bb9b7ba52ac0884dd4f1 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=288cf4385aa ]

LUCENE-10078: Fix more TestIndexWriterWithThreads failures.


> Enable merge-on-refresh by default?
> ---
>
> Key: LUCENE-10078
> URL: https://issues.apache.org/jira/browse/LUCENE-10078
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 9.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is a spinoff from the discussion in LUCENE-10073.
> The newish merge-on-refresh ([crazy origin 
> story|https://blog.mikemccandless.com/2021/03/open-source-collaboration-or-how-we.html])
>  feature is a powerful way to reduce searched segment counts, especially 
> helpful for applications using many indexing threads.  Such usage will write 
> many tiny segments on each refresh, which could quickly be merged up during 
> the {{refresh}} operation.
> We would have to implement a default for {{findFullFlushMerges}} 
> (LUCENE-10064 is open for this), and then we would need 
> {{IndexWriterConfig.getMaxFullFlushMergeWaitMillis}} a non-zero value (this 
> issue).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8806) WANDScorer should support two-phase iterator

2022-06-17 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555683#comment-17555683
 ] 

Adrien Grand commented on LUCENE-8806:
--

Sorry [~denimorim] I'm not getting your question.

> WANDScorer should support two-phase iterator
> 
>
> Key: LUCENE-8806
> URL: https://issues.apache.org/jira/browse/LUCENE-8806
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Major
> Attachments: LUCENE-8806.patch, LUCENE-8806.patch
>
>
> Following https://issues.apache.org/jira/browse/LUCENE-8770 the WANDScorer 
> should leverage two-phase iterators in order to be faster when used in 
> conjunctions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-17 Thread GitBox


gsmiller commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1159091616

   @shaie 
   > I struggled back-and-forth between introducing a public final long[] 
comparableLongs on the abstract FacetSet to the getComparableLongs() method.
   
   +1 to the approach you went with. We can always change it, but I like how 
you have it personally.
   
   @mdmarshmallow 
   > I think we should also have some subpackages [...]
   
   I generally disagree with this. I _used_ to like breaking down functional 
areas into packages for organization, but it limits your ability to make 
classes/methods pkg-visible in order to expose a clean API. I now greatly 
prefer flatter packages with very limited APIs exposed.
   
   As for the `RangeMatching` interface, I'm not sure we need it? I think it's 
easy enough for a user to construct `LongRange` instances to pass to 
`RangeFacetSetMatcher` using the static factory methods (`fromDoubles`, 
`fromFloats`, etc.). It feels overly complicated to introduce 
`FacetSetRange<...>` and then require the different `FacetSet` implementations 
to implement these methods to deal with inclusive/exclusive boundaries. My only 
suggestion here might be to rename `LongRange` to just `Range`. The API may 
make a little more sense that way if dealing with something other than longs.
   
   Also, should we add `double` and `float` methods to `FacetSetDecoder`?
   
   And finally, +1 to having demo code that shows how to mix-and-match types 
within a single point. That would be interesting to write so we make sure this 
thing is really fully fleshed out. I think it is though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #922: Index only the docs for FacetField posting list

2022-06-17 Thread GitBox


gsmiller commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r900382196


##
lucene/CHANGES.txt:
##
@@ -67,6 +67,8 @@ Other
 
 * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to 
analysis-common. (Tomoko Uchida)
 
+* GITHUB#922: Remove unused and confusing FacetField indexing options (Gautam 
Worah)

Review Comment:
   Sorry @gautamworah96, I should have noticed this earlier, but should we put 
this under 9.3? This should be safe to backport.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #954: LUCENE-10603: Change iteration methodology for SSDV ordinals in the f…

2022-06-17 Thread GitBox


gsmiller merged PR #954:
URL: https://github.com/apache/lucene/pull/954


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #954: LUCENE-10603: Change iteration methodology for SSDV ordinals in the f…

2022-06-17 Thread GitBox


gsmiller commented on PR #954:
URL: https://github.com/apache/lucene/pull/954#issuecomment-1159099868

   @jpountz no problem. I wasn't in any rush with this, and since you'd had a 
look, I just wanted to make sure you didn't have additional feedback. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-06-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555714#comment-17555714
 ] 

ASF subversion and git services commented on LUCENE-10603:
--

Commit 6ba759df866289db485d44fd1f75b3eb00f8d99f in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6ba759df866 ]

LUCENE-10603: Change iteration methodology for SSDV ordinals in the faceting 
module (#954)



> Improve iteration of ords for SortedSetDocValues
> 
>
> Key: LUCENE-10603
> URL: https://issues.apache.org/jira/browse/LUCENE-10603
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Assignee: Lu Xugang
>Priority: Trivial
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> After SortedSetDocValues#docValueCount added since Lucene 9.2, should we 
> refactor the implementation of ords iterations using docValueCount instead of 
> NO_MORE_ORDS?
> Similar how SortedNumericDocValues did
> From 
> {code:java}
> for (long ord = values.nextOrd();ord != SortedSetDocValues.NO_MORE_ORDS; ord 
> = values.nextOrd()) {
> }{code}
> to
> {code:java}
> for (int i = 0; i < values.docValueCount(); i++) {
>   long ord = values.nextOrd();
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-06-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555722#comment-17555722
 ] 

ASF subversion and git services commented on LUCENE-10603:
--

Commit 2265b7109b2c7f79671306096a05d9a37306e7ed in lucene's branch 
refs/heads/branch_9x from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2265b7109b2 ]

LUCENE-10603: Change iteration methodology for SSDV ordinals in the faceting 
module


> Improve iteration of ords for SortedSetDocValues
> 
>
> Key: LUCENE-10603
> URL: https://issues.apache.org/jira/browse/LUCENE-10603
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Assignee: Lu Xugang
>Priority: Trivial
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After SortedSetDocValues#docValueCount added since Lucene 9.2, should we 
> refactor the implementation of ords iterations using docValueCount instead of 
> NO_MORE_ORDS?
> Similar how SortedNumericDocValues did
> From 
> {code:java}
> for (long ord = values.nextOrd();ord != SortedSetDocValues.NO_MORE_ORDS; ord 
> = values.nextOrd()) {
> }{code}
> to
> {code:java}
> for (int i = 0; i < values.docValueCount(); i++) {
>   long ord = values.nextOrd();
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

2022-06-17 Thread GitBox


Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r900427716


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) 
throws IOException {
 return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+DimConfig dimConfig = verifyDim(dim);
+FacetLabel cp = new FacetLabel(dim, path);
+int dimOrd = taxoReader.getOrdinal(cp);
+if (dimOrd == -1) {
+  return null;
+}
+
+int aggregatedValue = 0;
+int childCount = 0;
+
+List ordinals = new ArrayList<>();
+List ordValues = new ArrayList<>();
+
+if (sparseValues != null) {
+  for (IntIntCursor c : sparseValues) {
+int value = c.value;
+int ord = c.key;
+if (parents[ord] == dimOrd && value > 0) {
+  aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
+  childCount++;
+  ordinals.add(ord);
+  ordValues.add(value);
+}
+  }
+} else {
+  int[] children = getChildren();
+  int[] siblings = getSiblings();
+  int ord = children[dimOrd];
+  while (ord != TaxonomyReader.INVALID_ORDINAL) {
+int value = values[ord];
+if (value > 0) {
+  aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
+  childCount++;
+  ordinals.add(ord);
+  ordValues.add(value);
+}
+ord = siblings[ord];
+  }
+}
+
+if (aggregatedValue == 0) {
+  return null;
+}
+
+if (dimConfig.multiValued) {
+  if (dimConfig.requireDimCount) {
+aggregatedValue = getValue(dimOrd);
+  } else {
+// Our sum'd value is not correct, in general:
+aggregatedValue = -1;
+  }
+} else {
+  // Our sum'd dim value is accurate, so we keep it
+}
+
+int[] ordinalArray = new int[ordinals.size()];
+for (int i = 0; i < ordinals.size(); i++) {
+  ordinalArray[i] = ordinals.get(i);
+}

Review Comment:
   `getBulkPath` only takes in int array and I would need to cast List 
to int[] here. Please advise me if there is a cleaner way to do so. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

2022-06-17 Thread GitBox


Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r900427946


##
lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java:
##
@@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
 return createFacetResult(topChildrenForPath, dim, path);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+
+if (dimConfig.hierarchical) {
+  int pathOrd = (int) dv.lookupTerm(new 
BytesRef(FacetsConfig.pathToString(dim, path)));
+  if (pathOrd < 0) {
+// path was never indexed
+return null;
+  }
+  SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim);
+  return getPathResult(dimConfig, dim, path, pathOrd, 
dimTree.iterator(pathOrd));
+} else {
+  if (path.length > 0) {
+throw new IllegalArgumentException(
+"Field is not configured as hierarchical, path should be 0 
length");
+  }
+  OrdRange ordRange = state.getOrdRange(dim);
+  if (ordRange == null) {
+// means dimension was never indexed
+return null;
+  }
+  int dimOrd = ordRange.start;
+  PrimitiveIterator.OfInt childIt = ordRange.iterator();
+  if (dimConfig.multiValued && dimConfig.requireDimCount) {
+// If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+// the dimension and we need to skip past it so the iterator is 
positioned on the first
+// child:
+childIt.next();
+  }
+  return getPathResult(dimConfig, dim, null, dimOrd, childIt);
+}
+  }

Review Comment:
   Thanks for the suggestion!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

2022-06-17 Thread GitBox


Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r900427716


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) 
throws IOException {
 return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+DimConfig dimConfig = verifyDim(dim);
+FacetLabel cp = new FacetLabel(dim, path);
+int dimOrd = taxoReader.getOrdinal(cp);
+if (dimOrd == -1) {
+  return null;
+}
+
+int aggregatedValue = 0;
+int childCount = 0;
+
+List ordinals = new ArrayList<>();
+List ordValues = new ArrayList<>();
+
+if (sparseValues != null) {
+  for (IntIntCursor c : sparseValues) {
+int value = c.value;
+int ord = c.key;
+if (parents[ord] == dimOrd && value > 0) {
+  aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
+  childCount++;
+  ordinals.add(ord);
+  ordValues.add(value);
+}
+  }
+} else {
+  int[] children = getChildren();
+  int[] siblings = getSiblings();
+  int ord = children[dimOrd];
+  while (ord != TaxonomyReader.INVALID_ORDINAL) {
+int value = values[ord];
+if (value > 0) {
+  aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
+  childCount++;
+  ordinals.add(ord);
+  ordValues.add(value);
+}
+ord = siblings[ord];
+  }
+}
+
+if (aggregatedValue == 0) {
+  return null;
+}
+
+if (dimConfig.multiValued) {
+  if (dimConfig.requireDimCount) {
+aggregatedValue = getValue(dimOrd);
+  } else {
+// Our sum'd value is not correct, in general:
+aggregatedValue = -1;
+  }
+} else {
+  // Our sum'd dim value is accurate, so we keep it
+}
+
+int[] ordinalArray = new int[ordinals.size()];
+for (int i = 0; i < ordinals.size(); i++) {
+  ordinalArray[i] = ordinals.get(i);
+}

Review Comment:
   `getBulkPath` only takes in int array and I would need to cast 
`List` to `int[]` here. Please advise me if there is a cleaner way to 
do so. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

2022-06-17 Thread GitBox


Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r900427716


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) 
throws IOException {
 return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+DimConfig dimConfig = verifyDim(dim);
+FacetLabel cp = new FacetLabel(dim, path);
+int dimOrd = taxoReader.getOrdinal(cp);
+if (dimOrd == -1) {
+  return null;
+}
+
+int aggregatedValue = 0;
+int childCount = 0;
+
+List ordinals = new ArrayList<>();
+List ordValues = new ArrayList<>();
+
+if (sparseValues != null) {
+  for (IntIntCursor c : sparseValues) {
+int value = c.value;
+int ord = c.key;
+if (parents[ord] == dimOrd && value > 0) {
+  aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
+  childCount++;
+  ordinals.add(ord);
+  ordValues.add(value);
+}
+  }
+} else {
+  int[] children = getChildren();
+  int[] siblings = getSiblings();
+  int ord = children[dimOrd];
+  while (ord != TaxonomyReader.INVALID_ORDINAL) {
+int value = values[ord];
+if (value > 0) {
+  aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
+  childCount++;
+  ordinals.add(ord);
+  ordValues.add(value);
+}
+ord = siblings[ord];
+  }
+}
+
+if (aggregatedValue == 0) {
+  return null;
+}
+
+if (dimConfig.multiValued) {
+  if (dimConfig.requireDimCount) {
+aggregatedValue = getValue(dimOrd);
+  } else {
+// Our sum'd value is not correct, in general:
+aggregatedValue = -1;
+  }
+} else {
+  // Our sum'd dim value is accurate, so we keep it
+}
+
+int[] ordinalArray = new int[ordinals.size()];
+for (int i = 0; i < ordinals.size(); i++) {
+  ordinalArray[i] = ordinals.get(i);
+}

Review Comment:
   `getBulkPath` only takes in int array and I would need to cast 
`List` to `int[]` here, and therefore, I can't just use ordinals 
directly for getBulkPath. Please advise me if there is a cleaner way to do so. 
Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

2022-06-17 Thread GitBox


Yuti-G commented on PR #914:
URL: https://github.com/apache/lucene/pull/914#issuecomment-1159161102

   Thanks @gsmiller for spending time reviewing my PR and leaving the great 
feedback! I addressed all of the comments except for the one that requires 
casting `List` to `int[]` for `ordinals`  to call `getBulkPath` in the 
IntTaxonomy and FloatTaxonomy. Please let me know if there is a better way to 
do it. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-17 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-10557:
---
Description: 
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)

  was:
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * (/) Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)


> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * (/) Get a consensus about the migration among committers
>  * Choose issues that should be moved to GitHub
>  ** Discussion thread 
> [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
>  ** -Conclusion for now: We don't migrate any issues. Only new issues should 
> be opened on GitHub.-
>  ** Write a prototype migration script - the decision could be made on that
>  * Build the convention for issue label/milestone management
>  ** Do some experiments on a sandbox repository 
> [https://github.com/mocobeta/sandbox-lucene-10557]
>  ** Make documentation for metadata (label/milestone) management 
>  * Enable Github issue on the lucene's repository
>  ** Raise an issue on INFRA
>  ** (Create an issue-only private repos

[jira] [Created] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-17 Thread Tomoko Uchida (Jira)
Tomoko Uchida created LUCENE-10622:
--

 Summary: Prepare complete migration script to GitHub issue from 
Jira (best effort)
 Key: LUCENE-10622
 URL: https://issues.apache.org/jira/browse/LUCENE-10622
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Tomoko Uchida
Assignee: Tomoko Uchida


If we intend to move the history to GitHub, it should be perfect as far as 
possible - significantly degraded copies of history are harmful, rather than 
helpful for future contributors, I think.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-17 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555803#comment-17555803
 ] 

Tomoko Uchida commented on LUCENE-10622:


Can we have (read-only) access keys to Jira via APIs?

> Prepare complete migration script to GitHub issue from Jira (best effort)
> -
>
> Key: LUCENE-10622
> URL: https://issues.apache.org/jira/browse/LUCENE-10622
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> If we intend to move the history to GitHub, it should be perfect as far as 
> possible - significantly degraded copies of history are harmful, rather than 
> helpful for future contributors, I think.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10618) Implement BooleanQuery rewrite rules based for minimumShouldMatch

2022-06-17 Thread fang hou (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555815#comment-17555815
 ] 

fang hou commented on LUCENE-10618:
---

should be resolved after this pr https://github.com/apache/lucene/pull/965

> Implement BooleanQuery rewrite rules based for minimumShouldMatch
> -
>
> Key: LUCENE-10618
> URL: https://issues.apache.org/jira/browse/LUCENE-10618
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While looking into a test failure I noticed that we sometimes create weights 
> for boolean queries with no SHOULD clauses and a non-zero 
> minimumNumberShouldMatch.
> We could rewrite BooleanQuery to MatchNoDocsQuery when the number of SHOULD 
> clauses is less than minimumNumberShouldMatch, and make SHOULD clauses 
> required when the number of SHOULD clauses is equal to 
> minimumNumberShouldMatch.
> This feels a bit like a degenerate case (why would the use create such a 
> query in the first place?) but this case can also happen to non-degenerate 
> queries if some SHOULD clauses rewrite to a MatchNoDocsQuery and get removed 
> through rewrite.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-17 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555819#comment-17555819
 ] 

Tomoko Uchida commented on LUCENE-10622:


I can create a Personal Access Token 
[https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html|https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html],
 but GET REST API can be called without any authentication.

{code}
$ curl -s https://issues.apache.org/jira/rest/api/latest/issue/LUCENE-10557 | 
jq 
{
  "expand": 
"renderedFields,names,schema,operations,editmeta,changelog,versionedRepresentations",
  "id": "13443092",
  "self": "https://issues.apache.org/jira/rest/api/latest/issue/13443092";,
  "key": "LUCENE-10557",
  "fields": {
"parent": {
  "id": "13442225",
  "key": "LUCENE-10543",
  "self": "https://issues.apache.org/jira/rest/api/2/issue/13442225";,
  "fields": {
"summary": "Achieve contribution workflow perfection (with progress)",
"status": {
  "self": "https://issues.apache.org/jira/rest/api/2/status/1";,
  "description": "The issue is open and ready for the assignee to start 
work on it.",
  "iconUrl": 
"https://issues.apache.org/jira/images/icons/statuses/open.png";,
  "name": "Open",
  "id": "1",
  "statusCategory": {
"self": 
"https://issues.apache.org/jira/rest/api/2/statuscategory/2";,
"id": 2,
"key": "new",
"colorName": "blue-gray",
"name": "To Do"
  }
},
"priority": {
  "self": "https://issues.apache.org/jira/rest/api/2/priority/3";,
  "iconUrl": 
"https://issues.apache.org/jira/images/icons/priorities/major.svg";,
  "name": "Major",
  "id": "3"
},
..
{code}

> Prepare complete migration script to GitHub issue from Jira (best effort)
> -
>
> Key: LUCENE-10622
> URL: https://issues.apache.org/jira/browse/LUCENE-10622
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> If we intend to move the history to GitHub, it should be perfect as far as 
> possible - significantly degraded copies of history are harmful, rather than 
> helpful for future contributors, I think.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-17 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555839#comment-17555839
 ] 

Tomoko Uchida commented on LUCENE-10622:


For experiments, I'd choose "difficult" issues for porting. E.g., ones with 
long histories (comments), frequent status changes, attached files, 
markups/code snippets, links, sub-tasks, etc. All fields/data can contain 
important information and shouldn't be discarded if we want to preserve the 
history, not just to move the platform.
Examples could be:
- https://issues.apache.org/jira/browse/LUCENE-2562
- https://issues.apache.org/jira/browse/LUCENE-9077
- https://issues.apache.org/jira/browse/LUCENE-4100

Honestly I'm skeptical that it can be done with a satisfactory level of quality 
- but we should give it a try before drawing a conclusion.

> Prepare complete migration script to GitHub issue from Jira (best effort)
> -
>
> Key: LUCENE-10622
> URL: https://issues.apache.org/jira/browse/LUCENE-10622
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> If we intend to move the history to GitHub, it should be perfect as far as 
> possible - significantly degraded copies of history are harmful, rather than 
> helpful for future contributors, I think.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-17 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555844#comment-17555844
 ] 

Tomoko Uchida commented on LUCENE-10622:


A fair warning - the migration will be significantly delayed for this. Bulk 
migration should be done before cutting over GitHub issue (if we decide to do 
so).

> Prepare complete migration script to GitHub issue from Jira (best effort)
> -
>
> Key: LUCENE-10622
> URL: https://issues.apache.org/jira/browse/LUCENE-10622
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> If we intend to move the history to GitHub, it should be perfect as far as 
> possible - significantly degraded copies of history are harmful, rather than 
> helpful for future contributors, I think.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-17 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10557:
-
Description: 
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that. 
Things to consider:
 *** version numbers - labels or milestones?
 *** add a comment/ prepend a link to the source Jira issue on github side,
 *** add a comment/ prepend a link on the jira side to the new issue on github 
side (for people who access jira from blogs, mailing list archives and other 
sources that will have stale links),
 *** convert cross-issue automatic links in comments/ descriptions (as 
suggested by Robert),
 *** maybe prefix (or postfix) the issue title on github side with the original 
LUCENE-XYZ key so that it is easier to search for a particular issue there?
 *** how to deal with user IDs (author, reporter, commenters)? Do they have to 
be github users? Will information about people not registered on github be lost?
 *** create an extra mapping file of old-issue-new-issue URLs for any potential 
future uses. 
 *** what to do with issue numbers in git/svn commits? These could be rewritten 
but it'd change the entire git history tree - I don't think this is practical, 
while doable.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)

  was:
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)


> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.co

[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-17 Thread GitBox


shaie commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1159374588

   > My only suggestion here might be to rename `LongRange` to just `Range`
   
   Yeah `LongRange` now feels like there are missing `Int/Float/DoubleRange` 
which is not the case. But maybe in order to give it a more purposeful name we 
can name it `Dim/DimensionRange`?
   
   
   
   > Also, should we add `double` and `float` methods to `FacetSetDecoder`?
   
   I don't think so? `FSD` is currently about decoding the encoded `byte[]` 
into a `long[]` for `FacetSetMatcher` purposes.  I assume you're thinking about 
a user-level API which can decode the values back into a `FacetSet` right? 
Feels to me like we can do it later too, and I think we'll need a diff API for 
that, maybe a `FacetSetReader` with `FacetSet[] fromBytes(BytesRef)` or maybe 
add to `FacetSet` an `unpackValues` method? I prefer though that we focus in 
this PR on the indexing + matching, to get this PR to completion and also since 
it doesn't currently feel to me like a necessary API to use facet sets. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-17 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10557:
-
Description: 
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that. 
Things to consider:
 *** version numbers - labels or milestones?
 *** add a comment/ prepend a link to the source Jira issue on github side,
 *** add a comment/ prepend a link on the jira side to the new issue on github 
side (for people who access jira from blogs, mailing list archives and other 
sources that will have stale links),
 *** convert cross-issue automatic links in comments/ descriptions (as 
suggested by Robert),
 *** strategy to deal with sub-issues (hierarchies),
 *** maybe prefix (or postfix) the issue title on github side with the original 
LUCENE-XYZ key so that it is easier to search for a particular issue there?
 *** how to deal with user IDs (author, reporter, commenters)? Do they have to 
be github users? Will information about people not registered on github be lost?
 *** create an extra mapping file of old-issue-new-issue URLs for any potential 
future uses. 
 *** what to do with issue numbers in git/svn commits? These could be rewritten 
but it'd change the entire git history tree - I don't think this is practical, 
while doable.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)

  was:
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** -Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.-
 ** Write a prototype migration script - the decision could be made on that. 
Things to consider:
 *** version numbers - labels or milestones?
 *** add a comment/ prepend a link to the source Jira issue on github side,
 *** add a comment/ prepend a link on the jira side to the new issue on github 
side (for people who access jira from blogs, mailing list archives and other 
sources that will have stale links),
 *** convert cross-issue automatic links in comments/ descriptions (as 
suggested by Robert),
 *** maybe prefix (or postfix) the issue title on github side with the original 
LUCENE-XYZ key so that it is easier to search for a particular issue there?
 *** how to deal with user IDs (author, reporter, commenters)? Do they have to 
be github users? Will information about people not registered on github be lost?
 *** create an extra mapping file of old-issue-new-issue URLs for any potential 
future uses. 
 *** what to do with issue numbers in git/svn commits? These could be rewritten 
but it'd change the entire git history tree - I don't think this is practical, 
while doable.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milesto

[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-17 Thread GitBox


shaie commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1159376910

   > It feels overly complicated to introduce `FacetSetRange<...>` and then 
require the different `FacetSet` implementations to implement these methods to 
deal with inclusive/exclusive boundaries.
   
   I agree, it's a good point. These ranges are not an attribute of a 
`FacetSet` and we shouldn't force custom implementations to implement that. I 
feel that the `fromInts/Longs/Floats/Double` gives users what they need to use 
`RangeFSM`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org