date:20220616

[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-16 Thread GitBox



shaie commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1157313525

   Actually there weren't many conflicts so pushed my commit, we can now 
compare the two options side-by-side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #961: Handle more cases in `BooleanWeight#count`.

2022-06-16 Thread GitBox



jpountz commented on PR #961:
URL: https://github.com/apache/lucene/pull/961#issuecomment-1157403188

   Thanks for the review, I pushed a comment to clarify that more cases could 
be handled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10620) Can we pass the Weight to Collector?

2022-06-16 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-10620:
-

 Summary: Can we pass the Weight to Collector?
 Key: LUCENE-10620
 URL: https://issues.apache.org/jira/browse/LUCENE-10620
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


Today collectors cannot know about the Weight, and thus they cannot leverage 
{{Weight#count}}. {{IndexSearcher#count}} works around it by extending 
{{TotalHitCountCollector}} in order to shortcut counting the number of hits on 
a segment via {{Weight#count}} whenever possible.

It works, but I would prefer this shortcut to work for all users of 
TotalHitCountCollector. For instance the faceting module creates a 
MultiCollector over a TotalHitCountCollector and a FacetCollector, and today it 
doesn't benefit from quick counts, which would enable it to only collect 
matches into a FacetCollector.

I'm considering adding a new {{Collector#setWeight}} API to allow collectors to 
leverage {{Weight#count}}. I gave {{TotalHitCountCollector}} as an example 
above, but this could have applications for our top-docs collectors too, which 
could skip counting hits at all if the weight can provide them with the hit 
count up-front.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request, #964: LUCENE-10620: Pass the Weight to Collectors.

2022-06-16 Thread GitBox



jpountz opened a new pull request, #964:
URL: https://github.com/apache/lucene/pull/964

   This allows `Collector`s to use `Weight#count` when appropriate.
   
   See [LUCENE-10620](https://issues.apache.org/jira/browse/LUCENE-10620).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10620) Can we pass the Weight to Collector?

2022-06-16 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555001#comment-17555001
 ] 

Adrien Grand commented on LUCENE-10620:
---

I opened a draft PR that demonstrates the idea: 
https://github.com/apache/lucene/pull/964.

> Can we pass the Weight to Collector?
> 
>
> Key: LUCENE-10620
> URL: https://issues.apache.org/jira/browse/LUCENE-10620
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Today collectors cannot know about the Weight, and thus they cannot leverage 
> {{Weight#count}}. {{IndexSearcher#count}} works around it by extending 
> {{TotalHitCountCollector}} in order to shortcut counting the number of hits 
> on a segment via {{Weight#count}} whenever possible.
> It works, but I would prefer this shortcut to work for all users of 
> TotalHitCountCollector. For instance the faceting module creates a 
> MultiCollector over a TotalHitCountCollector and a FacetCollector, and today 
> it doesn't benefit from quick counts, which would enable it to only collect 
> matches into a FacetCollector.
> I'm considering adding a new {{Collector#setWeight}} API to allow collectors 
> to leverage {{Weight#count}}. I gave {{TotalHitCountCollector}} as an example 
> above, but this could have applications for our top-docs collectors too, 
> which could skip counting hits at all if the weight can provide them with the 
> hit count up-front.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] kaivalnp commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

2022-06-16 Thread GitBox



kaivalnp commented on code in PR #958:
URL: https://github.com/apache/lucene/pull/958#discussion_r898955669


##
lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java:
##
@@ -498,7 +498,7 @@ public void testRandom() throws IOException {
 
   /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */
   public void testRandomWithFilter() throws IOException {
-int numDocs = 200;
+int numDocs = 2000;

Review Comment:
   Yes, makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] kaivalnp commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

2022-06-16 Thread GitBox



kaivalnp commented on code in PR #958:
URL: https://github.com/apache/lucene/pull/958#discussion_r898967077


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##
@@ -87,10 +87,14 @@ public static NeighborQueue search(
 int numVisited = 0;
 for (int level = graph.numLevels() - 1; level >= 1; level--) {
   results = graphSearcher.searchLevel(query, 1, level, eps, vectors, 
graph, null, visitedLimit);
-  eps[0] = results.pop();
 
   numVisited += results.visitedCount();
   visitedLimit -= results.visitedCount();
+
+  if (results.incomplete()) {

Review Comment:
   I had done this to prevent some duplicate code (as `searchLevel` won't do 
anything when `visitedLimit` <= 0)
   However, it also makes sense from a readability perspective to return 
`results` there itself



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on a diff in pull request #964: LUCENE-10620: Pass the Weight to Collectors.

2022-06-16 Thread GitBox



romseygeek commented on code in PR #964:
URL: https://github.com/apache/lucene/pull/964#discussion_r898969670


##
lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollector.java:
##
@@ -16,13 +16,17 @@
  */
 package org.apache.lucene.search;
 
+import java.io.IOException;
+import org.apache.lucene.index.LeafReaderContext;
+
 /**
  * Just counts the total number of hits. For cases when this is the only 
collector used, {@link
  * IndexSearcher#count(Query)} should be called instead of {@link 
IndexSearcher#search(Query,

Review Comment:
   I don't think this javadoc comment is accurate anymore with these changes?



##
lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingCollector.java:
##
@@ -65,4 +68,11 @@ public void collect(int doc) throws IOException {
   }
 };
   }
+
+  @Override
+  public void setWeight(Weight weight) {
+weightSet = true;

Review Comment:
   Should we assert that the Weight is only set once as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10621) Upgrade to OpenNLP 2.0 and add

2022-06-16 Thread Jeff Zemerick (Jira)

Jeff Zemerick created LUCENE-10621:
--

 Summary: Upgrade to OpenNLP 2.0 and add 
 Key: LUCENE-10621
 URL: https://issues.apache.org/jira/browse/LUCENE-10621
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Jeff Zemerick


Apache OpenNLP 2.0.0 has been released. This 
[version|https://opennlp.apache.org/news/release-200.html] contains new 
implementations of TokenNameFinder and DocumentCategorizer that supports models 
in the ONNX format.

This task is update the OpenNLP dependency version to 2.0 and to add support 
for the new interface implementations in the OpenNLP analysis module that was 
added in LUCENE-2899.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10621) Upgrade to OpenNLP 2.0 and add

2022-06-16 Thread Jeff Zemerick (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zemerick updated LUCENE-10621:
---
Description: 
Apache OpenNLP 2.0.0 has been released. This 
[version|https://opennlp.apache.org/news/release-200.html] contains new 
implementations of TokenNameFinder and DocumentCategorizer that supports models 
in the ONNX format. (TokenNameFinder is in NLPNERTaggerOp, DocumentCategorizer 
is not currently exposed through Lucene.)

This task is update the OpenNLP dependency version to 2.0 and to add support 
for the new interface implementations in the OpenNLP analysis module that was 
added in LUCENE-2899.

  was:
Apache OpenNLP 2.0.0 has been released. This 
[version|https://opennlp.apache.org/news/release-200.html] contains new 
implementations of TokenNameFinder and DocumentCategorizer that supports models 
in the ONNX format.

This task is update the OpenNLP dependency version to 2.0 and to add support 
for the new interface implementations in the OpenNLP analysis module that was 
added in LUCENE-2899.


> Upgrade to OpenNLP 2.0 and add 
> ---
>
> Key: LUCENE-10621
> URL: https://issues.apache.org/jira/browse/LUCENE-10621
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/analysis
>Reporter: Jeff Zemerick
>Priority: Major
>
> Apache OpenNLP 2.0.0 has been released. This 
> [version|https://opennlp.apache.org/news/release-200.html] contains new 
> implementations of TokenNameFinder and DocumentCategorizer that supports 
> models in the ONNX format. (TokenNameFinder is in NLPNERTaggerOp, 
> DocumentCategorizer is not currently exposed through Lucene.)
> This task is update the OpenNLP dependency version to 2.0 and to add support 
> for the new interface implementations in the OpenNLP analysis module that was 
> added in LUCENE-2899.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10619) Optimize the writeBytes in TermsHashPerField

2022-06-16 Thread tangdh (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangdh updated LUCENE-10619:

Description: 
Because we don't know the length of slice, writeBytes will always write byte 
one after another instead of writing a block of bytes.

May be we could return both offset and length in ByteBlockPool#allocSlice?
1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits.
2. slice size is at most 200, so it could fit in 8 bits.
So we could put them together into an int  offset | length
There are only two places where this function is used，the cost of change it is 
relatively small.

When allocSlice could return the offset and length of new Slice, we could 
change writeBytes like below
{code:java}
// write block of bytes each time
while(remaining > 0 ) {
   int offsetAndLength = allocSlice(bytes, offset);
   length = min(remaining, (offsetAndLength & 0xff) - 1);
   offset = offsetAndLength >> 8;
   System.arraycopy(src, srcPos, bytePool.buffer, offset, length);
   remaining -= length;
   offset+= (length + 1);
}
{code}
If it could work, I'd like to raise a pr.

  was:
Because we don't know the length of slice, writeBytes will always write byte 
one after another instead of writing a block of bytes.

May be we could return both offset and length in ByteBlockPool#allocSlice?
1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits.
2. slice size is at most 200, so it could fit in 8 bits.
So we could put them together into an int  offset | length
There are only two places where this function is used，the cost of change it is 
relatively small.

When allocSlice could return the offset and length of new Slice, we could 
writeBytes like below

{code:java}
// write block of bytes each time
while(remaining > 0 ) {
   int offsetAndLength = allocSlice(bytes, offset);
   length = min(remaining, (offsetAndLength & 0xff) - 1);
   offset = offsetAndLength >> 8;
   System.arraycopy(src, srcPos, bytePool.buffer, offset, length);
   remaining -= length;
   offset+= (length + 1);
}
{code}

If it's a good idea, I'd like to raise a pr.




> Optimize the writeBytes in TermsHashPerField
> 
>
> Key: LUCENE-10619
> URL: https://issues.apache.org/jira/browse/LUCENE-10619
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.2
>Reporter: tangdh
>Priority: Major
>
> Because we don't know the length of slice, writeBytes will always write byte 
> one after another instead of writing a block of bytes.
> May be we could return both offset and length in ByteBlockPool#allocSlice?
> 1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits.
> 2. slice size is at most 200, so it could fit in 8 bits.
> So we could put them together into an int  offset | length
> There are only two places where this function is used，the cost of change it 
> is relatively small.
> When allocSlice could return the offset and length of new Slice, we could 
> change writeBytes like below
> {code:java}
> // write block of bytes each time
> while(remaining > 0 ) {
>int offsetAndLength = allocSlice(bytes, offset);
>length = min(remaining, (offsetAndLength & 0xff) - 1);
>offset = offsetAndLength >> 8;
>System.arraycopy(src, srcPos, bytePool.buffer, offset, length);
>remaining -= length;
>offset+= (length + 1);
> }
> {code}
> If it could work, I'd like to raise a pr.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang merged pull request #962: LUCENE-10600: (backport)SortedSetDocValues#docValueCount should be an int, not long (#960)

2022-06-16 Thread GitBox



LuXugang merged PR #962:
URL: https://github.com/apache/lucene/pull/962


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10600) SortedSetDocValues#docValueCount should be an int, not long

2022-06-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555127#comment-17555127
 ] 

ASF subversion and git services commented on LUCENE-10600:
--

Commit d79c30b524d036e2e615673371b18b3f3d75a606 in lucene's branch 
refs/heads/branch_9x from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d79c30b524d ]

LUCENE-10600: SortedSetDocValues#docValueCount should be an int, not long (#960)



> SortedSetDocValues#docValueCount should be an int, not long
> ---
>
> Key: LUCENE-10600
> URL: https://issues.apache.org/jira/browse/LUCENE-10600
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Lu Xugang
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #922: Index only the docs for FacetField posting list

2022-06-16 Thread GitBox



gsmiller commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r899250685


##
lucene/CHANGES.txt:
##
@@ -67,6 +67,8 @@ Other
 
 * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to 
analysis-common. (Tomoko Uchida)
 
+* Remove unused and confusing FacetField indexing options (Gautam Worah)

Review Comment:
   Can you change this to:
   
   ```suggestion
   * GITHUB#992: Remove unused and confusing FacetField indexing options 
(Gautam Worah)
   ```
   
   You probably saw that we now allow changes without corresponding Jira 
issues, but we use the PR reference in place of the issue ID in this case. 
Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10577) Quantize vector values

2022-06-16 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555170#comment-17555170
 ] 

Michael Sokolov commented on LUCENE-10577:
--

I'm open to doing this with a different API. I tried to avoid massive code 
duplication and extra boilerplate, which is where I think creating yet another 
codec would lead, but I'd be happy to be proven wrong. That's why I tried to 
keep the HNSW util classes well-factored rather than introducing byte-oriented 
version and a float-oriented version which I think would be nightmarish to 
maintain since almost all code would be identical. Kind of analogous to the way 
FST allows you to work with different datatypes. If we want to pull out the 
comparison function into somewhere else, that seems fine, but I don't see how 
that would work. The API [~julietibs] proposed above 
(VectorValues#similarity(float[]))  would have to re-convert (the query vector) 
from float[]->byte[] for every document it compares against, wouldn't it?

> Quantize vector values
> --
>
> Key: LUCENE-10577
> URL: https://issues.apache.org/jira/browse/LUCENE-10577
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The {{KnnVectorField}} api handles vectors with 4-byte floating point values. 
> These fields can be used (via {{KnnVectorsReader}}) in two main ways:
> 1. The {{VectorValues}} iterator enables retrieving values
> 2. Approximate nearest -neighbor search
> The main point of this addition was to provide the search capability, and to 
> support that it is not really necessary to store vectors in full precision. 
> Perhaps users may also be willing to retrieve values in lower precision for 
> whatever purpose those serve, if they are able to store more samples. We know 
> that 8 bits is enough to provide a very near approximation to the same 
> recall/performance tradeoff that is achieved with the full-precision vectors. 
> I'd like to explore how we could enable 4:1 compression of these fields by 
> reducing their precision.
> A few ways I can imagine this would be done:
> 1. Provide a parallel byte-oriented API. This would allow users to provide 
> their data in reduced-precision format and give control over the quantization 
> to them. It would have a major impact on the Lucene API surface though, 
> essentially requiring us to duplicate all of the vector APIs.
> 2. Automatically quantize the stored vector data when we can. This would 
> require no or perhaps very limited change to the existing API to enable the 
> feature.
> I've been exploring (2), and what I find is that we can achieve very good 
> recall results using dot-product similarity scoring by simple linear scaling 
> + quantization of the vector values, so long as  we choose the scale that 
> minimizes the quantization error. Dot-product is amenable to this treatment 
> since vectors are required to be unit-length when used with that similarity 
> function. 
>  Even still there is variability in the ideal scale over different data sets. 
> A good choice seems to be max(abs(min-value), abs(max-value)), but of course 
> this assumes that the data set doesn't have a few outlier data points. A 
> theoretical range can be obtained by 1/sqrt(dimension), but this is only 
> useful when the samples are normally distributed. We could in theory 
> determine the ideal scale when flushing a segment and manage this 
> quantization per-segment, but then numerical error could creep in when 
> merging.
> I'll post a patch/PR with an experimental setup I've been using for 
> evaluation purposes. It is pretty self-contained and simple, but has some 
> drawbacks that need to be addressed:
> 1. No automated mechanism for determining quantization scale (it's a constant 
> that I have been playing with)
> 2. Converts from byte/float when computing dot-product instead of directly 
> computing on byte values
> I'd like to get people's feedback on the approach and whether in general we 
> should think about doing this compression under the hood, or expose a 
> byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty 
> compelling and we should pursue something.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a diff in pull request #922: Index only the docs for FacetField posting list

2022-06-16 Thread GitBox



gautamworah96 commented on code in PR #922:
URL: https://github.com/apache/lucene/pull/922#discussion_r899384834


##
lucene/CHANGES.txt:
##
@@ -67,6 +67,8 @@ Other
 
 * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to 
analysis-common. (Tomoko Uchida)
 
+* Remove unused and confusing FacetField indexing options (Gautam Worah)

Review Comment:
   Ugh. Sorry about this. I did not see any GITHUB issues in the vicinity and 
assumed that this should work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani merged pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

2022-06-16 Thread GitBox



jtibshirani merged PR #958:
URL: https://github.com/apache/lucene/pull/958


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters

2022-06-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555258#comment-17555258
 ] 

ASF subversion and git services commented on LUCENE-10611:
--

Commit 6df6cb093cca7f93075bad131fbc4ad6a8ce5fef in lucene's branch 
refs/heads/main from Kaival Parikh
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6df6cb093cc ]

LUCENE-10611: Fix Heap Error in HnswGraphSearcher (#958)

The HNSW graph search does not consider that visitedLimit may be reached in the
upper levels of graph search itself

This occurs when the pre-filter is too restrictive (and its count sets the
visitedLimit). So instead of switching over to exactSearch, it tries to pop
from an empty heap and throws an error.

We can check if results are incomplete after searching in upper levels, and
break out accordingly. This way it won't throw heap errors, and gracefully
switch to exactSearch instead

> KnnVectorQuery throwing Heap Error for Restrictive Filters
> --
>
> Key: LUCENE-10611
> URL: https://issues.apache.org/jira/browse/LUCENE-10611
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kaival Parikh
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The HNSW graph search does not consider that visitedLimit may be reached in 
> the upper levels of graph search itself
> This occurs when the pre-filter is too restrictive (and its count sets the 
> visitedLimit). So instead of switching over to exactSearch, it tries to [pop 
> from an empty 
> heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90]
>  and throws an error
>  
> To reproduce this error, we can +increase the numDocs 
> [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500]
>  to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached 
> faster)
>  
> Stacktrace:
> {code:java}
> The heap is empty
> java.lang.IllegalStateException: The heap is empty
> at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0)
> at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111)
> at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98)
> at 
> org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90)
> at 
> org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236)
> at 
> org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272)
> at 
> org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235)
> at 
> org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters

2022-06-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555259#comment-17555259
 ] 

ASF subversion and git services commented on LUCENE-10611:
--

Commit 450ee81154b4443d0060521f42aba1ac8b7c1db2 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=450ee81154b ]

LUCENE-10611: Tweak the CHANGES description


> KnnVectorQuery throwing Heap Error for Restrictive Filters
> --
>
> Key: LUCENE-10611
> URL: https://issues.apache.org/jira/browse/LUCENE-10611
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kaival Parikh
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The HNSW graph search does not consider that visitedLimit may be reached in 
> the upper levels of graph search itself
> This occurs when the pre-filter is too restrictive (and its count sets the 
> visitedLimit). So instead of switching over to exactSearch, it tries to [pop 
> from an empty 
> heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90]
>  and throws an error
>  
> To reproduce this error, we can +increase the numDocs 
> [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500]
>  to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached 
> faster)
>  
> Stacktrace:
> {code:java}
> The heap is empty
> java.lang.IllegalStateException: The heap is empty
> at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0)
> at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111)
> at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98)
> at 
> org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90)
> at 
> org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236)
> at 
> org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272)
> at 
> org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235)
> at 
> org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters

2022-06-16 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555263#comment-17555263
 ] 

ASF subversion and git services commented on LUCENE-10611:
--

Commit 1e808ae6238fc2e73615e34f02258ff0383e7296 in lucene's branch 
refs/heads/branch_9x from Kaival Parikh
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1e808ae6238 ]

LUCENE-10611: Fix Heap Error in HnswGraphSearcher (#958)

The HNSW graph search does not consider that visitedLimit may be reached in the
upper levels of graph search itself

This occurs when the pre-filter is too restrictive (and its count sets the
visitedLimit). So instead of switching over to exactSearch, it tries to pop
from an empty heap and throws an error.

We can check if results are incomplete after searching in upper levels, and
break out accordingly. This way it won't throw heap errors, and gracefully
switch to exactSearch instead


> KnnVectorQuery throwing Heap Error for Restrictive Filters
> --
>
> Key: LUCENE-10611
> URL: https://issues.apache.org/jira/browse/LUCENE-10611
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kaival Parikh
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The HNSW graph search does not consider that visitedLimit may be reached in 
> the upper levels of graph search itself
> This occurs when the pre-filter is too restrictive (and its count sets the 
> visitedLimit). So instead of switching over to exactSearch, it tries to [pop 
> from an empty 
> heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90]
>  and throws an error
>  
> To reproduce this error, we can +increase the numDocs 
> [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500]
>  to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached 
> faster)
>  
> Stacktrace:
> {code:java}
> The heap is empty
> java.lang.IllegalStateException: The heap is empty
> at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0)
> at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111)
> at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98)
> at 
> org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90)
> at 
> org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236)
> at 
> org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272)
> at 
> org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235)
> at 
> org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters

2022-06-16 Thread Julie Tibshirani (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julie Tibshirani resolved LUCENE-10611.
---
Fix Version/s: 9.3
   Resolution: Fixed

> KnnVectorQuery throwing Heap Error for Restrictive Filters
> --
>
> Key: LUCENE-10611
> URL: https://issues.apache.org/jira/browse/LUCENE-10611
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Kaival Parikh
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The HNSW graph search does not consider that visitedLimit may be reached in 
> the upper levels of graph search itself
> This occurs when the pre-filter is too restrictive (and its count sets the 
> visitedLimit). So instead of switching over to exactSearch, it tries to [pop 
> from an empty 
> heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90]
>  and throws an error
>  
> To reproduce this error, we can +increase the numDocs 
> [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500]
>  to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached 
> faster)
>  
> Stacktrace:
> {code:java}
> The heap is empty
> java.lang.IllegalStateException: The heap is empty
> at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0)
> at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111)
> at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98)
> at 
> org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90)
> at 
> org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236)
> at 
> org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272)
> at 
> org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235)
> at 
> org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges

2022-06-16 Thread Vigya Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555272#comment-17555272
 ] 

Vigya Sharma commented on LUCENE-10583:
---

Created [PR #963|https://github.com/apache/lucene/pull/963] with docstring 
changes. There are many more lucene objects that should not be locked by 
applications. Adding a warning to all of them seems repetitive and impractical. 
We could handpick the common classes where users run into traps and add it 
there, like we're doing for this Jira.

Wonder if there is a better way to avoid such errors, like some efficient way 
to check that objects are lock free at the start of public APIs. Also, maybe we 
should add this warning in some Getting Started tutorial for lucene?

> Deadlock with MMapDirectory while waitForMerges
> ---
>
> Key: LUCENE-10583
> URL: https://issues.apache.org/jira/browse/LUCENE-10583
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 8.11.1
> Environment: Java 17
> OS: Windows 2016
>Reporter: Thomas Hoffmann
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
> a deadlock situation happened in our application. We are using MMapDirectory 
> on Windows 2016 and got the following stacktrace:
> {code:java}
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait()  
> [0x413fc000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(java.base@17.0.2/Native Method)
>     - waiting on 
>     at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at 
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236)
>     at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278)
>     at 
> com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723)
>     - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory)
>     at 
> com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142)
> ...{code}
> All threads were waiting to lock <0x0006d5c00208> which got never 
> released.
> A lucene thread was also blocked, I dont know if this is relevant:
> {code:java}
> "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms 
> elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry  
> [0x5da9e000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346)
>     - waiting to lock <0x0006d5c00208> (a 
> org.apache.lucene.store.MMapDirectory)
>     at 
> org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363)
>     at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248)
>     at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289)
>     at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)
>     at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code}
> If looks like the merge operation never finished and released the lock.
> Is there any option to prevent this deadlock or how to investigate it further?
> A load-test didn't show this problem unf

[GitHub] [lucene] JoeHF opened a new pull request, #965: LUCENE-10618: Implement BooleanQuery rewrite rules based for minimumShouldMatch

2022-06-16 Thread GitBox



JoeHF opened a new pull request, #965:
URL: https://github.com/apache/lucene/pull/965

   ### Description (or a Jira issue link if you have one)
   Detailed discussion see: https://issues.apache.org/jira/browse/LUCENE-10618
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-16 Thread Tomoko Uchida (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-10557:
---
Description: 
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * (/) Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to issues@lucene.apache.org
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)

  was:
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * Get a consensus about the migration among committers
 * Enable Github issue on the lucene's repository (currently, it is disabled on 
it)
 * Build the convention or rules for issue label/milestone management
 * Choose issues that should be moved to GitHub (I think too old or obsolete 
issues can remain Jira.)


> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * (/) Get a consensus about the migration among committers
>  * (/) Choose issues that should be moved to GitHub
>  ** Discussion thread 
> [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
>  ** Conclusion for now: We don't migrate any issues. Only new issues should 
> be opened on GitHub.
>  * Build the convention for issue label/milestone management
>  ** Do some experiments on a sandbox repository 
> [https://github.com/mocobeta/sandbox-lucene-10557]
>  ** Make documentation for metadata (label/milestone) management 
>  * Enable Github issue on the lucene's repository
>  ** Raise an issue on INFRA
>  ** (Create an issue-only private repository for sensitive issues if it's 
> needed and allowed)
>  ** Set a mail hook to issues@lucene.apache.org
>  * Set a schedule for migration
>  ** Give some time to committers to play around with issues/labels/milestones 
> before the actual migration
>  ** Make an announcement on the mail lists
>  ** Show some text messages when opening a new Jira issue (in issue template?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-16 Thread Tomoko Uchida (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-10557:
---
Description: 
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * (/) Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to 
[issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the 
general mail group name)
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)

  was:
A few (not the majority) Apache projects already use the GitHub issue instead 
of Jira. For example,

Airflow: [https://github.com/apache/airflow/issues]

BookKeeper: [https://github.com/apache/bookkeeper/issues]

So I think it'd be technically possible that we move to GitHub issue. I have 
little knowledge of how to proceed with it, I'd like to discuss whether we 
should migrate to it, and if so, how to smoothly handle the migration.

The major tasks would be:
 * (/) Get a consensus about the migration among committers
 * (/) Choose issues that should be moved to GitHub
 ** Discussion thread 
[https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
 ** Conclusion for now: We don't migrate any issues. Only new issues should be 
opened on GitHub.
 * Build the convention for issue label/milestone management
 ** Do some experiments on a sandbox repository 
[https://github.com/mocobeta/sandbox-lucene-10557]
 ** Make documentation for metadata (label/milestone) management 
 * Enable Github issue on the lucene's repository
 ** Raise an issue on INFRA
 ** (Create an issue-only private repository for sensitive issues if it's 
needed and allowed)
 ** Set a mail hook to issues@lucene.apache.org
 * Set a schedule for migration
 ** Give some time to committers to play around with issues/labels/milestones 
before the actual migration
 ** Make an announcement on the mail lists
 ** Show some text messages when opening a new Jira issue (in issue template?)


> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * (/) Get a consensus about the migration among committers
>  * (/) Choose issues that should be moved to GitHub
>  ** Discussion thread 
> [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
>  ** Conclusion for now: We don't migrate any issues. Only new issues should 
> be opened on GitHub.
>  * Build the convention for issue label/milestone management
>  ** Do some experiments on a sandbox repository 
> [https://github.com/mocobeta/sandbox-lucene-10557]
>  ** Make documentation for metadata (label/milestone) management 
>  * Enable Github issue on the lucene's repository
>  ** Raise an issue on INFRA
>  ** (Create an issue-only private repository for sensitive issues if it's 
> needed and allowed)
>  ** Set a mail hook to 
> [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to 
> the general mail group name)
>  * Set a schedule for migration
>  *

[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

2022-06-16 Thread GitBox



Yuti-G commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r899773538


##
lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java:
##
@@ -346,6 +346,43 @@ private void increment(long value) {
 }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+if (dim.equals(field) == false) {
+  throw new IllegalArgumentException(
+  "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+}
+if (path.length != 0) {
+  throw new IllegalArgumentException("path.length should be 0");
+}
+
+List labelValues = new ArrayList<>();
+boolean countsAdded = false;
+if (hashCounts.size() != 0) {
+  for (LongIntCursor c : hashCounts) {
+int count = c.value;
+if (count != 0) {
+  if (countsAdded == false && c.key >= counts.length) {
+countsAdded = true;
+appendCounts(labelValues);
+  }
+  labelValues.add(new LabelAndValue(Long.toString(c.key), count));
+}
+  }
+}
+
+if (countsAdded == false) {
+  appendCounts(labelValues);
+}
+
+return new FacetResult(
+field,
+new String[0],
+totCount,
+labelValues.toArray(new LabelAndValue[0]),
+labelValues.size());

Review Comment:
   Thank you so much for providing a simplified logic here!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

[GitHub] [lucene] jpountz commented on pull request #961: Handle more cases in `BooleanWeight#count`.

[jira] [Created] (LUCENE-10620) Can we pass the Weight to Collector?

[GitHub] [lucene] jpountz opened a new pull request, #964: LUCENE-10620: Pass the Weight to Collectors.

[jira] [Commented] (LUCENE-10620) Can we pass the Weight to Collector?

[GitHub] [lucene] kaivalnp commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

[GitHub] [lucene] kaivalnp commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

[GitHub] [lucene] romseygeek commented on a diff in pull request #964: LUCENE-10620: Pass the Weight to Collectors.

[jira] [Created] (LUCENE-10621) Upgrade to OpenNLP 2.0 and add

[jira] [Updated] (LUCENE-10621) Upgrade to OpenNLP 2.0 and add

[jira] [Updated] (LUCENE-10619) Optimize the writeBytes in TermsHashPerField

[GitHub] [lucene] LuXugang merged pull request #962: LUCENE-10600: (backport)SortedSetDocValues#docValueCount should be an int, not long (#960)

[jira] [Commented] (LUCENE-10600) SortedSetDocValues#docValueCount should be an int, not long

[GitHub] [lucene] gsmiller commented on a diff in pull request #922: Index only the docs for FacetField posting list

[jira] [Commented] (LUCENE-10577) Quantize vector values

[GitHub] [lucene] gautamworah96 commented on a diff in pull request #922: Index only the docs for FacetField posting list

[GitHub] [lucene] jtibshirani merged pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters

[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters

[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters

[jira] [Resolved] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters

[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges

[GitHub] [lucene] JoeHF opened a new pull request, #965: LUCENE-10618: Implement BooleanQuery rewrite rules based for minimumShouldMatch

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira

[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

26 matches

Site Navigation

Mail list logo

Footer information