[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on PR #841: URL: https://github.com/apache/lucene/pull/841#issuecomment-1157313525 Actually there weren't many conflicts so pushed my commit, we can now compare the two options side-by-side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #961: Handle more cases in `BooleanWeight#count`.
jpountz commented on PR #961: URL: https://github.com/apache/lucene/pull/961#issuecomment-1157403188 Thanks for the review, I pushed a comment to clarify that more cases could be handled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10620) Can we pass the Weight to Collector?
Adrien Grand created LUCENE-10620: - Summary: Can we pass the Weight to Collector? Key: LUCENE-10620 URL: https://issues.apache.org/jira/browse/LUCENE-10620 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Today collectors cannot know about the Weight, and thus they cannot leverage {{Weight#count}}. {{IndexSearcher#count}} works around it by extending {{TotalHitCountCollector}} in order to shortcut counting the number of hits on a segment via {{Weight#count}} whenever possible. It works, but I would prefer this shortcut to work for all users of TotalHitCountCollector. For instance the faceting module creates a MultiCollector over a TotalHitCountCollector and a FacetCollector, and today it doesn't benefit from quick counts, which would enable it to only collect matches into a FacetCollector. I'm considering adding a new {{Collector#setWeight}} API to allow collectors to leverage {{Weight#count}}. I gave {{TotalHitCountCollector}} as an example above, but this could have applications for our top-docs collectors too, which could skip counting hits at all if the weight can provide them with the hit count up-front. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request, #964: LUCENE-10620: Pass the Weight to Collectors.
jpountz opened a new pull request, #964: URL: https://github.com/apache/lucene/pull/964 This allows `Collector`s to use `Weight#count` when appropriate. See [LUCENE-10620](https://issues.apache.org/jira/browse/LUCENE-10620). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10620) Can we pass the Weight to Collector?
[ https://issues.apache.org/jira/browse/LUCENE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555001#comment-17555001 ] Adrien Grand commented on LUCENE-10620: --- I opened a draft PR that demonstrates the idea: https://github.com/apache/lucene/pull/964. > Can we pass the Weight to Collector? > > > Key: LUCENE-10620 > URL: https://issues.apache.org/jira/browse/LUCENE-10620 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Today collectors cannot know about the Weight, and thus they cannot leverage > {{Weight#count}}. {{IndexSearcher#count}} works around it by extending > {{TotalHitCountCollector}} in order to shortcut counting the number of hits > on a segment via {{Weight#count}} whenever possible. > It works, but I would prefer this shortcut to work for all users of > TotalHitCountCollector. For instance the faceting module creates a > MultiCollector over a TotalHitCountCollector and a FacetCollector, and today > it doesn't benefit from quick counts, which would enable it to only collect > matches into a FacetCollector. > I'm considering adding a new {{Collector#setWeight}} API to allow collectors > to leverage {{Weight#count}}. I gave {{TotalHitCountCollector}} as an example > above, but this could have applications for our top-docs collectors too, > which could skip counting hits at all if the weight can provide them with the > hit count up-front. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kaivalnp commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher
kaivalnp commented on code in PR #958: URL: https://github.com/apache/lucene/pull/958#discussion_r898955669 ## lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java: ## @@ -498,7 +498,7 @@ public void testRandom() throws IOException { /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */ public void testRandomWithFilter() throws IOException { -int numDocs = 200; +int numDocs = 2000; Review Comment: Yes, makes sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kaivalnp commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher
kaivalnp commented on code in PR #958: URL: https://github.com/apache/lucene/pull/958#discussion_r898967077 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -87,10 +87,14 @@ public static NeighborQueue search( int numVisited = 0; for (int level = graph.numLevels() - 1; level >= 1; level--) { results = graphSearcher.searchLevel(query, 1, level, eps, vectors, graph, null, visitedLimit); - eps[0] = results.pop(); numVisited += results.visitedCount(); visitedLimit -= results.visitedCount(); + + if (results.incomplete()) { Review Comment: I had done this to prevent some duplicate code (as `searchLevel` won't do anything when `visitedLimit` <= 0) However, it also makes sense from a readability perspective to return `results` there itself -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a diff in pull request #964: LUCENE-10620: Pass the Weight to Collectors.
romseygeek commented on code in PR #964: URL: https://github.com/apache/lucene/pull/964#discussion_r898969670 ## lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollector.java: ## @@ -16,13 +16,17 @@ */ package org.apache.lucene.search; +import java.io.IOException; +import org.apache.lucene.index.LeafReaderContext; + /** * Just counts the total number of hits. For cases when this is the only collector used, {@link * IndexSearcher#count(Query)} should be called instead of {@link IndexSearcher#search(Query, Review Comment: I don't think this javadoc comment is accurate anymore with these changes? ## lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingCollector.java: ## @@ -65,4 +68,11 @@ public void collect(int doc) throws IOException { } }; } + + @Override + public void setWeight(Weight weight) { +weightSet = true; Review Comment: Should we assert that the Weight is only set once as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10621) Upgrade to OpenNLP 2.0 and add
Jeff Zemerick created LUCENE-10621: -- Summary: Upgrade to OpenNLP 2.0 and add Key: LUCENE-10621 URL: https://issues.apache.org/jira/browse/LUCENE-10621 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Jeff Zemerick Apache OpenNLP 2.0.0 has been released. This [version|https://opennlp.apache.org/news/release-200.html] contains new implementations of TokenNameFinder and DocumentCategorizer that supports models in the ONNX format. This task is update the OpenNLP dependency version to 2.0 and to add support for the new interface implementations in the OpenNLP analysis module that was added in LUCENE-2899. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10621) Upgrade to OpenNLP 2.0 and add
[ https://issues.apache.org/jira/browse/LUCENE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zemerick updated LUCENE-10621: --- Description: Apache OpenNLP 2.0.0 has been released. This [version|https://opennlp.apache.org/news/release-200.html] contains new implementations of TokenNameFinder and DocumentCategorizer that supports models in the ONNX format. (TokenNameFinder is in NLPNERTaggerOp, DocumentCategorizer is not currently exposed through Lucene.) This task is update the OpenNLP dependency version to 2.0 and to add support for the new interface implementations in the OpenNLP analysis module that was added in LUCENE-2899. was: Apache OpenNLP 2.0.0 has been released. This [version|https://opennlp.apache.org/news/release-200.html] contains new implementations of TokenNameFinder and DocumentCategorizer that supports models in the ONNX format. This task is update the OpenNLP dependency version to 2.0 and to add support for the new interface implementations in the OpenNLP analysis module that was added in LUCENE-2899. > Upgrade to OpenNLP 2.0 and add > --- > > Key: LUCENE-10621 > URL: https://issues.apache.org/jira/browse/LUCENE-10621 > Project: Lucene - Core > Issue Type: Task > Components: modules/analysis >Reporter: Jeff Zemerick >Priority: Major > > Apache OpenNLP 2.0.0 has been released. This > [version|https://opennlp.apache.org/news/release-200.html] contains new > implementations of TokenNameFinder and DocumentCategorizer that supports > models in the ONNX format. (TokenNameFinder is in NLPNERTaggerOp, > DocumentCategorizer is not currently exposed through Lucene.) > This task is update the OpenNLP dependency version to 2.0 and to add support > for the new interface implementations in the OpenNLP analysis module that was > added in LUCENE-2899. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10619) Optimize the writeBytes in TermsHashPerField
[ https://issues.apache.org/jira/browse/LUCENE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tangdh updated LUCENE-10619: Description: Because we don't know the length of slice, writeBytes will always write byte one after another instead of writing a block of bytes. May be we could return both offset and length in ByteBlockPool#allocSlice? 1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits. 2. slice size is at most 200, so it could fit in 8 bits. So we could put them together into an int offset | length There are only two places where this function is used,the cost of change it is relatively small. When allocSlice could return the offset and length of new Slice, we could change writeBytes like below {code:java} // write block of bytes each time while(remaining > 0 ) { int offsetAndLength = allocSlice(bytes, offset); length = min(remaining, (offsetAndLength & 0xff) - 1); offset = offsetAndLength >> 8; System.arraycopy(src, srcPos, bytePool.buffer, offset, length); remaining -= length; offset+= (length + 1); } {code} If it could work, I'd like to raise a pr. was: Because we don't know the length of slice, writeBytes will always write byte one after another instead of writing a block of bytes. May be we could return both offset and length in ByteBlockPool#allocSlice? 1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits. 2. slice size is at most 200, so it could fit in 8 bits. So we could put them together into an int offset | length There are only two places where this function is used,the cost of change it is relatively small. When allocSlice could return the offset and length of new Slice, we could writeBytes like below {code:java} // write block of bytes each time while(remaining > 0 ) { int offsetAndLength = allocSlice(bytes, offset); length = min(remaining, (offsetAndLength & 0xff) - 1); offset = offsetAndLength >> 8; System.arraycopy(src, srcPos, bytePool.buffer, offset, length); remaining -= length; offset+= (length + 1); } {code} If it's a good idea, I'd like to raise a pr. > Optimize the writeBytes in TermsHashPerField > > > Key: LUCENE-10619 > URL: https://issues.apache.org/jira/browse/LUCENE-10619 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.2 >Reporter: tangdh >Priority: Major > > Because we don't know the length of slice, writeBytes will always write byte > one after another instead of writing a block of bytes. > May be we could return both offset and length in ByteBlockPool#allocSlice? > 1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits. > 2. slice size is at most 200, so it could fit in 8 bits. > So we could put them together into an int offset | length > There are only two places where this function is used,the cost of change it > is relatively small. > When allocSlice could return the offset and length of new Slice, we could > change writeBytes like below > {code:java} > // write block of bytes each time > while(remaining > 0 ) { >int offsetAndLength = allocSlice(bytes, offset); >length = min(remaining, (offsetAndLength & 0xff) - 1); >offset = offsetAndLength >> 8; >System.arraycopy(src, srcPos, bytePool.buffer, offset, length); >remaining -= length; >offset+= (length + 1); > } > {code} > If it could work, I'd like to raise a pr. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang merged pull request #962: LUCENE-10600: (backport)SortedSetDocValues#docValueCount should be an int, not long (#960)
LuXugang merged PR #962: URL: https://github.com/apache/lucene/pull/962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10600) SortedSetDocValues#docValueCount should be an int, not long
[ https://issues.apache.org/jira/browse/LUCENE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555127#comment-17555127 ] ASF subversion and git services commented on LUCENE-10600: -- Commit d79c30b524d036e2e615673371b18b3f3d75a606 in lucene's branch refs/heads/branch_9x from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d79c30b524d ] LUCENE-10600: SortedSetDocValues#docValueCount should be an int, not long (#960) > SortedSetDocValues#docValueCount should be an int, not long > --- > > Key: LUCENE-10600 > URL: https://issues.apache.org/jira/browse/LUCENE-10600 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Lu Xugang >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #922: Index only the docs for FacetField posting list
gsmiller commented on code in PR #922: URL: https://github.com/apache/lucene/pull/922#discussion_r899250685 ## lucene/CHANGES.txt: ## @@ -67,6 +67,8 @@ Other * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida) +* Remove unused and confusing FacetField indexing options (Gautam Worah) Review Comment: Can you change this to: ```suggestion * GITHUB#992: Remove unused and confusing FacetField indexing options (Gautam Worah) ``` You probably saw that we now allow changes without corresponding Jira issues, but we use the PR reference in place of the issue ID in this case. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10577) Quantize vector values
[ https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555170#comment-17555170 ] Michael Sokolov commented on LUCENE-10577: -- I'm open to doing this with a different API. I tried to avoid massive code duplication and extra boilerplate, which is where I think creating yet another codec would lead, but I'd be happy to be proven wrong. That's why I tried to keep the HNSW util classes well-factored rather than introducing byte-oriented version and a float-oriented version which I think would be nightmarish to maintain since almost all code would be identical. Kind of analogous to the way FST allows you to work with different datatypes. If we want to pull out the comparison function into somewhere else, that seems fine, but I don't see how that would work. The API [~julietibs] proposed above (VectorValues#similarity(float[])) would have to re-convert (the query vector) from float[]->byte[] for every document it compares against, wouldn't it? > Quantize vector values > -- > > Key: LUCENE-10577 > URL: https://issues.apache.org/jira/browse/LUCENE-10577 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Michael Sokolov >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The {{KnnVectorField}} api handles vectors with 4-byte floating point values. > These fields can be used (via {{KnnVectorsReader}}) in two main ways: > 1. The {{VectorValues}} iterator enables retrieving values > 2. Approximate nearest -neighbor search > The main point of this addition was to provide the search capability, and to > support that it is not really necessary to store vectors in full precision. > Perhaps users may also be willing to retrieve values in lower precision for > whatever purpose those serve, if they are able to store more samples. We know > that 8 bits is enough to provide a very near approximation to the same > recall/performance tradeoff that is achieved with the full-precision vectors. > I'd like to explore how we could enable 4:1 compression of these fields by > reducing their precision. > A few ways I can imagine this would be done: > 1. Provide a parallel byte-oriented API. This would allow users to provide > their data in reduced-precision format and give control over the quantization > to them. It would have a major impact on the Lucene API surface though, > essentially requiring us to duplicate all of the vector APIs. > 2. Automatically quantize the stored vector data when we can. This would > require no or perhaps very limited change to the existing API to enable the > feature. > I've been exploring (2), and what I find is that we can achieve very good > recall results using dot-product similarity scoring by simple linear scaling > + quantization of the vector values, so long as we choose the scale that > minimizes the quantization error. Dot-product is amenable to this treatment > since vectors are required to be unit-length when used with that similarity > function. > Even still there is variability in the ideal scale over different data sets. > A good choice seems to be max(abs(min-value), abs(max-value)), but of course > this assumes that the data set doesn't have a few outlier data points. A > theoretical range can be obtained by 1/sqrt(dimension), but this is only > useful when the samples are normally distributed. We could in theory > determine the ideal scale when flushing a segment and manage this > quantization per-segment, but then numerical error could creep in when > merging. > I'll post a patch/PR with an experimental setup I've been using for > evaluation purposes. It is pretty self-contained and simple, but has some > drawbacks that need to be addressed: > 1. No automated mechanism for determining quantization scale (it's a constant > that I have been playing with) > 2. Converts from byte/float when computing dot-product instead of directly > computing on byte values > I'd like to get people's feedback on the approach and whether in general we > should think about doing this compression under the hood, or expose a > byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty > compelling and we should pursue something. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #922: Index only the docs for FacetField posting list
gautamworah96 commented on code in PR #922: URL: https://github.com/apache/lucene/pull/922#discussion_r899384834 ## lucene/CHANGES.txt: ## @@ -67,6 +67,8 @@ Other * LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida) +* Remove unused and confusing FacetField indexing options (Gautam Worah) Review Comment: Ugh. Sorry about this. I did not see any GITHUB issues in the vicinity and assumed that this should work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher
jtibshirani merged PR #958: URL: https://github.com/apache/lucene/pull/958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters
[ https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555258#comment-17555258 ] ASF subversion and git services commented on LUCENE-10611: -- Commit 6df6cb093cca7f93075bad131fbc4ad6a8ce5fef in lucene's branch refs/heads/main from Kaival Parikh [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6df6cb093cc ] LUCENE-10611: Fix Heap Error in HnswGraphSearcher (#958) The HNSW graph search does not consider that visitedLimit may be reached in the upper levels of graph search itself This occurs when the pre-filter is too restrictive (and its count sets the visitedLimit). So instead of switching over to exactSearch, it tries to pop from an empty heap and throws an error. We can check if results are incomplete after searching in upper levels, and break out accordingly. This way it won't throw heap errors, and gracefully switch to exactSearch instead > KnnVectorQuery throwing Heap Error for Restrictive Filters > -- > > Key: LUCENE-10611 > URL: https://issues.apache.org/jira/browse/LUCENE-10611 > Project: Lucene - Core > Issue Type: Bug >Reporter: Kaival Parikh >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > The HNSW graph search does not consider that visitedLimit may be reached in > the upper levels of graph search itself > This occurs when the pre-filter is too restrictive (and its count sets the > visitedLimit). So instead of switching over to exactSearch, it tries to [pop > from an empty > heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90] > and throws an error > > To reproduce this error, we can +increase the numDocs > [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500] > to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached > faster) > > Stacktrace: > {code:java} > The heap is empty > java.lang.IllegalStateException: The heap is empty > at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0) > at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111) > at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98) > at > org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90) > at > org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236) > at > org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272) > at > org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235) > at > org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters
[ https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555259#comment-17555259 ] ASF subversion and git services commented on LUCENE-10611: -- Commit 450ee81154b4443d0060521f42aba1ac8b7c1db2 in lucene's branch refs/heads/main from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=450ee81154b ] LUCENE-10611: Tweak the CHANGES description > KnnVectorQuery throwing Heap Error for Restrictive Filters > -- > > Key: LUCENE-10611 > URL: https://issues.apache.org/jira/browse/LUCENE-10611 > Project: Lucene - Core > Issue Type: Bug >Reporter: Kaival Parikh >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > The HNSW graph search does not consider that visitedLimit may be reached in > the upper levels of graph search itself > This occurs when the pre-filter is too restrictive (and its count sets the > visitedLimit). So instead of switching over to exactSearch, it tries to [pop > from an empty > heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90] > and throws an error > > To reproduce this error, we can +increase the numDocs > [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500] > to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached > faster) > > Stacktrace: > {code:java} > The heap is empty > java.lang.IllegalStateException: The heap is empty > at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0) > at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111) > at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98) > at > org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90) > at > org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236) > at > org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272) > at > org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235) > at > org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters
[ https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555263#comment-17555263 ] ASF subversion and git services commented on LUCENE-10611: -- Commit 1e808ae6238fc2e73615e34f02258ff0383e7296 in lucene's branch refs/heads/branch_9x from Kaival Parikh [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1e808ae6238 ] LUCENE-10611: Fix Heap Error in HnswGraphSearcher (#958) The HNSW graph search does not consider that visitedLimit may be reached in the upper levels of graph search itself This occurs when the pre-filter is too restrictive (and its count sets the visitedLimit). So instead of switching over to exactSearch, it tries to pop from an empty heap and throws an error. We can check if results are incomplete after searching in upper levels, and break out accordingly. This way it won't throw heap errors, and gracefully switch to exactSearch instead > KnnVectorQuery throwing Heap Error for Restrictive Filters > -- > > Key: LUCENE-10611 > URL: https://issues.apache.org/jira/browse/LUCENE-10611 > Project: Lucene - Core > Issue Type: Bug >Reporter: Kaival Parikh >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > The HNSW graph search does not consider that visitedLimit may be reached in > the upper levels of graph search itself > This occurs when the pre-filter is too restrictive (and its count sets the > visitedLimit). So instead of switching over to exactSearch, it tries to [pop > from an empty > heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90] > and throws an error > > To reproduce this error, we can +increase the numDocs > [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500] > to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached > faster) > > Stacktrace: > {code:java} > The heap is empty > java.lang.IllegalStateException: The heap is empty > at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0) > at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111) > at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98) > at > org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90) > at > org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236) > at > org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272) > at > org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235) > at > org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters
[ https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani resolved LUCENE-10611. --- Fix Version/s: 9.3 Resolution: Fixed > KnnVectorQuery throwing Heap Error for Restrictive Filters > -- > > Key: LUCENE-10611 > URL: https://issues.apache.org/jira/browse/LUCENE-10611 > Project: Lucene - Core > Issue Type: Bug >Reporter: Kaival Parikh >Priority: Minor > Fix For: 9.3 > > Time Spent: 1h > Remaining Estimate: 0h > > The HNSW graph search does not consider that visitedLimit may be reached in > the upper levels of graph search itself > This occurs when the pre-filter is too restrictive (and its count sets the > visitedLimit). So instead of switching over to exactSearch, it tries to [pop > from an empty > heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90] > and throws an error > > To reproduce this error, we can +increase the numDocs > [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500] > to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached > faster) > > Stacktrace: > {code:java} > The heap is empty > java.lang.IllegalStateException: The heap is empty > at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0) > at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111) > at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98) > at > org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90) > at > org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236) > at > org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272) > at > org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235) > at > org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges
[ https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555272#comment-17555272 ] Vigya Sharma commented on LUCENE-10583: --- Created [PR #963|https://github.com/apache/lucene/pull/963] with docstring changes. There are many more lucene objects that should not be locked by applications. Adding a warning to all of them seems repetitive and impractical. We could handpick the common classes where users run into traps and add it there, like we're doing for this Jira. Wonder if there is a better way to avoid such errors, like some efficient way to check that objects are lock free at the start of public APIs. Also, maybe we should add this warning in some Getting Started tutorial for lucene? > Deadlock with MMapDirectory while waitForMerges > --- > > Key: LUCENE-10583 > URL: https://issues.apache.org/jira/browse/LUCENE-10583 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 8.11.1 > Environment: Java 17 > OS: Windows 2016 >Reporter: Thomas Hoffmann >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Hello, > a deadlock situation happened in our application. We are using MMapDirectory > on Windows 2016 and got the following stacktrace: > {code:java} > "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms > "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms > elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait() > [0x413fc000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(java.base@17.0.2/Native Method) > - waiting on > at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983) > - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter) > at > org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697) > - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter) > at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236) > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278) > at > com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723) > - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory) > at > com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142) > ...{code} > All threads were waiting to lock <0x0006d5c00208> which got never > released. > A lucene thread was also blocked, I dont know if this is relevant: > {code:java} > "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms > elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry > [0x5da9e000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346) > - waiting to lock <0x0006d5c00208> (a > org.apache.lucene.store.MMapDirectory) > at > org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363) > at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130) > at > org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141) > at > org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) > at > org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code} > If looks like the merge operation never finished and released the lock. > Is there any option to prevent this deadlock or how to investigate it further? > A load-test didn't show this problem unf
[GitHub] [lucene] JoeHF opened a new pull request, #965: LUCENE-10618: Implement BooleanQuery rewrite rules based for minimumShouldMatch
JoeHF opened a new pull request, #965: URL: https://github.com/apache/lucene/pull/965 ### Description (or a Jira issue link if you have one) Detailed discussion see: https://issues.apache.org/jira/browse/LUCENE-10618 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10557: --- Description: A few (not the majority) Apache projects already use the GitHub issue instead of Jira. For example, Airflow: [https://github.com/apache/airflow/issues] BookKeeper: [https://github.com/apache/bookkeeper/issues] So I think it'd be technically possible that we move to GitHub issue. I have little knowledge of how to proceed with it, I'd like to discuss whether we should migrate to it, and if so, how to smoothly handle the migration. The major tasks would be: * (/) Get a consensus about the migration among committers * (/) Choose issues that should be moved to GitHub ** Discussion thread [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] ** Conclusion for now: We don't migrate any issues. Only new issues should be opened on GitHub. * Build the convention for issue label/milestone management ** Do some experiments on a sandbox repository [https://github.com/mocobeta/sandbox-lucene-10557] ** Make documentation for metadata (label/milestone) management * Enable Github issue on the lucene's repository ** Raise an issue on INFRA ** (Create an issue-only private repository for sensitive issues if it's needed and allowed) ** Set a mail hook to issues@lucene.apache.org * Set a schedule for migration ** Give some time to committers to play around with issues/labels/milestones before the actual migration ** Make an announcement on the mail lists ** Show some text messages when opening a new Jira issue (in issue template?) was: A few (not the majority) Apache projects already use the GitHub issue instead of Jira. For example, Airflow: [https://github.com/apache/airflow/issues] BookKeeper: [https://github.com/apache/bookkeeper/issues] So I think it'd be technically possible that we move to GitHub issue. I have little knowledge of how to proceed with it, I'd like to discuss whether we should migrate to it, and if so, how to smoothly handle the migration. The major tasks would be: * Get a consensus about the migration among committers * Enable Github issue on the lucene's repository (currently, it is disabled on it) * Build the convention or rules for issue label/milestone management * Choose issues that should be moved to GitHub (I think too old or obsolete issues can remain Jira.) > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub. > * Build the convention for issue label/milestone management > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to issues@lucene.apache.org > * Set a schedule for migration > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10557: --- Description: A few (not the majority) Apache projects already use the GitHub issue instead of Jira. For example, Airflow: [https://github.com/apache/airflow/issues] BookKeeper: [https://github.com/apache/bookkeeper/issues] So I think it'd be technically possible that we move to GitHub issue. I have little knowledge of how to proceed with it, I'd like to discuss whether we should migrate to it, and if so, how to smoothly handle the migration. The major tasks would be: * (/) Get a consensus about the migration among committers * (/) Choose issues that should be moved to GitHub ** Discussion thread [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] ** Conclusion for now: We don't migrate any issues. Only new issues should be opened on GitHub. * Build the convention for issue label/milestone management ** Do some experiments on a sandbox repository [https://github.com/mocobeta/sandbox-lucene-10557] ** Make documentation for metadata (label/milestone) management * Enable Github issue on the lucene's repository ** Raise an issue on INFRA ** (Create an issue-only private repository for sensitive issues if it's needed and allowed) ** Set a mail hook to [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to the general mail group name) * Set a schedule for migration ** Give some time to committers to play around with issues/labels/milestones before the actual migration ** Make an announcement on the mail lists ** Show some text messages when opening a new Jira issue (in issue template?) was: A few (not the majority) Apache projects already use the GitHub issue instead of Jira. For example, Airflow: [https://github.com/apache/airflow/issues] BookKeeper: [https://github.com/apache/bookkeeper/issues] So I think it'd be technically possible that we move to GitHub issue. I have little knowledge of how to proceed with it, I'd like to discuss whether we should migrate to it, and if so, how to smoothly handle the migration. The major tasks would be: * (/) Get a consensus about the migration among committers * (/) Choose issues that should be moved to GitHub ** Discussion thread [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] ** Conclusion for now: We don't migrate any issues. Only new issues should be opened on GitHub. * Build the convention for issue label/milestone management ** Do some experiments on a sandbox repository [https://github.com/mocobeta/sandbox-lucene-10557] ** Make documentation for metadata (label/milestone) management * Enable Github issue on the lucene's repository ** Raise an issue on INFRA ** (Create an issue-only private repository for sensitive issues if it's needed and allowed) ** Set a mail hook to issues@lucene.apache.org * Set a schedule for migration ** Give some time to committers to play around with issues/labels/milestones before the actual migration ** Make an announcement on the mail lists ** Show some text messages when opening a new Jira issue (in issue template?) > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub. > * Build the convention for issue label/milestone management > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > *
[GitHub] [lucene] Yuti-G commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets
Yuti-G commented on code in PR #914: URL: https://github.com/apache/lucene/pull/914#discussion_r899773538 ## lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java: ## @@ -346,6 +346,43 @@ private void increment(long value) { } } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { +if (dim.equals(field) == false) { + throw new IllegalArgumentException( + "invalid dim \"" + dim + "\"; should be \"" + field + "\""); +} +if (path.length != 0) { + throw new IllegalArgumentException("path.length should be 0"); +} + +List labelValues = new ArrayList<>(); +boolean countsAdded = false; +if (hashCounts.size() != 0) { + for (LongIntCursor c : hashCounts) { +int count = c.value; +if (count != 0) { + if (countsAdded == false && c.key >= counts.length) { +countsAdded = true; +appendCounts(labelValues); + } + labelValues.add(new LabelAndValue(Long.toString(c.key), count)); +} + } +} + +if (countsAdded == false) { + appendCounts(labelValues); +} + +return new FacetResult( +field, +new String[0], +totCount, +labelValues.toArray(new LabelAndValue[0]), +labelValues.size()); Review Comment: Thank you so much for providing a simplified logic here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org