[GitHub] [lucene] jpountz commented on pull request #954: LUCENE-10603: Change iteration methodology for SSDV ordinals in the f…
jpountz commented on PR #954: URL: https://github.com/apache/lucene/pull/954#issuecomment-1154822015 This is exactly the testing that I had in mind, thanks for running these benchmarks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #950: LUCENE-10608: Implement Weight#count on pure conjunctions.
jpountz merged PR #950: URL: https://github.com/apache/lucene/pull/950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10608) Implement Weight#count for pure conjunctions
[ https://issues.apache.org/jira/browse/LUCENE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553949#comment-17553949 ] ASF subversion and git services commented on LUCENE-10608: -- Commit 83461601adb08ff410c32a870cb0381b6b0857f2 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=83461601adb ] LUCENE-10608: Implement Weight#count on pure conjunctions. (#950) > Implement Weight#count for pure conjunctions > > > Key: LUCENE-10608 > URL: https://issues.apache.org/jira/browse/LUCENE-10608 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > It's common for Elasticsearch to ingest time-based data where newer segments > contain recent data and older segments contain older data. On such indices, > it's common for range queries on the time field to match either all of or > none of the documents in the segment. > We could implement Weight#count on pure conjunctions to take advantage of > this by either returning 0 if any of the clauses has a match count of 0, or > the count of the only clause that doesn't have a match count that is equal to > maxDoc. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553953#comment-17553953 ] Tomoko Uchida commented on LUCENE-10557: Vote thread: [https://lists.apache.org/thread/124nfzjmz2vqtw7kl6xohd2jct57m6tr] Vote count: [https://docs.google.com/spreadsheets/d/1MnRO-Kfbglj00liFDqaboyAvseI19_jWj5QwxuLiXWE/edit?usp=sharing] > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10608) Implement Weight#count for pure conjunctions
[ https://issues.apache.org/jira/browse/LUCENE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553970#comment-17553970 ] ASF subversion and git services commented on LUCENE-10608: -- Commit 4da1a16835d36b322bbd359e5ddc21f71c4fe3aa in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4da1a16835d ] LUCENE-10608: Implement Weight#count on pure conjunctions. (#950) > Implement Weight#count for pure conjunctions > > > Key: LUCENE-10608 > URL: https://issues.apache.org/jira/browse/LUCENE-10608 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > It's common for Elasticsearch to ingest time-based data where newer segments > contain recent data and older segments contain older data. On such indices, > it's common for range queries on the time field to match either all of or > none of the documents in the segment. > We could implement Weight#count on pure conjunctions to take advantage of > this by either returning 0 if any of the clauses has a match count of 0, or > the count of the only clause that doesn't have a match count that is equal to > maxDoc. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kaivalnp opened a new pull request, #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher
kaivalnp opened a new pull request, #958: URL: https://github.com/apache/lucene/pull/958 ### Description Link to [Jira](https://issues.apache.org/jira/browse/LUCENE-10611) The HNSW graph search does not consider that visitedLimit may be reached in the upper levels of graph search itself This occurs when the pre-filter is too restrictive (and its count sets the visitedLimit). So instead of switching over to exactSearch, it tries to [pop from an empty heap](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90) and throws an error ### Solution We can check if results are incomplete after searching in upper levels, and break out accordingly. This way it won't throw heap errors, and gracefully switch to exactSearch instead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10611) KnnVectorQuery throwing Heap Error for Restrictive Filters
[ https://issues.apache.org/jira/browse/LUCENE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553973#comment-17553973 ] Kaival Parikh commented on LUCENE-10611: Thanks, I have added the fix! As for the test, I feel that *testRandomWithFilter* is ideal for this (as it checks switching over to {*}exactSearch{*}, and we should extend it for higher levels as well) If we increase the *numDocs* reasonably high (~2000), we start getting heap errors (as the *visitedLimit* is exhausted in upper levels). With the fix, we can check if it still switches to *exactSearch* Here is the [PR|https://github.com/apache/lucene/pull/958] > KnnVectorQuery throwing Heap Error for Restrictive Filters > -- > > Key: LUCENE-10611 > URL: https://issues.apache.org/jira/browse/LUCENE-10611 > Project: Lucene - Core > Issue Type: Bug >Reporter: Kaival Parikh >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > The HNSW graph search does not consider that visitedLimit may be reached in > the upper levels of graph search itself > This occurs when the pre-filter is too restrictive (and its count sets the > visitedLimit). So instead of switching over to exactSearch, it tries to [pop > from an empty > heap|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90] > and throws an error > > To reproduce this error, we can +increase the numDocs > [here|https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L500] > to 20,000+ (so that nodes have more neighbors, and visitedLimit is reached > faster) > > Stacktrace: > {code:java} > The heap is empty > java.lang.IllegalStateException: The heap is empty > at __randomizedtesting.SeedInfo.seed([D7BC2F56048D9D1A:A1F576DD0E795BBF]:0) > at org.apache.lucene.util.LongHeap.pop(LongHeap.java:111) > at org.apache.lucene.util.hnsw.NeighborQueue.pop(NeighborQueue.java:98) > at > org.apache.lucene.util.hnsw.HnswGraphSearcher.search(HnswGraphSearcher.java:90) > at > org.apache.lucene.codecs.lucene92.Lucene92HnswVectorsReader.search(Lucene92HnswVectorsReader.java:236) > at > org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader.search(PerFieldKnnVectorsFormat.java:272) > at > org.apache.lucene.index.CodecReader.searchNearestVectors(CodecReader.java:235) > at > org.apache.lucene.search.KnnVectorQuery.approximateSearch(KnnVectorQuery.java:159) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #950: LUCENE-10608: Implement Weight#count on pure conjunctions.
jpountz commented on PR #950: URL: https://github.com/apache/lucene/pull/950#issuecomment-1154864922 Thanks @zhaih ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10608) Implement Weight#count for pure conjunctions
[ https://issues.apache.org/jira/browse/LUCENE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10608. --- Fix Version/s: 9.3 Resolution: Fixed > Implement Weight#count for pure conjunctions > > > Key: LUCENE-10608 > URL: https://issues.apache.org/jira/browse/LUCENE-10608 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.3 > > Time Spent: 1h > Remaining Estimate: 0h > > It's common for Elasticsearch to ingest time-based data where newer segments > contain recent data and older segments contain older data. On such indices, > it's common for range queries on the time field to match either all of or > none of the documents in the segment. > We could implement Weight#count on pure conjunctions to take advantage of > this by either returning 0 if any of the clauses has a match count of 0, or > the count of the only clause that doesn't have a match count that is equal to > maxDoc. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #907: LUCENE-10357 Ghost fields and postings/points
jpountz commented on code in PR #907: URL: https://github.com/apache/lucene/pull/907#discussion_r896529395 ## lucene/codecs/src/java/org/apache/lucene/codecs/bloom/BloomFilteringPostingsFormat.java: ## @@ -200,8 +200,8 @@ public Terms terms(String field) throws IOException { return delegateFieldsProducer.terms(field); } else { Terms result = delegateFieldsProducer.terms(field); -if (result == null) { - return null; +if (result == null || result == Terms.EMPTY) { Review Comment: This case is a bit special indeed, but I think we should fix it too to make sure that it only returns a `null` `Terms` instance if the field doesn't exist (fieldInfo == null) or if the field doesn't index terms (indexOptions == NONE). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10600) SortedSetDocValues#docValueCount should be an int, not long
[ https://issues.apache.org/jira/browse/LUCENE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553987#comment-17553987 ] Lu Xugang commented on LUCENE-10600: Hi [~jpountz] ,should we also make SortedSetDocValues#nextOrd() returns int because ssdv's values were represented by termID(int type) in SortedDocValuesWriter. > SortedSetDocValues#docValueCount should be an int, not long > --- > > Key: LUCENE-10600 > URL: https://issues.apache.org/jira/browse/LUCENE-10600 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Lu Xugang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10600) SortedSetDocValues#docValueCount should be an int, not long
[ https://issues.apache.org/jira/browse/LUCENE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553993#comment-17553993 ] Lu Xugang commented on LUCENE-10600: {quote}Is it that the unique count of ordinals will always fit inside an int {quote} Hi [~vigyas], yes, during Indexing phase, ssdv's values were represented as termID and collect non-duplicate termIDs , Detailed implementation could see SortedSetDocValuesWriter#finishCurrentDoc {quote}I guess it stores integer ordinals compressed as PackedLongValues? Should this also be changed to an int {quote} We could first make long to int and then think about what you mentiond. Have you ever start to this work , if not, I would fix it in the next few days. > SortedSetDocValues#docValueCount should be an int, not long > --- > > Key: LUCENE-10600 > URL: https://issues.apache.org/jira/browse/LUCENE-10600 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Lu Xugang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10600) SortedSetDocValues#docValueCount should be an int, not long
[ https://issues.apache.org/jira/browse/LUCENE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554004#comment-17554004 ] Adrien Grand commented on LUCENE-10600: --- bq. should we also make SortedSetDocValues#nextOrd() returns int No, SORTED_SET doc values could have more than Integer.MAX_VALUE unique values overall. SortedSetDocValuesWriter does indeed use ints to represent term IDs, but this class is only used for flushes and flushes have a hard bound of ~2GB per thread so you can't have more than Integer.MAX_VALUE unique terms in a flush. However, the unique count of terms can grow through merges beyond Integer.MAX_VALUE. > SortedSetDocValues#docValueCount should be an int, not long > --- > > Key: LUCENE-10600 > URL: https://issues.apache.org/jira/browse/LUCENE-10600 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Lu Xugang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10616) Moving to dictionaries has made stored fields slower at skipping
Adrien Grand created LUCENE-10616: - Summary: Moving to dictionaries has made stored fields slower at skipping Key: LUCENE-10616 URL: https://issues.apache.org/jira/browse/LUCENE-10616 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand [~ywelsch] has been digging into a regression of stored fields retrieval that is caused by LUCENE-9486. Say your documents have two stored fields, one that is 100B and is stored first, and the other one that is 100kB, and you are only interested in the first one. While the idea behind blocks of stored fields is to store multiple documents in the same block to leverage redundancy across documents, sometimes documents are larger than the block size. As soon as documents are larger than 2x the block size, our stored fields format splits such large documents into multiple blocks, so that you wouldn't need to decompress everything only to retrieve a couple small fields. Before LUCENE-9486, BEST_SPEED had a block size of 16kB, so only retrieving the first field value would only need to decompress 16kB of data. With the move to preset dictionaries in LUCENE-9486 and then LUCENE-9917, we now have blocks of 80kB, so stored fields would now need to decompress 80kB of data, 5x more than before. With dictionaries, our blocks are now split into 10 sub blocks. We happen to eagerly decompress all sub blocks that intersect with the stored document, which is why we would decompress 80kB of data, but this is an implementation detail. It should be possible to decompress these sub blocks lazily so that we would only decompress those that intersect with one of the field values that the user is interested in retrieving? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-10615. -- Resolution: Invalid Please don't use jira for questions like this. We won't be adding unnecessary stuff to NOTICE.txt. Look at the source code files if you want to see the license. > Add license information for SmartChineseAnalyzer to NOTICE.txt > -- > > Key: LUCENE-10615 > URL: https://issues.apache.org/jira/browse/LUCENE-10615 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Jan Dornseifer >Priority: Trivial > > The Lucene NOTICE file contains the statement > The SmartChineseAnalyzer source code (smartcn) was > provided by Xiaoping Gao and copyright 2009 by > [www.imdict.net.|http://www.imdict.net./] > without providing license information. Can this information be supplemented > or is it even outdated? > We are using Apache Lucene v8.4.1. We are currently subject to a license > audit of our software, where also 3rd party FOSS components are checked for > usage. Among other things, this part came to our attention. I would be very > grateful for information. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554039#comment-17554039 ] Dawid Weiss commented on LUCENE-10615: -- I think the reference you're looking for is here: https://github.com/apache/lucene/blob/main/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.java#L44-L45 although these web sites and their associated resources vanish over time. > Add license information for SmartChineseAnalyzer to NOTICE.txt > -- > > Key: LUCENE-10615 > URL: https://issues.apache.org/jira/browse/LUCENE-10615 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Jan Dornseifer >Priority: Trivial > > The Lucene NOTICE file contains the statement > The SmartChineseAnalyzer source code (smartcn) was > provided by Xiaoping Gao and copyright 2009 by > [www.imdict.net.|http://www.imdict.net./] > without providing license information. Can this information be supplemented > or is it even outdated? > We are using Apache Lucene v8.4.1. We are currently subject to a license > audit of our software, where also 3rd party FOSS components are checked for > usage. Among other things, this part came to our attention. I would be very > grateful for information. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554067#comment-17554067 ] Jan Dornseifer commented on LUCENE-10615: - [~rcmuir] thanks for taking the time. I don't want the Lucene team to add unnecessary stuff to NOTICE.txt but in my opinion this is sensitive information. My intent was more of a question, as I cannot verify this information (dead link) and do not know under what terms this contribution was made. Because we, as the users, have to comply. [~dweiss] thanks. Yes, I checked the source code before creating the issue and found this information, too. This is where my thoughts came from, that the other information from the NOTICE file may be outdated. However, this part only refers to the dictionary data. I see, the source code files of the package itself contain the Apache-2.0 header, so my thoughts now are the following: The source code of the org/apache/lucene/analysis/cn/smart package is licensed under Apache-2.0 and (either the code was contributed by Xiaoping Gao and copyright [www.imdict.net|http://www.imdict.net/] or this information from the NOTICE.txt is outdated and the copyright of the code is Apache Software Foundation). Additionally, dictionary data is copyright ICTCLAS and also licensed under Apache-2.0. Can either one or the other be confirmed? Licensing issues are always annoying and I am not a lawyer, but we as users depend on this information being complete to stay out of trouble. Hence my question. > Add license information for SmartChineseAnalyzer to NOTICE.txt > -- > > Key: LUCENE-10615 > URL: https://issues.apache.org/jira/browse/LUCENE-10615 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Jan Dornseifer >Priority: Trivial > > The Lucene NOTICE file contains the statement > The SmartChineseAnalyzer source code (smartcn) was > provided by Xiaoping Gao and copyright 2009 by > [www.imdict.net.|http://www.imdict.net./] > without providing license information. Can this information be supplemented > or is it even outdated? > We are using Apache Lucene v8.4.1. We are currently subject to a license > audit of our software, where also 3rd party FOSS components are checked for > usage. Among other things, this part came to our attention. I would be very > grateful for information. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554070#comment-17554070 ] Robert Muir commented on LUCENE-10615: -- > Please don't use jira for questions like this. > Add license information for SmartChineseAnalyzer to NOTICE.txt > -- > > Key: LUCENE-10615 > URL: https://issues.apache.org/jira/browse/LUCENE-10615 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Jan Dornseifer >Priority: Trivial > > The Lucene NOTICE file contains the statement > The SmartChineseAnalyzer source code (smartcn) was > provided by Xiaoping Gao and copyright 2009 by > [www.imdict.net.|http://www.imdict.net./] > without providing license information. Can this information be supplemented > or is it even outdated? > We are using Apache Lucene v8.4.1. We are currently subject to a license > audit of our software, where also 3rd party FOSS components are checked for > usage. Among other things, this part came to our attention. I would be very > grateful for information. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554072#comment-17554072 ] Jan Dornseifer commented on LUCENE-10615: - Where should such questions be asked instead? > Add license information for SmartChineseAnalyzer to NOTICE.txt > -- > > Key: LUCENE-10615 > URL: https://issues.apache.org/jira/browse/LUCENE-10615 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Jan Dornseifer >Priority: Trivial > > The Lucene NOTICE file contains the statement > The SmartChineseAnalyzer source code (smartcn) was > provided by Xiaoping Gao and copyright 2009 by > [www.imdict.net.|http://www.imdict.net./] > without providing license information. Can this information be supplemented > or is it even outdated? > We are using Apache Lucene v8.4.1. We are currently subject to a license > audit of our software, where also 3rd party FOSS components are checked for > usage. Among other things, this part came to our attention. I would be very > grateful for information. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554077#comment-17554077 ] Tomoko Uchida commented on LUCENE-10615: Hi, looks like the webpage (dead link) was moved to [http://ictclas.nlpir.org/index_e.html]. Also, you can find the original license file (ALv2) in this repository [https://github.com/NLPIR-team/nlpir-analysis-cn-ictclas] (this can be reached from the above website). I don't think we can help any further - the licensing of language models or dictionaries is sometimes very complicated and difficult to decouple with the source code... if you need more help please contact the site owner. > Add license information for SmartChineseAnalyzer to NOTICE.txt > -- > > Key: LUCENE-10615 > URL: https://issues.apache.org/jira/browse/LUCENE-10615 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Jan Dornseifer >Priority: Trivial > > The Lucene NOTICE file contains the statement > The SmartChineseAnalyzer source code (smartcn) was > provided by Xiaoping Gao and copyright 2009 by > [www.imdict.net.|http://www.imdict.net./] > without providing license information. Can this information be supplemented > or is it even outdated? > We are using Apache Lucene v8.4.1. We are currently subject to a license > audit of our software, where also 3rd party FOSS components are checked for > usage. Among other things, this part came to our attention. I would be very > grateful for information. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10612) Add parameters for HNSW codec in Lucene93Codec
[ https://issues.apache.org/jira/browse/LUCENE-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553593#comment-17553593 ] Elia Porciani edited comment on LUCENE-10612 at 6/14/22 12:32 PM: -- However, I understand the concern about backward compatibility. I don't think at this time is harmful to have custom HNSW parameters but things might be different in future releases. Even if we decide not to move forward, I have created this PR for making the proposal clearer: [https://github.com/apache/lucene/pull/955|https://github.com/apache/lucene/pull/955] was (Author: JIRAUSER280197): However, I understand the concern about backward compatibility. I don't think at this time is harmful to have custom HNSW parameters but things might be different in future releases. Even if we decide not to move forward, I have created this PR for making the proposal clearer: [https://github.com/apache/lucene/pull/955|https://github.com/apache/lucene/pull/955.] > Add parameters for HNSW codec in Lucene93Codec > -- > > Key: LUCENE-10612 > URL: https://issues.apache.org/jira/browse/LUCENE-10612 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs >Reporter: Elia Porciani >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently, it is possible to specify only the compression mode for stored > fields in the LuceneXXCodec constructors. > With the introduction of HNSW graph, and the LuceneXXHnswCodecFormat, > LuceneXXCodec should provide an easy way to specify custom parameters for > HNSW graph layout: > * maxConn > * beamWidth -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10615) Add license information for SmartChineseAnalyzer to NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554092#comment-17554092 ] Jan Dornseifer commented on LUCENE-10615: - [~tomoko] thanks for providing this information. I will update the license information in our distribution of Apache Lucene. > Add license information for SmartChineseAnalyzer to NOTICE.txt > -- > > Key: LUCENE-10615 > URL: https://issues.apache.org/jira/browse/LUCENE-10615 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Jan Dornseifer >Priority: Trivial > > The Lucene NOTICE file contains the statement > The SmartChineseAnalyzer source code (smartcn) was > provided by Xiaoping Gao and copyright 2009 by > [www.imdict.net.|http://www.imdict.net./] > without providing license information. Can this information be supplemented > or is it even outdated? > We are using Apache Lucene v8.4.1. We are currently subject to a license > audit of our software, where also 3rd party FOSS components are checked for > usage. Among other things, this part came to our attention. I would be very > grateful for information. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #927: LUCENE-10151: Adding Timeout Support to IndexSearcher
msokolov commented on code in PR #927: URL: https://github.com/apache/lucene/pull/927#discussion_r896898656 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -766,18 +778,29 @@ protected void search(List leaves, Weight weight, Collector c } BulkScorer scorer = weight.bulkScorer(ctx); if (scorer != null) { -try { - scorer.score(leafCollector, ctx.reader().getLiveDocs()); -} catch ( -@SuppressWarnings("unused") -CollectionTerminatedException e) { - // collection was terminated prematurely - // continue with the following leaf +if (isTimeoutEnabled) { + TimeLimitingBulkScorer timeLimitingBulkScorer = + new TimeLimitingBulkScorer(scorer, queryTimeout); + try { +timeLimitingBulkScorer.score(leafCollector, ctx.reader().getLiveDocs()); + } catch ( + @SuppressWarnings("unused") + TimeLimitingBulkScorer.TimeExceededException e) { +partialResult = true; Review Comment: perhaps we could return the time anyway? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8193) Deprecate LowercaseTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward resolved LUCENE-8193. --- Resolution: Duplicate It is indeed a duplicate, thanks [~asalamon74] > Deprecate LowercaseTokenizer > > > Key: LUCENE-8193 > URL: https://issues.apache.org/jira/browse/LUCENE-8193 > Project: Lucene - Core > Issue Type: Task > Components: modules/analysis >Reporter: Tim Allison >Priority: Minor > > On LUCENE-8186, discussion favored deprecating and eventually removing > LowercaseTokenizer. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #956: Make sure KnnVectorQuery applies search boost
jtibshirani merged PR #956: URL: https://github.com/apache/lucene/pull/956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10612) Add parameters for HNSW codec in Lucene93Codec
[ https://issues.apache.org/jira/browse/LUCENE-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554232#comment-17554232 ] Michael Sokolov commented on LUCENE-10612: -- > Actually, the change I'm proposing is to make it possible to specify the > parameters for HNSM without the need to know which HNWS codec is used > underlying. I think the idea is that we might choose in the future to use a different nearest-neighbor algorithm that would not support the same configuration parameters as HNSW. The public-facing API is deliberately not specific to HNSW. > Add parameters for HNSW codec in Lucene93Codec > -- > > Key: LUCENE-10612 > URL: https://issues.apache.org/jira/browse/LUCENE-10612 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs >Reporter: Elia Porciani >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently, it is possible to specify only the compression mode for stored > fields in the LuceneXXCodec constructors. > With the introduction of HNSW graph, and the LuceneXXHnswCodecFormat, > LuceneXXCodec should provide an easy way to specify custom parameters for > HNSW graph layout: > * maxConn > * beamWidth -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
gsmiller commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897216959 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/ExactFacetSetMatcher.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +/** + * A {@link FacetSetMatcher} which considers a set as a match only if all dimension values are equal + * to the given one. + * + * @lucene.experimental + */ +public class ExactFacetSetMatcher extends FacetSetMatcher { Review Comment: Should we include `Long` as part of the naming scheme for this (and `RangeFacetSetMatcher`) to note that it expects long points? I imagine we may want to create a "double" version of this in the future as well. Since we have different point types (`LongPoint`, `DoublePoint`, `IntPoint`, `FloatPoint`), we might need corresponding versions of these matchers for all of them right? ## lucene/facet/src/java/org/apache/lucene/facet/facetset/RangeFacetSetMatcher.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.util.Arrays; + +/** + * A {@link FacetSetMatcher} which considers a set as a match if all dimensions fall within the + * given corresponding range. + * + * @lucene.experimental + */ +public class RangeFacetSetMatcher extends FacetSetMatcher { + + private final long[] lowerRanges; + private final long[] upperRanges; + + /** + * Constructs and instance to match facet sets with dimensions that fall within the given ranges. Review Comment: typo: "an instance" not "and instance" ## lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.Facets; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.LabelAndValue; +import org.apache.lucene.index.BinaryDocValues; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.search.ConjunctionUtils; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.util.BytesRef; + +/** + * Returns the counts for each given {@link FacetSet} + * + * @lucene.experimental + */ +public class MatchingFacetSetsCounts extends Facets { + + private final FacetSetMatcher[] facetSetMatchers; + private fi
[GitHub] [lucene] javanna opened a new pull request, #959: LUCENE-10507: Make it more likely to perform concurrent search in tests
javanna opened a new pull request, #959: URL: https://github.com/apache/lucene/pull/959 I took a stab at this, these are the changes that I made: 1) Replace default useThreads value: rarely() -> randomBoolean() 2) apply lower slices thresholds more frequently: randomBoolean() -> frequently 3) lower maxDocsPerSlice and maxSegmentsPerSlice threshold when applied 4) apply lower maxSegments and maxSegmentsPerSlice also when wrapWithAssertions is true Please let me know what you think. Would it be better to rather make one change at a time, or make less aggressive changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on pull request #738: LUCENE-10448: Avoid instant rate write bursts by writing bytes buffer in chunks
vigyasharma commented on PR #738: URL: https://github.com/apache/lucene/pull/738#issuecomment-1155751419 Based on @jpountz's response in [Lucene-10448](https://issues.apache.org/jira/browse/LUCENE-10448), looks like it is unusual for lucene to write byte[] arrays that are longer than the `chunk` size in this PR. Since chunking with pauses in between would add memory pressure by delaying gc on these arrays, and it is probably not an expected normal scenario, I am planning to close this PR. Will wait for a couple days in case there are follow up comments on this. On a related note, since `writeBytes()` is the only API that doesn't pause, it may be useful to add this note in comments or docstring somewhere, perhaps with a reference to the jira comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on PR #841: URL: https://github.com/apache/lucene/pull/841#issuecomment-1155985365 > but the one bigger question I'd like to discuss is how we envision handing different point types? I think there are two sides of supporting additional numeric types: indexing and aggregation. IMO it's still fine if `FSM` handles a `long[]`: indexing `doubles` will be done as `toSortableLong` and reading `int` and `float` into `long` is doable. Therefore on the aggregation side I feel like it's fine to keep the `long[]` matching API. For indexing we just need to convert the values to `byte[]`. We can do that by making `FacetSet` abstract with a `toBytes()` method and the current impl will be changed to `LongFacetSet`. To complement that on the aggregation side we will need to pass a _reader_ which can convert the `BytesRef` to a `long[]`. I'm thinking that the `Int/Float/Long/DoubleFacetSet` impls will do that. As for "mix-and-match" I think this provides a solution too, since the user will be able to implement their own `FacetSet` and convert their, as example, `int, long, long, float` facet set to `byte[]` and decode that back. I'll give it a try to see how it works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897533011 ## lucene/facet/docs/FacetSets.adoc: ## @@ -0,0 +1,90 @@ += FacetSets Overview +:toc: + +This document describes the `FacetSets` capability, which allows to aggregate on multi dimensional values. It starts +with outlining a few example use cases to showcase the motivation for this capability and follows with an API +walk through. + +== Motivation + +[#movie-actors] +=== Movie Actors DB + +Suppose that you want to build a search engine for movie actors which allows you to search for actors by name and see +movie titles they appeared in. You might want to index standard fields such as `actorName`, `genre` and `releaseYear` +which will let you search by the actor's name or see all actors who appeared in movies during 2021. Similarly, you can +index facet fields that will let you aggregate by “Genre” and “Year” so that you can show how many actors appeared in +each year or genre. Few example documents: + +[source] + +{ "name": "Tom Hanks", "genre": ["Comedy", "Drama", …], "year": [1988, 2000,…] } +{ "name": "Harrison Ford", "genre": ["Action", "Adventure", …], "year": [1977, 1981, …] } + + +However, these facet fields do not allow you to show the following aggregation: + +.Number of Actors performing in movies by Genre and Year +[cols="4*"] +|=== +| | 2020 | 2021 | 2022 +| Thriller | 121 | 43 | 97 +| Action | 145 | 52 | 130 +| Adventure | 87 | 21 | 32 +|=== + +The reason is that each “genre” or “releaseYear” facet field is indexed in its own data structure, and therefore if an +actor appeared in a "Thriller" movie in "2020" and "Action" movie in "2021", there's no way for you to tell that they +didn't appear in an "Action" movie in "2020". + +[#automotive-parts] +=== Automotive Parts Store + +Say you're building a search engine for an automotive parts store where customers can search for different car parts. +For simplicity let's assume that each item in the catalog contains a searchable “type” field and “car model” it fits +which consists of two separate fields: “manufacturer” and “year”. This lets you search for parts by their type as well +as filter parts that fit only a certain manufacturer or year. Few example documents: + +[source] + +{ + "type": "Wiper Blades V1", + "models": [ +{ "manufaturer": "Ford", "year": 2010 }, +{ "manufacturer": "Chevy", "year": 2011 } + ] +} +{ + "type": "Wiper Blades V2", + "models": [ +{ "manufaturer": "Ford", "year": 2011 }, +{ "manufacturer": "Chevy", "year": 2010 } + ] +} + + +By breaking up the "models" field into its sub-fields "manufacturer" and "year", you can easily aggregate on parts that +fit a certain manufacturer or year. However, if a user would like to aggregate on parts that can fit either a "Ford +2010" or "Chevy 2011", then aggregating on the sub-fields will lead to a wrong count of 2 (in the above example) instead +of 1. + +[#movie-awards] +=== Movie Awards + +To showcase a 3-D multi-dimensional aggregation, lets expand the <> example with awards an actor has +received over the years. For this aggregation we will use four dimensions: Award Type ("Oscar", "Grammy", "Emmy"), +Award Category ("Best Actor", "Best Supporting Actress"), Year and Genre. One interesting aggregation is to show how +many "Best Actor" vs "Best Supporting Actor" awards one has received in the "Oscar" or "Emmy" for each year. Another +aggregation is slicing the number of these awards by Genre over all the years. + +Building on these examples, one might be able to come up with an interesting use case for an N-dimensional aggregation +(where `N > 3`). The higher `N` is, the harder it is to aggregate all the dimensions correctly and efficiently without +`FacetSets`. + +== FacetSets API + +TBD + +== FacetSets Under the Hood + +TBD Review Comment: I intended to do that, just wanted us to finalize the API first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897535013 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.Facets; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.LabelAndValue; +import org.apache.lucene.index.BinaryDocValues; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.search.ConjunctionUtils; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.util.BytesRef; + +/** + * Returns the counts for each given {@link FacetSet} + * + * @lucene.experimental + */ +public class MatchingFacetSetsCounts extends Facets { + + private final FacetSetMatcher[] facetSetMatchers; + private final int[] counts; + private final String field; + private final int totCount; + + /** + * Constructs a new instance of matching facet set counts which calculates the countBytes for each + * given facet set matcher. + */ + public MatchingFacetSetsCounts( + String field, FacetsCollector hits, FacetSetMatcher... facetSetMatchers) throws IOException { +if (facetSetMatchers == null || facetSetMatchers.length == 0) { + throw new IllegalArgumentException("facetSetMatchers cannot be null or empty"); +} +if (areFacetSetMatcherDimensionsInconsistent(facetSetMatchers)) { + throw new IllegalArgumentException("All facet set matchers must be the same dimensionality"); +} +this.field = field; +this.facetSetMatchers = facetSetMatchers; +this.counts = new int[facetSetMatchers.length]; +this.totCount = count(field, hits.getMatchingDocs()); + } + + /** Counts from the provided field. */ + private int count(String field, List matchingDocs) Review Comment: I see your point. I did that mainly to keep fields `final` to denote that are not changing after initialization. I realize there's a "side effect" of populating the counts array in the method which sucks (cause we can't return two values from a method). Is it better though over having all fields `final`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897535157 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.Facets; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.LabelAndValue; +import org.apache.lucene.index.BinaryDocValues; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.search.ConjunctionUtils; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.util.BytesRef; + +/** + * Returns the counts for each given {@link FacetSet} + * + * @lucene.experimental + */ +public class MatchingFacetSetsCounts extends Facets { + + private final FacetSetMatcher[] facetSetMatchers; + private final int[] counts; + private final String field; + private final int totCount; + + /** + * Constructs a new instance of matching facet set counts which calculates the countBytes for each Review Comment: This is an IDEA refactor side-effect, obviously an error :). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang merged pull request #957: LUCENE-10598: (backport) SortedSetDocValues#docValueCount() should be always greater than zero
LuXugang merged PR #957: URL: https://github.com/apache/lucene/pull/957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero
[ https://issues.apache.org/jira/browse/LUCENE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554367#comment-17554367 ] ASF subversion and git services commented on LUCENE-10598: -- Commit 90b5d5383f1ced8d567dc02462ac7632a5e5949d in lucene's branch refs/heads/branch_9x from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=90b5d5383f1 ] LUCENE-10598: (backport) SortedSetDocValues#docValueCount() should be always greater than zero (#957) * LUCENE-10598: SortedSetDocValues#docValueCount() should be always greater than zero (#934) * LUCENE-10598: Use count to record docValueCount similar to SortedNumericDocValues did (#942) * Fix docValueCount() on Lucene70 sorted set doc values. > SortedSetDocValues#docValueCount() should be always greater than zero > - > > Key: LUCENE-10598 > URL: https://issues.apache.org/jira/browse/LUCENE-10598 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This test runs failed. > {code:java} > public void testDocValueCount() throws IOException { > try (Directory d = newDirectory()) { > try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) { > for (int j = 0; j < 1; j++) { > Document doc = new Document(); > doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("field", new BytesRef("b"))); > w.addDocument(doc); > } > } > try (IndexReader reader = DirectoryReader.open(d)) { > assertEquals(1, reader.leaves().size()); > for (LeafReaderContext leaf : reader.leaves()) { > SortedSetDocValues docValues= > leaf.reader().getSortedSetDocValues("field") ; > for (int doc1 = docValues.nextDoc(); doc1 != > DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) { > assert docValues.docValueCount() > 0; > } > } > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897535559 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/RangeFacetSetMatcher.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.util.Arrays; + +/** + * A {@link FacetSetMatcher} which considers a set as a match if all dimensions fall within the + * given corresponding range. + * + * @lucene.experimental + */ +public class RangeFacetSetMatcher extends FacetSetMatcher { + + private final long[] lowerRanges; + private final long[] upperRanges; + + /** + * Constructs and instance to match facet sets with dimensions that fall within the given ranges. Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10600) SortedSetDocValues#docValueCount should be an int, not long
[ https://issues.apache.org/jira/browse/LUCENE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554374#comment-17554374 ] Lu Xugang commented on LUCENE-10600: {quote}but this class is only used for flushes and flushes have a hard bound of ~2GB per thread so you can't have more than Integer.MAX_VALUE unique terms in a flush. However, the unique count of terms can grow through merges beyond Integer.MAX_VALUE{quote} Thanks for the explanation! > SortedSetDocValues#docValueCount should be an int, not long > --- > > Key: LUCENE-10600 > URL: https://issues.apache.org/jira/browse/LUCENE-10600 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Lu Xugang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897538192 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.util.Arrays; +import org.apache.lucene.document.BinaryDocValuesField; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.util.BytesRef; + +/** + * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet facet sets}. The encoding + * scheme consists of a packed {@code long[]} where the first value denotes the number of dimensions + * in all the sets, followed by each set's values. + * + * @lucene.experimental + */ +public class FacetSetsField extends BinaryDocValuesField { + + /** + * Create a new FacetSets field. + * + * @param name field name + * @param facetSets the {@link FacetSet} to index in that field. All must have the same number of + * dimensions + * @throws IllegalArgumentException if the field name is null or the given facet sets are invalid + */ + public static FacetSetsField create(String name, FacetSet... facetSets) { +validateFacetSets(facetSets); + +return new FacetSetsField(name, toPackedLongs(facetSets)); + } + + private FacetSetsField(String name, BytesRef value) { +super(name, value); + } + + private static void validateFacetSets(FacetSet... facetSets) { Review Comment: Good idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897537151 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.util.Arrays; +import org.apache.lucene.document.BinaryDocValuesField; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.util.BytesRef; + +/** + * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet facet sets}. The encoding + * scheme consists of a packed {@code long[]} where the first value denotes the number of dimensions + * in all the sets, followed by each set's values. + * + * @lucene.experimental + */ +public class FacetSetsField extends BinaryDocValuesField { + + /** + * Create a new FacetSets field. + * + * @param name field name + * @param facetSets the {@link FacetSet} to index in that field. All must have the same number of + * dimensions + * @throws IllegalArgumentException if the field name is null or the given facet sets are invalid + */ + public static FacetSetsField create(String name, FacetSet... facetSets) { +validateFacetSets(facetSets); + +return new FacetSetsField(name, toPackedLongs(facetSets)); + } + + private FacetSetsField(String name, BytesRef value) { +super(name, value); + } + + private static void validateFacetSets(FacetSet... facetSets) { +if (facetSets == null || facetSets.length == 0) { + throw new IllegalArgumentException("FacetSets cannot be null or empty!"); +} + +int dims = facetSets[0].values.length; +if (!Arrays.stream(facetSets).allMatch(facetSet -> facetSet.values.length == dims)) { Review Comment: Wasn't aware of this preference in the code base, will change to `noneMatch` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897537151 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.util.Arrays; +import org.apache.lucene.document.BinaryDocValuesField; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.util.BytesRef; + +/** + * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet facet sets}. The encoding + * scheme consists of a packed {@code long[]} where the first value denotes the number of dimensions + * in all the sets, followed by each set's values. + * + * @lucene.experimental + */ +public class FacetSetsField extends BinaryDocValuesField { + + /** + * Create a new FacetSets field. + * + * @param name field name + * @param facetSets the {@link FacetSet} to index in that field. All must have the same number of + * dimensions + * @throws IllegalArgumentException if the field name is null or the given facet sets are invalid + */ + public static FacetSetsField create(String name, FacetSet... facetSets) { +validateFacetSets(facetSets); + +return new FacetSetsField(name, toPackedLongs(facetSets)); + } + + private FacetSetsField(String name, BytesRef value) { +super(name, value); + } + + private static void validateFacetSets(FacetSet... facetSets) { +if (facetSets == null || facetSets.length == 0) { + throw new IllegalArgumentException("FacetSets cannot be null or empty!"); +} + +int dims = facetSets[0].values.length; +if (!Arrays.stream(facetSets).allMatch(facetSet -> facetSet.values.length == dims)) { Review Comment: Wasn't aware of this preference in the code base, will change to `anyMatch` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r897535976 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/ExactFacetSetMatcher.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +/** + * A {@link FacetSetMatcher} which considers a set as a match only if all dimension values are equal + * to the given one. + * + * @lucene.experimental + */ +public class ExactFacetSetMatcher extends FacetSetMatcher { + + private final long[] values; + + /** Constructs an instance to match the given facet set. */ + public ExactFacetSetMatcher(String label, FacetSet facetSet) { +super(label, facetSet.values.length); +this.values = facetSet.values; + } + + @Override + public boolean matches(long[] dimValues) { +assert dimValues.length == dims +: "Encoded dimensions (dims=" ++ dimValues.length ++ ") is incompatible with FacetSet dimensions (dims=" ++ dims ++ ")"; + +for (int i = 0; i < dimValues.length; i++) { + if (dimValues[i] != values[i]) { +// Field's dimension value is not equal to given dimension, the entire set is rejected +return false; + } +} +return true; Review Comment: I thought we want to avoid calling other methods from such hot code, but yeah, `Arrays.equals` may even be more optimal. 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #922: Index only the docs for FacetField posting list
gautamworah96 commented on code in PR #922: URL: https://github.com/apache/lucene/pull/922#discussion_r897570270 ## lucene/facet/src/java/org/apache/lucene/facet/FacetField.java: ## @@ -30,14 +30,12 @@ */ public class FacetField extends Field { - /** Field type used for storing facet values: docs, freqs, and positions. */ + /** + * Field type used for storing facet values. Actual field type used for indexing is determined in + * {@link FacetsConfig#build(TaxonomyWriter, Document)} + */ public static final FieldType TYPE = new FieldType(); - static { Review Comment: Yeah, tbh, I was debating whether this change was even needed or no. Lets keep it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on pull request #922: Index only the docs for FacetField posting list
gautamworah96 commented on PR #922: URL: https://github.com/apache/lucene/pull/922#issuecomment-1156032542 > I think we might be doing the right thing already? If you look at StringField, we are setting: setIndexOptions(IndexOptions.DOCS) Yes, that is indeed the case. Thanks for taking a look at this change @gsmiller. I had committed my changes so as to not lose them with time. I'll explicitly request a review through the UI the next time I commit my partial work :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10617) Investigate recent Jenkins build failures in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
Gautam Worah created LUCENE-10617: - Summary: Investigate recent Jenkins build failures in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler Key: LUCENE-10617 URL: https://issues.apache.org/jira/browse/LUCENE-10617 Project: Lucene - Core Issue Type: Bug Reporter: Gautam Worah Sample failures: [https://jenkins.thetaphi.de/job/Lucene-9.x-MacOSX/692/, https://jenkins.thetaphi.de/job/Lucene-main-MacOSX/8177/|https://jenkins.thetaphi.de/job/Lucene-9.x-MacOSX/692/] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org