[GitHub] [lucene] wjp719 commented on a diff in pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
wjp719 commented on code in PR #780: URL: https://github.com/apache/lucene/pull/780#discussion_r842427168 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -94,6 +95,7 @@ public void disableSkipping() { private long iteratorCost; private int maxDocVisited = -1; private int updateCounter = 0; +private boolean disableSegmentInternalSkip = false; Review Comment: Ok, we should keep skipping, it help more than the skipping overhead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
jpountz commented on code in PR #780: URL: https://github.com/apache/lucene/pull/780#discussion_r842434529 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -94,6 +95,7 @@ public void disableSkipping() { private long iteratorCost; private int maxDocVisited = -1; private int updateCounter = 0; +private boolean disableSegmentInternalSkip = false; Review Comment: OK, then my suggestion would be to close this PR if that works for you too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 closed pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
wjp719 closed pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction URL: https://github.com/apache/lucene/pull/780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 commented on a diff in pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
wjp719 commented on code in PR #780: URL: https://github.com/apache/lucene/pull/780#discussion_r842438315 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -94,6 +95,7 @@ public void disableSkipping() { private long iteratorCost; private int maxDocVisited = -1; private int updateCounter = 0; +private boolean disableSegmentInternalSkip = false; Review Comment: I will close this pr. thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery
jpountz merged PR #731: URL: https://github.com/apache/lucene/pull/731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10456) Implement Weight#count for MultiRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517272#comment-17517272 ] ASF subversion and git services commented on LUCENE-10456: -- Commit 898ec1659d07d4ea953b8eaf6332d1a036b151e4 in lucene's branch refs/heads/main from xiaoping [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=898ec1659d0 ] LUCENE-10456: Implement Weight#count for MultiRangeQuery (#731) > Implement Weight#count for MultiRangeQuery > -- > > Key: LUCENE-10456 > URL: https://issues.apache.org/jira/browse/LUCENE-10456 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > for one dimension MultiRangeQuery, we can firstly merge overlapping ranges > to have some unconnected ranges, then the doc count of this multiRangeQuery > is the sum of each doc count of these unconnected range. For each unconnected > range, we can take advantage of PointRangeQuery count() method. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10456) Implement Weight#count for MultiRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10456. --- Fix Version/s: 9.2 Resolution: Fixed > Implement Weight#count for MultiRangeQuery > -- > > Key: LUCENE-10456 > URL: https://issues.apache.org/jira/browse/LUCENE-10456 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > Fix For: 9.2 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > for one dimension MultiRangeQuery, we can firstly merge overlapping ranges > to have some unconnected ranges, then the doc count of this multiRangeQuery > is the sum of each doc count of these unconnected range. For each unconnected > range, we can take advantage of PointRangeQuery count() method. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] pquentin opened a new pull request, #784: LUCENE-10085: Fix flaky test by always deleting docs
pquentin opened a new pull request, #784: URL: https://github.com/apache/lucene/pull/784 As noticed by @jtibshirani in https://issues.apache.org/jira/browse/LUCENE-10085?focusedCommentId=17516122&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17516122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a diff in pull request #767: LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery
zacharymorn commented on code in PR #767: URL: https://github.com/apache/lucene/pull/767#discussion_r842463582 ## lucene/core/src/test/org/apache/lucene/search/TestFieldExistsQuery.java: ## @@ -65,20 +65,21 @@ public void testDocValuesRewriteWithTermsPresent() throws IOException { dir.close(); } - public void testDocValuesRewriteWithPointValuesPresent() throws IOException { + public void testDocValuesRewriteWithDocValuesPresent() throws IOException { Directory dir = newDirectory(); RandomIndexWriter iw = new RandomIndexWriter(random(), dir); final int numDocs = atLeast(100); for (int i = 0; i < numDocs; ++i) { Document doc = new Document(); - doc.add(new BinaryPoint("dim", new byte[4], new byte[4])); + doc.add(new DoubleDocValuesField("f", 2.0)); + doc.add(new StringField("f", random().nextBoolean() ? "yes" : "no", Store.NO)); Review Comment: Updated in https://github.com/apache/lucene/pull/767/commits/61fedf349fd56d0a49cbb076bdbc7f13919d30f1 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #767: LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery
zacharymorn commented on PR #767: URL: https://github.com/apache/lucene/pull/767#issuecomment-1088372398 > Thank you! No problem, thanks @jpountz for all the reviews and suggestions as well! I'll open a follow-up PR after merging this as you suggested to remove the deprecated classes for `main`. @mikemccand, please let me know if you have further feedback for this PR as well. I plan to merge this in the new few days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10456) Implement Weight#count for MultiRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517290#comment-17517290 ] ASF subversion and git services commented on LUCENE-10456: -- Commit c4da9adcfeb35426d056d1b2df20d42e6d7fa127 in lucene's branch refs/heads/branch_9x from xiaoping [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c4da9adcfeb ] LUCENE-10456: Implement Weight#count for MultiRangeQuery (#731) > Implement Weight#count for MultiRangeQuery > -- > > Key: LUCENE-10456 > URL: https://issues.apache.org/jira/browse/LUCENE-10456 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: jianping weng >Priority: Major > Fix For: 9.2 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > for one dimension MultiRangeQuery, we can firstly merge overlapping ranges > to have some unconnected ranges, then the doc count of this multiRangeQuery > is the sum of each doc count of these unconnected range. For each unconnected > range, we can take advantage of PointRangeQuery count() method. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #761: LUCENE-10002: replace TopFieldCollector usages in tests with collector manager
jpountz merged PR #761: URL: https://github.com/apache/lucene/pull/761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request, #785: LUCENE-10002: move MemoryIndex to search(Query, CollectorManager)
javanna opened a new pull request, #785: URL: https://github.com/apache/lucene/pull/785 MemoryIndex exposes a search method that returns the score of the matching doc if it does match, otherwise 0.0f. It internally uses a custom collector and calls IndexSearcher#search(Query, Collector). We can easily replace that with a collector manager, that does not need to handle concurrency as the memory index will only ever hold a single document. - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517325#comment-17517325 ] Balmukund Mandal commented on LUCENE-9625: -- [~sokolov] As you've mentioned that jars should be copied but not very clear that which all Lucene jars we've to copy? > Benchmark KNN search with ann-benchmarks > > > Key: LUCENE-9625 > URL: https://issues.apache.org/jira/browse/LUCENE-9625 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In addition to benchmarking with luceneutil, it would be good to be able to > make use of ann-benchmarks, which is publishing results from many approximate > knn algorithms, including the hnsw implementation from its authors. We don't > expect to challenge the performance of these native code libraries, however > it would be good to know just how far off we are. > I started looking into this and posted a fork of ann-benchmarks that uses > KnnGraphTester class to run these: > https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to > manually copy jars and the KnnGraphTester.class to the test host machine > rather than downloading from a distribution. KnnGraphTester needs some > modifications in order to support this process - this issue is mostly about > that. > One thing I noticed is that some of the index builds with higher fanout > (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is > concerning and I'll open a separate issue for trying to improve that. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
LuXugang commented on PR #649: URL: https://github.com/apache/lucene/pull/649#issuecomment-1088480302 > I wonder if it would be a better trade-off to keep ints uncompressed, but read them from disk directly instead of loading giant arrays in memory? Or possibly switch to something like DirectMonotonicReader if it doesn't slow down searches. @jpountz, could we use IndexedDISI to store docIds and DirectMonotonicWriter to store `ordToDoc` mapping like doc values did. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10499) eliminate unnecessary copy data overhead when grow array size
jianping weng created LUCENE-10499: -- Summary: eliminate unnecessary copy data overhead when grow array size Key: LUCENE-10499 URL: https://issues.apache.org/jira/browse/LUCENE-10499 Project: Lucene - Core Issue Type: Improvement Reporter: jianping weng ArrayUtil#grow will copy origin array data to the new array after growing size in default, but in many places, the new array needn't the origin data. we should eliminate the extra copy data overhead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 opened a new pull request, #786: LUCENE-10499: eliminate unnecessary copy data overhead when growing array size
wjp719 opened a new pull request, #786: URL: https://github.com/apache/lucene/pull/786 ArrayUtil#grow will copy origin array data to the new array after growing size in default, but in many places, the new array needn't the origin data. This PR eliminates the extra copy data overhead. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10499) reduce unnecessary copy data overhead when grow array size
[ https://issues.apache.org/jira/browse/LUCENE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10499: --- Summary: reduce unnecessary copy data overhead when grow array size (was: eliminate unnecessary copy data overhead when grow array size) > reduce unnecessary copy data overhead when grow array size > -- > > Key: LUCENE-10499 > URL: https://issues.apache.org/jira/browse/LUCENE-10499 > Project: Lucene - Core > Issue Type: Improvement >Reporter: jianping weng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > ArrayUtil#grow will copy origin array data to the new array after growing > size in default, but in many places, the new array needn't the origin data. > we should eliminate the extra copy data overhead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10499) reduce unnecessary copy data overhead when grow array size
[ https://issues.apache.org/jira/browse/LUCENE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10499: --- Description: ArrayUtil#grow will copy origin array data to the new array after growing size in default, but in many places, the new array needn't the origin data. we should reduce the extra copy data overhead (was: ArrayUtil#grow will copy origin array data to the new array after growing size in default, but in many places, the new array needn't the origin data. we should eliminate the extra copy data overhead) > reduce unnecessary copy data overhead when grow array size > -- > > Key: LUCENE-10499 > URL: https://issues.apache.org/jira/browse/LUCENE-10499 > Project: Lucene - Core > Issue Type: Improvement >Reporter: jianping weng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > ArrayUtil#grow will copy origin array data to the new array after growing > size in default, but in many places, the new array needn't the origin data. > we should reduce the extra copy data overhead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #786: LUCENE-10499: reduce unnecessary copy data overhead when growing array size
rmuir commented on PR #786: URL: https://github.com/apache/lucene/pull/786#issuecomment-1088581156 Can we instead add a new method to `ArrayUtil`, with a different name than `grow`, that doesn't copy data? I don't think we should add a boolean parameter to grow that gives it radically different behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #783: LUCENE-10497: add a base Token class to analysis-common
mocobeta commented on PR #783: URL: https://github.com/apache/lucene/pull/783#issuecomment-1088582275 I checked that this preserves the current behavior. I'd merge this (only in main) now and will refine it in the following PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta merged pull request #783: LUCENE-10497: add a base Token class to analysis-common
mocobeta merged PR #783: URL: https://github.com/apache/lucene/pull/783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10497) Unify "Token" interface in Kuromoji and Nori
[ https://issues.apache.org/jira/browse/LUCENE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517386#comment-17517386 ] ASF subversion and git services commented on LUCENE-10497: -- Commit bb4a0dc19ba911ac488e3ea7d6fcb3b17a38832f in lucene's branch refs/heads/main from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=bb4a0dc19ba ] LUCENE-10497: Add a base Token class to analysis-common (for kuromoji and nori) (#783) > Unify "Token" interface in Kuromoji and Nori > > > Key: LUCENE-10497 > URL: https://issues.apache.org/jira/browse/LUCENE-10497 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > "Token" classes are the output of the viterbi algorithm in kuromoji and nori, > and its implementation is slightly different between them (Nori's Token class > looks more expressive or extendable). > To proceed [LUCENE-10493], I think we need the unified "Token" class or at > least a common interface for it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 commented on pull request #786: LUCENE-10499: reduce unnecessary copy data overhead when growing array size
wjp719 commented on PR #786: URL: https://github.com/apache/lucene/pull/786#issuecomment-1088618967 > Can we instead add a new method to `ArrayUtil`, with a different name than `grow`, that doesn't copy data? > > I don't think we should add a boolean parameter to grow that gives it radically different behavior. @rmuir Hi, I add ArrayUtil#growSizeOnly method that only grow size and not copy array data, please review again, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
jpountz commented on PR #649: URL: https://github.com/apache/lucene/pull/649#issuecomment-1088641170 Yes, something like that sounds like a good fit to store the ordToDoc mapping indeed :+1: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7522) Make the Lucene jar an OSGi bundle
[ https://issues.apache.org/jira/browse/LUCENE-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517425#comment-17517425 ] Alexander Kurtakov commented on LUCENE-7522: Is there hope for this one? > Make the Lucene jar an OSGi bundle > -- > > Key: LUCENE-7522 > URL: https://issues.apache.org/jira/browse/LUCENE-7522 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Affects Versions: 6.2.1 >Reporter: Michal Hlavac >Priority: Major > > Add support for OSGi. LUCENE-1344 added this feature to previous versions, > but now lucene jars are not OSGi bundles. There are OSGi bundles from > Servicemix, but I think lucene should add this feature. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request, #787: LUCENE-10002: replace more usages of search(Query, Collector) in tests
javanna opened a new pull request, #787: URL: https://github.com/apache/lucene/pull/787 This commit replaces more usages of search(Query, Collector) with calling the corresponding search(Query, CollectorManager) instead. This round focuses on tests that implement custom collector, that need a corresponding collector manager. - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #787: LUCENE-10002: replace more usages of search(Query, Collector) in tests
javanna commented on code in PR #787: URL: https://github.com/apache/lucene/pull/787#discussion_r842783772 ## lucene/core/src/test/org/apache/lucene/search/TestMultiCollector.java: ## @@ -99,7 +100,7 @@ public void testCollectionTerminatedExceptionHandling() throws IOException { } final IndexReader reader = w.getReader(); w.close(); - final IndexSearcher searcher = newSearcher(reader); + final IndexSearcher searcher = newSearcher(reader, true, true, false); Review Comment: In this case, it did not feel necessary to write a collector manager able to handle concurrency. I went for executing the test from a single threaded searcher instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #787: LUCENE-10002: replace more usages of search(Query, Collector) in tests
javanna commented on code in PR #787: URL: https://github.com/apache/lucene/pull/787#discussion_r842785447 ## lucene/core/src/test/org/apache/lucene/search/TestTotalHitCountCollector.java: ## @@ -38,10 +38,10 @@ public void testBasics() throws Exception { IndexReader reader = writer.getReader(); writer.close(); -IndexSearcher searcher = newSearcher(reader); -TotalHitCountCollector c = new TotalHitCountCollector(); -searcher.search(new MatchAllDocsQuery(), c); -assertEquals(5, c.getTotalHits()); +IndexSearcher searcher = newSearcher(reader, true, true, random().nextBoolean()); +TotalHitCountCollectorManager collectorManager = new TotalHitCountCollectorManager(); +int totalHits = searcher.search(new MatchAllDocsQuery(), collectorManager); +assertEquals(5, totalHits); Review Comment: here, instead of having a separate test class for the collector manager, I merged the two and used a collector manager in the test class for the collector. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #787: LUCENE-10002: replace more usages of search(Query, Collector) in tests
javanna commented on code in PR #787: URL: https://github.com/apache/lucene/pull/787#discussion_r842786871 ## lucene/queryparser/src/test/org/apache/lucene/queryparser/surround/query/BooleanQueryTestFacade.java: ## @@ -121,14 +125,24 @@ public void doTest() throws Exception { Query query = lq.makeLuceneQueryField(fieldName, qf); /* if (verbose) System.out.println("Lucene: " + query.toString()); */ -TestCollector tc = new TestCollector(); -IndexReader reader = DirectoryReader.open(dBase.getDb()); -IndexSearcher searcher = new IndexSearcher(reader); -try { - searcher.search(query, tc); -} finally { - reader.close(); +SetOnce collector = new SetOnce<>(); +try (IndexReader reader = DirectoryReader.open(dBase.getDb())) { + IndexSearcher searcher = new IndexSearcher(reader); Review Comment: no executor is set to the searcher, hence it does not seem worthwhile to write a collector manager that handles concurrency. I am even wondering if enforcing single thread through the setonce is worthwhile -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517471#comment-17517471 ] Michael Sokolov commented on LUCENE-9625: - > As you've mentioned that jars should be copied but not very clear that which > all Lucene jars we've to copy? I think lucene-core should be enough. What have you tried? > Benchmark KNN search with ann-benchmarks > > > Key: LUCENE-9625 > URL: https://issues.apache.org/jira/browse/LUCENE-9625 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In addition to benchmarking with luceneutil, it would be good to be able to > make use of ann-benchmarks, which is publishing results from many approximate > knn algorithms, including the hnsw implementation from its authors. We don't > expect to challenge the performance of these native code libraries, however > it would be good to know just how far off we are. > I started looking into this and posted a fork of ann-benchmarks that uses > KnnGraphTester class to run these: > https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to > manually copy jars and the KnnGraphTester.class to the test host machine > rather than downloading from a distribution. KnnGraphTester needs some > modifications in order to support this process - this issue is mostly about > that. > One thing I noticed is that some of the index builds with higher fanout > (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is > concerning and I'll open a separate issue for trying to improve that. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517488#comment-17517488 ] Balmukund Mandal commented on LUCENE-9625: -- Thank you Michael for your response. Even, i tried the same but getting below error. Also, i copied jar and KnnGraphTester.class files into lib and classes folder which are created inside algorithms folder. Error: Unable to initialize main class org.apache.lucene.util.hnsw.KnnGraphTester Caused by: java.lang.NoClassDefFoundError: org/apache/lucene/util/hnsw/KnnGraphTester$1 > Benchmark KNN search with ann-benchmarks > > > Key: LUCENE-9625 > URL: https://issues.apache.org/jira/browse/LUCENE-9625 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In addition to benchmarking with luceneutil, it would be good to be able to > make use of ann-benchmarks, which is publishing results from many approximate > knn algorithms, including the hnsw implementation from its authors. We don't > expect to challenge the performance of these native code libraries, however > it would be good to know just how far off we are. > I started looking into this and posted a fork of ann-benchmarks that uses > KnnGraphTester class to run these: > https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to > manually copy jars and the KnnGraphTester.class to the test host machine > rather than downloading from a distribution. KnnGraphTester needs some > modifications in order to support this process - this issue is mostly about > that. > One thing I noticed is that some of the index builds with higher fanout > (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is > concerning and I'll open a separate issue for trying to improve that. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517495#comment-17517495 ] Michael Sokolov commented on LUCENE-9625: - So, clearly it's not finding KnnGraphTester. Probably it needs to go in the proper folder structure (org/apache/lucene/util/KnnGraphTester.class) relative to the classpath root. TBH I don't remember the way files are set up in this ann-benchmarks environment, but I hope that helps you figure it out! > Benchmark KNN search with ann-benchmarks > > > Key: LUCENE-9625 > URL: https://issues.apache.org/jira/browse/LUCENE-9625 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In addition to benchmarking with luceneutil, it would be good to be able to > make use of ann-benchmarks, which is publishing results from many approximate > knn algorithms, including the hnsw implementation from its authors. We don't > expect to challenge the performance of these native code libraries, however > it would be good to know just how far off we are. > I started looking into this and posted a fork of ann-benchmarks that uses > KnnGraphTester class to run these: > https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to > manually copy jars and the KnnGraphTester.class to the test host machine > rather than downloading from a distribution. KnnGraphTester needs some > modifications in order to support this process - this issue is mostly about > that. > One thing I noticed is that some of the index builds with higher fanout > (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is > concerning and I'll open a separate issue for trying to improve that. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges
iverase commented on PR #756: URL: https://github.com/apache/lucene/pull/756#issuecomment-1088873741 Thanks @yinux! your dataset is very interesting because it seems your geometries have many parallel edges which are challenging the tessellator. In order to fix that I went back to the PR that changed the behaviour of the tessellator. In that PR we added the capability of labelling the edges of the triangles to know if they belong to the original polygon or they have been introduced by the tessellation process. One side effect is that the method `filterPoints` which removes collinear points is more picky now because it can only remove collinear points when they have the same label (belonging to the original polygon or not). In this case the issue was a collinear edge where one of the edges was returning over the previous edge and had a point on top of that edge. We can actually remove that collinearity as well so we end up with only one edge. I added that change in line 1306 and now all test are passing. I run this version over my dataset (over 400 million polygons) and all passed too. Let me know if you find more issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #751: LUCENE-10467: Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
gsmiller merged PR #751: URL: https://github.com/apache/lucene/pull/751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10467) Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
[ https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517542#comment-17517542 ] ASF subversion and git services commented on LUCENE-10467: -- Commit 6b82e600a8f8e2dcfd182bcaf58a3434e9a05b3f in lucene's branch refs/heads/main from Yuting Gan [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6b82e600a8f ] LUCENE-10467: Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0 (#751) > Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0 > -- > > Key: LUCENE-10467 > URL: https://issues.apache.org/jira/browse/LUCENE-10467 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, there are different behaviors from subclass that implements and > overrides getAllDims and getTopChildren when passing in an invalid TopN > parameter (topN <= 0). Some overridden implementations throw a > NullPointerException, some throw an IllegalArgumentException, and others > throw no exception. > It would provide a better user experience by consistently throwing an > IllegalArgumentException when requesting topN <= 0 for these two > functionalities across all implementations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10467) Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
[ https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517545#comment-17517545 ] ASF subversion and git services commented on LUCENE-10467: -- Commit a071180a806d1bb7ae11ae30a07e43e452bea810 in lucene's branch refs/heads/main from Greg Miller [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a071180a806 ] Add CHANGES entry for LUCENE-10467 > Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0 > -- > > Key: LUCENE-10467 > URL: https://issues.apache.org/jira/browse/LUCENE-10467 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, there are different behaviors from subclass that implements and > overrides getAllDims and getTopChildren when passing in an invalid TopN > parameter (topN <= 0). Some overridden implementations throw a > NullPointerException, some throw an IllegalArgumentException, and others > throw no exception. > It would provide a better user experience by consistently throwing an > IllegalArgumentException when requesting topN <= 0 for these two > functionalities across all implementations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10467) Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
[ https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517557#comment-17517557 ] ASF subversion and git services commented on LUCENE-10467: -- Commit d1bd59f6bc6effe04f802e0d9a8f88fc7920f2cd in lucene's branch refs/heads/branch_9x from Yuting Gan [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d1bd59f6bc6 ] LUCENE-10467: Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0 > Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0 > -- > > Key: LUCENE-10467 > URL: https://issues.apache.org/jira/browse/LUCENE-10467 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, there are different behaviors from subclass that implements and > overrides getAllDims and getTopChildren when passing in an invalid TopN > parameter (topN <= 0). Some overridden implementations throw a > NullPointerException, some throw an IllegalArgumentException, and others > throw no exception. > It would provide a better user experience by consistently throwing an > IllegalArgumentException when requesting topN <= 0 for these two > functionalities across all implementations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10467) Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
[ https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller resolved LUCENE-10467. -- Fix Version/s: 9.2 Resolution: Fixed Merged and backported. Thanks [~yutinggan]! > Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0 > -- > > Key: LUCENE-10467 > URL: https://issues.apache.org/jira/browse/LUCENE-10467 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Yuting Gan >Priority: Minor > Fix For: 9.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, there are different behaviors from subclass that implements and > overrides getAllDims and getTopChildren when passing in an invalid TopN > parameter (topN <= 0). Some overridden implementations throw a > NullPointerException, some throw an IllegalArgumentException, and others > throw no exception. > It would provide a better user experience by consistently throwing an > IllegalArgumentException when requesting topN <= 0 for these two > functionalities across all implementations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #778: LUCENE-10495: Fix bug in TaxonomyFacets
gsmiller commented on code in PR #778: URL: https://github.com/apache/lucene/pull/778#discussion_r843051837 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/TaxonomyFacets.java: ## @@ -109,7 +109,7 @@ public boolean childrenLoaded() { * @lucene.experimental */ public boolean siblingsLoaded() { -return children != null; +return siblings != null; Review Comment: Oh, that does look like a silly bug. Can you create a unit test that actually demonstrates the bug and then validates the fix? This looks super obvious but it's a nice opportunity to make our testing a little better to avoid a future regression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #784: LUCENE-10085: Fix flaky test by always deleting docs
jtibshirani merged PR #784: URL: https://github.com/apache/lucene/pull/784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10085) Implement Weight#count on DocValuesFieldExistsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517595#comment-17517595 ] ASF subversion and git services commented on LUCENE-10085: -- Commit 6062ba0b3b80085f07fc66b32a3f32dc32a78acd in lucene's branch refs/heads/main from Quentin Pradet [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6062ba0b3b8 ] LUCENE-10085: Fix rare failure in TestDocValuesFieldExistsQuery (#784) In rare cases, this test could delete all documents and cause a failure. > Implement Weight#count on DocValuesFieldExistsQuery > --- > > Key: LUCENE-10085 > URL: https://issues.apache.org/jira/browse/LUCENE-10085 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Now that we require all documents to use the same features (LUCENE-9334) we > could implement {{Weight#count}} to return docCount if either terms or points > are indexed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10085) Implement Weight#count on DocValuesFieldExistsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517607#comment-17517607 ] ASF subversion and git services commented on LUCENE-10085: -- Commit be1c4850441f7cff07f5ce20c9ed4ad203cabb21 in lucene's branch refs/heads/branch_9x from Quentin Pradet [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=be1c4850441 ] LUCENE-10085: Fix rare failure in TestDocValuesFieldExistsQuery (#784) In rare cases, this test could delete all documents and cause a failure. > Implement Weight#count on DocValuesFieldExistsQuery > --- > > Key: LUCENE-10085 > URL: https://issues.apache.org/jira/browse/LUCENE-10085 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 9.1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Now that we require all documents to use the same features (LUCENE-9334) we > could implement {{Weight#count}} to return docCount if either terms or points > are indexed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10500) StringValueFacetCounts relies on sequential collection
Luca Cavanna created LUCENE-10500: - Summary: StringValueFacetCounts relies on sequential collection Key: LUCENE-10500 URL: https://issues.apache.org/jira/browse/LUCENE-10500 Project: Lucene - Core Issue Type: Bug Reporter: Luca Cavanna We recently moved some of the facets tests to use IndexSearcher#search(Query, CollectorManager) providing a FacetsCollectorManager instead of a FacetsCollector. Whenever newIndexSearcher(IndexReader) is used in tests, concurrent search may now be exercised while it was not before. This caused some build failures on TestStringValueFacetCounts: {code:java} java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at __randomizedtesting.SeedInfo.seed([ED8BF8281FCE5C02:9FC7DD27AEAEEA71]:0) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.util.packed.Packed64.get(Packed64.java:81) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.OrdinalMap$2.get(OrdinalMap.java:346) at org.apache.lucene.facet.StringValueFacetCounts.countOneSegment(StringValueFacetCounts.java:440) at org.apache.lucene.facet.StringValueFacetCounts.count(StringValueFacetCounts.java:295) at org.apache.lucene.facet.StringValueFacetCounts.(StringValueFacetCounts.java:123) at org.apache.lucene.facet.TestStringValueFacetCounts.checkFacetResult(TestStringValueFacetCounts.java:349) at org.apache.lucene.facet.TestStringValueFacetCounts.testRandom(TestStringValueFacetCounts.java:325) {code} This looks like a real bug, as StringValueFacetCounts#countOneSegment is called once providing the index of the current loop instead of the ordinal taken from the matching hits that we are analyzing. That works fine with single threaded collection as we will go sequentially and the two indices will always be the same. With multi-threaded search, the order of the returned matching hits (one per segment) is not deterministic. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request, #788: LUCENE-10500: StringValueFacetCounts to not rely on sequential collection
javanna opened a new pull request, #788: URL: https://github.com/apache/lucene/pull/788 StringValueFacetCounts should use the segment ordinal instead of the current index when looping through the matching hits, as when search is multi-threaded the order of the matching hits (one per segment) is not deterministic. - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #788: LUCENE-10500: StringValueFacetCounts to not rely on sequential collection
jpountz merged PR #788: URL: https://github.com/apache/lucene/pull/788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10500) StringValueFacetCounts relies on sequential collection
[ https://issues.apache.org/jira/browse/LUCENE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517693#comment-17517693 ] ASF subversion and git services commented on LUCENE-10500: -- Commit 796a19b457691249b0f3c2fc837f82a378d1d2c6 in lucene's branch refs/heads/main from Luca Cavanna [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=796a19b4576 ] LUCENE-10500: StringValueFacetCounts to not rely on sequential collection (#788) StringValueFacetCounts should use the segment ordinal instead of the current index when looping through the matching hits, as when search is multi-threaded the order of the matching hits (one per segment) is not deterministic. > StringValueFacetCounts relies on sequential collection > -- > > Key: LUCENE-10500 > URL: https://issues.apache.org/jira/browse/LUCENE-10500 > Project: Lucene - Core > Issue Type: Bug >Reporter: Luca Cavanna >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We recently moved some of the facets tests to use IndexSearcher#search(Query, > CollectorManager) providing a FacetsCollectorManager instead of a > FacetsCollector. Whenever newIndexSearcher(IndexReader) is used in tests, > concurrent search may now be exercised while it was not before. > This caused some build failures on TestStringValueFacetCounts: > {code:java} > java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 > at > __randomizedtesting.SeedInfo.seed([ED8BF8281FCE5C02:9FC7DD27AEAEEA71]:0) > at > org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.util.packed.Packed64.get(Packed64.java:81) > at > org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.OrdinalMap$2.get(OrdinalMap.java:346) > at > org.apache.lucene.facet.StringValueFacetCounts.countOneSegment(StringValueFacetCounts.java:440) > at > org.apache.lucene.facet.StringValueFacetCounts.count(StringValueFacetCounts.java:295) > at > org.apache.lucene.facet.StringValueFacetCounts.(StringValueFacetCounts.java:123) > at > org.apache.lucene.facet.TestStringValueFacetCounts.checkFacetResult(TestStringValueFacetCounts.java:349) > at > org.apache.lucene.facet.TestStringValueFacetCounts.testRandom(TestStringValueFacetCounts.java:325) > {code} > This looks like a real bug, as StringValueFacetCounts#countOneSegment is > called once providing the index of the current loop instead of the ordinal > taken from the matching hits that we are analyzing. That works fine with > single threaded collection as we will go sequentially and the two indices > will always be the same. With multi-threaded search, the order of the > returned matching hits (one per segment) is not deterministic. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10500) StringValueFacetCounts relies on sequential collection
[ https://issues.apache.org/jira/browse/LUCENE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517694#comment-17517694 ] ASF subversion and git services commented on LUCENE-10500: -- Commit d2b3141e361636a36646b3a45fc322abecd2e957 in lucene's branch refs/heads/branch_9x from Luca Cavanna [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d2b3141e361 ] LUCENE-10500: StringValueFacetCounts to not rely on sequential collection (#788) StringValueFacetCounts should use the segment ordinal instead of the current index when looping through the matching hits, as when search is multi-threaded the order of the matching hits (one per segment) is not deterministic. > StringValueFacetCounts relies on sequential collection > -- > > Key: LUCENE-10500 > URL: https://issues.apache.org/jira/browse/LUCENE-10500 > Project: Lucene - Core > Issue Type: Bug >Reporter: Luca Cavanna >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We recently moved some of the facets tests to use IndexSearcher#search(Query, > CollectorManager) providing a FacetsCollectorManager instead of a > FacetsCollector. Whenever newIndexSearcher(IndexReader) is used in tests, > concurrent search may now be exercised while it was not before. > This caused some build failures on TestStringValueFacetCounts: > {code:java} > java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 > at > __randomizedtesting.SeedInfo.seed([ED8BF8281FCE5C02:9FC7DD27AEAEEA71]:0) > at > org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.util.packed.Packed64.get(Packed64.java:81) > at > org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.OrdinalMap$2.get(OrdinalMap.java:346) > at > org.apache.lucene.facet.StringValueFacetCounts.countOneSegment(StringValueFacetCounts.java:440) > at > org.apache.lucene.facet.StringValueFacetCounts.count(StringValueFacetCounts.java:295) > at > org.apache.lucene.facet.StringValueFacetCounts.(StringValueFacetCounts.java:123) > at > org.apache.lucene.facet.TestStringValueFacetCounts.checkFacetResult(TestStringValueFacetCounts.java:349) > at > org.apache.lucene.facet.TestStringValueFacetCounts.testRandom(TestStringValueFacetCounts.java:325) > {code} > This looks like a real bug, as StringValueFacetCounts#countOneSegment is > called once providing the index of the current loop instead of the ordinal > taken from the matching hits that we are analyzing. That works fine with > single threaded collection as we will go sequentially and the two indices > will always be the same. With multi-threaded search, the order of the > returned matching hits (one per segment) is not deterministic. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request, #789: Add release wizard step around build failures
jtibshirani opened a new pull request, #789: URL: https://github.com/apache/lucene/pull/789 This PR proposes adding a preparation step to look at bui...@lucene.apache.org and address recurring failures. This helps make sure we catch and fix known bugs before spinning the release candidate. It also prevents flaky tests from failing during the release vote (which adds confusion). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #789: Add release wizard step around build failures
jtibshirani commented on PR #789: URL: https://github.com/apache/lucene/pull/789#issuecomment-1089323655 This was inspired by my experience releasing Lucene 9.1, where addressing build failures early would've helped save us time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
zhaih commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843265236 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import java.nio.file.Path; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException { Review Comment: Thank you for adding the test case, it looks awesome! ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/AlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; + +/** + * A modified DirectoryTaxonomyReader that always recreates a new {@link + * AlwaysRefreshDirectoryTaxonomyReader} instance when {@link + * AlwaysRefreshDirectoryTaxonomyReader#doOpenIfChanged()} is called. This enables us to easily go + * forward or backward in time by re-computing the ordinal space during each refresh. Review Comment: Maybe add a note/warn here saying this will result in an always `O(#facet_label)` taxonomy array construction time when refresh? ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843271841 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; +import org.junit.Test; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + @Test + /** + * Tests the expert constructors in {@link DirectoryTaxonomyReader} and checks the {@link + * DirectoryTaxonomyReader#getInternalIndexReader()} API. Also demonstrates the need for the + * constructor and the API. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException { +Directory dir = newDirectory(); +DirectoryTaxonomyWriter tw = new DirectoryTaxonomyWriter(dir); +tw.addCategory(new FacetLabel("a")); +tw.commit(); + +DirectoryTaxonomyReader dtr1 = new AlwaysRefreshDirectoryTaxonomyReader(dir); +tw.addCategory(new FacetLabel("b")); +tw.commit(); +DirectoryTaxonomyReader dtr2 = dtr1.doOpenIfChanged(); Review Comment: > his is not really testing the unique capability of AlwaysRefreshDTR Yes and that was the cause of my hesitation in the first comment (I had also mentioned this in the first commit javadoc). The next commit adds the correct test case where DTR fails to pass the use case. > maybe by creating a method returning taxoArray == null? That was another way but I did not want to add another API to just be able to demonstrate a rare/expert use case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843272080 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; +import org.junit.Test; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { Review Comment: The next commit does that exactly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843272398 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java: ## @@ -78,10 +78,20 @@ private volatile TaxonomyIndexArrays taxoArrays; /** - * Called only from {@link #doOpenIfChanged()}. If the taxonomy has been recreated, you should - * pass {@code null} as the caches and parent/children arrays. + * Expert: Use this method to explicitly force the {@link DirectoryTaxonomyReader} to use specific + * parent/children arrays and caches. + * + * Called from {@link #doOpenIfChanged()}. If the taxonomy has been recreated, you should pass + * {@code null} as the caches and parent/children arrays. + * + * @param indexReader An indexReader that is opened in the desired Directory + * @param taxoWriter The {@link DirectoryTaxonomyWriter} from which to obtain newly added + * categories, in real-time. + * @param ordinalCache a FacetLabel to Integer ordinal mapping if it already exists + * @param categoryCache an ordinal to FacetLabel mapping if it already exists + * @param taxoArrays taxonomy arrays that store the parent, siblings, children information */ - DirectoryTaxonomyReader( + public DirectoryTaxonomyReader( Review Comment: Changed it to protected -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843275500 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import java.nio.file.Path; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException { +final Path taxoPath1 = createTempDir("dir1"); +final Directory dir1 = newFSDirectory(taxoPath1); +final DirectoryTaxonomyWriter tw1 = +new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE); +tw1.addCategory(new FacetLabel("a")); +tw1.commit(); // commit1 + +final Path taxoPath2 = createTempDir("commit1"); +final Directory commit1 = newFSDirectory(taxoPath2); +// copy all index files from dir1 +for (String file : dir1.listAll()) { + commit1.copyFrom(dir1, file, file, IOContext.READ); +} + +tw1.addCategory(new FacetLabel("b")); +tw1.commit(); // commit2 +tw1.close(); + +final DirectoryReader dr1 = DirectoryReader.open(dir1); +// using a DirectoryTaxonomyReader here will cause the test to fail and throw a AIOOB exception Review Comment: Is there a way in Java to send a class, exception type as a param to this test case and then check the error thrown? So for DTR we could "actually" verify that an AIOOB is thrown instead of me mentioning it here. Something like ``` // ClassParam is sent as a param to the test case, once as DTR and next time as AlwaysRefreshDTR final DirectoryTaxonomyReader dtr1 = new ClassParam(dir1); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843275500 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import java.nio.file.Path; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException { +final Path taxoPath1 = createTempDir("dir1"); +final Directory dir1 = newFSDirectory(taxoPath1); +final DirectoryTaxonomyWriter tw1 = +new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE); +tw1.addCategory(new FacetLabel("a")); +tw1.commit(); // commit1 + +final Path taxoPath2 = createTempDir("commit1"); +final Directory commit1 = newFSDirectory(taxoPath2); +// copy all index files from dir1 +for (String file : dir1.listAll()) { + commit1.copyFrom(dir1, file, file, IOContext.READ); +} + +tw1.addCategory(new FacetLabel("b")); +tw1.commit(); // commit2 +tw1.close(); + +final DirectoryReader dr1 = DirectoryReader.open(dir1); +// using a DirectoryTaxonomyReader here will cause the test to fail and throw a AIOOB exception Review Comment: Is there a way in Java to send a class, exception type as a param to this test case and then check the error thrown? So for DTR we could "actually" verify that an AIOOB is thrown instead of me mentioning it here. Something like ``` // DTRClass is sent as a param to the test case, once as DTR and next time as AlwaysRefreshDTR final DirectoryTaxonomyReader dtr1 = new DTRClass(dir1); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843272398 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java: ## @@ -78,10 +78,20 @@ private volatile TaxonomyIndexArrays taxoArrays; /** - * Called only from {@link #doOpenIfChanged()}. If the taxonomy has been recreated, you should - * pass {@code null} as the caches and parent/children arrays. + * Expert: Use this method to explicitly force the {@link DirectoryTaxonomyReader} to use specific + * parent/children arrays and caches. + * + * Called from {@link #doOpenIfChanged()}. If the taxonomy has been recreated, you should pass + * {@code null} as the caches and parent/children arrays. + * + * @param indexReader An indexReader that is opened in the desired Directory + * @param taxoWriter The {@link DirectoryTaxonomyWriter} from which to obtain newly added + * categories, in real-time. + * @param ordinalCache a FacetLabel to Integer ordinal mapping if it already exists + * @param categoryCache an ordinal to FacetLabel mapping if it already exists + * @param taxoArrays taxonomy arrays that store the parent, siblings, children information */ - DirectoryTaxonomyReader( + public DirectoryTaxonomyReader( Review Comment: Changed it to protected to enable subclasses to be able to customize their usage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10501) StackOverflow when RegExp encounters a very large string
Kartik Ganesh created LUCENE-10501: -- Summary: StackOverflow when RegExp encounters a very large string Key: LUCENE-10501 URL: https://issues.apache.org/jira/browse/LUCENE-10501 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 9.1 Reporter: Kartik Ganesh When RegExp encounters a very large string, it hits a Stack Overflow exception when parsing it. Simple program to repro: {{$ ls}} {{RegExpTest.java lucene-core-9.1.0.jar}} {{$ cat RegExpTest.java}} {{class RegExpTest {}} {{ public static void main(String[] args) {}} {{ StringBuilder strBuilder = new StringBuilder();}} {{ for (int i = 0; i < 5; i++) {}} {{ strBuilder.append("a");}} {{ }}} {{ try {}} {{ new org.apache.lucene.util.automaton.RegExp(strBuilder.toString());}} {{ } catch (StackOverflowError e) {}} {{ System.out.println("Stack overflow");}} {{ System.exit(-1);}} {{ }}} {{ System.out.println("Success");}} {{ }}} {{}}} {{$ javac -cp './lucene-core-9.1.0.jar:.' RegExpTest.java}} {{$ java -cp './lucene-core-9.1.0.jar:.' RegExpTest}} {{Stack overflow}} {{$ java -Xss1G -cp './lucene-core-9.1.0.jar:.' RegExpTest}} {{Success}} Based on https://issues.apache.org/jira/browse/LUCENE-6156 , this appears to be due to the recursive parsing implementation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #788: LUCENE-10500: StringValueFacetCounts to not rely on sequential collection
gsmiller commented on PR #788: URL: https://github.com/apache/lucene/pull/788#issuecomment-1089423815 Oh no. Good find and thanks for fixing! (and apologies for slipping that bug in) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10501) StackOverflow when RegExp encounters a very large string
[ https://issues.apache.org/jira/browse/LUCENE-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kartik Ganesh updated LUCENE-10501: --- Description: When RegExp encounters a very large string, it hits a Stack Overflow exception when parsing it. Simple program to repro: {noformat} $ ls RegExpTest.java lucene-core-9.1.0.jar $ cat RegExpTest.java class RegExpTest { public static void main(String[] args) { StringBuilder strBuilder = new StringBuilder(); for (int i = 0; i < 5; i++) { strBuilder.append("a"); } try { new org.apache.lucene.util.automaton.RegExp(strBuilder.toString()); } catch (StackOverflowError e) { System.out.println("Stack overflow"); System.exit(-1); } System.out.println("Success"); } } $ javac -cp './lucene-core-9.1.0.jar:.' RegExpTest.java $ java -cp './lucene-core-9.1.0.jar:.' RegExpTest Stack overflow $ java -Xss1G -cp './lucene-core-9.1.0.jar:.' RegExpTest Success{noformat} Based on https://issues.apache.org/jira/browse/LUCENE-6156 , this appears to be due to the recursive parsing implementation. was: When RegExp encounters a very large string, it hits a Stack Overflow exception when parsing it. Simple program to repro: {{$ ls}} {{RegExpTest.java lucene-core-9.1.0.jar}} {{$ cat RegExpTest.java}} {{class RegExpTest {}} {{ public static void main(String[] args) {}} {{ StringBuilder strBuilder = new StringBuilder();}} {{ for (int i = 0; i < 5; i++) {}} {{ strBuilder.append("a");}} {{ }}} {{ try {}} {{ new org.apache.lucene.util.automaton.RegExp(strBuilder.toString());}} {{ } catch (StackOverflowError e) {}} {{ System.out.println("Stack overflow");}} {{ System.exit(-1);}} {{ }}} {{ System.out.println("Success");}} {{ }}} {{}}} {{$ javac -cp './lucene-core-9.1.0.jar:.' RegExpTest.java}} {{$ java -cp './lucene-core-9.1.0.jar:.' RegExpTest}} {{Stack overflow}} {{$ java -Xss1G -cp './lucene-core-9.1.0.jar:.' RegExpTest}} {{Success}} Based on https://issues.apache.org/jira/browse/LUCENE-6156 , this appears to be due to the recursive parsing implementation. > StackOverflow when RegExp encounters a very large string > > > Key: LUCENE-10501 > URL: https://issues.apache.org/jira/browse/LUCENE-10501 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 9.1 >Reporter: Kartik Ganesh >Priority: Major > > When RegExp encounters a very large string, it hits a Stack Overflow > exception when parsing it. > Simple program to repro: > {noformat} > $ ls > RegExpTest.java lucene-core-9.1.0.jar > $ cat RegExpTest.java > class RegExpTest { > public static void main(String[] args) { > StringBuilder strBuilder = new StringBuilder(); > for (int i = 0; i < 5; i++) { > strBuilder.append("a"); > } > try { > new > org.apache.lucene.util.automaton.RegExp(strBuilder.toString()); > } catch (StackOverflowError e) { > System.out.println("Stack overflow"); > System.exit(-1); > } > System.out.println("Success"); > } > } > $ javac -cp './lucene-core-9.1.0.jar:.' RegExpTest.java > $ java -cp './lucene-core-9.1.0.jar:.' RegExpTest > Stack overflow > $ java -Xss1G -cp './lucene-core-9.1.0.jar:.' RegExpTest > Success{noformat} > Based on https://issues.apache.org/jira/browse/LUCENE-6156 , this appears to > be due to the recursive parsing implementation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843312709 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import java.nio.file.Path; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException { +final Path taxoPath1 = createTempDir("dir1"); +final Directory dir1 = newFSDirectory(taxoPath1); +final DirectoryTaxonomyWriter tw1 = +new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE); +tw1.addCategory(new FacetLabel("a")); +tw1.commit(); // commit1 + +final Path taxoPath2 = createTempDir("commit1"); +final Directory commit1 = newFSDirectory(taxoPath2); +// copy all index files from dir1 +for (String file : dir1.listAll()) { + commit1.copyFrom(dir1, file, file, IOContext.READ); +} + +tw1.addCategory(new FacetLabel("b")); +tw1.commit(); // commit2 +tw1.close(); + +final DirectoryReader dr1 = DirectoryReader.open(dir1); +// using a DirectoryTaxonomyReader here will cause the test to fail and throw a AIOOB exception Review Comment: I am also okay with just plainly duplicating all the code and creating a new test case. It is important that we have a test case that tests if an AIOOB is thrown. What do other people think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
[ https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Worah updated LUCENE-10482: -- Status: Patch Available (was: Open) > Allow users to create their own DirectoryTaxonomyReaders with empty > taxoArrays instead of letting the taxoEpoch decide > -- > > Key: LUCENE-10482 > URL: https://issues.apache.org/jira/browse/LUCENE-10482 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 9.1 >Reporter: Gautam Worah >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} > in my day job where we were trying to replace the index underneath a reader > asynchronously and then call the {{doOpenIfChanged}} call on it. > It turns out that the taxonomy index uses its own index based counter (the > {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write > mode after the last time it was written and if not, it directly tries to > reuse the previous {{taxoArrays}} it had created. This logic fails in a > scenario where both the old and new index were opened just once but the index > itself is completely different in both the cases. > In such a case, it would be good to give the user the flexibility to inform > the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and > {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in > various ways). Luckily, such a constructor already exists! But it is private > today! The idea here is to allow subclasses of DTR to use this constructor. > Curious to see what other folks think about this idea. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
zhaih commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843318155 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import java.nio.file.Path; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException { +final Path taxoPath1 = createTempDir("dir1"); +final Directory dir1 = newFSDirectory(taxoPath1); +final DirectoryTaxonomyWriter tw1 = +new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE); +tw1.addCategory(new FacetLabel("a")); +tw1.commit(); // commit1 + +final Path taxoPath2 = createTempDir("commit1"); +final Directory commit1 = newFSDirectory(taxoPath2); +// copy all index files from dir1 +for (String file : dir1.listAll()) { + commit1.copyFrom(dir1, file, file, IOContext.READ); +} + +tw1.addCategory(new FacetLabel("b")); +tw1.commit(); // commit2 +tw1.close(); + +final DirectoryReader dr1 = DirectoryReader.open(dir1); +// using a DirectoryTaxonomyReader here will cause the test to fail and throw a AIOOB exception Review Comment: I believe you can use `expectThrows`? https://lucene.apache.org/core/7_3_1/test-framework/org/apache/lucene/util/LuceneTestCase.html#expectThrows-java.lang.Class-org.apache.lucene.util.LuceneTestCase.ThrowingRunnable- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
[ https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated LUCENE-10292: Attachment: LUCENE-10292.patch Assignee: Chris M. Hostetter Status: Open (was: Open) {quote}seems like it should be possible/better to change {{AnalyzingInfixSuggester.build()}} so that the {{searcherMgr}} is only repalced *after* we build the new index (and/or stop using a new {{IndexWriter}} on every {{AnalyzingInfixSuggester.build()}} call and just do a {{writer.deleteAll()}} instead?) {quote} Digging into the code a bit more i realized: * Using {{writer.deleteAll()}} isn't really a viable "fix" because (by default) the writer doesn't exist (or is closed after {{build()}} * Another aspect of the existing impl is that if thread T1 enters the {{lookup()}} method and makes it past the {{if (searcherMgr == null)}} check, but then another thread T2 calls {{build()}} before the T1 reaches the synchronized block, then T1 will be blocked for the entire durration of the {{build()}} ... which is certainly less then ideal. I worked up a patch with a test case to try and demonstrate the problem, as well as a fix -- note that as written the test case won't always "fail" cleaning with the existing impl, it can also deadlock because of how I'm using a Semaphore to "slow" down the {{build()}} -- triggering the blocked lookup thread situation described above. The basic idea of the fix s to decouple the synchronization/locking needed for changing the writer from the synchronization/locking needed to change the SearchManager. and replace the SearchManager only once the {{build()}} is complete. I originally tried to replace the "R/W" locking of SearcherManager with an AtomicReference so we wouldn't need to have any synchronization blocks in {{lookup()}} at all; but I couldn't figure out a "safe" way to do that w/o ref counting the SearcherManager itself (to ensure that no build/add calls close the SearcherManager that {{lookup()}} gets a ref to before it has a change to {{acquire()}} on it (thank you {{testRandomNRT}} for catching that for me) Feedback on this approach (or suggestions for better ways to solve this) welcome > AnalyzingInfixSuggester thread safety: lookup() fails during (re)build() > > > Key: LUCENE-10292 > URL: https://issues.apache.org/jira/browse/LUCENE-10292 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: LUCENE-10292.patch > > > I'm filing this based on anecdotal information from a Solr user w/o > experiencing it first hand (and I don't have a test case to demonstrate it) > but based on a reading of the code the underlying problem seems self > evident... > With all other Lookup implementations I've examined, it is possible to call > {{lookup()}} regardless of whether another thread is concurrently calling > {{build()}} – in all cases I've seen, it is even possible to call > {{lookup()}} even if {{build()}} has never been called: the result is just an > "empty" {{List}} > Typically this is works because the {{build()}} method uses temporary > datastructures until it's "build logic" is complete, at which point it > atomically replaces the datastructures used by the {{lookup()}} method. In > the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method > starts by closing & null'ing out the {{protected SearcherManager > searcherMgr}} (which it only populates again once it's completed building up > it's index) and then the lookup method starts with... > {code:java} > if (searcherMgr == null) { > throw new IllegalStateException("suggester was not built"); > } > {code} > ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any > situation where another thread may be calling > {{AnalyzingInfixSuggester.build()}} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843363466 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import java.nio.file.Path; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException { +final Path taxoPath1 = createTempDir("dir1"); +final Directory dir1 = newFSDirectory(taxoPath1); +final DirectoryTaxonomyWriter tw1 = +new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE); +tw1.addCategory(new FacetLabel("a")); +tw1.commit(); // commit1 + +final Path taxoPath2 = createTempDir("commit1"); +final Directory commit1 = newFSDirectory(taxoPath2); +// copy all index files from dir1 +for (String file : dir1.listAll()) { + commit1.copyFrom(dir1, file, file, IOContext.READ); +} + +tw1.addCategory(new FacetLabel("b")); +tw1.commit(); // commit2 +tw1.close(); + +final DirectoryReader dr1 = DirectoryReader.open(dir1); +// using a DirectoryTaxonomyReader here will cause the test to fail and throw a AIOOB exception Review Comment: I meant something like ``` void testCase(DTRClass, exceptionType) { // dir1 is created here final DirectoryTaxonomyReader dtr1 = new DTRClass(dir1); // again some logic here if (DTRClass instanceOf AlwaysRefreshDTR) { mgr.maybeRefresh(); } else { expectThrows(AIOOB, mgr.maybeRefresh()); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r843363466 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import java.io.IOException; +import java.nio.file.Path; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException { +final Path taxoPath1 = createTempDir("dir1"); +final Directory dir1 = newFSDirectory(taxoPath1); +final DirectoryTaxonomyWriter tw1 = +new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE); +tw1.addCategory(new FacetLabel("a")); +tw1.commit(); // commit1 + +final Path taxoPath2 = createTempDir("commit1"); +final Directory commit1 = newFSDirectory(taxoPath2); +// copy all index files from dir1 +for (String file : dir1.listAll()) { + commit1.copyFrom(dir1, file, file, IOContext.READ); +} + +tw1.addCategory(new FacetLabel("b")); +tw1.commit(); // commit2 +tw1.close(); + +final DirectoryReader dr1 = DirectoryReader.open(dir1); +// using a DirectoryTaxonomyReader here will cause the test to fail and throw a AIOOB exception Review Comment: Something like ``` void testCase(DTRClass, exceptionType) { // dir1 is created here final DirectoryTaxonomyReader dtr1 = new DTRClass(dir1); // again some logic here if (DTRClass instanceOf AlwaysRefreshDTR) { mgr.maybeRefresh(); } else { expectThrows(AIOOB, mgr.maybeRefresh()); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517825#comment-17517825 ] Balmukund Mandal commented on LUCENE-9625: -- Thank you Michael, it works. I think its easy to include only two jars i.e. core and test rather including KnnGraphTester.class which requires proper directories . Hence, little modification in luceneknn.py file and read all from lib/* only. > Benchmark KNN search with ann-benchmarks > > > Key: LUCENE-9625 > URL: https://issues.apache.org/jira/browse/LUCENE-9625 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In addition to benchmarking with luceneutil, it would be good to be able to > make use of ann-benchmarks, which is publishing results from many approximate > knn algorithms, including the hnsw implementation from its authors. We don't > expect to challenge the performance of these native code libraries, however > it would be good to know just how far off we are. > I started looking into this and posted a fork of ann-benchmarks that uses > KnnGraphTester class to run these: > https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to > manually copy jars and the KnnGraphTester.class to the test host machine > rather than downloading from a distribution. KnnGraphTester needs some > modifications in order to support this process - this issue is mostly about > that. > One thing I noticed is that some of the index builds with higher fanout > (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is > concerning and I'll open a separate issue for trying to improve that. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn merged pull request #767: LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery
zacharymorn merged PR #767: URL: https://github.com/apache/lucene/pull/767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10436) Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery into a single FieldExistsQuery?
[ https://issues.apache.org/jira/browse/LUCENE-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517860#comment-17517860 ] ASF subversion and git services commented on LUCENE-10436: -- Commit 91e29405d8e4017e40e613323d6ef8f598167588 in lucene's branch refs/heads/main from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=91e29405d8e ] LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery (#767) > Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and > KnnVectorFieldExistsQuery into a single FieldExistsQuery? > -- > > Key: LUCENE-10436 > URL: https://issues.apache.org/jira/browse/LUCENE-10436 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > Now that we require consistency across data structures, we could merge > DocValuesFieldExistsQuery, NormsFieldExistsQuery and > KnnVectorFieldExistsQuery together into a FieldExistsQuery that would require > that the field indexes either norms, doc values or vectors? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn opened a new pull request, #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery
zacharymorn opened a new pull request, #790: URL: https://github.com/apache/lucene/pull/790 # Description Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery # Tests Passed existing tests. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery
zacharymorn commented on PR #790: URL: https://github.com/apache/lucene/pull/790#issuecomment-1089864280 For the change entry, I assume this should go into version `10.0.0`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery
jpountz commented on code in PR #790: URL: https://github.com/apache/lucene/pull/790#discussion_r843530787 ## lucene/core/src/java/org/apache/lucene/search/UsageTrackingQueryCachingPolicy.java: ## @@ -58,12 +58,6 @@ private static boolean shouldNeverCache(Query query) { return true; } -if (query instanceof DocValuesFieldExistsQuery) { - // We do not bother caching DocValuesFieldExistsQuery queries since they are already plenty - // fast. - return true; -} Review Comment: We should have a similar instanceof check for `FieldExistsQuery`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery
jpountz commented on PR #790: URL: https://github.com/apache/lucene/pull/790#issuecomment-1089886965 > For the change entry, I assume this should go into version 10.0.0? Yes, we need a CHANGES entry under 10.0.0 and a new entry in `lucene/MIGRATE.txt` that recommends replacing `DocValueFieldExistsQuery` and others with `FieldExistsQuery`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org