[jira] [Updated] (LUCENE-10511) IntersectIterators is not necessary under matchAll case in Facet
[ https://issues.apache.org/jira/browse/LUCENE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xugang updated LUCENE-10511: --- Description: If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's cost() which is precise, we may not do ConjunctionUtils.intersectIterators(List)? was: If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's cost() which is precise, we may not do ConjunctionUtils.intersectIterators(List) instand of DocIdSetIterator.all(int maxDoc)? > IntersectIterators is not necessary under matchAll case in Facet > > > Key: LUCENE-10511 > URL: https://issues.apache.org/jira/browse/LUCENE-10511 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Trivial > > If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's > cost() which is precise, we may not do > ConjunctionUtils.intersectIterators(List)? > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10511) IntersectIterators is not necessary under matchAll case in Facet
[ https://issues.apache.org/jira/browse/LUCENE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521028#comment-17521028 ] Lu Xugang commented on LUCENE-10511: ConjunctionUtils is lack of the ability of prune DocIdSetIterators if all DocIdSetIterators are the "same". For example, if two elements in List are DocIdSetIterator.all(int maxDoc), now we will iterate numbers of (2 * maxDoc) docs which could be reduce to numbers of (1 * maxDoc) docs. Could we add a new method in ConjunctionUtils to tell user whether all DocIdSetIterators are equal, then user himself choose a DocIdSetIterator from List to do the iteration? > IntersectIterators is not necessary under matchAll case in Facet > > > Key: LUCENE-10511 > URL: https://issues.apache.org/jira/browse/LUCENE-10511 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Trivial > > If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's > cost() which is precise, we may not do > ConjunctionUtils.intersectIterators(List)? > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10514) Some Component2D#within* implementations inconsistent with Component2D#relate
Ignacio Vera created LUCENE-10514: - Summary: Some Component2D#within* implementations inconsistent with Component2D#relate Key: LUCENE-10514 URL: https://issues.apache.org/jira/browse/LUCENE-10514 Project: Lucene - Core Issue Type: Bug Reporter: Ignacio Vera During a contains query we have an inconsistent behaviour for geometries that are within the query geometry depending if we are detecting it in an inner node or we are detecting it in a leaf node: In an inner node we use the method Component2D#Relate, If the query shape fully contains the node, then we consider that all the documents in that node are NOTWITHIN. On the other hand, it might happen that when checking the documents below that inner node one by one, some of them result on DISJOINT relationship. In some cases that leads to inconsistent result. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase opened a new pull request, #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle
iverase opened a new pull request, #809: URL: https://github.com/apache/lucene/pull/809 WE currently might return disjoint when a query geometry fully contains a triangle / line / point. This causes issues as when an inner node is fully contained in the query shape, we marked those documents as NOTWITHIN. This PR brings these behaviour together by making sure we always return NOTWITHIN for fully contained triangles. cc: @nknize could you have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks
[ https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521062#comment-17521062 ] Alan Woodward commented on LUCENE-10510: This has been triggering failures in our elasticsearch CI, and I think it's because the new check is running on `./gradlew clean test` when we don't really need it to? AIUI it should only be added to the task graph if spotless or errorprone are being run. > Check module access prior to running gjf/spotless/errorprone tasks > -- > > Key: LUCENE-10510 > URL: https://issues.apache.org/jira/browse/LUCENE-10510 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > PR at: [https://github.com/apache/lucene/pull/802] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide
mikemccand commented on PR #762: URL: https://github.com/apache/lucene/pull/762#issuecomment-1096620694 > Ooof this new commit was quite a journey. The test case sporadically started failing after I added the two use cases (of testing both the DTR and ARDTR). This led me down to debugging and figuring out that the test-framework randomly adds files and folders to empty directories just to chaos test the setup. It then also randomly disables deletes in directories through VirusChecker and WindowsFS. Accounting for these factors finally made the test case work. > > Directly modifying index files is not easy :) > > > > We now also check explictly for the older label "a" and ensure that the new commit label "b" can't be found. Thanks for persisting @gautamworah96! Indeed the randomized file system behavior (the occasional `extraN` files!) is exciting when it strikes. And simulated virus checkers holding files open is also exciting. Lucene's randomized testing infra is awesome. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta merged pull request #808: LUCENE-10513: Run `gradlew tidy` first
mocobeta merged PR #808: URL: https://github.com/apache/lucene/pull/808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users
[ https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521112#comment-17521112 ] ASF subversion and git services commented on LUCENE-10513: -- Commit e9789afb39b9003248650afaf19d4ba9672f2994 in lucene's branch refs/heads/main from Rich Bowen [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e9789afb39b ] LUCENE-10513: Run `gradlew tidy` first (#808) Co-authored-by: Dawid Weiss > Make it more obvious how to fix Spotless issues for new users > - > > Key: LUCENE-10513 > URL: https://issues.apache.org/jira/browse/LUCENE-10513 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > I just made my first PR to Lucene (yay me!) and in the process stumbled on > various things that were non-obvious. > I request, for The Next Person, that the error messaging in `gradlew` make it > more obvious that one should run `./gradlew tidy` the first time around, so > as to avoid the low-hanging formatting problems that cause everything else to > fail. > During the course of my fumbling around, I was encouraged to run: > ./gradlew :lucene:suggest:spotlessJavaCheck > ./gradlew :lucene:suggest:spotlessApply > ./gradlew :lucene:test-framework:spotlessApply > and > ./gradlew check -Ptests.nightly=true > various times, by the error messages in `./gradlew check`, and while I got > there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew > tidy` first may have saved some frustration. > That said, I cannot overstate how impressed I am with the thoroughness of the > testing/verification tools, and wish more projects had this kind of tooling. > Thank you. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rbowen commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first
rbowen commented on PR #808: URL: https://github.com/apache/lucene/pull/808#issuecomment-1096660870 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users
[ https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Bowen resolved LUCENE-10513. - Resolution: Fixed Thank you. PR merged. Closing. > Make it more obvious how to fix Spotless issues for new users > - > > Key: LUCENE-10513 > URL: https://issues.apache.org/jira/browse/LUCENE-10513 > Project: Lucene - Core > Issue Type: Task >Reporter: Rich Bowen >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > I just made my first PR to Lucene (yay me!) and in the process stumbled on > various things that were non-obvious. > I request, for The Next Person, that the error messaging in `gradlew` make it > more obvious that one should run `./gradlew tidy` the first time around, so > as to avoid the low-hanging formatting problems that cause everything else to > fail. > During the course of my fumbling around, I was encouraged to run: > ./gradlew :lucene:suggest:spotlessJavaCheck > ./gradlew :lucene:suggest:spotlessApply > ./gradlew :lucene:test-framework:spotlessApply > and > ./gradlew check -Ptests.nightly=true > various times, by the error messages in `./gradlew check`, and while I got > there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew > tidy` first may have saved some frustration. > That said, I cannot overstate how impressed I am with the thoroughness of the > testing/verification tools, and wish more projects had this kind of tooling. > Thank you. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.
uschindler commented on PR #807: URL: https://github.com/apache/lucene/pull/807#issuecomment-1096665479 Tidy is the best task name for this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch deci
mikemccand commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r848422770 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import static com.carrotsearch.randomizedtesting.RandomizedTest.sleep; +import static org.apache.lucene.tests.mockfile.ExtrasFS.isExtra; + +import java.io.IOException; +import java.nio.file.Path; +import java.time.Instant; +import java.util.List; +import java.util.function.Function; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.tests.util.TestUtil; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + private void testAlwaysRefreshDirectoryTaxonomyReader( + Function dtrProducer, Class exceptionType) + throws IOException { +final Path taxoPath1 = createTempDir(String.valueOf(Instant.now())); +final Directory dir1 = newFSDirectory(taxoPath1); +final DirectoryTaxonomyWriter tw1 = +new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE); +tw1.addCategory(new FacetLabel("a")); +tw1.commit(); // commit1 + +final Path taxoPath2 = createTempDir(String.valueOf(Instant.now())); +final Directory commit1 = newFSDirectory(taxoPath2); +// copy all index files from dir1 +for (String file : dir1.listAll()) { + if (isExtra(file) == false) { +// the test framework creates these devious extra files just to chaos test the edge cases +commit1.copyFrom(dir1, file, file, IOContext.READ); + } +} + +tw1.addCategory(new FacetLabel("b")); +tw1.commit(); // commit2 +tw1.close(); + +final DirectoryReader dr1 = DirectoryReader.open(dir1); +final DirectoryTaxonomyReader dtr1 = dtrProducer.apply(dir1); +final SearcherTaxonomyManager mgr = new SearcherTaxonomyManager(dr1, dtr1, null); + +final FacetsConfig config = new FacetsConfig(); +SearcherTaxonomyManager.SearcherAndTaxonomy pair = mgr.acquire(); +final FacetsCollector sfc = new FacetsCollector(); +/** + * the call flow here initializes {@link DirectoryTaxonomyReader#taxoArrays}. These reused + * `taxoArrays` form the basis of the inconsistency * + */ +getTaxonomyFacetCounts(pair.taxonomyReader, config, sfc); + +// now try to go back to checkpoint 1 and refresh the SearcherTaxonomyManager + +// delete all files from commit2 +for (String file : dir1.listAll()) { + dir1.deleteFile(file); +} + +while (dir1.getPendingDeletions().isEmpty() == false) { + // make the test more robust to the OS taking more time to actually delete files + if (TestUtil.hasVirusChecker(dir1) || TestUtil.hasWindowsFS(dir1)) { +// nefarious FS will delay/stop deletion of index files +return; + } + sleep(5); +} + +// copy all index files from commit1 +for (String file : commit1.listAll()) { + if (isExtra(file) == false) { +dir1.copyFrom(commit1, file, file, IOContext.R
[GitHub] [lucene] mocobeta opened a new pull request, #810: (trivial) revice contributing.md?
mocobeta opened a new pull request, #810: URL: https://github.com/apache/lucene/pull/810 This kind of documentation tends to become bloated. I think it might be good if we maintain only files in [lucene/help](https://github.com/apache/lucene/tree/main/help) and make the guide include just links to proper help docs, instead of adding specific commands to it as needed? It can't cover various use-cases anyway... You can check this for a preview: https://github.com/mocobeta/lucene/blob/slimdown-contributing-guide/CONTRIBUTING.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] yixunx commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges
yixunx commented on PR #756: URL: https://github.com/apache/lucene/pull/756#issuecomment-1096841552 @iverase The latest run of our indexing pipeline revealed some more shapes that fail to tessellate. This time I made changes so that we get all failing shapes instead of just one. There are 1.3 million failing shapes, and I'm planning to submit a sample of them this week. However, I haven't looked into the shapes yet so I'm not sure if the failures are related to this issue. I can open a separate ticket with the new shapes if you are going to merge this PR soon. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nknize commented on a diff in pull request #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle
nknize commented on code in PR #809: URL: https://github.com/apache/lucene/pull/809#discussion_r848570743 ## lucene/core/src/java/org/apache/lucene/geo/Polygon2D.java: ## @@ -257,10 +257,13 @@ public WithinRelation withinLine( boolean ab, double bX, double bY) { -if (ab == true Review Comment: I don't remember if order matters here? Seems before we would bail early if `ab == false` and skip the costly checks. Does that logic still stand if we move an `ab == false` check up front? ## lucene/core/src/java/org/apache/lucene/document/LatLonShapeBoundingBoxQuery.java: ## @@ -485,8 +485,11 @@ boolean containsTriangle(int aX, int aY, int bX, int bY, int cX, int cY) { } /** Returns the Within relation to the provided triangle */ -Component2D.WithinRelation withinLine(int ax, int ay, boolean ab, int bx, int by) { - if (ab == true && edgeIntersectsBox(ax, ay, bx, by, minX, maxX, minY, maxY) == true) { +Component2D.WithinRelation withinLine(int aX, int aY, boolean ab, int bX, int bY) { Review Comment: :+1: I think this is cleaner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges
iverase commented on PR #756: URL: https://github.com/apache/lucene/pull/756#issuecomment-1096983965 Ups I was not expecting so many failures. I prefer to wait until we find what is hopefully some pathological issue that can be fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a diff in pull request #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle
iverase commented on code in PR #809: URL: https://github.com/apache/lucene/pull/809#discussion_r848675478 ## lucene/core/src/java/org/apache/lucene/geo/Polygon2D.java: ## @@ -257,10 +257,13 @@ public WithinRelation withinLine( boolean ab, double bX, double bY) { -if (ab == true Review Comment: I don't think we can do that now as we need to distinguish better between DISJOINT and WITHIN. The previous logic was not right. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10204) Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / ToChildBlockJoinQuery)
[ https://issues.apache.org/jira/browse/LUCENE-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521295#comment-17521295 ] Marc D'Mello commented on LUCENE-10204: --- So I talked more with [~gsmiller] about this and we (Amazon) actually have been facing a lot of issues with our internal fork of {{ToParentBlockJoinQuery}} that attempts to do what is described in this issue. There are a lot of problems with getting submatch tracking to work properly with early termination in disjunctive queries, so it seems that replaying the child query is really the best way to go and this issue is not really worth pursuing further. > Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / > ToChildBlockJoinQuery) > - > > Key: LUCENE-10204 > URL: https://issues.apache.org/jira/browse/LUCENE-10204 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/join >Reporter: Greg Miller >Priority: Minor > > It would be nice to be able to iterate over the "sub-matches" in these join > queries for the purpose of faceting (or possibly other use-cases?). > For example, we have a use-case where our query matches on "child" docs, > using a {{ToParentBlockJoinQuery}} to "emit" the associated parents, which > are ultimately added to our match set. But, we want to iterate over the > matching "children" for the purpose of faceting. > To make it concrete, consider searching over a product catalog where "offers" > and "items" are indexed side-by-side, with the offers being represented as > "children" of the parent items. An offer contains information like > "condition" (new vs. used), selling price, etc. for the parent item. If we > want to facet on "condition", we want to observe all children that matched > the query to know if the parent item had a "new" or "used" offer (or both). > This requires iterating over the child matches when faceting, which we cannot > do today since the child hit information isn't retained anywhere. > We can support this by "caching" the child hits in a bitset but there is some > complexity when multiple join queries appear in a query structure (would need > to logically combine various "cached" bitsets using the same boolean > operations as in the original query structure). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks
[ https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521323#comment-17521323 ] Dawid Weiss commented on LUCENE-10510: -- Hi Alan. The task graph is fine. When you run 'gradlew clean test' the new task would not be included. If you take a look at the dependencies, it is only included if either spotless is actually part of the execution graph or you run java compilation with -Ptests.slow=true (in which case it is needed because error-prone does require those vm opening settings). I think everything is set up correctly. I believe your CI jobs were passing on 9x with JDKs older than 17 because those JDKs emitted a warning about package accesses. The right way to fix the problem would be to add the right exports or, even better, run gradlew help or an explicit gradlew localSettings to make sure everything is set up correctly in gradle.properties. > Check module access prior to running gjf/spotless/errorprone tasks > -- > > Key: LUCENE-10510 > URL: https://issues.apache.org/jira/browse/LUCENE-10510 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 9.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > PR at: [https://github.com/apache/lucene/pull/802] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d
gautamworah96 commented on code in PR #762: URL: https://github.com/apache/lucene/pull/762#discussion_r848944541 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java: ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.directory; + +import static com.carrotsearch.randomizedtesting.RandomizedTest.sleep; +import static org.apache.lucene.tests.mockfile.ExtrasFS.isExtra; + +import java.io.IOException; +import java.nio.file.Path; +import java.time.Instant; +import java.util.List; +import java.util.function.Function; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.tests.util.TestUtil; +import org.apache.lucene.util.IOUtils; + +public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase { + + /** + * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by testing if the + * associated {@link SearcherTaxonomyManager} can successfully refresh and serve queries if the + * underlying taxonomy index is changed to an older checkpoint. Ideally, each checkpoint should be + * self-sufficient and should allow serving search queries when {@link + * SearcherTaxonomyManager#maybeRefresh()} is called. + * + * It does not check whether the private taxoArrays were actually recreated or no. We are + * (correctly) hiding away that complexity away from the user. + */ + private void testAlwaysRefreshDirectoryTaxonomyReader( + Function dtrProducer, Class exceptionType) + throws IOException { +final Path taxoPath1 = createTempDir(String.valueOf(Instant.now())); +final Directory dir1 = newFSDirectory(taxoPath1); +final DirectoryTaxonomyWriter tw1 = +new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE); +tw1.addCategory(new FacetLabel("a")); +tw1.commit(); // commit1 + +final Path taxoPath2 = createTempDir(String.valueOf(Instant.now())); +final Directory commit1 = newFSDirectory(taxoPath2); +// copy all index files from dir1 +for (String file : dir1.listAll()) { + if (isExtra(file) == false) { +// the test framework creates these devious extra files just to chaos test the edge cases +commit1.copyFrom(dir1, file, file, IOContext.READ); + } +} + +tw1.addCategory(new FacetLabel("b")); +tw1.commit(); // commit2 +tw1.close(); + +final DirectoryReader dr1 = DirectoryReader.open(dir1); +final DirectoryTaxonomyReader dtr1 = dtrProducer.apply(dir1); +final SearcherTaxonomyManager mgr = new SearcherTaxonomyManager(dr1, dtr1, null); + +final FacetsConfig config = new FacetsConfig(); +SearcherTaxonomyManager.SearcherAndTaxonomy pair = mgr.acquire(); +final FacetsCollector sfc = new FacetsCollector(); +/** + * the call flow here initializes {@link DirectoryTaxonomyReader#taxoArrays}. These reused + * `taxoArrays` form the basis of the inconsistency * + */ +getTaxonomyFacetCounts(pair.taxonomyReader, config, sfc); + +// now try to go back to checkpoint 1 and refresh the SearcherTaxonomyManager + +// delete all files from commit2 +for (String file : dir1.listAll()) { + dir1.deleteFile(file); +} + +while (dir1.getPendingDeletions().isEmpty() == false) { + // make the test more robust to the OS taking more time to actually delete files + if (TestUtil.hasVirusChecker(dir1) || TestUtil.hasWindowsFS(dir1)) { Review Comment: Done. It does not look too neat but I think it is good enough -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr..
[GitHub] [lucene] mocobeta closed pull request #810: (trivial) revice contributing.md?
mocobeta closed pull request #810: (trivial) revice contributing.md? URL: https://github.com/apache/lucene/pull/810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori
mocobeta commented on PR #805: URL: https://github.com/apache/lucene/pull/805#issuecomment-1097541490 @rmuir would you mind reviewing this, or do you think we shouldn't proceed this way? I know it's a bit radical refactoring but I cannot come up with a better way to unify the two tokenizers than this, sorry... let me know if it'd be better to stop pursuing this way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first
mocobeta commented on PR #808: URL: https://github.com/apache/lucene/pull/808#issuecomment-1097561253 I made minor edits to the contribution guide without losing the information added here. https://github.com/apache/lucene/commit/e6fb74f9090db2bc274af94c17d80739697bdc01 I think I'm too accustomed to the development workflow on this project, so I'm not able to figure out what is the minimum information that should be shown there to introduce new contributors without unnecessary pain or asking "newbie" questions to committers - feedback is welcome. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org