date:20220412

[jira] [Updated] (LUCENE-10511) IntersectIterators is not necessary under matchAll case in Facet

2022-04-12 Thread Lu Xugang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xugang updated LUCENE-10511:
---
Description: 
If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's 
cost() which is precise, we may not do 
ConjunctionUtils.intersectIterators(List)?

 

  was:
If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's 
cost() which is precise, we may not do 
ConjunctionUtils.intersectIterators(List)  instand of 
DocIdSetIterator.all(int maxDoc)?

 


> IntersectIterators is not necessary under matchAll case in Facet
> 
>
> Key: LUCENE-10511
> URL: https://issues.apache.org/jira/browse/LUCENE-10511
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
>
> If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's 
> cost() which is precise, we may not do 
> ConjunctionUtils.intersectIterators(List)?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10511) IntersectIterators is not necessary under matchAll case in Facet

2022-04-12 Thread Lu Xugang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521028#comment-17521028
 ] 

Lu Xugang commented on LUCENE-10511:


ConjunctionUtils is lack of the ability of prune DocIdSetIterators if all 
DocIdSetIterators are the "same".  For example, if two elements in 
List are DocIdSetIterator.all(int maxDoc), now we will 
iterate numbers of (2 * maxDoc) docs which could be reduce to numbers of (1 * 
maxDoc) docs.

Could we add a new method in ConjunctionUtils to tell user whether all 
DocIdSetIterators are equal, then user himself choose a DocIdSetIterator from 
List to do the iteration? 

> IntersectIterators is not necessary under matchAll case in Facet
> 
>
> Key: LUCENE-10511
> URL: https://issues.apache.org/jira/browse/LUCENE-10511
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
>
> If number of hits in FacetsCollector equals reader.maxDoc() and DocValues's 
> cost() which is precise, we may not do 
> ConjunctionUtils.intersectIterators(List)?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10514) Some Component2D#within* implementations inconsistent with Component2D#relate

2022-04-12 Thread Ignacio Vera (Jira)

Ignacio Vera created LUCENE-10514:
-

 Summary: Some Component2D#within* implementations inconsistent 
with Component2D#relate
 Key: LUCENE-10514
 URL: https://issues.apache.org/jira/browse/LUCENE-10514
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ignacio Vera


During a contains query we have an inconsistent behaviour for geometries that 
are within the query geometry depending if we are detecting it in an inner node 
or we are detecting it in a leaf node:

In an inner node we use the method Component2D#Relate, If the query shape fully 
contains the node, then we consider that all the documents in that node are 
NOTWITHIN.

On the other hand, it might happen that when checking the documents below that 
inner node one by one, some of them result on DISJOINT relationship. In some 
cases that leads to inconsistent result.
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase opened a new pull request, #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle

2022-04-12 Thread GitBox



iverase opened a new pull request, #809:
URL: https://github.com/apache/lucene/pull/809

   WE currently might return disjoint when a query geometry fully contains a 
triangle / line / point. This causes issues as when an inner node is fully 
contained in the query shape, we marked those documents as NOTWITHIN. This PR 
brings these behaviour together by making sure we always return NOTWITHIN for 
fully contained triangles.
   
   
   cc: @nknize could you have a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks

2022-04-12 Thread Alan Woodward (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521062#comment-17521062
 ] 

Alan Woodward commented on LUCENE-10510:


This has been triggering failures in our elasticsearch CI, and I think it's 
because the new check is running on `./gradlew clean test` when we don't really 
need it to? AIUI it should only be added to the task graph if spotless or 
errorprone are being run.

> Check module access prior to running gjf/spotless/errorprone tasks
> --
>
> Key: LUCENE-10510
> URL: https://issues.apache.org/jira/browse/LUCENE-10510
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 9.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR at: [https://github.com/apache/lucene/pull/802]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand commented on pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-12 Thread GitBox



mikemccand commented on PR #762:
URL: https://github.com/apache/lucene/pull/762#issuecomment-1096620694

   > Ooof this new commit was quite a journey. The test case sporadically 
started failing after I added the two use cases (of testing both the DTR and 
ARDTR). This led me down to debugging and figuring out that the test-framework 
randomly adds files and folders to empty directories just to chaos test the 
setup. It then also randomly disables deletes in directories through 
VirusChecker and WindowsFS. Accounting for these factors finally made the test 
case work.
   > 
   > Directly modifying index files is not easy :) 
   > 
   > 
   > 
   > We now also check explictly for the older label "a" and ensure that the 
new commit label "b" can't be found.
   
   Thanks for persisting @gautamworah96!  Indeed the randomized file system 
behavior (the occasional `extraN` files!) is exciting when it strikes.  And 
simulated virus checkers holding files open is also exciting.  Lucene's 
randomized testing infra is awesome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta merged pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-12 Thread GitBox



mocobeta merged PR #808:
URL: https://github.com/apache/lucene/pull/808


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

2022-04-12 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521112#comment-17521112
 ] 

ASF subversion and git services commented on LUCENE-10513:
--

Commit e9789afb39b9003248650afaf19d4ba9672f2994 in lucene's branch 
refs/heads/main from Rich Bowen
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e9789afb39b ]

LUCENE-10513: Run `gradlew tidy` first (#808)

Co-authored-by: Dawid Weiss 

> Make it more obvious how to fix Spotless issues for new users
> -
>
> Key: LUCENE-10513
> URL: https://issues.apache.org/jira/browse/LUCENE-10513
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I just made my first PR to Lucene (yay me!) and in the process stumbled on 
> various things that were non-obvious.
> I request, for The Next Person, that the error messaging in `gradlew` make it 
> more obvious that one should run `./gradlew tidy` the first time around, so 
> as to avoid the low-hanging formatting problems that cause everything else to 
> fail.
> During the course of my fumbling around, I was encouraged to run:
> ./gradlew :lucene:suggest:spotlessJavaCheck
> ./gradlew :lucene:suggest:spotlessApply
> ./gradlew :lucene:test-framework:spotlessApply
> and
> ./gradlew check -Ptests.nightly=true
> various times, by the error messages in `./gradlew check`, and while I got 
> there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew 
> tidy` first may have saved some frustration.
> That said, I cannot overstate how impressed I am with the thoroughness of the 
> testing/verification tools, and wish more projects had this kind of tooling. 
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rbowen commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-12 Thread GitBox



rbowen commented on PR #808:
URL: https://github.com/apache/lucene/pull/808#issuecomment-1096660870

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

2022-04-12 Thread Rich Bowen (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Bowen resolved LUCENE-10513.
-
Resolution: Fixed

Thank you. PR merged. Closing.

> Make it more obvious how to fix Spotless issues for new users
> -
>
> Key: LUCENE-10513
> URL: https://issues.apache.org/jira/browse/LUCENE-10513
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Rich Bowen
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I just made my first PR to Lucene (yay me!) and in the process stumbled on 
> various things that were non-obvious.
> I request, for The Next Person, that the error messaging in `gradlew` make it 
> more obvious that one should run `./gradlew tidy` the first time around, so 
> as to avoid the low-hanging formatting problems that cause everything else to 
> fail.
> During the course of my fumbling around, I was encouraged to run:
> ./gradlew :lucene:suggest:spotlessJavaCheck
> ./gradlew :lucene:suggest:spotlessApply
> ./gradlew :lucene:test-framework:spotlessApply
> and
> ./gradlew check -Ptests.nightly=true
> various times, by the error messages in `./gradlew check`, and while I got 
> there eventually (again, yay me!) perhaps encouraging folks to run `./gradlew 
> tidy` first may have saved some frustration.
> That said, I cannot overstate how impressed I am with the thoroughness of the 
> testing/verification tools, and wish more projects had this kind of tooling. 
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

2022-04-12 Thread GitBox



uschindler commented on PR #807:
URL: https://github.com/apache/lucene/pull/807#issuecomment-1096665479

   Tidy is the best task name for this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch deci

2022-04-12 Thread GitBox



mikemccand commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r848422770


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import static com.carrotsearch.randomizedtesting.RandomizedTest.sleep;
+import static org.apache.lucene.tests.mockfile.ExtrasFS.isExtra;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import java.time.Instant;
+import java.util.List;
+import java.util.function.Function;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  private  void testAlwaysRefreshDirectoryTaxonomyReader(
+  Function dtrProducer, Class 
exceptionType)
+  throws IOException {
+final Path taxoPath1 = createTempDir(String.valueOf(Instant.now()));
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir(String.valueOf(Instant.now()));
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  if (isExtra(file) == false) {
+// the test framework creates these devious extra files just to chaos 
test the edge cases
+commit1.copyFrom(dir1, file, file, IOContext.READ);
+  }
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+final DirectoryTaxonomyReader dtr1 = dtrProducer.apply(dir1);
+final SearcherTaxonomyManager mgr = new SearcherTaxonomyManager(dr1, dtr1, 
null);
+
+final FacetsConfig config = new FacetsConfig();
+SearcherTaxonomyManager.SearcherAndTaxonomy pair = mgr.acquire();
+final FacetsCollector sfc = new FacetsCollector();
+/**
+ * the call flow here initializes {@link 
DirectoryTaxonomyReader#taxoArrays}. These reused
+ * `taxoArrays` form the basis of the inconsistency *
+ */
+getTaxonomyFacetCounts(pair.taxonomyReader, config, sfc);
+
+// now try to go back to checkpoint 1 and refresh the 
SearcherTaxonomyManager
+
+// delete all files from commit2
+for (String file : dir1.listAll()) {
+  dir1.deleteFile(file);
+}
+
+while (dir1.getPendingDeletions().isEmpty() == false) {
+  // make the test more robust to the OS taking more time to actually 
delete files
+  if (TestUtil.hasVirusChecker(dir1) || TestUtil.hasWindowsFS(dir1)) {
+// nefarious FS will delay/stop deletion of index files
+return;
+  }
+  sleep(5);
+}
+
+// copy all index files from commit1
+for (String file : commit1.listAll()) {
+  if (isExtra(file) == false) {
+dir1.copyFrom(commit1, file, file, IOContext.R

[GitHub] [lucene] mocobeta opened a new pull request, #810: (trivial) revice contributing.md?

2022-04-12 Thread GitBox



mocobeta opened a new pull request, #810:
URL: https://github.com/apache/lucene/pull/810

   This kind of documentation tends to become bloated.
   I think it might be good if we maintain only files in 
[lucene/help](https://github.com/apache/lucene/tree/main/help) and make the 
guide include just links to proper help docs, instead of adding specific 
commands to it as needed? It can't cover various use-cases anyway...
   
   You can check this for a preview:
   
https://github.com/mocobeta/lucene/blob/slimdown-contributing-guide/CONTRIBUTING.md
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] yixunx commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges

2022-04-12 Thread GitBox



yixunx commented on PR #756:
URL: https://github.com/apache/lucene/pull/756#issuecomment-1096841552

   @iverase The latest run of our indexing pipeline revealed some more shapes 
that fail to tessellate. This time I made changes so that we get all failing 
shapes instead of just one. There are 1.3 million failing shapes, and I'm 
planning to submit a sample of them this week. However, I haven't looked into 
the shapes yet so I'm not sure if the failures are related to this issue. I can 
open a separate ticket with the new shapes if you are going to merge this PR 
soon. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] nknize commented on a diff in pull request #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle

2022-04-12 Thread GitBox



nknize commented on code in PR #809:
URL: https://github.com/apache/lucene/pull/809#discussion_r848570743


##
lucene/core/src/java/org/apache/lucene/geo/Polygon2D.java:
##
@@ -257,10 +257,13 @@ public WithinRelation withinLine(
   boolean ab,
   double bX,
   double bY) {
-if (ab == true

Review Comment:
   I don't remember if order matters here? Seems before we would bail early if 
`ab == false` and skip the costly checks. Does that logic still stand if we 
move an `ab == false` check up front?



##
lucene/core/src/java/org/apache/lucene/document/LatLonShapeBoundingBoxQuery.java:
##
@@ -485,8 +485,11 @@ boolean containsTriangle(int aX, int aY, int bX, int bY, 
int cX, int cY) {
 }
 
 /** Returns the Within relation to the provided triangle */
-Component2D.WithinRelation withinLine(int ax, int ay, boolean ab, int bx, 
int by) {
-  if (ab == true && edgeIntersectsBox(ax, ay, bx, by, minX, maxX, minY, 
maxY) == true) {
+Component2D.WithinRelation withinLine(int aX, int aY, boolean ab, int bX, 
int bY) {

Review Comment:
   :+1:  I think this is cleaner



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges

2022-04-12 Thread GitBox



iverase commented on PR #756:
URL: https://github.com/apache/lucene/pull/756#issuecomment-1096983965

   Ups I was not expecting so many failures. I prefer to wait until we find 
what is hopefully some pathological issue that can be fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase commented on a diff in pull request #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle

2022-04-12 Thread GitBox



iverase commented on code in PR #809:
URL: https://github.com/apache/lucene/pull/809#discussion_r848675478


##
lucene/core/src/java/org/apache/lucene/geo/Polygon2D.java:
##
@@ -257,10 +257,13 @@ public WithinRelation withinLine(
   boolean ab,
   double bX,
   double bY) {
-if (ab == true

Review Comment:
   I don't think we can do that now as we need to distinguish better between 
DISJOINT and WITHIN. The previous logic was not right.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10204) Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / ToChildBlockJoinQuery)

2022-04-12 Thread Marc D'Mello (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521295#comment-17521295
 ] 

Marc D'Mello commented on LUCENE-10204:
---

So I talked more with [~gsmiller] about this and we (Amazon) actually have been 
facing a lot of issues with our internal fork of {{ToParentBlockJoinQuery}} 
that attempts to do what is described in this issue. There are a lot of 
problems with getting submatch tracking to work properly with early termination 
in disjunctive queries, so it seems that replaying the child query is really 
the best way to go and this issue is not really worth pursuing further.

> Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / 
> ToChildBlockJoinQuery)
> -
>
> Key: LUCENE-10204
> URL: https://issues.apache.org/jira/browse/LUCENE-10204
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/join
>Reporter: Greg Miller
>Priority: Minor
>
> It would be nice to be able to iterate over the "sub-matches" in these join 
> queries for the purpose of faceting (or possibly other use-cases?).
> For example, we have a use-case where our query matches on "child" docs, 
> using a {{ToParentBlockJoinQuery}} to "emit" the associated parents, which 
> are ultimately added to our match set. But, we want to iterate over the 
> matching "children" for the purpose of faceting.
> To make it concrete, consider searching over a product catalog where "offers" 
> and "items" are indexed side-by-side, with the offers being represented as 
> "children" of the parent items. An offer contains information like 
> "condition" (new vs. used), selling price, etc. for the parent item. If we 
> want to facet on "condition", we want to observe all children that matched 
> the query to know if the parent item had a "new" or "used" offer (or both). 
> This requires iterating over the child matches when faceting, which we cannot 
> do today since the child hit information isn't retained anywhere.
> We can support this by "caching" the child hits in a bitset but there is some 
> complexity when multiple join queries appear in a query structure (would need 
> to logically combine various "cached" bitsets using the same boolean 
> operations as in the original query structure).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks

2022-04-12 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521323#comment-17521323
 ] 

Dawid Weiss commented on LUCENE-10510:
--

Hi Alan. The task graph is fine. When you run 'gradlew clean test' the new task 
would not be included. If you take a look at the dependencies, it is only 
included if either spotless is actually part of the execution graph or you run 
java compilation with -Ptests.slow=true (in which case it is needed because 
error-prone does require those vm opening settings). I think everything is set 
up correctly. I believe your CI jobs were passing on 9x with JDKs older than 17 
because those JDKs emitted a warning about package accesses. The right way to 
fix the problem would be to add the right exports or, even better, run gradlew 
help or an explicit gradlew localSettings to make sure everything is set up 
correctly in gradle.properties.

> Check module access prior to running gjf/spotless/errorprone tasks
> --
>
> Key: LUCENE-10510
> URL: https://issues.apache.org/jira/browse/LUCENE-10510
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 9.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR at: [https://github.com/apache/lucene/pull/802]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-12 Thread GitBox



gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r848944541


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import static com.carrotsearch.randomizedtesting.RandomizedTest.sleep;
+import static org.apache.lucene.tests.mockfile.ExtrasFS.isExtra;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import java.time.Instant;
+import java.util.List;
+import java.util.function.Function;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  private  void testAlwaysRefreshDirectoryTaxonomyReader(
+  Function dtrProducer, Class 
exceptionType)
+  throws IOException {
+final Path taxoPath1 = createTempDir(String.valueOf(Instant.now()));
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir(String.valueOf(Instant.now()));
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  if (isExtra(file) == false) {
+// the test framework creates these devious extra files just to chaos 
test the edge cases
+commit1.copyFrom(dir1, file, file, IOContext.READ);
+  }
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+final DirectoryTaxonomyReader dtr1 = dtrProducer.apply(dir1);
+final SearcherTaxonomyManager mgr = new SearcherTaxonomyManager(dr1, dtr1, 
null);
+
+final FacetsConfig config = new FacetsConfig();
+SearcherTaxonomyManager.SearcherAndTaxonomy pair = mgr.acquire();
+final FacetsCollector sfc = new FacetsCollector();
+/**
+ * the call flow here initializes {@link 
DirectoryTaxonomyReader#taxoArrays}. These reused
+ * `taxoArrays` form the basis of the inconsistency *
+ */
+getTaxonomyFacetCounts(pair.taxonomyReader, config, sfc);
+
+// now try to go back to checkpoint 1 and refresh the 
SearcherTaxonomyManager
+
+// delete all files from commit2
+for (String file : dir1.listAll()) {
+  dir1.deleteFile(file);
+}
+
+while (dir1.getPendingDeletions().isEmpty() == false) {
+  // make the test more robust to the OS taking more time to actually 
delete files
+  if (TestUtil.hasVirusChecker(dir1) || TestUtil.hasWindowsFS(dir1)) {

Review Comment:
   Done. It does not look too neat but I think it is good enough



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr..

[GitHub] [lucene] mocobeta closed pull request #810: (trivial) revice contributing.md?

2022-04-12 Thread GitBox



mocobeta closed pull request #810: (trivial) revice contributing.md?
URL: https://github.com/apache/lucene/pull/810


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

2022-04-12 Thread GitBox



mocobeta commented on PR #805:
URL: https://github.com/apache/lucene/pull/805#issuecomment-1097541490

   @rmuir would you mind reviewing this, or do you think we shouldn't proceed 
this way?
   I know it's a bit radical refactoring but I cannot come up with a better way 
to unify the two tokenizers than this, sorry... let me know if it'd be better 
to stop pursuing this way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

2022-04-12 Thread GitBox



mocobeta commented on PR #808:
URL: https://github.com/apache/lucene/pull/808#issuecomment-1097561253

   I made minor edits to the contribution guide without losing the information 
added here.
   
https://github.com/apache/lucene/commit/e6fb74f9090db2bc274af94c17d80739697bdc01
   
   I think I'm too accustomed to the development workflow on this project, so 
I'm not able to figure out what is the minimum information that should be shown 
there to introduce new contributors without unnecessary pain or asking "newbie" 
questions to committers - feedback is welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10511) IntersectIterators is not necessary under matchAll case in Facet

[jira] [Commented] (LUCENE-10511) IntersectIterators is not necessary under matchAll case in Facet

[jira] [Created] (LUCENE-10514) Some Component2D#within* implementations inconsistent with Component2D#relate

[GitHub] [lucene] iverase opened a new pull request, #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle

[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks

[GitHub] [lucene] mikemccand commented on pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

[GitHub] [lucene] mocobeta merged pull request #808: LUCENE-10513: Run `gradlew tidy` first

[jira] [Commented] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

[GitHub] [lucene] rbowen commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

[jira] [Resolved] (LUCENE-10513) Make it more obvious how to fix Spotless issues for new users

[GitHub] [lucene] uschindler commented on pull request #807: LUCENE-10512: Grammar: Remove incidents of "the the" in comments.

[GitHub] [lucene] mikemccand commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch deci

[GitHub] [lucene] mocobeta opened a new pull request, #810: (trivial) revice contributing.md?

[GitHub] [lucene] yixunx commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges

[GitHub] [lucene] nknize commented on a diff in pull request #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle

[GitHub] [lucene] iverase commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges

[GitHub] [lucene] iverase commented on a diff in pull request #809: LUCENE-10514: Component2D#Within methods should return NOTWITHIN when the query geometry contains the triangle

[jira] [Commented] (LUCENE-10204) Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / ToChildBlockJoinQuery)

[jira] [Commented] (LUCENE-10510) Check module access prior to running gjf/spotless/errorprone tasks

[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

[GitHub] [lucene] mocobeta closed pull request #810: (trivial) revice contributing.md?

[GitHub] [lucene] mocobeta commented on pull request #805: LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori

[GitHub] [lucene] mocobeta commented on pull request #808: LUCENE-10513: Run `gradlew tidy` first

23 matches

Site Navigation

Mail list logo

Footer information