Re: [PR] jgit/ clean status check should ignore any 'untracked folders' [lucene]

2024-09-06 Thread via GitHub


dweiss merged PR #13728:
URL: https://github.com/apache/lucene/pull/13728


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add unit-of-least-precision float comparison [lucene]

2024-09-06 Thread via GitHub


aherbert commented on code in PR #13723:
URL: https://github.com/apache/lucene/pull/13723#discussion_r1746618586


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##
@@ -881,27 +881,30 @@ public static void assumeNoException(String msg, 
Exception e) {
* @param maxUlps {@code (maxUlps - 1)} is the number of floating point 
values between {@code x}
* and {@code y}.
*/
-  public static void assertUlpEquals(final float x, final float y, final int 
maxUlps) {
-assertFalse(Float.isNaN(x));
-assertFalse(Float.isNaN(y));
-
+  public static void assertUlpEquals(final float x, final float y, final short 
maxUlps) {
 final int xInt = Float.floatToRawIntBits(x);
 final int yInt = Float.floatToRawIntBits(y);
 
-final boolean isEqual;
 if ((xInt ^ yInt) < 0) {
   // Numbers have opposite signs, take care of overflow.
   // Remove the sign bit to obtain the absolute ULP above zero.
   final int deltaPlus = xInt & Integer.MAX_VALUE;
   final int deltaMinus = yInt & Integer.MAX_VALUE;
 
-  // Avoid possible overflow from adding the deltas by using a long.
-  isEqual = (long) deltaPlus + deltaMinus <= maxUlps;
-} else {
-  // Numbers have same sign, there is no risk of overflow.
-  isEqual = Math.abs(xInt - yInt) <= maxUlps;
+  // Note:
+  // If either value is NaN, the exponent bits are set to (255 << 23) and 
the
+  // distance above 0.0 is always above a short ULP error. So omit the test
+  // for NaN and return directly.
+
+  // Avoid possible overflow from adding the deltas by splitting the 
comparison
+  assertTrue(deltaPlus <= maxUlps);
+  assertTrue(deltaMinus <= (maxUlps - deltaPlus));

Review Comment:
   You should add a return inside this if conditional. Otherwise you fall 
through to a condition check that may return an incorrect result if `xInt - 
yInt` overflows.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] PayloadScoreQuery javadoc update w.r.t. SpanQuery use [lucene]

2024-09-06 Thread via GitHub


cpoerschke opened a new pull request, #13731:
URL: https://github.com/apache/lucene/pull/13731

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] jgit/ clean status check should ignore any 'untracked folders' [lucene]

2024-09-06 Thread via GitHub


uschindler commented on PR #13728:
URL: https://github.com/apache/lucene/pull/13728#issuecomment-2333707595

   Thanks. I wasn't aware of this extra File walk. Who added this? Thanks for 
removing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add unit-of-least-precision float comparison [lucene]

2024-09-06 Thread via GitHub


uschindler commented on code in PR #13723:
URL: https://github.com/apache/lucene/pull/13723#discussion_r1746885657


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetAssociations.java:
##
@@ -654,7 +654,7 @@ private void assertFloatFacetResultsEqual(List 
expected, List

Re: [I] Should the static search methods in FacetsCollector take a FacetsCollector as last argument? [lucene]

2024-09-06 Thread via GitHub


javanna commented on issue #13725:
URL: https://github.com/apache/lucene/issues/13725#issuecomment-2333752360

   I agree @gsmiller , I am inclined to be stricter. Also, while we make this 
breaking change, we can go directly from `Collector` to 
`FacetsCollectorManager`. I am also considering moving these static methods to 
`FacetsCollectorManager`. It feels weird to have them exposed to 
`FacetsCollector` but require the corresponding manager.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]

2024-09-06 Thread via GitHub


dweiss commented on issue #13719:
URL: https://github.com/apache/lucene/issues/13719#issuecomment-2333762856

   > The check on a developers computer was not really wanted, so originally 
the task was only executed on CI runs.
   
   I don't know about that - I think it's one of the original reasons I added 
it to precommit... But it's been a while.
   
   Also, both modes you mention are actually very similar: nothing in the code 
ever calls "git add" so anything touched will be either untracked or modified 
by definition. I'm not sure I understand the difference. A notable exception is 
if something created a directory (empty or with something that is in 
.gitignore) - then this change would go unnoticed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] jgit/ clean status check should ignore any 'untracked folders' [lucene]

2024-09-06 Thread via GitHub


dweiss commented on PR #13728:
URL: https://github.com/apache/lucene/pull/13728#issuecomment-2333765565

   > Thanks. I wasn't aware of this extra File walk. Who added this? Thanks for 
removing it.
   
   Can't remember. Could have been me. Clearly not working in all cases though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-09-06 Thread via GitHub


uschindler commented on PR #13572:
URL: https://github.com/apache/lucene/pull/13572#issuecomment-2333805272

   Hi the whole setup of the calls to native code are not correct. In Lucene we 
don't use or need "--enable-preview", because we have a special way to compile 
the code ("you added it marked as "hacky"). In fact, the Java code must be 
placed in the `src/java21` folder which gets a special compilation (using 
apijar) files that also ensure it works with Java 22 and later.
   
   So basically the code would need to be added to the VectorizationProvider.
   
   Anyways: At moment we do not want to have native code in Lucene Core.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Add migration note about the removal of optional RegExp complement syntax [lucene]

2024-09-06 Thread via GitHub


ChrisHegarty opened a new pull request, #13732:
URL: https://github.com/apache/lucene/pull/13732

   This commit adds a migration note about the removal of optional RegExp 
complement support.
   
   I just added a sentence to an existing section, but if we have further 
advice it may be worth separating into a new section. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add migration note about the removal of optional RegExp complement syntax [lucene]

2024-09-06 Thread via GitHub


ChrisHegarty commented on PR #13732:
URL: https://github.com/apache/lucene/pull/13732#issuecomment-2333840754

   The removal of the optional complement syntax is technically a "breaking" 
change. It is of course fine to do such things in a major release, no issue 
there. Just want to ensure that it is intentional, and we provide whatever 
guidance is necessary in the migration, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace static FacetsCollector#search methods [lucene]

2024-09-06 Thread via GitHub


javanna commented on code in PR #13733:
URL: https://github.com/apache/lucene/pull/13733#discussion_r1746991600


##
lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java:
##
@@ -54,4 +79,135 @@ public ReducedFacetsCollector(final 
Collection facetsCollectors
   facetsCollector -> 
matchingDocs.addAll(facetsCollector.getMatchingDocs()));
 }
   }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */
+  public static FacetsResult search(
+  IndexSearcher searcher, Query q, int n, FacetsCollectorManager fcm) 
throws IOException {
+return doSearch(searcher, null, q, n, null, false, fcm);
+  }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */
+  public static FacetsResult search(
+  IndexSearcher searcher, Query q, int n, Sort sort, 
FacetsCollectorManager fcm)
+  throws IOException {
+if (sort == null) {
+  throw new IllegalArgumentException("sort must not be null");
+}
+return doSearch(searcher, null, q, n, sort, false, fcm);
+  }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */
+  public static FacetsResult search(
+  IndexSearcher searcher,
+  Query q,
+  int n,
+  Sort sort,
+  boolean doDocScores,
+  FacetsCollectorManager fcm)
+  throws IOException {
+if (sort == null) {
+  throw new IllegalArgumentException("sort must not be null");
+}
+return doSearch(searcher, null, q, n, sort, doDocScores, fcm);
+  }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */
+  public static FacetsResult searchAfter(
+  IndexSearcher searcher, ScoreDoc after, Query q, int n, 
FacetsCollectorManager fcm)
+  throws IOException {
+return doSearch(searcher, after, q, n, null, false, fcm);
+  }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */
+  public static FacetsResult searchAfter(
+  IndexSearcher searcher, ScoreDoc after, Query q, int n, Sort sort, 
FacetsCollectorManager fcm)
+  throws IOException {
+if (sort == null) {
+  throw new IllegalArgumentException("sort must not be null");
+}
+return doSearch(searcher, after, q, n, sort, false, fcm);
+  }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */
+  public static FacetsResult searchAfter(
+  IndexSearcher searcher,
+  ScoreDoc after,
+  Query q,
+  int n,
+  Sort sort,
+  boolean doDocScores,
+  FacetsCollectorManager fcm)
+  throws IOException {
+if (sort == null) {
+  throw new IllegalArgumentException("sort must not be null");
+}
+return doSearch(searcher, after, q, n, sort, doDocScores, fcm);
+  }
+
+  private static FacetsResult doSearch(
+  IndexSearcher searcher,
+  ScoreDoc after,
+  Query q,
+  int n,
+  Sort sort,
+  boolean doDocScores,
+  FacetsCollectorManager fcm)
+  throws IOException {
+
+int limit = searcher.getIndexReader().maxDoc();
+if (limit == 0) {
+  limit = 1;
+}
+n = Math.min(n, limit);
+
+if (after != null && after.doc >= limit) {
+  throw new IllegalArgumentException(
+  "after.doc exceeds the number of documents in the reader: after.doc="
+  + after.doc
+  + " limit="
+  + limit);
+}
+
+final TopDocs topDocs;
+final FacetsCollector facetsCollector;
+if (n == 0) {
+  TotalHitCountCollectorManager hitCountCollectorManager = new 
TotalHitCountCollectorManager();
+  MultiCollectorManager multiCollectorManager =
+  new MultiCollectorManager(hitCountCollectorManager, fcm);
+  Object[] result = searcher.search(q, multiCollectorManager);
+  topDocs =
+  new TopDocs(
+  new TotalHits((Integer) result[0], TotalHits.Relation.EQUAL_TO), 
new ScoreDoc[0]);
+  facetsCollector = (FacetsCollector) result[1];
+} else {
+  final MultiCollectorManager multiCollectorManager;
+  if (sort != null) {
+if (after != null && !(after instanceof FieldDoc)) {
+  // TODO: if we fix type safety of TopFieldDocs we can
+  // remove this
+  throw new IllegalArgumentException("after must be a FieldDoc; got " 
+ after);
+}
+TopFieldCollectorManager topFieldCollectorManager =
+new TopFieldCollectorManager(sort, n, (FieldDoc) after, 
Integer.MAX_VALUE, true);
+multiCollectorManager = new 
MultiCollectorManager(topFieldCollectorManager, fcm);
+  } else {
+TopScoreDocCollectorManager topScoreDocCollectorManager =
+new TopScoreDocCollectorManager(n, after, Integer.MAX_VALUE, true);
+multiCollectorManager = new 
MultiCollectorManager(topScoreDocCollectorManager, fcm);
+  }
+  Object[] result = s

Re: [PR] Add migration note about the removal of optional RegExp complement syntax [lucene]

2024-09-06 Thread via GitHub


mikemccand commented on PR #13732:
URL: https://github.com/apache/lucene/pull/13732#issuecomment-2333897038

   Thanks @ChrisHegarty for catching this!  It's important to document all 
breaks in `MIGRATE.md`.
   
   I'm trying to understand why/when we removed this operator.  Was it really 
during [adding NFA support](https://issues.apache.org/jira/browse/LUCENE-10010) 
(linked from your linked Elasticsearch issue)?  I don't think that was 
intentional, or (more likely) I at least don't remember it -- I thought that 
change was purely additive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add migration note about the removal of optional RegExp complement syntax [lucene]

2024-09-06 Thread via GitHub


rmuir commented on PR #13732:
URL: https://github.com/apache/lucene/pull/13732#issuecomment-2333960253

   maybe this small post with diagram helps explain the situation: I hope it is 
more clear that there's no way we could support "let caller decide NFA vs DFA" 
with this operator: https://eugene-eeo.github.io/blog/nfa-complement.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add migration note about the removal of optional RegExp complement syntax [lucene]

2024-09-06 Thread via GitHub


rmuir commented on PR #13732:
URL: https://github.com/apache/lucene/pull/13732#issuecomment-2333973731

   and the title of this video can't be beat: 
https://www.youtube.com/watch?v=bbMku8ZAoBk


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add migration note about the removal of optional RegExp complement syntax [lucene]

2024-09-06 Thread via GitHub


mikemccand commented on PR #13732:
URL: https://github.com/apache/lucene/pull/13732#issuecomment-2334106736

   Phew, thank you for the historical context @rmuir!!  I forgot about the 
horrors of complementing NFAs... I love that talk title.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add migration note about the removal of optional RegExp complement syntax [lucene]

2024-09-06 Thread via GitHub


rmuir commented on PR #13732:
URL: https://github.com/apache/lucene/pull/13732#issuecomment-2334113812

   Sorry, i was a bit offended that it might have been accidental :)
   
   I tried to make it possible for us to support NFA query in straightforward 
way, without bringing insanity to APIs, in a way we can maintain. 
   
   I don't wish to break any functionality for any user: but I draw the line at 
"mathematically impossible". In such a case, breaking change in a major release 
is a reasonable last-resort.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove leftover search(Query, Collector) usages in TestTaxonomyFacetAssociations [lucene]

2024-09-06 Thread via GitHub


javanna commented on PR #13726:
URL: https://github.com/apache/lucene/pull/13726#issuecomment-2334115120

   It seems like I missed something, maybe that's why these methods were not 
migrated yet :) I got a failure with seed `8DDC50A89D6B6EC9`. Digging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace static FacetsCollector#search methods [lucene]

2024-09-06 Thread via GitHub


gsmiller commented on code in PR #13733:
URL: https://github.com/apache/lucene/pull/13733#discussion_r1747170856


##
lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java:
##
@@ -54,4 +79,138 @@ public ReducedFacetsCollector(final 
Collection facetsCollectors
   facetsCollector -> 
matchingDocs.addAll(facetsCollector.getMatchingDocs()));
 }
   }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */
+  public static FacetsResult search(
+  IndexSearcher searcher, Query q, int n, FacetsCollectorManager fcm) 
throws IOException {

Review Comment:
   Mostly just thinking out loud here, but now that these methods are 
responsible for creating the `FacetsCollector` that gets returned in the 
wrapped `FacetsResult` (instead of previously where the user would provide it), 
it feels a little strange that the user would be responsible for passing in a 
`FacetsCollectorManager`. Another option would be to have these methods just 
setup the `FacetsCollectorManager` internally. Is there a need for users to 
provide their own? I suppose it's useful if users have some use-case where they 
want to extend FCM with their own customization? And it's also the only way to 
specify `keepScores`? So maybe we keep this. At the same time, these methods 
feel aimed at users that want out-of-the-box functionality, so maybe we 
simplify and take the parameter away? If we did that, I'm not sure how we'd 
deal with `keepScores` though unless we added yet another parameter to these 
static methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-06 Thread via GitHub


msokolov commented on PR #13542:
URL: https://github.com/apache/lucene/pull/13542#issuecomment-2334328695

   Adding a new optional constructor makes sense to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove usage of IndexSearcher#Search(Query, Collector) from monitor package [lucene]

2024-09-06 Thread via GitHub


gsmiller commented on PR #13735:
URL: https://github.com/apache/lucene/pull/13735#issuecomment-2334400226

   OK, I _think_ this is still a safe implementation but the change is that 
multiple collection threads will now be concurrently calling 
`CandidateMatcher#addMatch`. I believe this is safe but need another set of 
eyes on it. The reason I think this works is that we should never be 
interleaving calls with the same `doc` value since these are global docIDs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Remove all deprecated IndexSearcher#search(Query, Collector) usage / methods in the next major release [lucene]

2024-09-06 Thread via GitHub


gsmiller commented on issue #12892:
URL: https://github.com/apache/lucene/issues/12892#issuecomment-2334402205

   Opened #13735 for the `monitor` usage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] monitor: CollectingMatcher [lucene]

2024-09-06 Thread via GitHub


gsmiller closed issue #13736: monitor: CollectingMatcher
URL: https://github.com/apache/lucene/issues/13736


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add migration note about the removal of optional RegExp complement syntax [lucene]

2024-09-06 Thread via GitHub


ChrisHegarty commented on PR #13732:
URL: https://github.com/apache/lucene/pull/13732#issuecomment-2334426390

   Thank @rmuir. I included that advice in the migration note.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace static FacetsCollector#search methods [lucene]

2024-09-06 Thread via GitHub


javanna commented on code in PR #13733:
URL: https://github.com/apache/lucene/pull/13733#discussion_r1747548475


##
lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java:
##
@@ -54,4 +79,138 @@ public ReducedFacetsCollector(final 
Collection facetsCollectors
   facetsCollector -> 
matchingDocs.addAll(facetsCollector.getMatchingDocs()));
 }
   }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */
+  public static FacetsResult search(
+  IndexSearcher searcher, Query q, int n, FacetsCollectorManager fcm) 
throws IOException {

Review Comment:
   Yes! I was going to make the same comment. I was going to ask if we need to 
care for scenarios where the manager is a subclass of `FacetsCollectorManager`. 
But the `keepScores` flag made it kind of necessary to accept the collector 
manager, unless we want to accept the flag instead which is a little cryptic?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove usage of IndexSearcher#Search(Query, Collector) from monitor package [lucene]

2024-09-06 Thread via GitHub


javanna commented on PR #13735:
URL: https://github.com/apache/lucene/pull/13735#issuecomment-2334629622

   I am not familiar with this code but I believe you are correct. This should 
work despite we use a plain `ArrayList`, because each thread should only get 
and compute its own docs. Testing would verify that, but I believe we never 
provide an executor to the searcher we use, neither in test nor in the prod 
code. Should we try and add that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-06 Thread via GitHub


Mikep86 commented on code in PR #13697:
URL: https://github.com/apache/lucene/pull/13697#discussion_r1747612996


##
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java:
##
@@ -0,0 +1,450 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.join;
+
+import com.carrotsearch.randomizedtesting.generators.RandomPicks;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.BooleanClause;
+import org.apache.lucene.search.BooleanQuery;
+import org.apache.lucene.search.BoostQuery;
+import org.apache.lucene.search.BulkScorer;
+import org.apache.lucene.search.ConstantScoreQuery;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.LeafCollector;
+import org.apache.lucene.search.Scorable;
+import org.apache.lucene.search.ScorerSupplier;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.index.RandomIndexWriter;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.TestUtil;
+
+public class TestBlockJoinBulkScorer extends LuceneTestCase {
+  private static final String TYPE_FIELD_NAME = "type";
+  private static final String VALUE_FIELD_NAME = "value";
+  private static final String PARENT_FILTER_VALUE = "parent";
+  private static final String CHILD_FILTER_VALUE = "child";
+
+  private enum MatchValue {
+MATCH_A("A", 1),
+MATCH_B("B", 2),
+MATCH_C("C", 3),
+MATCH_D("D", 4);
+
+private static final List VALUES = List.of(values());
+
+private final String text;
+private final int score;
+
+MatchValue(String text, int score) {
+  this.text = text;
+  this.score = score;
+}
+
+public String getText() {
+  return text;
+}
+
+public int getScore() {
+  return score;
+}
+
+@Override
+public String toString() {
+  return text;
+}
+
+public static MatchValue random() {
+  return RandomPicks.randomFrom(LuceneTestCase.random(), VALUES);
+}
+  }
+
+  private record ChildDocMatch(int docId, List matches) {
+public ChildDocMatch(int docId, List matches) {
+  this.docId = docId;
+  this.matches = Collections.unmodifiableList(matches);
+}
+  }
+
+  private static Map> populateRandomIndex(
+  RandomIndexWriter writer, int maxParentDocCount, int maxChildDocCount, 
int maxChildDocMatches)
+  throws IOException {
+Map> expectedMatches = new HashMap<>();
+
+final int parentDocCount = random().nextInt(1, maxParentDocCount + 1);
+int currentDocId = 0;
+for (int i = 0; i < parentDocCount; i++) {
+  final int childDocCount = random().nextInt(maxChildDocCount + 1);
+  List docs = new ArrayList<>(childDocCount);
+  List childDocMatches = new ArrayList<>(childDocCount);
+
+  for (int j = 0; j < childDocCount; j++) {
+// Build a child doc
+Document childDoc = new Document();
+childDoc.add(newStringField(TYPE_FIELD_NAME, CHILD_FILTER_VALUE, 
Field.Store.NO));
+
+final int matchCount = random().nextInt(maxChildDocMatches + 1);
+List matchValues = new ArrayList<>(matchCount);
+for (int k = 0; k < matchCount; k++) {
+  // Add a match to the child doc
+  MatchValue matchValue = MatchValue.random();
+  matchValues.add(matchValue);
+  childDoc.add(newStringField(VALUE_FIELD_NAME, matchValue.getText(), 
Field.Store.NO));
+}
+
+docs.add(childDoc);
+childDocMatches.add(new ChildDocMatch(currentDocId++, matchValues));
+  }
+
+  // Build a parent doc
+  Document parentDoc = new Document();
+  parentDoc.add(newStringField(TYPE_FIELD_NAME, PARENT_FILTER_VALUE, 
Fi

Re: [PR] Replace static FacetsCollector#search methods [lucene]

2024-09-06 Thread via GitHub


javanna commented on PR #13733:
URL: https://github.com/apache/lucene/pull/13733#issuecomment-2334786138

   Thanks a lot for the speedy review @gsmiller !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace static FacetsCollector#search methods [lucene]

2024-09-06 Thread via GitHub


javanna merged PR #13733:
URL: https://github.com/apache/lucene/pull/13733


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should the static search methods in FacetsCollector take a FacetsCollector as last argument? [lucene]

2024-09-06 Thread via GitHub


javanna closed issue #13725: Should the static search methods in 
FacetsCollector take a FacetsCollector as last argument?
URL: https://github.com/apache/lucene/issues/13725


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]

2024-09-06 Thread via GitHub


dweiss commented on issue #13719:
URL: https://github.com/apache/lucene/issues/13719#issuecomment-2334831855

   About right. These are my top three:
   ```
   git status
   git add -A .
   git commit -m "foo"
   ```
   and these are used much, much more rarely:
   ```
   git cherry-pick
   git branch
   git log
   git reset (--hard)
   git merge --no-ff ...
   git reflog
   ```
   
   This about covers my git knowledge. Anything else I have to go and RTFM. 
   
   > latest version has 161 subcommands
   
   Isn't it like TeX and LaTeX - that there is a very fundamental core and lots 
of other commands built around it? I remember this was amusing when I came 
across it a while ago: https://github.com/chrisdickinson/git-rs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Deprecate FacetsCollector#search utility methods [lucene]

2024-09-06 Thread via GitHub


gsmiller commented on code in PR #13737:
URL: https://github.com/apache/lucene/pull/13737#discussion_r1747783583


##
lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java:
##
@@ -54,4 +79,167 @@ public ReducedFacetsCollector(final 
Collection facetsCollectors
   facetsCollector -> 
matchingDocs.addAll(facetsCollector.getMatchingDocs()));
 }
   }
+
+  /** Utility method, to search and also collect all hits into the provided 
{@link Collector}. */

Review Comment:
   Shoot, I think I missed this in the earlier review but looks like we need a 
small javadoc cleanup here (need to mention `FacetsCollectorManager` instead of 
`Collector` right?). Same comment applies to all the new methods. Sorry for 
overlooking that.
   
   I just updated these on `main` with a quick commit 
(dc47adbbe736ef55f56feb8d66dc768eaba05fff). Could you cherry-pick this down 
into the back-port as well (and feel free to tweak the language if you think of 
ways to improve on it). Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org