[GitHub] [lucene] dweiss commented on pull request #11872: Update java version to 17 for Lucene 10 in the release wizard.

2022-10-25 Thread GitBox


dweiss commented on PR #11872:
URL: https://github.com/apache/lucene/pull/11872#issuecomment-1290386554

   Updated Lucene 10's minimum JDK requirement in the release wizard, as per 
Jan's suggestion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss merged pull request #11872: Update java version to 17 for Lucene 10 in the release wizard.

2022-10-25 Thread GitBox


dweiss merged PR #11872:
URL: https://github.com/apache/lucene/pull/11872


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] harishankar-gopalan commented on issue #11354: Reuse HNSW graphs when merging segments? [LUCENE-10318]

2022-10-25 Thread GitBox


harishankar-gopalan commented on issue #11354:
URL: https://github.com/apache/lucene/issues/11354#issuecomment-1290668547

   > Update: Sorry for delay, I am still working on this but got a little side 
tracked with other work.
   > 
   > 
   > 
   > Hi @harishankar-gopalan, yes what currently happens is the graph gets 
reconstructed from scratch. In https://github.com/apache/lucene/pull/11719, I 
am working on selecting the largest graph from a segment and using that to 
initialize the newly created segment's graph. Posted above are my initial 
benchmark results. However, I am running into some issues where the recall is 
slightly lower with the test setup and the merge time is higher. I have been 
debugging a little bit why this is happening, but have not yet make progress. I 
am going to take another try at it this week or next week.
   
   Hi @jmazanec15 thanks for the update. Are there any public stats available 
for the current segment merges for HNSW based graph indexes in Lucene ? To be 
more clear any performance benchmarks to compare the Lucene segment merges for 
Documents with and without KnnVectorFields indexed as a HNSW Graph. If you are 
aware of any initial benchmarks that you are using as reference, would be great 
full if you could share links to those if possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module

2022-10-25 Thread GitBox


uschindler commented on PR #11873:
URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290731341

   > This looks fine to me. Migration entry would be probably good here since 
some folks may be collection JUL logs and parsing messages from there (although 
for what reason - I've no idea).
   
   Actually by default there would be no change to end user because Java 
Platform Logging will feed messages to JUL anyways.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module

2022-10-25 Thread GitBox


rmuir commented on PR #11873:
URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290734616

   Technically it makes sense.
   
   The only confusion i have is from the JEP:
   
   > Non-Goals
   It is not a goal to define a general-purpose interface for logging. The 
service interface contains only the minimal set of methods that the JDK needs 
for its own usage
   
   The way I read it, is that this is almost an internal hack to workaround 
module issues. Ideally we'd avoid any "internal API".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module

2022-10-25 Thread GitBox


uschindler commented on PR #11873:
URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290737592

   > Technically it makes sense.
   > 
   > The only confusion i have is from the JEP:
   > 
   > > Non-Goals
   > > It is not a goal to define a general-purpose interface for logging. The 
service interface contains only the minimal set of methods that the JDK needs 
for its own usage
   > 
   > The way I read it, is that this is almost an internal hack to workaround 
module issues. Ideally we'd avoid any "internal API".
   
   Actually it is not an internal hack. It is part of java.lang.System! Yes it 
is to work around module issues, but it also splits logging facade from 
implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module

2022-10-25 Thread GitBox


rmuir commented on PR #11873:
URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290739788

   Yeah, I just mean some of their wording in the JEP hints strongly that this 
is "internal for our use only". Even the first line of the summary: "Define a 
minimal logging API which *platform* classes can use to log messages".
   
   I'm not trying to block the change, just mentioning my confusion. The way i 
read this, lucene shouldn't be pushing messages to it :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module

2022-10-25 Thread GitBox


uschindler commented on PR #11873:
URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290740794

   I will keep this open a while. I have also mixed feelings. Logging is a 
desaster and Java has just thrown another part into the game!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta opened a new issue, #11874: Usability improvements for timeout support in IndexSearcher

2022-10-25 Thread GitBox


reta opened a new issue, #11874:
URL: https://github.com/apache/lucene/issues/11874

   ### Description
   
   In OpenSearch, we used to rely on custom implementation for query timeouts 
support. Since `9.3`, Apache Lucene offers the timeout support in the 
`IndexSearcher` [1] but the implementation is quite restrictive. In the 
`OpenSearch` we would like to benefit from this new feature preferably without 
duplicating the code. 
   
   The suggested usability imporvements are low risk and do not open up any 
internals:
- add getter for timeout value to `IndexSearcher` (only setter exists)
- open up `TimeLimitingBulkScorer` and 
`TimeLimitingBulkScorer.TimeExceededException`
   
   @msokolov would apprecite your opinion, thank you.
   
   [1] https://issues.apache.org/jira/browse/LUCENE-10151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module

2022-10-25 Thread GitBox


dweiss commented on PR #11873:
URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290776703

   > Define a minimal logging API which platform classes can use to log messages
   
   This is confusing like hell, isn't it. I admit I didn't even know about this 
JEP...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta opened a new pull request, #11875: Usability improvements for timeout support in IndexSearcher

2022-10-25 Thread GitBox


reta opened a new pull request, #11875:
URL: https://github.com/apache/lucene/pull/11875

   Signed-off-by: Andriy Redko 
   
   ### Description
   Closes https://github.com/apache/lucene/issues/11874
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on a diff in pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-10-25 Thread GitBox


reta commented on code in PR #11875:
URL: https://github.com/apache/lucene/pull/11875#discussion_r1004673321


##
lucene/core/src/java/org/apache/lucene/search/TimeLimitingBulkScorer.java:
##
@@ -28,14 +28,14 @@
  *
  * @see org.apache.lucene.index.ExitableDirectoryReader
  */
-final class TimeLimitingBulkScorer extends BulkScorer {
+public final class TimeLimitingBulkScorer extends BulkScorer {

Review Comment:
   It used to be `public` but the visibility was reduced later on



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on a diff in pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-10-25 Thread GitBox


reta commented on code in PR #11875:
URL: https://github.com/apache/lucene/pull/11875#discussion_r1004673850


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -519,10 +524,15 @@ public void search(Query query, Collector results) throws 
IOException {
 search(leafContexts, createWeight(query, results.scoreMode(), 1), results);
   }
 
-  /** Returns true if any search hit the {@link #setTimeout(QueryTimeout) 
timeout}. */
+  /** Return true if any search hit the {@link #setTimeout(QueryTimeout) 
timeout}. */

Review Comment:
   Not related, fixing comment to keep it concise 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 opened a new pull request, #11876: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor

2022-10-25 Thread GitBox


gf2121 opened a new pull request, #11876:
URL: https://github.com/apache/lucene/pull/11876

   This PR proposes to use `ByteArrayComparator` to speed up 
`PointInSetQuery#MergePointVisitor`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11876: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor

2022-10-25 Thread GitBox


rmuir commented on PR #11876:
URL: https://github.com/apache/lucene/pull/11876#issuecomment-1290840923

   Looks like it the build is angry about spotless formatting.
   
   High level, this makes sense to me. We're just comparing fixed length arrays 
and we know this length up front (bytesPerDim) so we can use the comparator 
already optimized for that. cc @jpountz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] Trey314159 commented on a diff in pull request #187: LUCENE-7785: Move dictionary for Ukrainan analyzer to external dependency

2022-10-25 Thread GitBox


Trey314159 commented on code in PR #187:
URL: https://github.com/apache/lucene-solr/pull/187#discussion_r1004863464


##
lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java:
##
@@ -107,11 +107,18 @@ public UkrainianMorfologikAnalyzer(CharArraySet 
stopwords, CharArraySet stemExcl
   @Override
   protected Reader initReader(String fieldName, Reader reader) {
 NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
+// different apostrophes
 builder.add("\u2019", "'");
+builder.add("\u0218", "'");
 builder.add("\u02BC", "'");
+builder.add("`", "'");
+builder.add("ยด", "'");
+// ignored characters
 builder.add("\u0301", "");
-NormalizeCharMap normMap = builder.build();
+builder.add("\u00AD", "");
+builder.add("\uFEFF", "");

Review Comment:
   Just stumbled across this dicussion today. I'm the Wikimedia guy that arysin 
linked to above. And while it is _way_ to late, I just wanted to point out 
that while U+FEFF is used as the byte order mark at the beginning of a text 
stream, it is [also used](https://en.wikipedia.org/wiki/Byte_order_mark#Usage) 
as a "zero-width non-breaking space" within a text. That use was deprecated as 
of Unicode 3.2, but it is still quite common in practice on lots of Wikipedias. 
ICU normalization converts it to the empty string, which is how it usually gets 
handled on the wikis where we have had the chance to customize the analyzers, 
though that requires being able to open them up, so to speak.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 commented on pull request #11876: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor

2022-10-25 Thread GitBox


gf2121 commented on PR #11876:
URL: https://github.com/apache/lucene/pull/11876#issuecomment-1291530725

   Thanks @rmuir !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 merged pull request #11876: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor

2022-10-25 Thread GitBox


gf2121 merged PR #11876:
URL: https://github.com/apache/lucene/pull/11876


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 merged pull request #11877: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor (Backport 9x)

2022-10-25 Thread GitBox


gf2121 merged PR #11877:
URL: https://github.com/apache/lucene/pull/11877


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-10-25 Thread GitBox


jpountz commented on PR #11875:
URL: https://github.com/apache/lucene/pull/11875#issuecomment-1291573525

   Adding a getter works for me, but I'd prefer not to make other 
implementation details like the custom bulk scorer public, why do you need this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org