[GitHub] [lucene] dweiss commented on pull request #11872: Update java version to 17 for Lucene 10 in the release wizard.
dweiss commented on PR #11872: URL: https://github.com/apache/lucene/pull/11872#issuecomment-1290386554 Updated Lucene 10's minimum JDK requirement in the release wizard, as per Jan's suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #11872: Update java version to 17 for Lucene 10 in the release wizard.
dweiss merged PR #11872: URL: https://github.com/apache/lucene/pull/11872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] harishankar-gopalan commented on issue #11354: Reuse HNSW graphs when merging segments? [LUCENE-10318]
harishankar-gopalan commented on issue #11354: URL: https://github.com/apache/lucene/issues/11354#issuecomment-1290668547 > Update: Sorry for delay, I am still working on this but got a little side tracked with other work. > > > > Hi @harishankar-gopalan, yes what currently happens is the graph gets reconstructed from scratch. In https://github.com/apache/lucene/pull/11719, I am working on selecting the largest graph from a segment and using that to initialize the newly created segment's graph. Posted above are my initial benchmark results. However, I am running into some issues where the recall is slightly lower with the test setup and the merge time is higher. I have been debugging a little bit why this is happening, but have not yet make progress. I am going to take another try at it this week or next week. Hi @jmazanec15 thanks for the update. Are there any public stats available for the current segment merges for HNSW based graph indexes in Lucene ? To be more clear any performance benchmarks to compare the Lucene segment merges for Documents with and without KnnVectorFields indexed as a HNSW Graph. If you are aware of any initial benchmarks that you are using as reference, would be great full if you could share links to those if possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module
uschindler commented on PR #11873: URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290731341 > This looks fine to me. Migration entry would be probably good here since some folks may be collection JUL logs and parsing messages from there (although for what reason - I've no idea). Actually by default there would be no change to end user because Java Platform Logging will feed messages to JUL anyways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module
rmuir commented on PR #11873: URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290734616 Technically it makes sense. The only confusion i have is from the JEP: > Non-Goals It is not a goal to define a general-purpose interface for logging. The service interface contains only the minimal set of methods that the JDK needs for its own usage The way I read it, is that this is almost an internal hack to workaround module issues. Ideally we'd avoid any "internal API". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module
uschindler commented on PR #11873: URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290737592 > Technically it makes sense. > > The only confusion i have is from the JEP: > > > Non-Goals > > It is not a goal to define a general-purpose interface for logging. The service interface contains only the minimal set of methods that the JDK needs for its own usage > > The way I read it, is that this is almost an internal hack to workaround module issues. Ideally we'd avoid any "internal API". Actually it is not an internal hack. It is part of java.lang.System! Yes it is to work around module issues, but it also splits logging facade from implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module
rmuir commented on PR #11873: URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290739788 Yeah, I just mean some of their wording in the JEP hints strongly that this is "internal for our use only". Even the first line of the summary: "Define a minimal logging API which *platform* classes can use to log messages". I'm not trying to block the change, just mentioning my confusion. The way i read this, lucene shouldn't be pushing messages to it :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module
uschindler commented on PR #11873: URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290740794 I will keep this open a while. I have also mixed feelings. Logging is a desaster and Java has just thrown another part into the game! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta opened a new issue, #11874: Usability improvements for timeout support in IndexSearcher
reta opened a new issue, #11874: URL: https://github.com/apache/lucene/issues/11874 ### Description In OpenSearch, we used to rely on custom implementation for query timeouts support. Since `9.3`, Apache Lucene offers the timeout support in the `IndexSearcher` [1] but the implementation is quite restrictive. In the `OpenSearch` we would like to benefit from this new feature preferably without duplicating the code. The suggested usability imporvements are low risk and do not open up any internals: - add getter for timeout value to `IndexSearcher` (only setter exists) - open up `TimeLimitingBulkScorer` and `TimeLimitingBulkScorer.TimeExceededException` @msokolov would apprecite your opinion, thank you. [1] https://issues.apache.org/jira/browse/LUCENE-10151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #11873: DISCUSSION: Move Lucene Core's log support to Java Platform Logging (JEP 264) facade instead of java.util.logging implementation module
dweiss commented on PR #11873: URL: https://github.com/apache/lucene/pull/11873#issuecomment-1290776703 > Define a minimal logging API which platform classes can use to log messages This is confusing like hell, isn't it. I admit I didn't even know about this JEP... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta opened a new pull request, #11875: Usability improvements for timeout support in IndexSearcher
reta opened a new pull request, #11875: URL: https://github.com/apache/lucene/pull/11875 Signed-off-by: Andriy Redko ### Description Closes https://github.com/apache/lucene/issues/11874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on a diff in pull request #11875: Usability improvements for timeout support in IndexSearcher
reta commented on code in PR #11875: URL: https://github.com/apache/lucene/pull/11875#discussion_r1004673321 ## lucene/core/src/java/org/apache/lucene/search/TimeLimitingBulkScorer.java: ## @@ -28,14 +28,14 @@ * * @see org.apache.lucene.index.ExitableDirectoryReader */ -final class TimeLimitingBulkScorer extends BulkScorer { +public final class TimeLimitingBulkScorer extends BulkScorer { Review Comment: It used to be `public` but the visibility was reduced later on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on a diff in pull request #11875: Usability improvements for timeout support in IndexSearcher
reta commented on code in PR #11875: URL: https://github.com/apache/lucene/pull/11875#discussion_r1004673850 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -519,10 +524,15 @@ public void search(Query query, Collector results) throws IOException { search(leafContexts, createWeight(query, results.scoreMode(), 1), results); } - /** Returns true if any search hit the {@link #setTimeout(QueryTimeout) timeout}. */ + /** Return true if any search hit the {@link #setTimeout(QueryTimeout) timeout}. */ Review Comment: Not related, fixing comment to keep it concise -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 opened a new pull request, #11876: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor
gf2121 opened a new pull request, #11876: URL: https://github.com/apache/lucene/pull/11876 This PR proposes to use `ByteArrayComparator` to speed up `PointInSetQuery#MergePointVisitor` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11876: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor
rmuir commented on PR #11876: URL: https://github.com/apache/lucene/pull/11876#issuecomment-1290840923 Looks like it the build is angry about spotless formatting. High level, this makes sense to me. We're just comparing fixed length arrays and we know this length up front (bytesPerDim) so we can use the comparator already optimized for that. cc @jpountz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Trey314159 commented on a diff in pull request #187: LUCENE-7785: Move dictionary for Ukrainan analyzer to external dependency
Trey314159 commented on code in PR #187: URL: https://github.com/apache/lucene-solr/pull/187#discussion_r1004863464 ## lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java: ## @@ -107,11 +107,18 @@ public UkrainianMorfologikAnalyzer(CharArraySet stopwords, CharArraySet stemExcl @Override protected Reader initReader(String fieldName, Reader reader) { NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); +// different apostrophes builder.add("\u2019", "'"); +builder.add("\u0218", "'"); builder.add("\u02BC", "'"); +builder.add("`", "'"); +builder.add("ยด", "'"); +// ignored characters builder.add("\u0301", ""); -NormalizeCharMap normMap = builder.build(); +builder.add("\u00AD", ""); +builder.add("\uFEFF", ""); Review Comment: Just stumbled across this dicussion today. I'm the Wikimedia guy that arysin linked to above. And while it is _way_ to late, I just wanted to point out that while U+FEFF is used as the byte order mark at the beginning of a text stream, it is [also used](https://en.wikipedia.org/wiki/Byte_order_mark#Usage) as a "zero-width non-breaking space" within a text. That use was deprecated as of Unicode 3.2, but it is still quite common in practice on lots of Wikipedias. ICU normalization converts it to the empty string, which is how it usually gets handled on the wikis where we have had the chance to customize the analyzers, though that requires being able to open them up, so to speak. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 commented on pull request #11876: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor
gf2121 commented on PR #11876: URL: https://github.com/apache/lucene/pull/11876#issuecomment-1291530725 Thanks @rmuir ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 merged pull request #11876: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor
gf2121 merged PR #11876: URL: https://github.com/apache/lucene/pull/11876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 merged pull request #11877: Use ByteArrayComparator for PointInSetQuery#MergePointVisitor (Backport 9x)
gf2121 merged PR #11877: URL: https://github.com/apache/lucene/pull/11877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11875: Usability improvements for timeout support in IndexSearcher
jpountz commented on PR #11875: URL: https://github.com/apache/lucene/pull/11875#issuecomment-1291573525 Adding a getter works for me, but I'd prefer not to make other implementation details like the custom bulk scorer public, why do you need this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org