Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

2023-12-06 Thread via GitHub
vsop-479 commented on PR #12846: URL: https://github.com/apache/lucene/pull/12846#issuecomment-1842384044 > is this new approach helping cover more cases maybe? Yes, Previous patch only apply copy for level 0 's acc in ``Lucene99SkipWriter.bufferSkip``, Current patch apply copy for

Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-06 Thread via GitHub
uschindler commented on issue #12863: URL: https://github.com/apache/lucene/issues/12863#issuecomment-1842448366 > Thank you very much @rmuir, @uschindler. Yes, there are many threads running at the same time and we have a managed reference count to close the reader, maybe there are some is

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-06 Thread via GitHub
dweiss commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1416950035 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-06 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1416980241 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[PR] Remove some redundant modifiers from code [lucene]

2023-12-06 Thread via GitHub
shubhamvishu opened a new pull request, #12880: URL: https://github.com/apache/lucene/pull/12880 ### Description This PR removes(or cleans up) some of the redundant modifiers from the code. Addresses the below warning in my IDE : - Modifier ‘static’ is redundant for inner inter

Re: [PR] CheckIndex - Removal of some dead code [lucene]

2023-12-06 Thread via GitHub
slow-J commented on PR #12876: URL: https://github.com/apache/lucene/pull/12876#issuecomment-1842557831 > I'm not sure about the removal of these two functions, they're quite nice, only taking a CodecReader and an InfoStream. Removing them is forcing users to call the more complex functions

Re: [PR] Fix the declared Exceptions of Expression#evaluate to match those of DoubleValues#doubleValue [lucene]

2023-12-06 Thread via GitHub
uschindler merged PR #12878: URL: https://github.com/apache/lucene/pull/12878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-06 Thread via GitHub
dweiss opened a new pull request, #12881: URL: https://github.com/apache/lucene/pull/12881 This patch provides a number of small improvements aimed at improving performance of MatchHighlighter (and MatchRegionRetriever), especially in corner cases like: * queries that result in a lar

Re: [I] Speeding up Lucene Vector Similarity through the Java Vector API [lucene]

2023-12-06 Thread via GitHub
benwtrent commented on issue #12091: URL: https://github.com/apache/lucene/issues/12091#issuecomment-1843036132 This has been added. A MR-JAR for vector operations was added and Lucene can now take advantage of the Panama Vector API for vector similarity functions. -- This is an automated

Re: [I] Speeding up Lucene Vector Similarity through the Java Vector API [lucene]

2023-12-06 Thread via GitHub
benwtrent closed issue #12091: Speeding up Lucene Vector Similarity through the Java Vector API URL: https://github.com/apache/lucene/issues/12091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Follow-up refactors to 8-bit quantization change [lucene]

2023-12-06 Thread via GitHub
benwtrent commented on issue #11758: URL: https://github.com/apache/lucene/issues/11758#issuecomment-1843041904 There are multiple simplifications now in the quantized byte operations. One key thing is that the graph searcher now takes a "vector scorer" object and doesn't care about

Re: [I] Follow-up refactors to 8-bit quantization change [lucene]

2023-12-06 Thread via GitHub
benwtrent closed issue #11758: Follow-up refactors to 8-bit quantization change URL: https://github.com/apache/lucene/issues/11758 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Use hash set for visited nodes in HNSW search? [LUCENE-10404] [lucene]

2023-12-06 Thread via GitHub
benwtrent commented on issue #11440: URL: https://github.com/apache/lucene/issues/11440#issuecomment-1843050638 While exploring other things, I noticed this: https://github.com/apache/lucene/pull/12789 There is definitely room for improvement here. We should test: - when to use Sp

Re: [I] Making vector similarity functions pluggable [lucene]

2023-12-06 Thread via GitHub
benwtrent commented on issue #12219: URL: https://github.com/apache/lucene/issues/12219#issuecomment-1843052002 We can close this, we added panama vector API to Lucene directly, that was my main concern with this issue. -- This is an automated message from the Apache Git Service. To respo

Re: [I] Making vector similarity functions pluggable [lucene]

2023-12-06 Thread via GitHub
benwtrent closed issue #12219: Making vector similarity functions pluggable URL: https://github.com/apache/lucene/issues/12219 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Add Scalar Quantization codec for Vectors [lucene]

2023-12-06 Thread via GitHub
benwtrent closed issue #12497: Add Scalar Quantization codec for Vectors URL: https://github.com/apache/lucene/issues/12497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Add Scalar Quantization codec for Vectors [lucene]

2023-12-06 Thread via GitHub
benwtrent commented on issue #12497: URL: https://github.com/apache/lucene/issues/12497#issuecomment-1843057160 This was shipped in Lucene 9.9 as the `Lucene99HnswScalarQuantizedVectorsFormat` codec format! -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-06 Thread via GitHub
romseygeek commented on code in PR #12881: URL: https://github.com/apache/lucene/pull/12881#discussion_r1417479284 ## lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/MatchRegionRetriever.java: ## @@ -199,34 +343,95 @@ public void highlightDocument( Lea

Re: [PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-06 Thread via GitHub
dweiss commented on code in PR #12881: URL: https://github.com/apache/lucene/pull/12881#discussion_r1417504333 ## lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/MatchRegionRetriever.java: ## @@ -53,99 +58,173 @@ public class MatchRegionRetriever { privat

Re: [PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-06 Thread via GitHub
dweiss commented on code in PR #12881: URL: https://github.com/apache/lucene/pull/12881#discussion_r1417504745 ## lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/MatchRegionRetriever.java: ## @@ -199,34 +343,95 @@ public void highlightDocument( LeafRea

[PR] clean up smoketester GPG leaks [lucene]

2023-12-06 Thread via GitHub
hurutoriya opened a new pull request, #12882: URL: https://github.com/apache/lucene/pull/12882 ### Description https://github.com/apache/lucene/issues/11948 >>>smoketester leaks a GPG agent on my computer everytime it runs. @risdenk pointed out this fix from solr: https://

Re: [I] Upgrade OpenNLP to 1.9.1 [LUCENE-8659] [lucene]

2023-12-06 Thread via GitHub
hurutoriya commented on issue #9705: URL: https://github.com/apache/lucene/issues/9705#issuecomment-1843259962 Hello @tteofili . It seems OpenNLP is upgraded to 1.9.1+. https://github.com/apache/lucene/commit/c228e4bb66ca73c8150d8eaebe2bb999bcc6c9b1 Can we close this issue to clean up

Re: [I] Upgrade OpenNLP to 1.9.1 [LUCENE-8659] [lucene]

2023-12-06 Thread via GitHub
cpoerschke closed issue #9705: Upgrade OpenNLP to 1.9.1 [LUCENE-8659] URL: https://github.com/apache/lucene/issues/9705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Upgrade OpenNLP to 1.9.1 [LUCENE-8659] [lucene]

2023-12-06 Thread via GitHub
cpoerschke commented on issue #9705: URL: https://github.com/apache/lucene/issues/9705#issuecomment-1843296060 Thanks @hurutoriya for re-surfacing this issue! Closing as superseded by #11277 upgrading to 1.9.4 version. -- This is an automated message from the Apache Git Service. To

Re: [PR] Remove some redundant modifiers from code [lucene]

2023-12-06 Thread via GitHub
gsmiller commented on PR #12880: URL: https://github.com/apache/lucene/pull/12880#issuecomment-1843430023 Thanks @shubhamvishu! I personally like this cleanup, but I'd be curious if others have some reasons I'm not aware of why they may prefer differently? Maybe more importantly though, I w

[I] Reproducible failure in TestUnifiedHighlighter.testOneSentence (and others) - index order [lucene]

2023-12-06 Thread via GitHub
dweiss opened a new issue, #12883: URL: https://github.com/apache/lucene/issues/12883 ### Description I came across this one by stress-testing the package with -Dtests.iters=200. Occasional hiccups in unified highlighter tests, for example: ``` gradlew :lucene:highlighter:test -

Re: [PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-06 Thread via GitHub
dweiss commented on code in PR #12881: URL: https://github.com/apache/lucene/pull/12881#discussion_r1417928882 ## lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/MatchRegionRetriever.java: ## @@ -199,34 +343,95 @@ public void highlightDocument( LeafRea

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-12-06 Thread via GitHub
javanna commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1843624155 I think that it would be even better to get the deprecation out with 9x, regardless of whether we will effectively remove with 10. At least we let users know that they should use a different

Re: [PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-06 Thread via GitHub
dweiss commented on code in PR #12881: URL: https://github.com/apache/lucene/pull/12881#discussion_r1417931314 ## lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/MatchRegionRetriever.java: ## @@ -53,99 +58,173 @@ public class MatchRegionRetriever { privat

Re: [PR] Remove some redundant modifiers from code [lucene]

2023-12-06 Thread via GitHub
dweiss commented on PR #12880: URL: https://github.com/apache/lucene/pull/12880#issuecomment-1843774013 I don't think we have any conventions for this. And I don't know if tools can automate this. I remember reading a thread somewhere on the openjdk mailing list that touched on a si

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-06 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1843848755 Hi Adrien, Thanks for the more insight! > @uschindler FYI this is what I'm getting: https://gist.github.com/jpountz/be81b1eb93c6118aac65c3679911f1d8. There are two files

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-06 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1843857763 The best idea that I have instead of VarHandles: Create an implementation for ByteBuffer (using the methods available there). This implementation would work with: - byte arrays: Us

Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-06 Thread via GitHub
sosohu commented on issue #12863: URL: https://github.com/apache/lucene/issues/12863#issuecomment-1844190152 Thanks @uschindler for your reply, really apprecicate your help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-06 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1844310878 @uschindler Thanks for review! > With unaligned random reads you mean that you read with positional reads from the area where the buffer is saved? With default DataInput you can on

Re: [PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-06 Thread via GitHub
dweiss commented on PR #12881: URL: https://github.com/apache/lucene/pull/12881#issuecomment-1844803975 I'll commit this later today, if there are no objections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Remove some redundant modifiers from code [lucene]

2023-12-06 Thread via GitHub
shubhamvishu commented on PR #12880: URL: https://github.com/apache/lucene/pull/12880#issuecomment-1844826409 Thanks @gsmiller @dweiss for taking a look. I see some great points raised here by both of you(and I agree to all). A couple of points why I think we should go with this change :