Re: [PR] Backport SOLR-13749 Cross-collection join filter to 8.x [lucene-solr]

2024-09-18 Thread via GitHub
itygh commented on PR #1175: URL: https://github.com/apache/lucene-solr/pull/1175#issuecomment-2359982307 这是来自QQ邮箱的假期自动回复邮件。您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Backport SOLR-13749 Cross-collection join filter to 8.x [lucene-solr]

2024-09-18 Thread via GitHub
danmfox closed pull request #1175: Backport SOLR-13749 Cross-collection join filter to 8.x URL: https://github.com/apache/lucene-solr/pull/1175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Add reopen method in PerThreadPKLookup [lucene]

2024-09-18 Thread via GitHub
vsop-479 commented on PR #13596: URL: https://github.com/apache/lucene/pull/13596#issuecomment-2359833789 This change looks clearer, Thanks @jpountz . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2359667256 I'll post one more iteration here addressing the concerns about dangerous default impls that adds back impls of copy() and cost(). I also added a test-and-throw ensuring that the vector

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-18 Thread via GitHub
rmuir commented on PR #13806: URL: https://github.com/apache/lucene/pull/13806#issuecomment-2359564950 also, it would be good to get an idea of the use-case. The problem is, this query can hold many terms: * getting an automaton over them isn't really a use-case, highlighting docs is. We

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
benwtrent commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2359448022 > One Q I have is can we remove copy()? Do we need to deprecate it first -- and if so, could we deprecate in 9x branch? You are already removing `RandomAccessVectorValues`, which

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-18 Thread via GitHub
rmuir commented on code in PR #13806: URL: https://github.com/apache/lucene/pull/13806#discussion_r1765753269 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -141,6 +135,11 @@ public long getTermsCount() { return termData.size(); } + public

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-18 Thread via GitHub
rmuir commented on PR #13806: URL: https://github.com/apache/lucene/pull/13806#issuecomment-2359446319 and the question is not for this PR, just a general one. It seems the only "real user" of `consumeTermsMatching` is highlighter, and building an automaton for this thing seems to be... bot

Re: [PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-18 Thread via GitHub
rmuir commented on PR #13806: URL: https://github.com/apache/lucene/pull/13806#issuecomment-2359438720 good solution. could we consider also fixing the visitor to use this approach (vs passing a RunAutomaton or something awful?) -- This is an automated message from the Apache Git Service.

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2359436688 ooh I just saw the Dictionary branch - that looks like a nice approach, I don't think I really understood what you were proposing before. One Q I have is can we remove copy()? Do we nee

[PR] Add BytesRefIterator to TermInSetQuery [lucene]

2024-09-18 Thread via GitHub
cbuescher opened a new pull request, #13806: URL: https://github.com/apache/lucene/pull/13806 Addresses #13778 TermInSetQuery used to have an accessor to its terms that was removed in #12173 to protect leaking internal encoding details. This introduces an accessor to the term data in

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-18 Thread via GitHub
iverase commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2359389642 See here: https://github.com/apache/lucene/blob/6d987e1ce1c3f3215633a979ce048829fe1bb6ed/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PointsWriter.java#L248 -- T

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-18 Thread via GitHub
iverase commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2359380429 Probably we are not remapping the field ordinal properly when merging segments. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Sometimes intersect the essential clause and the best non-essential clause. [lucene]

2024-09-18 Thread via GitHub
wurui90 commented on PR #12589: URL: https://github.com/apache/lucene/pull/12589#issuecomment-2359243271 Hi Adrien, for this sentence: "moving more and more clauses from the essential list to the non-essential list as the minimum competitive score increases. " Is it actually moving more cla

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-18 Thread via GitHub
benwtrent commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2359368164 git bisect might be lying, I don't see how that PR could cause this failure :( -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-18 Thread via GitHub
benwtrent commented on issue #13805: URL: https://github.com/apache/lucene/issues/13805#issuecomment-2359326047 Git bisect puts the blame at: 6634b41f42f4e2802048d1e4750e1ce1202652c5 https://github.com/apache/lucene/pull/13686 -- This is an automated message from the Apache Git Serv

[I] TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException [lucene]

2024-09-18 Thread via GitHub
ChrisHegarty opened a new issue, #13805: URL: https://github.com/apache/lucene/issues/13805 ``` RROR: The following test(s) have failed: - org.apache.lucene.codecs.lucene90.TestLucene90DocValuesFormat.testSparseDocValuesVsStoredFields (:lucene:core) Test output: /opt/bui

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-09-18 Thread via GitHub
benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2359272883 Here is some more flat index test results. This was to exercise and see how the number of coarse grained centroids changes recall & speed. | Lucene912BinaryQuantizedVectorsForma

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
msokolov commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1765337333 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -361,33 +385,46 @@ private MergedByteVectorValues(List subs, MergeState mergeS

Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-09-18 Thread via GitHub
jpountz commented on code in PR #13763: URL: https://github.com/apache/lucene/pull/13763#discussion_r1765286133 ## lucene/core/src/java/org/apache/lucene/codecs/perfield/PerFieldKnnVectorsFormat.java: ## @@ -239,51 +245,69 @@ public FieldsReader(final SegmentReadState readState)

Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-09-18 Thread via GitHub
jpountz commented on code in PR #13763: URL: https://github.com/apache/lucene/pull/13763#discussion_r1765280863 ## lucene/core/src/java/org/apache/lucene/codecs/perfield/PerFieldKnnVectorsFormat.java: ## @@ -239,51 +245,69 @@ public FieldsReader(final SegmentReadState readState)

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2358728676 FWIW I tried removing `copy()` and using caller-supplied storage in `vectorValue`. In many ways this looks nicer, but it leads to substantial slowdown in indexing/merging because of the

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
msokolov commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1765233960 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -303,29 +314,45 @@ private MergedFloat32VectorValues(List subs, MergeState me }

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
jpountz commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2358719345 I iterated a bit on my branch, so that there is no more call site for `FloatVectorValues#copy`: https://github.com/msokolov/lucene/compare/knn-vector-random...jpountz:lucene:knn-vector-r

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
jpountz commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1765122721 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -303,29 +314,45 @@ private MergedFloat32VectorValues(List subs, MergeState me }

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-18 Thread via GitHub
benwtrent commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1765048870 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -303,29 +314,45 @@ private MergedFloat32VectorValues(List subs, MergeState me }

[I] Should EdgeNGramTokenizer's DEFAULT_MAX_GRAM_SIZE be ONE? [lucene]

2024-09-18 Thread via GitHub
YeonghyeonKO opened a new issue, #13802: URL: https://github.com/apache/lucene/issues/13802 ### Description From org.apache.lucene:lucene-analysis-common:9.11.1, the static variable `DEFAULT_MAX_GRAM_SIZE` of EdgeNGramTokenizer is ONE not TWO. Logically, the maximum n-gram siz

Re: [PR] Disable intra-merge parallelism for all structures but kNN vectors [lucene]

2024-09-18 Thread via GitHub
benwtrent merged PR #13799: URL: https://github.com/apache/lucene/pull/13799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-18 Thread via GitHub
benwtrent closed issue #13798: TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException URL: https://github.com/apache/lucene/issues/13798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-18 Thread via GitHub
benwtrent closed issue #13798: TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException URL: https://github.com/apache/lucene/issues/13798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Correct Point file extensions in Codec javadocs [lucene]

2024-09-18 Thread via GitHub
romseygeek merged PR #13801: URL: https://github.com/apache/lucene/pull/13801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Make MaxScoreBulkScorer repartition scorers when the min competitive increases. [lucene]

2024-09-18 Thread via GitHub
jpountz commented on PR #13800: URL: https://github.com/apache/lucene/pull/13800#issuecomment-2357876594 luceneutil on wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

[PR] Make MaxScoreBulkScorer repartition scorers when the min competitive increases. [lucene]

2024-09-18 Thread via GitHub
jpountz opened a new pull request, #13800: URL: https://github.com/apache/lucene/pull/13800 MaxScoreBulkScorer partitions scorers into a set of essential scorers and a set of non-essential scorers, depending on the maximum scores produced by scorers and on the current minimum competitive sc

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-18 Thread via GitHub
vsop-479 commented on PR #13782: URL: https://github.com/apache/lucene/pull/13782#issuecomment-2357709872 > can we find other occurrences using some regex searches? I replaced loop compare suffixe with `Arrays#compareUnsigned` in `IDVersionSegmentTermsEnumFrame` and `OrdsSegmentTermsE