Re: [PR] Avoid wrap readers without soft-deletes (#13588) [lucene]

2024-07-18 Thread via GitHub
dnhatn merged PR #13590: URL: https://github.com/apache/lucene/pull/13590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Avoid wrap readers without soft-deletes [lucene]

2024-07-18 Thread via GitHub
dnhatn merged PR #13588: URL: https://github.com/apache/lucene/pull/13588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[PR] Remove useless todo in PerThreadPKLookup. [lucene]

2024-07-18 Thread via GitHub
vsop-479 opened a new pull request, #13589: URL: https://github.com/apache/lucene/pull/13589 ### Description No difference, see https://github.com/apache/lucene/pull/13557. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]

2024-07-18 Thread via GitHub
vsop-479 commented on PR #13557: URL: https://github.com/apache/lucene/pull/13557#issuecomment-2237974082 I also tested it by adding simulation code under `jmh` ( just ensure they get right optimization), there is no difference neither : Benchmark (size) Mo

Re: [PR] Convert more classes to record classes [lucene]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #13328: URL: https://github.com/apache/lucene/pull/13328#issuecomment-2237834830 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Only search soft deleted in SoftDeletesRetentionMergePolicy.applyRetentionQuery [lucene]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #13536: URL: https://github.com/apache/lucene/pull/13536#issuecomment-2237834572 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Avoid wrap readers without soft-deletes [lucene]

2024-07-18 Thread via GitHub
dnhatn commented on PR #13588: URL: https://github.com/apache/lucene/pull/13588#issuecomment-2237702315 @jpountz Thanks for reviewing! I am about to write the description and have updated it. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Avoid wrap readers without soft-deletes [lucene]

2024-07-18 Thread via GitHub
dnhatn commented on PR #13588: URL: https://github.com/apache/lucene/pull/13588#issuecomment-2237700737 https://github.com/user-attachments/assets/3842b32e-e337-4a90-b76c-4f51b1ee9bfa";> -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Feature/vector io prefetch [lucene]

2024-07-18 Thread via GitHub
jpountz commented on PR #13586: URL: https://github.com/apache/lucene/pull/13586#issuecomment-2237667492 Thanks for looking into this! It's disappointing that this small change degrades performance so much indeed! I'm curious if you are able to run your benchmark under a profiler to confirm

Re: [PR] Compute facets while collecting [lucene]

2024-07-18 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1683534122 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/abstracts/GetOrd.java: ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Compute facets while collecting [lucene]

2024-07-18 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1683523816 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldLeafCollector.java: ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Use Max WAND optimizations with ToParentBlockJoinQuery when using ScoreMode.Max [lucene]

2024-07-18 Thread via GitHub
jpountz commented on code in PR #13587: URL: https://github.com/apache/lucene/pull/13587#discussion_r1683500985 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -101,12 +99,7 @@ public Weight createWeight( .rewrite(new Const

[PR] Use Max WAND optimizations with ToParentBlockJoinQuery when using ScoreMode.Max [lucene]

2024-07-18 Thread via GitHub
Mikep86 opened a new pull request, #13587: URL: https://github.com/apache/lucene/pull/13587 Updates `ToParentBlockJoinQuery` to propagate the min competitive score when using `ScoreMode.Max` -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Deprecate `COSINE` before Lucene 10 release [lucene]

2024-07-18 Thread via GitHub
jmazanec15 commented on issue #13281: URL: https://github.com/apache/lucene/issues/13281#issuecomment-2237366786 > I am not sure what to do for users who quantize their own vectors & rely on cosine. I think I am on same page as @msokolov. Users could "float_vector -> norm_float_vecto

Re: [PR] Compute facets while collecting [lucene]

2024-07-18 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1683349131 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldLeafCollector.java: ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Compute facets while collecting [lucene]

2024-07-18 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1683347154 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldLeafCollector.java: ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Compute facets while collecting [lucene]

2024-07-18 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1683321175 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldLeafCollector.java: ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Feature/vector io prefetch [lucene]

2024-07-18 Thread via GitHub
benwtrent commented on PR #13586: URL: https://github.com/apache/lucene/pull/13586#issuecomment-2237254527 🤔 my benchmarking is suspicious. I wonder if I am doing something wrong. I have a 4GB index, on a 4GB machine, 1GB set aside for the JVM. So, QPS should be about the same.

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-07-18 Thread via GitHub
vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2237194980 > The pattern doesn't work well with ColBERT esque models. +1.. Good question, @navneet1v. I had the same doubts before starting this effort. There is some discussion in [1231

[PR] Feature/vector io prefetch [lucene]

2024-07-18 Thread via GitHub
benwtrent opened a new pull request, #13586: URL: https://github.com/apache/lucene/pull/13586 I am trying out some prefetching for vector search and HNSW. This right now is a dead-simple version that simply prefetches the next neighbor we will explore. I will respond with some

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1683116076 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return fi

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-07-18 Thread via GitHub
jmazanec15 commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2237010997 Makes sense thanks @benwtrent . Im working on PoC and some experiments. Didnt realize that the full-precision vectors for quantized index are exposed via getFloatVectorValues. Th

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
magibney commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1683132959 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return file

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1683116076 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return fi

Re: [I] Incomplete Javadoc for DirectoryReader#indexExists [lucene]

2024-07-18 Thread via GitHub
uschindler commented on issue #13583: URL: https://github.com/apache/lucene/issues/13583#issuecomment-2236932133 Yeah, it looks like its missing the second part of the sentence. ...or if an index in the process of committing the return value is not reliable. Actually the code only che

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1683097237 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return fi

Re: [PR] Compute facets while collecting [lucene]

2024-07-18 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1683081812 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
magibney commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1683080897 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return file

Re: [PR] Compute facets while collecting [lucene]

2024-07-18 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1683076964 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-18 Thread via GitHub
rmuir commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2236829406 I definitely want to play around more with @goankur 's PR here and see what performance looks like across machines, but will be out of town for a bit. There is a script to run the be

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
ChrisHegarty commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1683020894 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-18 Thread via GitHub
rmuir commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2236809947 And i see from playing around with compiler versions, the advantage of intrinsics approach: although I worry how many variants we'd maintain. it would give stability across releasing lucen

Re: [PR] HnswLock: access locks via hash and only use for concurrent indexing [lucene]

2024-07-18 Thread via GitHub
benwtrent commented on PR #13581: URL: https://github.com/apache/lucene/pull/13581#issuecomment-2236769005 @msokolov whenever I had to benchmark the original parallel merge change, the way to isolate was reduce the KnnIndexer buffer size dramatically to create multiple segments, then measur

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-18 Thread via GitHub
rmuir commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2236739249 Here is my proposal visually: https://godbolt.org/z/a9T8YrroY As you can see, by passing `-march=cascadelake` it generates VNNI instructions. IMO, no need for any intrinsics anyw

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-18 Thread via GitHub
rmuir commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2236647191 > I avoided it at the time given the toolchain that we were using, but it's a good option which I'll reevaluate. It should work well with any modern gcc (@goankur uses gcc 10 here).

Re: [PR] HnswLock: access locks via hash and only use for concurrent indexing [lucene]

2024-07-18 Thread via GitHub
msokolov commented on PR #13581: URL: https://github.com/apache/lucene/pull/13581#issuecomment-2236632284 I agree it's weird we saw no impact -- I'll retry with -forceMerge -- probably there was not enough merge activity? -- This is an automated message from the Apache Git Service. To re

[PR] Inline skip data into postings lists [lucene]

2024-07-18 Thread via GitHub
jpountz opened a new pull request, #13585: URL: https://github.com/apache/lucene/pull/13585 This updates the postings format in order to inline skip data into postings. This format is generally similar to the current `Lucene99PostingsFormat`, e.g. it shares the same block encoding logic, bu

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
magibney commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1682897378 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return file

Re: [I] Deprecate `COSINE` before Lucene 10 release [lucene]

2024-07-18 Thread via GitHub
msokolov commented on issue #13281: URL: https://github.com/apache/lucene/issues/13281#issuecomment-2236548966 It would be interesting to know how many actual users of COSINE there are. I agree there may be no workaround, but that does not mean we need to continue to support, either. One qu

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-18 Thread via GitHub
ChrisHegarty commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2236540457 Could Lucene ever have this directly in one of its modules? We currently use the `FlatVectorsScorer` to plugin the "native code optimized" alternative, when scoring Scalar Quantize

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-18 Thread via GitHub
ChrisHegarty commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2236530280 > > With the updated compile flags, the performance of auto-vectorized code is slightly better than explicitly vectorized code (see results). Interesting thing to note is that both

[I] Add support for reading/writing dense vectors to MemoryIndex [lucene]

2024-07-18 Thread via GitHub
benwtrent opened a new issue, #13584: URL: https://github.com/apache/lucene/issues/13584 ### Description Back when knn vectors were introduce, we sort of kicked the can for MemoryIndex support. While it may not make sense to add the approximate search API (debatable), it should at le

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1682781560 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return fi

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
ChrisHegarty commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1682680606 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return

Re: [I] Deprecate `COSINE` before Lucene 10 release [lucene]

2024-07-18 Thread via GitHub
benwtrent commented on issue #13281: URL: https://github.com/apache/lucene/issues/13281#issuecomment-2236364825 I cannot think of an adequate work around at all for `byte` folks. The linear transformation of bytes will indeed cause potentially non-uniform magnitudes and could break scoring

Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-18 Thread via GitHub
iverase commented on code in PR #13563: URL: https://github.com/apache/lucene/pull/13563#discussion_r1682694746 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java: ## @@ -207,65 +210,127 @@ void accumulate(long value) { maxValue = Mat

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
ChrisHegarty commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1682680606 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return

Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-18 Thread via GitHub
iverase commented on code in PR #13563: URL: https://github.com/apache/lucene/pull/13563#discussion_r1682671882 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java: ## @@ -207,65 +210,127 @@ void accumulate(long value) { maxValue = Mat

Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-18 Thread via GitHub
iverase commented on code in PR #13563: URL: https://github.com/apache/lucene/pull/13563#discussion_r1682656013 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1792,61 +1794,91 @@ public DocValuesSkipper getSkipper(FieldInfo field

Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-18 Thread via GitHub
iverase commented on code in PR #13563: URL: https://github.com/apache/lucene/pull/13563#discussion_r1682654554 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1792,61 +1794,91 @@ public DocValuesSkipper getSkipper(FieldInfo field

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1682619481 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return fi

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
uschindler commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1682619481 ## lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java: ## @@ -142,6 +143,26 @@ public static String stripSegmentName(String filename) { return fi

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-18 Thread via GitHub
ChrisHegarty commented on code in PR #13570: URL: https://github.com/apache/lucene/pull/13570#discussion_r1682608032 ## lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java: ## @@ -83,6 +86,19 @@ public class MMapDirectory extends FSDirectory { */ public static

Re: [PR] Add a `targetSearchConcurrency` parameter to `LogMergePolicy`. [lucene]

2024-07-18 Thread via GitHub
jpountz commented on PR #13517: URL: https://github.com/apache/lucene/pull/13517#issuecomment-2236096293 > improved the logic to not apply a threshold on the doc count for merges below the min merge size Woops I need to revert this for now. This made sense for TieredMergePolicy but t

Re: [PR] Add a `targetSearchConcurrency` parameter to `LogMergePolicy`. [lucene]

2024-07-18 Thread via GitHub
jpountz merged PR #13517: URL: https://github.com/apache/lucene/pull/13517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Stop requiring MaxScoreBulkScorer's outer window from having at least INNER_WINDOW_SIZE docs. [lucene]

2024-07-18 Thread via GitHub
jpountz merged PR #13582: URL: https://github.com/apache/lucene/pull/13582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-18 Thread via GitHub
jpountz commented on code in PR #13563: URL: https://github.com/apache/lucene/pull/13563#discussion_r1682502283 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java: ## @@ -207,65 +210,127 @@ void accumulate(long value) { maxValue = Mat

Re: [PR] Add a `targetSearchConcurrency` parameter to `LogMergePolicy`. [lucene]

2024-07-18 Thread via GitHub
jpountz commented on PR #13517: URL: https://github.com/apache/lucene/pull/13517#issuecomment-2236001207 Thanks for looking @stefanvodita! I added a CHANGES entry and improved the logic to not apply a threshold on the doc count for merges below the min merge size. I will merge soon. -- T

Re: [PR] Gradle build: cleanup of dependency resolution and consolidation of dependency versions [lucene]

2024-07-18 Thread via GitHub
dweiss commented on code in PR #13484: URL: https://github.com/apache/lucene/pull/13484#discussion_r1682394662 ## versions.lock: ## Review Comment: Oh - the "because" contain a hash key and all configurations/projects which refer to a dependency. These hashes are used next

Re: [PR] Gradle build: cleanup of dependency resolution and consolidation of dependency versions [lucene]

2024-07-18 Thread via GitHub
dweiss commented on code in PR #13484: URL: https://github.com/apache/lucene/pull/13484#discussion_r1682387379 ## versions.lock: ## Review Comment: A plugin does this. It is similar in nature to palantir's but "passive" - it only collects dependencies from selected configu