Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-08-21 Thread via GitHub
msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2301788507 Hi thanks for that @jpountz, no worries; this was something we all agreed on. I'm able to continue with the "research" part of this by simply increasing heap size - it's not a bloc

Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-08-21 Thread via GitHub
msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2301804631 In the meantime, just to let you know I do have a dirt path implementation of this (multithreading not yet working, totally recomputes centroids on every iteration, etc), but it is

Re: [PR] Display frame types when analyzing top frames. [lucene]

2024-08-21 Thread via GitHub
jpountz commented on PR #13670: URL: https://github.com/apache/lucene/pull/13670#issuecomment-2301811869 Nightly benchmarks picked it up, see e.g. https://home.apache.org/~mikemccand/lucenebench/2024.08.20.18.04.44.html#profiler_searching_4_cpu. cc @mikemccand -- This is an automated me

Re: [PR] Override single byte writes to OutputStreamIndexOutput to remove locking [lucene]

2024-08-21 Thread via GitHub
msokolov commented on PR #13543: URL: https://github.com/apache/lucene/pull/13543#issuecomment-2301813096 > This change should also help with that by cutting the number of calls to BufferedOutputStream#write(int) by a factor of 8192, which cuts the number of calls to growIfNeeded by the sam

Re: [I] [DISCUSS] Could we have a different ANN algorithm for Learned Sparse Vectors? [lucene]

2024-08-21 Thread via GitHub
msokolov commented on issue #13675: URL: https://github.com/apache/lucene/issues/13675#issuecomment-2301818862 What I wonder is: how can Lucene help with this? I feel like we have all the primitives available to enable Splade-style search and retrieval, but maybe there is something missing?

Re: [I] [DISCUSS] Could we have a different ANN algorithm for Learned Sparse Vectors? [lucene]

2024-08-21 Thread via GitHub
benwtrent commented on issue #13675: URL: https://github.com/apache/lucene/issues/13675#issuecomment-2301866689 There might be a better format than just terms. But I would assume the bipartite graph stuff would help here. Additionally, I would expect the most benefits to be made at qu

Re: [I] Eclipse - one or more cycles were detected [lucene]

2024-08-21 Thread via GitHub
dweiss commented on issue #13676: URL: https://github.com/apache/lucene/issues/13676#issuecomment-2302007706 How did you import the project into Eclipse? It should be "Import as an existing project" or something like this. When I run gradlew eclipse, the .classpath file doesn't mention thos

Re: [PR] Compute facets while collecting [lucene]

2024-08-21 Thread via GitHub
mikemccand commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2302206543 > Perhaps this is something we'd want to fix for Lucene 10 if it requires breaking changes? +1, thanks @javanna and @gsmiller. -- This is an automated message from the Apache

Re: [I] Flaky Test in TestMergeSchedulerExternal#testSubclassConcurrentMergeScheduler [lucene]

2024-08-21 Thread via GitHub
aoli-al commented on issue #13547: URL: https://github.com/apache/lucene/issues/13547#issuecomment-2302212349 Please use this fork to reproduce the failure: https://github.com/aoli-al/lucene/tree/LUCENE-13547 Command: `./gradlew test --tests "*testSubclassConcurrentMergeScheduler*"` --

Re: [I] Test TestIndexWriterWithThreads#testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce Failed [lucene]

2024-08-21 Thread via GitHub
aoli-al commented on issue #13552: URL: https://github.com/apache/lucene/issues/13552#issuecomment-2302320506 Please use the following fork to reproduce the failure: https://github.com/aoli-al/lucene/tree/LUCENE-13552 Command: `./gradlew test --tests "*testIOExceptionDuringWriteSegmentWi

Re: [I] ConcurrentMergeScheduler may spawn more merge threads than specified [lucene]

2024-08-21 Thread via GitHub
aoli-al commented on issue #13593: URL: https://github.com/apache/lucene/issues/13593#issuecomment-2302344991 Please use the following fork to reproduce the failure: https://github.com/aoli-al/lucene/tree/LUCENE-13593 Note that the patch adds an infinite loop at the end of the test. S

Re: [PR] Take advantage of the doc value skipper when it is primary sort [lucene]

2024-08-21 Thread via GitHub
iverase commented on code in PR #13592: URL: https://github.com/apache/lucene/pull/13592#discussion_r1725315073 ## lucene/core/src/java/org/apache/lucene/index/DocValuesSkipper.java: ## @@ -98,4 +98,29 @@ public abstract class DocValuesSkipper { /** Return the global number

Re: [I] Search Results Filtering Based on Bitwise Operations on Integer Fields [LUCENE-2460] [lucene]

2024-08-21 Thread via GitHub
NavidMitchell commented on issue #3534: URL: https://github.com/apache/lucene/issues/3534#issuecomment-2302405519 It would be nice to have this feature supported. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Take advantage of the doc value skipper when it is primary sort [lucene]

2024-08-21 Thread via GitHub
gsmiller commented on code in PR #13592: URL: https://github.com/apache/lucene/pull/13592#discussion_r1725418294 ## lucene/core/src/java/org/apache/lucene/index/DocValuesSkipper.java: ## @@ -98,4 +98,29 @@ public abstract class DocValuesSkipper { /** Return the global numbe

Re: [PR] Leverage doc value skip lists in DocValuesRewriteMethod if indexed [lucene]

2024-08-21 Thread via GitHub
gsmiller commented on PR #13672: URL: https://github.com/apache/lucene/pull/13672#issuecomment-2302551656 Cleaned up this PR a bit and added testing. Also addressed Robert's feedback (thanks @rmuir). Should be ready for another review if anyone is interested. Thanks! -- This is an automa

Re: [PR] import definition of default parameter values from HnswGraphBuilder [lucene]

2024-08-21 Thread via GitHub
msokolov merged PR #13674: URL: https://github.com/apache/lucene/pull/13674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-08-21 Thread via GitHub
mayya-sharipova commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2302685585 > possibly switch to LongValues for storing vectorOrd -> centroidOrd mapping I was thinking about adding centroids mappings as LongValues at the end of meta file, but this

Re: [PR] Release memory for cancelled tasks earlier in TaskExecutor [lucene]

2024-08-21 Thread via GitHub
original-brownbear commented on PR #13609: URL: https://github.com/apache/lucene/pull/13609#issuecomment-2302854261 Thanks Luca! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Release memory for cancelled tasks earlier in TaskExecutor [lucene]

2024-08-21 Thread via GitHub
original-brownbear merged PR #13609: URL: https://github.com/apache/lucene/pull/13609 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-08-21 Thread via GitHub
benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2302857403 100MB assumes that even when compressed, it's a single byte per centroid. 100M vectors might only have 2 centroids and thus only need two bits two store. Also, I would expect the

Re: [I] Search Results Filtering Based on Bitwise Operations on Integer Fields [LUCENE-2460] [lucene]

2024-08-21 Thread via GitHub
gsmiller commented on issue #3534: URL: https://github.com/apache/lucene/issues/3534#issuecomment-2302926537 @NavidMitchell I'm not sure if there's a more convenient way to do this, but note that you can do this using `Expressions` compiled from `JavascriptCompiler` since the compiler suppo

Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-21 Thread via GitHub
jpountz commented on PR #13658: URL: https://github.com/apache/lucene/pull/13658#issuecomment-2302929875 I ran wikibigall on a M3, which is interesting because it does inline the splitLongs call, both before and after the change (presumably because the generated native code is smaller thus

Re: [PR] Leverage doc value skip lists in DocValuesRewriteMethod if indexed [lucene]

2024-08-21 Thread via GitHub
jpountz commented on code in PR #13672: URL: https://github.com/apache/lucene/pull/13672#discussion_r1725706940 ## lucene/core/src/test/org/apache/lucene/search/TestDocValuesRewriteMethod.java: ## @@ -61,14 +61,19 @@ public void setUp() throws Exception { .setMa

Re: [I] [DISCUSS] Could we have a different ANN algorithm for Learned Sparse Vectors? [lucene]

2024-08-21 Thread via GitHub
jpountz commented on issue #13675: URL: https://github.com/apache/lucene/issues/13675#issuecomment-230374 I found this recent paper by well-known people in the IR efficiency space quite interesting: https://arxiv.org/pdf/2405.01117. It builds on inverted indexes and simple/intuitive ide

Re: [PR] Take advantage of the doc value skipper when it is primary sort [lucene]

2024-08-21 Thread via GitHub
jpountz commented on code in PR #13592: URL: https://github.com/apache/lucene/pull/13592#discussion_r1725906610 ## lucene/core/src/test/org/apache/lucene/search/TestDocValuesQueries.java: ## @@ -42,34 +45,100 @@ public class TestDocValuesQueries extends LuceneTestCase { +

Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-08-21 Thread via GitHub
jpountz commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2303226288 > readVInt is also a hotspot at search time for *VectorQuery We should use group-varint, like for tail postings? -- This is an automated message from the Apache Git Service.

Re: [PR] [KNN] Add comment and remove duplicate code [lucene]

2024-08-21 Thread via GitHub
github-actions[bot] commented on PR #13594: URL: https://github.com/apache/lucene/pull/13594#issuecomment-2303335718 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-08-21 Thread via GitHub
msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2303338851 > We should use group-varint, like for tail postings? Does this choose a single bit-width for a group of postings? That sounds like it would produce savings here, yes. Also i

Re: [I] org.apache.lucene.index.IndexFormatTooNewException on arm64 [lucene]

2024-08-21 Thread via GitHub
thomasli9895 commented on issue #13452: URL: https://github.com/apache/lucene/issues/13452#issuecomment-2303379046 I also encountered this problem, I upgraded from version 7.8.1 to version 7.17.9 and then fell back to version 7.8.1 and this glitch occurred ![image](https://github.com/use

Re: [I] org.apache.lucene.index.IndexFormatTooNewException on arm64 [lucene]

2024-08-21 Thread via GitHub
thomasli9895 commented on issue #13452: URL: https://github.com/apache/lucene/issues/13452#issuecomment-2303380395 I also encountered this problem, I upgraded from version 7.8.1 to version 7.17.9 and then fell back to version 7.8.1 and this glitch occurred ![image](https://github.com/use

Re: [I] org.apache.lucene.index.IndexFormatTooNewException on arm64 [lucene]

2024-08-21 Thread via GitHub
thomasli9895 commented on issue #13452: URL: https://github.com/apache/lucene/issues/13452#issuecomment-2303384472 I also encountered this problem, I upgraded from version 7.8.1 to version 7.17.9 and then fell back to version 7.8.1 and this glitch occurred `org.elasticsearch.bootstrap.Sta

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-08-21 Thread via GitHub
vsop-479 commented on PR #13398: URL: https://github.com/apache/lucene/pull/13398#issuecomment-2303599920 @jpountz Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Only apply deletion one time for unique term update in FrozenBufferedUpdates.applyTermDeletes [lucene]

2024-08-21 Thread via GitHub
vsop-479 commented on PR #13486: URL: https://github.com/apache/lucene/pull/13486#issuecomment-2303600507 > Could Lucene maybe track that a field is actually unique internally and then apply this optimization automatically / always correctly? @jpountz Do you have any idea about t