Re: [PR] Speedup Lucene912PostingsReader nextDoc() impls. [lucene]

2024-10-29 Thread via GitHub
jpountz commented on PR #13963: URL: https://github.com/apache/lucene/pull/13963#issuecomment-2443576894 Here are two runs on `wikibigall`. I wouldn't read too much in the `CountOrHighHigh` and `CountOrHighMed` tasks, it wouldn't be the first time that my machine reports bigger speedups tha

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-29 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2443591275 I plan on merging this change soon, and looking into moving postings back to int[] arrays next to hopefully get benefits from having 2x more lanes that can be compared at once. -- Thi

Re: [I] Unable to Tessellate shape for a valid Polygon according to GDAL/OGR and PostGIS [lucene]

2024-10-29 Thread via GitHub
sinuhepop commented on issue #13841: URL: https://github.com/apache/lucene/issues/13841#issuecomment-2444173886 According to [GeoJSONLint](https://geojsonlint.com) this polygon doesn't follow the "right-hand rule". -- This is an automated message from the Apache Git Service. To respond to

[PR] Account for 0 graph size when initializing HNSW graph [lucene]

2024-10-29 Thread via GitHub
mayya-sharipova opened a new pull request, #13964: URL: https://github.com/apache/lucene/pull/13964 When initializing a joint graph from one of the segments' graphs, we always assume that a segment's graph is present. But later we want to explore an option where some segments will not

Re: [PR] Indicate frontier init length in Lucene90BlockTreeTermsWriter#compileIndex. [lucene]

2024-10-29 Thread via GitHub
github-actions[bot] commented on PR #13916: URL: https://github.com/apache/lucene/pull/13916#issuecomment-2445559376 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add changelog verifier [lucene]

2024-10-29 Thread via GitHub
github-actions[bot] commented on PR #13909: URL: https://github.com/apache/lucene/pull/13909#issuecomment-2445559414 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-29 Thread via GitHub
mdmarshmallow commented on code in PR #13951: URL: https://github.com/apache/lucene/pull/13951#discussion_r1821700862 ## lucene/core/src/java/org/apache/lucene/index/IndexWriterRAMManager.java: ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-29 Thread via GitHub
mdmarshmallow commented on code in PR #13951: URL: https://github.com/apache/lucene/pull/13951#discussion_r1821703257 ## lucene/core/src/java/org/apache/lucene/index/IndexWriterRAMManager.java: ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [I] [Discuss] What should we name the new aggregation engine? [lucene]

2024-10-29 Thread via GitHub
rmuir commented on issue #13965: URL: https://github.com/apache/lucene/issues/13965#issuecomment-2445349922 I voted on a couple cool suggestions from your link. You can always add "Fast-", "Block-", "Smart-", "Bulk-", in front of any of them for more fun (kidding). -- This is an a

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-10-29 Thread via GitHub
jimczi commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2445295372 The more I think about it, the less I feel like the knn codec is the best choice for this feature (assuming that this issue is focused on late interaction models). > It is possible

Re: [PR] Fix numDeletesToMerge for unchanged segments [lucene]

2024-10-29 Thread via GitHub
dnhatn closed pull request #13324: Fix numDeletesToMerge for unchanged segments URL: https://github.com/apache/lucene/pull/13324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Remove HitsThresholdChecker. [lucene]

2024-10-29 Thread via GitHub
jpountz commented on PR #13943: URL: https://github.com/apache/lucene/pull/13943#issuecomment-2445073699 Wow, it's been a bigger speedup on nightly benchmarks than on my machine: https://benchmarks.mikemccandless.com/And3Terms.html. -- This is an automated message from the Apache Git Serv

Re: [PR] replace Map with IntObjectHashMap for DV producer [lucene]

2024-10-29 Thread via GitHub
jpountz merged PR #13961: URL: https://github.com/apache/lucene/pull/13961 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] replace Map with IntObjectHashMap for DV producer [lucene]

2024-10-29 Thread via GitHub
bugmakerr commented on code in PR #13961: URL: https://github.com/apache/lucene/pull/13961#discussion_r1820891044 ## lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java: ## @@ -377,7 +376,7 @@ private static class SlowCompositeDocValuesProducerW

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-29 Thread via GitHub
ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2444863020 > Maybe we could add a RandomVectorScorer.setTarget(int node) method that would only be implemented by the Scorers returned from ScorerSuppliers? Let's defer a double addressi

Re: [PR] Speed up Lucene912PostingsReader nextDoc() impls. [lucene]

2024-10-29 Thread via GitHub
jpountz merged PR #13963: URL: https://github.com/apache/lucene/pull/13963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-10-29 Thread via GitHub
vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2444980147 Hi @jimczi , The main change in this PR is support for multi-vectors in flat readers and writers, along with a similarity spec for multiple vector values. It is possible that H

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-10-29 Thread via GitHub
vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2444982753 > Maybe the first goal should be to incorporate max sim for re-ranking use cases first using a flat format This could be setup using 1) a single-vector field for hnsw matching,

Re: [I] Could Lucene's default Directory (`FSDirectory.open`) somehow preload `.vec` files? [lucene]

2024-10-29 Thread via GitHub
mikemccand commented on issue #13551: URL: https://github.com/apache/lucene/issues/13551#issuecomment-2444797398 > ...but later realized that this would not work for compound files that have vector data in them eg `.cfs`? Tracing `MMapDirectory.openInput` in `main`, it does this:

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-10-29 Thread via GitHub
vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2444990835 As mentioned earlier, here is my rough plan for splitting this change into smaller PRs. Some of these steps could be merged if the impl. warrants it: 1. Multi-Vector similarity

[PR] Speedup Lucene912PostingsReader nextDoc() impls. [lucene]

2024-10-29 Thread via GitHub
jpountz opened a new pull request, #13963: URL: https://github.com/apache/lucene/pull/13963 127 times out of 128, nextDoc() returns the next doc ID in the buffer. Currently, we check if the current doc is equal to the last doc ID in the block to know if we need to refill. We can do better b

[I] It seems that escaped query characters are not treated as escaped when calling queryParser.Parse() [lucene]

2024-10-29 Thread via GitHub
suchoss opened a new issue, #13962: URL: https://github.com/apache/lucene/issues/13962 ### Description Having following code: ```java package org.example; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.luc

Re: [PR] replace Map with IntObjectHashMap for DV producer [lucene]

2024-10-29 Thread via GitHub
jpountz commented on code in PR #13961: URL: https://github.com/apache/lucene/pull/13961#discussion_r1820458842 ## lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java: ## @@ -377,7 +376,7 @@ private static class SlowCompositeDocValuesProducerWrapper

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-29 Thread via GitHub
mdmarshmallow commented on PR #13951: URL: https://github.com/apache/lucene/pull/13951#issuecomment-2445766269 Thanks for all the comments guys, I've been pretty busy with some life things/work, but hopefully I'll put out another update by tomorrow! -- This is an automated message from th

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-29 Thread via GitHub
mdmarshmallow commented on code in PR #13951: URL: https://github.com/apache/lucene/pull/13951#discussion_r1821702844 ## lucene/core/src/java/org/apache/lucene/index/IndexWriterRAMManager.java: ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o