Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-17 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1396959558 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,17 @@ public OnHeapHnswGraph build(int maxOrd) throws IOExce

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-17 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1396959558 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,17 @@ public OnHeapHnswGraph build(int maxOrd) throws IOExce

Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-17 Thread via GitHub
mikemccand commented on PR #124: URL: https://github.com/apache/lucene/pull/124#issuecomment-1816528624 OK thank you for bringing closure @ChristophKaser. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] Dry up DirectReader implementations [lucene]

2023-11-17 Thread via GitHub
original-brownbear opened a new pull request, #12823: URL: https://github.com/apache/lucene/pull/12823 This can be written in a much drier way that shouldn't come at any performance cost as far as I can see. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-17 Thread via GitHub
mikemccand commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1816539511 Thanks @shubhamvishu and @dweiss and @bruno-roustant. Hashing is fun and hard :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
Shibi-bala commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1816600417 @uschindler hey, thanks for the approval! Read the contributing guidelines, but not entirely sure how to get permissions to merge this PR. -- This is an automated message from the A

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
uschindler commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1816614768 You can't do it. Please add a Changes entry unter the 9.9 section, commit it to branch and I will merge and Backport your PR. I am just away from my computer at moment, s

Re: [PR] LUCENE-10241: Updating OpenNLP to 1.9.4. [lucene]

2023-11-17 Thread via GitHub
cpoerschke merged PR #448: URL: https://github.com/apache/lucene/pull/448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] Update OpenNLP to 1.9.4 [LUCENE-10241] [lucene]

2023-11-17 Thread via GitHub
cpoerschke closed issue #11277: Update OpenNLP to 1.9.4 [LUCENE-10241] URL: https://github.com/apache/lucene/issues/11277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Update OpenNLP to 1.9.4 [LUCENE-10241] [lucene]

2023-11-17 Thread via GitHub
cpoerschke commented on issue #11277: URL: https://github.com/apache/lucene/issues/11277#issuecomment-1816701100 #448 is the merged `main` branch pull request and https://github.com/apache/lucene/commit/b8094d49aaf5e5cb5182c0307e25eafa2d332dda is the `branch_9x` commit. Thanks @jzont

Re: [PR] Re-use information from graph traversal during exact search [lucene]

2023-11-17 Thread via GitHub
kaivalnp commented on PR #12820: URL: https://github.com/apache/lucene/pull/12820#issuecomment-1816720340 Thanks @jpountz! I realised something from your comment: My current implementation has a flaw, because it cannot handle the [`OrdinalTranslatedKnnCollector`](https://github.com/ka

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
MarcusSorealheis commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1816799469 @Shibi-bala It's here: https://github.com/apache/lucene/blob/c228e4bb66ca73c8150d8eaebe2bb999bcc6c9b1/lucene/CHANGES.txt#L147 You need to include your user and the

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
Shibi-bala commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1816818775 Made the changes. Thanks @uschindler @MarcusSorealheis @msfroh 😁 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Fix CheckIndex to detect major corruption with old (not the latest) commit point [lucene]

2023-11-17 Thread via GitHub
mikemccand merged PR #12530: URL: https://github.com/apache/lucene/pull/12530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
uschindler merged PR #12626: URL: https://github.com/apache/lucene/pull/12626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] segmentInfos.replace() doesn't set userData [lucene]

2023-11-17 Thread via GitHub
uschindler closed issue #12637: segmentInfos.replace() doesn't set userData URL: https://github.com/apache/lucene/issues/12637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] CheckIndex cannot "fix" indexes that have individual segments with missing or corrupt .si files because sanity checks will fail trying to read the index initially. [LUCENE-6762] [lucene]

2023-11-17 Thread via GitHub
mikemccand commented on issue #7820: URL: https://github.com/apache/lucene/issues/7820#issuecomment-1816857195 I merged the first step in this issue -- detecting when this unique snowflake form of corruption strikes. Step 2 is to enable exorcism when there is an `_X.si` file missing,

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-17 Thread via GitHub
jpountz commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1817146436 Thanks @easyice. I took some time to look into the benchmark and improve a few things, hopefully you don't mind. Here is the output of the benchmark on my machine now: ``` Benc

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-17 Thread via GitHub
jpountz commented on code in PR #12782: URL: https://github.com/apache/lucene/pull/12782#discussion_r1391047570 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Log number of visited nodes in knn query [lucene]

2023-11-17 Thread via GitHub
jpountz commented on PR #12819: URL: https://github.com/apache/lucene/pull/12819#issuecomment-1817157857 Logging doesn't sound like a good fit for this, would it be better exposed e.g. via the profiling query? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-17 Thread via GitHub
jpountz commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1397929325 ## lucene/misc/src/java/org/apache/lucene/misc/search/HumanReadableQuery.java: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-17 Thread via GitHub
vigyasharma commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1397994430 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -26,26 +26,71 @@ * @lucene.experimental */ public final class TopKnnCollector ext

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-17 Thread via GitHub
vigyasharma commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1817274998 We seem to consistently see an improvement in recall between single segment, and multi-segment runs (both seq and conc.) on baseline. Is this because with multiple segments, we get m

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-17 Thread via GitHub
vigyasharma commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1817282807 Do you have a mental model on what kind of graphs would see minimal loss of recall between baseline and candidate? Is this change better with denser (higher fanout) graphs? Would it

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-17 Thread via GitHub
slow-J commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1398045523 ## lucene/misc/src/test/org/apache/lucene/misc/search/TestHumanReadableQuery.java: ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-17 Thread via GitHub
slow-J commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1817340280 > I left minor comments but it looks good to me otherwise! Thanks for the feedback! Done the changes. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Fix Field.java documentation to refer to new IntField/FloatField/LongField/DoubleField #12125 [lucene]

2023-11-17 Thread via GitHub
jpountz commented on PR #12821: URL: https://github.com/apache/lucene/pull/12821#issuecomment-1817419020 Thanks for doing it, it looks like the PR includes unintended changes though? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Remove delayed seek optimization. [lucene]

2023-11-17 Thread via GitHub
jpountz merged PR #12815: URL: https://github.com/apache/lucene/pull/12815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-17 Thread via GitHub
jpountz merged PR #12816: URL: https://github.com/apache/lucene/pull/12816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Can/should `KnnByte/FloatVectorQuery` carry some human-meaningful opaque `toString` fragment? [lucene]

2023-11-17 Thread via GitHub
jpountz closed issue #12487: Can/should `KnnByte/FloatVectorQuery` carry some human-meaningful opaque `toString` fragment? URL: https://github.com/apache/lucene/issues/12487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use