Re: [I] Compute multiple aggregations in one iteration of the match-set [lucene]

2023-10-06 Thread via GitHub
stefanvodita commented on issue #12546: URL: https://github.com/apache/lucene/issues/12546#issuecomment-1750160967 I was thinking about this issue in connection with #12553. If accumulators were generic, that would give us multi-aggregations by default. For example, if we're setting up an

Re: [I] [DISCUSS] Identifying Gaps in Lucene’s Faceting [lucene]

2023-10-06 Thread via GitHub
stefanvodita commented on issue #12553: URL: https://github.com/apache/lucene/issues/12553#issuecomment-1750161091 One dependency I can point out is between the idea of nested aggregations and that of specific aggregation targets. With nested aggregations, we want to target some aggregation

Re: [I] Make IndexWriter#flushNextBuffer flush deletes too? [lucene]

2023-10-06 Thread via GitHub
jpountz commented on issue #12572: URL: https://github.com/apache/lucene/issues/12572#issuecomment-1750526300 Thank you Simon for looking into it and sorry for putting you on the wrong track. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Make IndexWriter#flushNextBuffer flush deletes too? [lucene]

2023-10-06 Thread via GitHub
s1monw commented on issue #12572: URL: https://github.com/apache/lucene/issues/12572#issuecomment-1750542067 nah man, it was the right intention. good exercise for me to refresh knowledge -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-10-06 Thread via GitHub
Shibi-bala opened a new pull request, #12626: URL: https://github.com/apache/lucene/pull/12626 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[I] HnwsGraph creates disconnected components [lucene]

2023-10-06 Thread via GitHub
nitirajrathore opened a new issue, #12627: URL: https://github.com/apache/lucene/issues/12627 ### Description I work for Amazon Retail Product search and we are using Lucene KNN for semantic search of products. We recently noticed that the hnsw graphs generated are not always st

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-06 Thread via GitHub
robertvanwinkle1138 commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1750770721 > QDrant's HNSW filter solution is the exact same as Lucene's Interesting thanks. > as candidate posting lists are gathered, ensure they have some candidates

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-06 Thread via GitHub
benwtrent commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1750806182 I did some benchmarks with the current change, using 400k Cohere embeddings searching over 1k vectors. This was on GCP `c3-standard-8` (Intel Sapphire Rapids), on this machine `byte[]`

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-06 Thread via GitHub
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1750872046 > This search with filter method seems to throw an error. LOL, I thought it was supported, I must have read a github issue and made an assumption. > Couldn't that be

Re: [PR] Add ParentJoin KNN support [lucene]

2023-10-06 Thread via GitHub
alessandrobenedetti commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1750930644 Thanks @benwtrent for this work! I finally had the chance to take a look. It's a lot and I see it's already merged, so I don't have any meaningful comment at the moment, bu

Re: [PR] Add ParentJoin KNN support [lucene]

2023-10-06 Thread via GitHub
benwtrent commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1750978708 > The work here drastically changes the way also my Pull Request should look like right now. Yes, I am sorry about that. But the good news is that the integration for multi-valu

Re: [PR] Add ParentJoin KNN support [lucene]

2023-10-06 Thread via GitHub
alessandrobenedetti commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1751034549 > > The work here drastically changes the way also my Pull Request should look like right now. > > Yes, I am sorry about that. But the good news is that the integration

[PR] Enable rank-unsafe optimization of top-k hit computations by quantizing scores. [lucene]

2023-10-06 Thread via GitHub
jpountz opened a new pull request, #12628: URL: https://github.com/apache/lucene/pull/12628 This adds a `ScoreQuantizingCollector`, which quantizes scores with a configurable number of accuracy bits. This allows dynamic pruning to more efficiently skip hits that would have similar scores. W

Re: [PR] Enable rank-unsafe optimization of top-k hit computations by quantizing scores. [lucene]

2023-10-06 Thread via GitHub
jpountz commented on PR #12628: URL: https://github.com/apache/lucene/pull/12628#issuecomment-1751102600 I ran the Tantivy benchmark with TOP_10 and TOP_100 commands and shared the result at https://jpountz.github.io/quantization. The right column configures quantization with 4 bits of accu

Re: [PR] Enable rank-unsafe optimization of top-k hit computations by quantizing scores. [lucene]

2023-10-06 Thread via GitHub
benwtrent commented on PR #12628: URL: https://github.com/apache/lucene/pull/12628#issuecomment-1751135535 Say WHAT?!?!?! https://github.com/apache/lucene/assets/4357155/08043030-f84d-47e5-ac09-90d59adcf878";> -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Enable rank-unsafe optimization of top-k hit computations by quantizing scores. [lucene]

2023-10-06 Thread via GitHub
benwtrent commented on PR #12628: URL: https://github.com/apache/lucene/pull/12628#issuecomment-1751149629 Would the typical usage of this scorer be as an initial collection of `k` docs, which would then get re-scored over their full floating point scores? Are there any numbers around

Re: [I] StackOverflow when RegExp encounters a very large string [LUCENE-10501] [lucene]

2023-10-06 Thread via GitHub
snow-lily-warner commented on issue #11537: URL: https://github.com/apache/lucene/issues/11537#issuecomment-1751156372 @zhaih can this fix be released as a patch version of 8.x? Version 9.x is not compatible with elasticsearch which has lucene as a dependency -- This is an automated messa

Re: [I] StackOverflow when RegExp encounters a very large string [LUCENE-10501] [lucene]

2023-10-06 Thread via GitHub
benwtrent commented on issue #11537: URL: https://github.com/apache/lucene/issues/11537#issuecomment-1751166998 What is your concern @snow-lily-warner ES should have been protected against this for a while now: https://github.com/elastic/elasticsearch/pull/84624 And the latest

[PR] Refactor Lucene95 to allow off heap vector reader reuse [lucene]

2023-10-06 Thread via GitHub
benwtrent opened a new pull request, #12629: URL: https://github.com/apache/lucene/pull/12629 While going through: https://github.com/apache/lucene/pull/12582 I noticed that for a while now, our offheap vector readers haven't changed at all. We just keep copying them around for no rea

Re: [PR] Improve refresh speed with softdelete enable [lucene]

2023-10-06 Thread via GitHub
easyice commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1751579594 @jpountz Would you please take a look when you get a chance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] DeletedTerms#clear should reset ByteBlockPool [lucene]

2023-10-06 Thread via GitHub
gf2121 opened a new pull request, #12630: URL: https://github.com/apache/lucene/pull/12630 ### Description This is a bug left by #12573. As we are using a common `ByteBlockPool` across all `BytesRefHash`, the map clear won't help release the memory held by `ByteBlockPool` and the poo