[GitHub] [lucene] jpountz closed pull request #227: LUCENE-10033: Encode numeric doc values and ordinals of SORTED(_SET) doc values in blocks.

2022-10-06 Thread GitBox
jpountz closed pull request #227: LUCENE-10033: Encode numeric doc values and ordinals of SORTED(_SET) doc values in blocks. URL: https://github.com/apache/lucene/pull/227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #11832: Added static factory method for loading VectorValues

2022-10-06 Thread GitBox
shubhamvishu commented on code in PR #11832: URL: https://github.com/apache/lucene/pull/11832#discussion_r988712945 ## lucene/core/src/java/org/apache/lucene/index/SlowCodecReaderWrapper.java: ## @@ -163,7 +163,7 @@ private static KnnVectorsReader readerToVectorReader(LeafReade

[GitHub] [lucene] rmuir commented on issue #11839: gradle aggregate coverage report

2022-10-06 Thread GitBox
rmuir commented on issue #11839: URL: https://github.com/apache/lucene/issues/11839#issuecomment-1269611646 @zhaih please take it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [lucene] jpountz commented on pull request #11832: Added static factory method for loading VectorValues

2022-10-06 Thread GitBox
jpountz commented on PR #11832: URL: https://github.com/apache/lucene/pull/11832#issuecomment-1269629285 > One reasoning is we can have this to be consistent like how we have similar static functions in DocValues. There no code using it right now but this could be used maybe in future?

[GitHub] [lucene] jpountz commented on pull request #11831: GITHUB-11761: Move minimum TieredMergePolicy delete percentage from 2…

2022-10-06 Thread GitBox
jpountz commented on PR #11831: URL: https://github.com/apache/lucene/pull/11831#issuecomment-1269633237 Why did we have to update the number of allowed deletes back to 33% on some tests, did they fail otherwise? Is there another way how we could improve these tests to cope better with the

[GitHub] [lucene] jpountz commented on pull request #11831: GITHUB-11761: Move minimum TieredMergePolicy delete percentage from 2…

2022-10-06 Thread GitBox
jpountz commented on PR #11831: URL: https://github.com/apache/lucene/pull/11831#issuecomment-1269635726 E.g. maybe it would be better to add more documents in `testNRTIsCurrentAfterDelete` in order for the single delete not to introduce more than 20% deletes, and keep the `TieredMergePolic

[GitHub] [lucene] benwtrent commented on issue #11830: Store HNSW graph connections more compactly

2022-10-06 Thread GitBox
benwtrent commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1270205020 I was able to replicate @jtibshirani results. Using the `glove-100-angular` from ann-benchmarks, here is the graph data size: ``` Baseline: 154M _0_Lucene94Hn

[GitHub] [lucene] jtibshirani commented on issue #11830: Store HNSW graph connections more compactly

2022-10-06 Thread GitBox
jtibshirani commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1270359029 @benwtrent thanks for looking into this! To clarify, `OffHeapHnswGraph` will always be used for searches. Usually it will be paged in from disk before heavy searching, but it's

[GitHub] [lucene] benwtrent commented on issue #11830: Store HNSW graph connections more compactly

2022-10-06 Thread GitBox
benwtrent commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1270415680 > Could you include a comparison of the search latency and recall numbers you see with this approach? Sometimes with our benchmarks it's easy to miss small differences in performa

[GitHub] [lucene] jtibshirani commented on issue #11838: Adding concurrency to query rewrite?

2022-10-06 Thread GitBox
jtibshirani commented on issue #11838: URL: https://github.com/apache/lucene/issues/11838#issuecomment-1270564173 This is a great question! To me it'd make sense to try to move costly steps into weight creation or scoring. It feels a little "off" to do the bulk of a query's work during rewr