[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-06 Thread GitBox
benwtrent commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1040969450 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -37,6 +37,7 @@ * @param the type of query vector */ public class HnswGraphSea

[GitHub] [lucene] rmuir commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-06 Thread GitBox
rmuir commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1041043232 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java: ## @@ -236,7 +236,13 @@ public VectorValues getVectorValues

[GitHub] [lucene] rmuir commented on pull request #11946: add similarity threshold for hnsw

2022-12-06 Thread GitBox
rmuir commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1339474637 > What I have in mind would be to implement entirely in the > KnnVectorQuery. Since results are sorted by score, they can easily be > post-filtered there: no need to implement anything

[GitHub] [lucene] rmuir commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

2022-12-06 Thread GitBox
rmuir commented on PR #11998: URL: https://github.com/apache/lucene/pull/11998#issuecomment-1339780615 That's fine. or we could fix `newSearcher` to not wrap with crazy CodecReader's. or we could fix said CodecReaders (since they are only used for tests) to implement the deprecated document

[GitHub] [lucene] rmuir commented on a diff in pull request #11997: Add IntField, LongField, FloatField and DoubleField

2022-12-06 Thread GitBox
rmuir commented on code in PR #11997: URL: https://github.com/apache/lucene/pull/11997#discussion_r1041309581 ## lucene/core/src/java/org/apache/lucene/document/DoubleField.java: ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [lucene] rmuir commented on a diff in pull request #11997: Add IntField, LongField, FloatField and DoubleField

2022-12-06 Thread GitBox
rmuir commented on code in PR #11997: URL: https://github.com/apache/lucene/pull/11997#discussion_r1041312439 ## lucene/core/src/java/org/apache/lucene/document/DoubleField.java: ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [lucene] rmuir commented on pull request #11997: Add IntField, LongField, FloatField and DoubleField

2022-12-06 Thread GitBox
rmuir commented on PR #11997: URL: https://github.com/apache/lucene/pull/11997#issuecomment-1339794352 I like the idea too. Make the .document API simpler for typical use-cases! I added a couple cosmetic comments. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [lucene] rmuir commented on pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

2022-12-06 Thread GitBox
rmuir commented on PR #11998: URL: https://github.com/apache/lucene/pull/11998#issuecomment-1339823026 I got the tests happy for now with 12a5dfaeba954a049675830eabd54bd8f58b51c2 Maybe not the right solution in the end, but makes it easier to iterate when you have passing tests at lea

[GitHub] [lucene] jpountz commented on pull request #11995: enable fully directly copy merge/flush fdt files when index sorting

2022-12-06 Thread GitBox
jpountz commented on PR #11995: URL: https://github.com/apache/lucene/pull/11995#issuecomment-1339962667 Thanks for the explanation of what this PR does. I'm not comfortable with the fact that with your change, stored fields are no longer stored in doc ID order. It's probably a good trade-o

[GitHub] [lucene] jpountz commented on a diff in pull request #11999: Add support for stored fields to MemoryIndex

2022-12-06 Thread GitBox
jpountz commented on code in PR #11999: URL: https://github.com/apache/lucene/pull/11999#discussion_r1041429957 ## lucene/memory/src/java/org/apache/lucene/index/memory/StoredValues.java: ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [lucene] jpountz commented on a diff in pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes

2022-12-06 Thread GitBox
jpountz commented on code in PR #11958: URL: https://github.com/apache/lucene/pull/11958#discussion_r1041440269 ## lucene/core/src/java/org/apache/lucene/store/FilterIndexInput.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [lucene] agorlenko commented on pull request #11946: add similarity threshold for hnsw

2022-12-06 Thread GitBox
agorlenko commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1340151458 I've done some experiments with real data and it seems that it really doesn't work as I expected. If number of docs which exceed threshold is significant (for example 20% or more of pr

[GitHub] [lucene] agorlenko commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-06 Thread GitBox
agorlenko commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1041584638 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -37,6 +37,7 @@ * @param the type of query vector */ public class HnswGraphSea

[GitHub] [lucene] wjp719 commented on pull request #11995: enable fully directly copy merge/flush fdt files when index sorting

2022-12-06 Thread GitBox
wjp719 commented on PR #11995: URL: https://github.com/apache/lucene/pull/11995#issuecomment-1340308711 > It's probably a good trade-off in your case and maybe something you can do in a custom codec Thanks for your reply, does that mean I can add a new custom codec in Lucene? -

[GitHub] [lucene] wjp719 closed pull request #11995: enable fully directly copy merge/flush fdt files when index sorting

2022-12-06 Thread GitBox
wjp719 closed pull request #11995: enable fully directly copy merge/flush fdt files when index sorting URL: https://github.com/apache/lucene/pull/11995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] wjp719 closed pull request #11995: enable fully directly copy merge/flush fdt files when index sorting

2022-12-06 Thread GitBox
wjp719 closed pull request #11995: enable fully directly copy merge/flush fdt files when index sorting URL: https://github.com/apache/lucene/pull/11995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] jpountz merged pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-12-06 Thread GitBox
jpountz merged PR #11860: URL: https://github.com/apache/lucene/pull/11860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz closed issue #11830: Store HNSW graph connections more compactly

2022-12-06 Thread GitBox
jpountz closed issue #11830: Store HNSW graph connections more compactly URL: https://github.com/apache/lucene/issues/11830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u