[GitHub] [lucene] jpountz commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-18 Thread GitBox
jpountz commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1025393090 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsFormat.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [lucene] jpountz commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-18 Thread GitBox
jpountz commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1026184428 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsReader.java: ## @@ -0,0 +1,497 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [lucene] rmuir commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-18 Thread GitBox
rmuir commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1026351356 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsReader.java: ## @@ -0,0 +1,497 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

[GitHub] [lucene] jpountz commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-18 Thread GitBox
jpountz commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1026383814 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsReader.java: ## @@ -0,0 +1,497 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [lucene] rmuir commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-18 Thread GitBox
rmuir commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1026406375 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsReader.java: ## @@ -0,0 +1,497 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
dweiss commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1320064830 I initially wanted to use Python but without additional libraries it overwhelmed me and I wanted to keep it simple and self-contained. There is some extra verbosity (XML processing in Jav

[GitHub] [lucene] dweiss commented on issue #11329: Add an equivalent of ant's stage-maven-artifacts for the release wizard [LUCENE-10293]

2022-11-18 Thread GitBox
dweiss commented on issue #11329: URL: https://github.com/apache/lucene/issues/11329#issuecomment-1320066031 Patch implemented in #11947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
dweiss commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1320072940 For stand-alone use, it's simple enough too: ``` java dev-tools\scripts\StageArtifacts.java -u dweiss /release/candidate/maven-artifacts ``` will prompt for password for nexus

[GitHub] [lucene] jpountz commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
jpountz commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1320074546 Thanks @dweiss I'll give it a try! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [lucene] madrob commented on a diff in pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
madrob commented on code in PR #11947: URL: https://github.com/apache/lucene/pull/11947#discussion_r1026512732 ## dev-tools/scripts/StageArtifacts.java: ## @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreem

[GitHub] [lucene] gsmiller commented on pull request #11901: Github#11869: Add RangeOnRangeFacetCounts

2022-11-18 Thread GitBox
gsmiller commented on PR #11901: URL: https://github.com/apache/lucene/pull/11901#issuecomment-1320139204 @mdmarshmallow it's similar to FacetSets but a bit different since FacetSets work over stored points, while this would work over stored ranges. I think it would make sense to eventually

[GitHub] [lucene] benwtrent commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-18 Thread GitBox
benwtrent commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1026551961 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsFormat.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [lucene] msokolov commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-18 Thread GitBox
msokolov commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1320161291 +1 to the awesomeness - thanks for iterating on this fruit! - how high-hanging it is depends on one's perspective I guess. I have to say I am mildly amused that we are now using I

[GitHub] [lucene] rmuir opened a new issue, #11948: clean up smoketester GPG leaks

2022-11-18 Thread GitBox
rmuir opened a new issue, #11948: URL: https://github.com/apache/lucene/issues/11948 ### Description smoketester leaks a GPG agent on my computer everytime it runs. @risdenk pointed out this fix from solr: https://github.com/apache/solr/commit/0cfef740617cc40585e3121e0b41e5cc8002471f

[GitHub] [lucene] dweiss commented on a diff in pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
dweiss commented on code in PR #11947: URL: https://github.com/apache/lucene/pull/11947#discussion_r1026571426 ## dev-tools/scripts/StageArtifacts.java: ## @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreem

[GitHub] [lucene] dweiss commented on a diff in pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
dweiss commented on code in PR #11947: URL: https://github.com/apache/lucene/pull/11947#discussion_r1026572395 ## dev-tools/scripts/StageArtifacts.java: ## @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreem

[GitHub] [lucene] dweiss commented on a diff in pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
dweiss commented on code in PR #11947: URL: https://github.com/apache/lucene/pull/11947#discussion_r1026573130 ## dev-tools/scripts/StageArtifacts.java: ## @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreem

[GitHub] [lucene] dweiss opened a new pull request, #11949: Add star import check/validation

2022-11-18 Thread GitBox
dweiss opened a new pull request, #11949: URL: https://github.com/apache/lucene/pull/11949 It's been a few times that I saw a comment on misc. PRs mentioning we want to avoid star imports. Let's just automate it? Seems like we already have tools to help out here. -- This is an automated

[GitHub] [lucene] msokolov commented on issue #11830: Store HNSW graph connections more compactly

2022-11-18 Thread GitBox
msokolov commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1320183235 Hey this looks great! Awesome to see the storage gains with no loss in query time On Thu, Nov 17, 2022 at 2:25 PM Benjamin Trent ***@***.***> wrote: > I changed t

[GitHub] [lucene] gsmiller opened a new pull request, #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-18 Thread GitBox
gsmiller opened a new pull request, #11950: URL: https://github.com/apache/lucene/pull/11950 ### Description This fixes a bug where variants of `BinaryRangeFieldRangeQuery` will result in an NPE if the field doesn't exist in a segment. -- This is an automated message from the Apach

[GitHub] [lucene] dweiss commented on a diff in pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
dweiss commented on code in PR #11947: URL: https://github.com/apache/lucene/pull/11947#discussion_r1026588147 ## dev-tools/scripts/StageArtifacts.java: ## @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreem

[GitHub] [lucene] dweiss merged pull request #11949: Add star import check/validation

2022-11-18 Thread GitBox
dweiss merged PR #11949: URL: https://github.com/apache/lucene/pull/11949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dweiss commented on pull request #11949: Add star import check/validation

2022-11-18 Thread GitBox
dweiss commented on PR #11949: URL: https://github.com/apache/lucene/pull/11949#issuecomment-1320195854 I'll backport to 9x manually. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] rmuir merged pull request #11936: Lower gradle heap: 3GB is unnecessary

2022-11-18 Thread GitBox
rmuir merged PR #11936: URL: https://github.com/apache/lucene/pull/11936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] agorlenko commented on pull request #11946: add similarity threshold for hnsw

2022-11-18 Thread GitBox
agorlenko commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1320221923 If we use only post-filter in KnnVectorQuery, then we have to set k = Integer.MAX_VALUE (or another very big value) and calculate similarity with all vectors. So the complexity would b

[GitHub] [lucene] rmuir opened a new issue, #11951: TestStressIndexing can sometime take minutes

2022-11-18 Thread GitBox
rmuir opened a new issue, #11951: URL: https://github.com/apache/lucene/issues/11951 ### Description I've seen this happen several times, so i think it may not be hard to reproduce, have not tried to use the seed yet: ``` > Task :randomizationInfo Running tests with rando

[GitHub] [lucene] gsmiller commented on a diff in pull request #11928: GH#11922: Allow DisjunctionDISIApproximation to short-circuit

2022-11-18 Thread GitBox
gsmiller commented on code in PR #11928: URL: https://github.com/apache/lucene/pull/11928#discussion_r1026694386 ## lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java: ## @@ -45,29 +51,54 @@ public long cost() { @Override public int docID() {

[GitHub] [lucene] gsmiller commented on a diff in pull request #11928: GH#11922: Allow DisjunctionDISIApproximation to short-circuit

2022-11-18 Thread GitBox
gsmiller commented on code in PR #11928: URL: https://github.com/apache/lucene/pull/11928#discussion_r1026697770 ## lucene/MIGRATE.md: ## @@ -102,6 +102,12 @@ Lucene 9.2 or stay with 9.0. See LUCENE-10558 for more details and workarounds. +### DisjunctionDISIApproximation b

[GitHub] [lucene] dweiss merged pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-18 Thread GitBox
dweiss merged PR #11947: URL: https://github.com/apache/lucene/pull/11947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dweiss closed issue #11329: Add an equivalent of ant's stage-maven-artifacts for the release wizard [LUCENE-10293]

2022-11-18 Thread GitBox
dweiss closed issue #11329: Add an equivalent of ant's stage-maven-artifacts for the release wizard [LUCENE-10293] URL: https://github.com/apache/lucene/issues/11329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [lucene] gsmiller commented on a diff in pull request #11928: GH#11922: Allow DisjunctionDISIApproximation to short-circuit

2022-11-18 Thread GitBox
gsmiller commented on code in PR #11928: URL: https://github.com/apache/lucene/pull/11928#discussion_r1026737221 ## lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java: ## @@ -45,29 +51,54 @@ public long cost() { @Override public int docID() {

[GitHub] [lucene] gsmiller commented on pull request #11928: GH#11922: Allow DisjunctionDISIApproximation to short-circuit

2022-11-18 Thread GitBox
gsmiller commented on PR #11928: URL: https://github.com/apache/lucene/pull/11928#issuecomment-1320356774 @jpountz thanks for the implementation feedback! I've updated the PR, but still plan to do more benchmarking to really understand the benefit, etc. before looking to actually merge this

[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw

2022-11-18 Thread GitBox
msokolov commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1320401647 > If we use only post-filter in KnnVectorQuery, then we have to set k = Integer.MAX_VALUE (or another very big value) and calculate similarity with all vectors. So the complexity would

[GitHub] [lucene] msokolov commented on pull request #11945: Decrease test time for TestManyKnnDocs.testLargeSegment

2022-11-18 Thread GitBox
msokolov commented on PR #11945: URL: https://github.com/apache/lucene/pull/11945#issuecomment-1320406270 oh nice plan, thanks everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [lucene] agorlenko commented on pull request #11946: add similarity threshold for hnsw

2022-11-18 Thread GitBox
agorlenko commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1320416549 But we don't know K - that's the problem. The task which I want to solve sounds like this: find documents with similarity >= 0.76 (for example). We don't have the number of such docume

[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw

2022-11-18 Thread GitBox
msokolov commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1320438152 OK, can we start by providing post-filter? I think this will be a more common use case. I want to find the best docs, and ensure that none of them are terrible. It is less disruptiv

[GitHub] [lucene] agorlenko commented on pull request #11946: add similarity threshold for hnsw

2022-11-18 Thread GitBox
agorlenko commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1320508166 > Can you explain why you want the "find all docs with score > T"? For example, we want to give user only suitable for him/her documents. We have a custom scorer (based on ml-mod

[GitHub] [lucene] benwtrent commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-18 Thread GitBox
benwtrent commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1320660919 OK, I did some more performance testing @jpountz @rmuir Every once in a while, I see some extreme 100%/99.9% latency spikes in KNN search times. This happened on about half of t

[GitHub] [lucene] mdmarshmallow commented on pull request #11901: Github#11869: Add RangeOnRangeFacetCounts

2022-11-18 Thread GitBox
mdmarshmallow commented on PR #11901: URL: https://github.com/apache/lucene/pull/11901#issuecomment-1320700985 Ah yeah good point on the FacetSets... so I actually already use `LongRangeDocValueFields` here: public class `LongRangeDocValuesFacetField extends LongRangeDocValuesField`. The di