[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982891128 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] mikemccand commented on pull request #11828: TermInSetQuery optimization when all docs in a field match a term

2022-09-29 Thread GitBox
mikemccand commented on PR #11828: URL: https://github.com/apache/lucene/pull/11828#issuecomment-1262145491 Awesome use of Lucene's aggregate (corpus) statistics! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [lucene] mikemccand commented on a diff in pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

2022-09-29 Thread GitBox
mikemccand commented on code in PR #11822: URL: https://github.com/apache/lucene/pull/11822#discussion_r983427086 ## lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java: ## @@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() { throw

[GitHub] [lucene] mikemccand commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mikemccand commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262156464 > Also @dsmiley that's an interesting suggestion! I'm not as familiar with Lucene as some of the other people commenting here but I would be open to adding this to metadata if there a

[GitHub] [lucene] mikemccand commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mikemccand commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262157624 > I'm considering exposing write amplification separately for flushes (as `flushedBytes / totalIndexSize`), merges (as `(totalIndexSize + mergedBytes) / totalIndexSize`) and temporary

[GitHub] [lucene] mikemccand commented on a diff in pull request #11780: GH#11601: Add ability to compute reader states after refresh

2022-09-29 Thread GitBox
mikemccand commented on code in PR #11780: URL: https://github.com/apache/lucene/pull/11780#discussion_r983451389 ## lucene/core/src/java/org/apache/lucene/search/ReferenceManager.java: ## @@ -219,6 +219,36 @@ public final boolean maybeRefresh() throws IOException { return

[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mikemccand commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983455247 ## lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mikemccand commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983456888 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mikemccand commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983459042 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] rmuir commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
rmuir commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262185696 why does this need to do anything more than `getFilePointer()` on `close()` or something to capture how much was written? -- This is an automated message from the Apache Git Service. To

[GitHub] [lucene] gsmiller merged pull request #11803: DrillSideways optimizations

2022-09-29 Thread GitBox
gsmiller merged PR #11803: URL: https://github.com/apache/lucene/pull/11803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] shubhamvishu commented on issue #11462: Should we create a static factory method for loading VectorValues? [LUCENE-10426]

2022-09-29 Thread GitBox
shubhamvishu commented on issue #11462: URL: https://github.com/apache/lucene/issues/11462#issuecomment-1262204187 I would like to work on it. I'll work on having a PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [lucene] gsmiller commented on issue #11462: Should we create a static factory method for loading VectorValues? [LUCENE-10426]

2022-09-29 Thread GitBox
gsmiller commented on issue #11462: URL: https://github.com/apache/lucene/issues/11462#issuecomment-1262243446 Sounds good @shubhamvishu. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [lucene] uschindler commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
uschindler commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262284939 Robert is right, why do we need to see the values live? getFilePointer() always works. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983549013 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983565903 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] mayya-sharipova commented on issue #11830: Store HNSW graph connections more compactly

2022-09-29 Thread GitBox
mayya-sharipova commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1262311806 Nice idea, @jtibshirani! Have you tested what's the performance of reading this way packed values during search? Does it make searches any slower? -- This is an auto

[GitHub] [lucene] gsmiller merged pull request #11828: TermInSetQuery optimization when all docs in a field match a term

2022-09-29 Thread GitBox
gsmiller merged PR #11828: URL: https://github.com/apache/lucene/pull/11828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] jpountz commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
jpountz commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262327111 >> I'm considering exposing write amplification separately for flushes (as flushedBytes / totalIndexSize), merges (as (totalIndexSize + mergedBytes) / totalIndexSize) and temporary files

[GitHub] [lucene] jpountz commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-29 Thread GitBox
jpountz commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262350445 If someone opens a PR to decrease the limit from 20% to 5% I'll happily approve the change given the results I shared above. -- This is an automated message from the Apache Git Se

[GitHub] [lucene] dsmiley commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
dsmiley commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262364880 The former is more intuitive to me -- how much more data do we write beyond the initial segment flush. This is the added cost of a system immutable files with log structured merge.

[GitHub] [lucene] msokolov commented on issue #11830: Store HNSW graph connections more compactly

2022-09-29 Thread GitBox
msokolov commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1262415650 I like the idea, although I confess I don't understand how we can make it fixed width! I guess if we know the max number and it is small, we can quantize more cheaply? -- This i

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983753267 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] mdmarshmallow commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mdmarshmallow commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262549378 I agree, the former is also more intuitive to me as well for the same reasons. In your example, 10MB were initially written, and 9MB were written again in merges, so 1.9 to me mean

[GitHub] [lucene] mdmarshmallow opened a new pull request, #11831: GITHUB-11761: Move minimum TieredMergePolicy delete percentage from 2…

2022-09-29 Thread GitBox
mdmarshmallow opened a new pull request, #11831: URL: https://github.com/apache/lucene/pull/11831 …0% to 5% ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] mdmarshmallow commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-29 Thread GitBox
mdmarshmallow commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262618406 Here is a PR: https://github.com/apache/lucene/pull/11831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [lucene] danmuzi commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
danmuzi commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983858651 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983963180 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] rmuir commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-29 Thread GitBox
rmuir commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262741708 > I got some numbers for write amplification for the case tested in TestTieredMergePolicy#testSimulateUpdates: @jpountz based on these numbers wouldn't it also make sense to con

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983963180 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] shubhamvishu opened a new pull request, #11832: Added static factory method for loading VectorValues

2022-09-29 Thread GitBox
shubhamvishu opened a new pull request, #11832: URL: https://github.com/apache/lucene/pull/11832 I have added a static factory method `getVectorValues` in `VectorValues` which returns the EMPTY `VectorValues` instance if the field is not found in the segment or if its not a vector field it

[GitHub] [lucene] rmuir commented on a diff in pull request #11832: Added static factory method for loading VectorValues

2022-09-29 Thread GitBox
rmuir commented on code in PR #11832: URL: https://github.com/apache/lucene/pull/11832#discussion_r983993590 ## lucene/core/src/java/org/apache/lucene/index/VectorValues.java: ## @@ -35,6 +35,25 @@ public abstract class VectorValues extends DocIdSetIterator { /** Sole constru

[GitHub] [lucene] gsmiller commented on a diff in pull request #11832: Added static factory method for loading VectorValues

2022-09-29 Thread GitBox
gsmiller commented on code in PR #11832: URL: https://github.com/apache/lucene/pull/11832#discussion_r984025608 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2585,7 +2585,7 @@ public static Status.VectorValuesStatus testVectors( +

[GitHub] [lucene] danmuzi commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox
danmuzi commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r984184997 ## lucene/CHANGES.txt: ## @@ -98,6 +98,11 @@ API Changes * GITHUB#11804: FacetsCollector#collect is no longer final, allowing extension. (Greg Miller) +New Featur