[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982891128 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: Volatile is worse than AtomicLong, because it has a barrier on the first read (to get actual value), then do addition, and then write new value with a barrier again. So the AtomicLong increment is better as it is one atomic operation with one barrier (it is actually a separate CPU instruction). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: I would add a local plain `int` field (an int is enough here, as we only write small amounts and it fits better to the parameters passed to the methods) to the IndexOutput, increment it on every write completely unsychronized. After every write call a private method like `maybeUpdateGlobal()` which checks in the plain `int` field if it is larger than a certain value and then does something like `AtomicLong#addAndGet(localIntValue); localIntValue = 0;` Nevertheless we should have a benchmark how much this affects writing. If the filter directory is just for debugging purposes, we can of course go with the full atomic implementation as we have now. Nobody takes care if you want to just watch the amplification factors while debugging your app and it gets slower because of this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: I would add a local plain `int` field (an int is enough here, as we only write small amounts and it fits better to the parameters passed to the methods) to the IndexOutput, increment it on every write completely unsychronized. After every write call a private method like `maybeUpdateGlobal()` which checks in the plain `int` field if it is larger than a certain value and then does something like `AtomicLong#addAndGet(localIntValue); localIntValue = 0;`. Of course if somebody is writing a large `byte[]` we can directly update the atomic long (to prevent overflows of the int). Nevertheless we should have a benchmark how much this affects writing. If the filter directory is just for debugging purposes, we can of course go with the full atomic implementation as we have now. Nobody takes care if you want to just watch the amplification factors while debugging your app and it gets slower because of this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: I would add a local plain `int` field (an int is enough here, as we only write small amounts and it fits better to the parameters passed to the methods) to the IndexOutput, increment it on every write completely unsychronized. After every write call a private method like `maybeUpdateGlobal()` which checks in the plain `int` field if it is larger than a certain value and then does something like `AtomicLong#addAndGet(localIntValue); localIntValue = 0;`. Of course if somebody is writing a large `byte[]` we can directly update the atomic long (to prevent overflows of the int). I think we should make the writes of single bytes effective as it is often used for other data types like short, int, long unless we optimize in FilterDirectory, too. Nevertheless we should have a benchmark how much this affects writing. If the filter directory is just for debugging purposes, we can of course go with the full atomic implementation as we have now. Nobody takes care if you want to just watch the amplification factors while debugging your app and it gets slower because of this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #11828: TermInSetQuery optimization when all docs in a field match a term
mikemccand commented on PR #11828: URL: https://github.com/apache/lucene/pull/11828#issuecomment-1262145491 Awesome use of Lucene's aggregate (corpus) statistics! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose
mikemccand commented on code in PR #11822: URL: https://github.com/apache/lucene/pull/11822#discussion_r983427086 ## lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java: ## @@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() { throw new AssertionError("missing VERSION_KEY"); } + /** + * @return the number of milliseconds to wait during shutdown for remote replicas to close + */ + public int getRemoteCloseTimeout() { +return remoteCloseTimeout; + } + + /** + * Set the number of milliseconds to wait during shutdown for remote replicas to close. {@code -1} + * (the default) means forever, and {@code 0} means don't wait at all. + */ + public void setRemoteCloseTimeout(int remoteCloseTimeout) { +this.remoteCloseTimeout = remoteCloseTimeout; Review Comment: Could we throw `IllegalArgumentException` if this is < -1? ## lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java: ## @@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() { throw new AssertionError("missing VERSION_KEY"); } + /** + * @return the number of milliseconds to wait during shutdown for remote replicas to close + */ + public int getRemoteCloseTimeout() { +return remoteCloseTimeout; + } + + /** + * Set the number of milliseconds to wait during shutdown for remote replicas to close. {@code -1} + * (the default) means forever, and {@code 0} means don't wait at all. + */ + public void setRemoteCloseTimeout(int remoteCloseTimeout) { Review Comment: In addition to a specific test case, we could also randomize the timeout we pass in our randomized tests. Randomly sometimes pass 0, other times a random range or so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mikemccand commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262156464 > Also @dsmiley that's an interesting suggestion! I'm not as familiar with Lucene as some of the other people commenting here but I would be open to adding this to metadata if there are no objections. It would indeed be awesome if this was a first class metric in Lucene. But let's first start with this `Directory` wrapper approach and see how it's used / improved, and later upgrade it to first class if it is helpful? Progress not perfection! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mikemccand commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262157624 > I'm considering exposing write amplification separately for flushes (as `flushedBytes / totalIndexSize`), merges (as `(totalIndexSize + mergedBytes) / totalIndexSize`) and temporary files (as `(totalIndexSize + tempBytes) / totalIndexSize`) and pushing the responsibility to users of whether and how they would like to combine these various metrics? +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #11780: GH#11601: Add ability to compute reader states after refresh
mikemccand commented on code in PR #11780: URL: https://github.com/apache/lucene/pull/11780#discussion_r983451389 ## lucene/core/src/java/org/apache/lucene/search/ReferenceManager.java: ## @@ -219,6 +219,36 @@ public final boolean maybeRefresh() throws IOException { return doTryRefresh; } + /** Compute some state of the reference using a parameter. */ + @FunctionalInterface + public interface StateCalculator { +R calculate(G current, P param); + } + + /** + * If you need to compute something after the refresh, you can use this method instead of {@link + * #maybeRefresh()}. + * + * @throws IOException if refreshing the resource causes an {@link IOException} + */ + public final R maybeRefreshAndReturnState( + StateCalculator refreshedStateCalculator, P param) throws IOException { +ensureOpen(); + +// Ensure only 1 thread does refresh at once; other threads just return immediately: +final boolean doTryRefresh = refreshLock.tryLock(); +if (doTryRefresh) { + try { +doMaybeRefresh(); +return refreshedStateCalculator.calculate(current, param); Review Comment: Yeah I don't think that's important to guard against? An application typically has a separate refresh thread, and that (one) thread will call refresh, compute top-level data structures like `OrdinalMap`, SSDV states, etc., and then wait some time and refresh again. So the application can easily ensure it's not double-calling refresh, and if it might, it should use its own locking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mikemccand commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983455247 ## lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; + +/** An {@link IndexOutput} that wraps another instance and tracks the number of bytes written */ +public class ByteTrackingIndexOutput extends IndexOutput { + + private final IndexOutput output; + private final AtomicLong byteTracker; + + protected ByteTrackingIndexOutput(IndexOutput output, AtomicLong byteTracker) { +super( +"Byte tracking wrapper for: " + output.getName(), +"ByteTrackingIndexOutput{" + output.getName() + "}"); +this.output = output; +this.byteTracker = byteTracker; + } + + @Override + public void writeByte(byte b) throws IOException { +byteTracker.incrementAndGet(); +output.writeByte(b); + } + + @Override + public void writeBytes(byte[] b, int offset, int length) throws IOException { +byteTracker.addAndGet(length); +output.writeBytes(b, offset, length); + } + + @Override + public void close() throws IOException { +output.close(); + } + + @Override + public long getFilePointer() { +return output.getFilePointer(); + } + + @Override + public long getChecksum() throws IOException { +return output.getChecksum(); + } + + public String getWrappedName() { +return output.getName(); + } + + public String getWrappedToString() { +return output.toString(); + } +} Review Comment: I think this is why we need `FilterIndexOutput`? To make delegation always 100%. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mikemccand commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983456888 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: If we switch to tracking only on `close` (and explain the small loss of real-time-ness in the javadocs), then 1) we can privately track in a local `long` the byte count written, 2) the risk to performance should be lower, 3) PR gets simpler/smaller? We can "transfer" the `long bytesWritten` to the shared `AtomicLong` on close. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mikemccand commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983459042 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: Actually, on `close` couldn't we just call `getFilePointer()` to get the number of bytes written? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
rmuir commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262185696 why does this need to do anything more than `getFilePointer()` on `close()` or something to capture how much was written? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #11803: DrillSideways optimizations
gsmiller merged PR #11803: URL: https://github.com/apache/lucene/pull/11803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shubhamvishu commented on issue #11462: Should we create a static factory method for loading VectorValues? [LUCENE-10426]
shubhamvishu commented on issue #11462: URL: https://github.com/apache/lucene/issues/11462#issuecomment-1262204187 I would like to work on it. I'll work on having a PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on issue #11462: Should we create a static factory method for loading VectorValues? [LUCENE-10426]
gsmiller commented on issue #11462: URL: https://github.com/apache/lucene/issues/11462#issuecomment-1262243446 Sounds good @shubhamvishu. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
uschindler commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262284939 Robert is right, why do we need to see the values live? getFilePointer() always works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983549013 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: yes, Mike or Robert are right. you can just capture getFilePointer() on close(). This makes the risk of performance issues zero. IMHO, we don't need the live information while the stuff is written. If we record it at end it should be fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983565903 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: If you are wondering why getFilePointer() works: The name getFilePointer() is a bit misnomed from former times. Nowadays you can't seek while writing to IndexOutput, the semantics are now identical to an OutputStream (it goes forward only, only one thread can write). So after you have written everything, the file pointer is just pointing after the last byte, which is the number of bytes written. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on issue #11830: Store HNSW graph connections more compactly
mayya-sharipova commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1262311806 Nice idea, @jtibshirani! Have you tested what's the performance of reading this way packed values during search? Does it make searches any slower? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #11828: TermInSetQuery optimization when all docs in a field match a term
gsmiller merged PR #11828: URL: https://github.com/apache/lucene/pull/11828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
jpountz commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262327111 >> I'm considering exposing write amplification separately for flushes (as flushedBytes / totalIndexSize), merges (as (totalIndexSize + mergedBytes) / totalIndexSize) and temporary files (as (totalIndexSize + tempBytes) / totalIndexSize) and pushing the responsibility to users of whether and how they would like to combine these various metrics? > > We could add these in addition to the write amplification factor methods that are already present? I think that one question I was raising was also whether the current way we're measuring write amplification is correct. Say you write 10 1MB segments, which then get merged into a 9MB segment (smaller the the sum of the sizes of flushed segments, possibly because many terms are no longer duplicated across segments). What is the write amplification of merging? Is it `(10 + 9) / 10 = 1.9` like the formula that this PR is currently using, computing write ampfilication as a function of the total flush size. Or is it `(9 + 9) / 9 = 2`, computing write amplification as a function of the total index size? The latter feels more intuitive to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits
jpountz commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262350445 If someone opens a PR to decrease the limit from 20% to 5% I'll happily approve the change given the results I shared above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
dsmiley commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262364880 The former is more intuitive to me -- how much more data do we write beyond the initial segment flush. This is the added cost of a system immutable files with log structured merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11830: Store HNSW graph connections more compactly
msokolov commented on issue #11830: URL: https://github.com/apache/lucene/issues/11830#issuecomment-1262415650 I like the idea, although I confess I don't understand how we can make it fixed width! I guess if we know the max number and it is small, we can quantize more cheaply? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983753267 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + private final AtomicLong realTimeFlushedBytes = new AtomicLong(); Review Comment: Sure that sounds really simple then, I'll make a PR to just use `getFilePointer()` on close and just update an `AtomicLong` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mdmarshmallow commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262549378 I agree, the former is also more intuitive to me as well for the same reasons. In your example, 10MB were initially written, and 9MB were written again in merges, so 1.9 to me means that we did 1.9 times the work of initially just writing the 10MB to get to the final 9MB index state. Though since this does seem like it might be a point of confusion for users in that there are multiple ways to calculate this, do you think it would be better to just expose `getFlushedBytes()` and `getMergedBytes()` and let users calculate the write amplification as they see fit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow opened a new pull request, #11831: GITHUB-11761: Move minimum TieredMergePolicy delete percentage from 2…
mdmarshmallow opened a new pull request, #11831: URL: https://github.com/apache/lucene/pull/11831 …0% to 5% ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits
mdmarshmallow commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262618406 Here is a PR: https://github.com/apache/lucene/pull/11831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] danmuzi commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
danmuzi commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983858651 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + public final boolean trackTempOutput; + + public ByteWritesTrackingDirectoryWrapper(Directory in) { +this(in, false); + } + /** + * Constructor with option to track tempOutput + * + * @param in input Directory + * @param trackTempOutput if true, will also track temporary outputs created by this directory + */ + public ByteWritesTrackingDirectoryWrapper(Directory in, boolean trackTempOutput) { +super(in); +this.trackTempOutput = trackTempOutput; + } + + @Override + public IndexOutput createOutput(String name, IOContext ioContext) throws IOException { +IndexOutput output = in.createOutput(name, ioContext); +ByteTrackingIndexOutput byteTrackingIndexOutput; +if (ioContext.context.equals(IOContext.Context.FLUSH)) { + byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, flushedBytes); +} else if (ioContext.context.equals(IOContext.Context.MERGE)) { + byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, mergedBytes); +} else { + return output; +} +return byteTrackingIndexOutput; + } + + @Override + public IndexOutput createTempOutput(String prefix, String suffix, IOContext ioContext) + throws IOException { +IndexOutput output = in.createTempOutput(prefix, suffix, ioContext); +if (trackTempOutput) { + ByteTrackingIndexOutput byteTrackingIndexOutput; + if (ioContext.context.equals(IOContext.Context.FLUSH)) { +byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, flushedBytes); + } else if (ioContext.context.equals(IOContext.Context.MERGE)) { +byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, mergedBytes); + } else { +return output; + } + return byteTrackingIndexOutput; Review Comment: There are duplicate codes. How about changing it to: ```java @Override public IndexOutput createOutput(String name, IOContext ioContext) throws IOException { IndexOutput output = in.createOutput(name, ioContext); return createByteTrackingOutput(output, ioContext.context); } @Override public IndexOutput createTempOutput(String prefix, String suffix, IOContext ioContext) throws IOException { IndexOutput output = in.createTempOutput(prefix, suffix, ioContext); return trackTempOutput ? createByteTrackingOutput(output, ioContext.context) : output; } private IndexOutput createByteTrackingOutput(IndexOutput output, IOContext.Context context) { // If you want to add this feature to Lucene 9, you should modify it to the old switch-case style. return switch (context) { case FLUSH -> new ByteTrackingIndexOutput(output, flushedBytes); case MERGE -> new ByteTrackingIndexOutput(output, mergedBytes); default -> output; }; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: is
[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983963180 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + public final boolean trackTempOutput; + + public ByteWritesTrackingDirectoryWrapper(Directory in) { +this(in, false); + } + /** + * Constructor with option to track tempOutput + * + * @param in input Directory + * @param trackTempOutput if true, will also track temporary outputs created by this directory + */ + public ByteWritesTrackingDirectoryWrapper(Directory in, boolean trackTempOutput) { +super(in); +this.trackTempOutput = trackTempOutput; + } + + @Override + public IndexOutput createOutput(String name, IOContext ioContext) throws IOException { +IndexOutput output = in.createOutput(name, ioContext); +ByteTrackingIndexOutput byteTrackingIndexOutput; +if (ioContext.context.equals(IOContext.Context.FLUSH)) { + byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, flushedBytes); +} else if (ioContext.context.equals(IOContext.Context.MERGE)) { + byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, mergedBytes); +} else { + return output; +} +return byteTrackingIndexOutput; + } + + @Override + public IndexOutput createTempOutput(String prefix, String suffix, IOContext ioContext) + throws IOException { +IndexOutput output = in.createTempOutput(prefix, suffix, ioContext); +if (trackTempOutput) { + ByteTrackingIndexOutput byteTrackingIndexOutput; + if (ioContext.context.equals(IOContext.Context.FLUSH)) { +byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, flushedBytes); + } else if (ioContext.context.equals(IOContext.Context.MERGE)) { +byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, mergedBytes); + } else { +return output; + } + return byteTrackingIndexOutput; Review Comment: Hi, thanks for the feedback! I made the changes you suggested + added a new CHANGED.txt entry under 9.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits
rmuir commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262741708 > I got some numbers for write amplification for the case tested in TestTieredMergePolicy#testSimulateUpdates: @jpountz based on these numbers wouldn't it also make sense to consider changing the default from `33%` to `20%` as well? Just an idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r983963180 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + public final boolean trackTempOutput; + + public ByteWritesTrackingDirectoryWrapper(Directory in) { +this(in, false); + } + /** + * Constructor with option to track tempOutput + * + * @param in input Directory + * @param trackTempOutput if true, will also track temporary outputs created by this directory + */ + public ByteWritesTrackingDirectoryWrapper(Directory in, boolean trackTempOutput) { +super(in); +this.trackTempOutput = trackTempOutput; + } + + @Override + public IndexOutput createOutput(String name, IOContext ioContext) throws IOException { +IndexOutput output = in.createOutput(name, ioContext); +ByteTrackingIndexOutput byteTrackingIndexOutput; +if (ioContext.context.equals(IOContext.Context.FLUSH)) { + byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, flushedBytes); +} else if (ioContext.context.equals(IOContext.Context.MERGE)) { + byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, mergedBytes); +} else { + return output; +} +return byteTrackingIndexOutput; + } + + @Override + public IndexOutput createTempOutput(String prefix, String suffix, IOContext ioContext) + throws IOException { +IndexOutput output = in.createTempOutput(prefix, suffix, ioContext); +if (trackTempOutput) { + ByteTrackingIndexOutput byteTrackingIndexOutput; + if (ioContext.context.equals(IOContext.Context.FLUSH)) { +byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, flushedBytes); + } else if (ioContext.context.equals(IOContext.Context.MERGE)) { +byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, mergedBytes); + } else { +return output; + } + return byteTrackingIndexOutput; Review Comment: Hi, thanks for the feedback! I made the changes you suggested + added a new CHANGES.txt entry under 9.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shubhamvishu opened a new pull request, #11832: Added static factory method for loading VectorValues
shubhamvishu opened a new pull request, #11832: URL: https://github.com/apache/lucene/pull/11832 I have added a static factory method `getVectorValues` in `VectorValues` which returns the EMPTY `VectorValues` instance if the field is not found in the segment or if its not a vector field it throws an IAE else returns the `VectorValues` instance. Looking forward to the feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11832: Added static factory method for loading VectorValues
rmuir commented on code in PR #11832: URL: https://github.com/apache/lucene/pull/11832#discussion_r983993590 ## lucene/core/src/java/org/apache/lucene/index/VectorValues.java: ## @@ -35,6 +35,25 @@ public abstract class VectorValues extends DocIdSetIterator { /** Sole constructor */ protected VectorValues() {} + /** + * Returns the {@link VectorValues} index for this field, or {@link #EMPTY} if it has none. + * + * @param reader Leaf reader instance + * @param field Field name + * @return VectorValues instance, or an empty instance if {@code field} does not exist in this + * reader + * @throws IOException if the field does not have any vector values + */ + public static VectorValues getVectorValues(LeafReader reader, String field) throws IOException { +VectorValues values = reader.getVectorValues(field); +if (values == null) { + return EMPTY; +} else if (!reader.getFieldInfos().fieldInfo(field).hasVectorValues()) { Review Comment: this line of code is unreachable. It is actually the null case where you want to check if the field exists, and if so, does it have vector values. See DocValues as an example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #11832: Added static factory method for loading VectorValues
gsmiller commented on code in PR #11832: URL: https://github.com/apache/lucene/pull/11832#discussion_r984025608 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2585,7 +2585,7 @@ public static Status.VectorValuesStatus testVectors( + "\" has vector values but dimension is " + dimension); } -VectorValues values = reader.getVectorValues(fieldInfo.name); +VectorValues values = VectorValues.getVectorValues(reader, fieldInfo.name); if (values == null) { Review Comment: We shouldn't need to check `values == null` here anymore right since `VectorValues#getVectorValues` never returns null? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] danmuzi commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor
danmuzi commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r984184997 ## lucene/CHANGES.txt: ## @@ -98,6 +98,11 @@ API Changes * GITHUB#11804: FacetsCollector#collect is no longer final, allowing extension. (Greg Miller) +New Features +- +* GITHUB#11795: Add ByteWritesTrackingDirectoryWrapper to expose metrics for bytes merged, flushed, and overall +write amplification factor. (Marc D'Mello) Review Comment: Usually, when a new line is needed for a description in CHANGES.txt, two whitespaces are added in front. (see other multi-line description) write amp... -> {space}{space}write amp... ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.misc.store; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FilterDirectory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexOutput; + +/** {@link FilterDirectory} that tracks write amplification factor */ +public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory { + + private final AtomicLong flushedBytes = new AtomicLong(); + private final AtomicLong mergedBytes = new AtomicLong(); + public final boolean trackTempOutput; + + public ByteWritesTrackingDirectoryWrapper(Directory in) { +this(in, false); + } + /** + * Constructor with option to track tempOutput + * + * @param in input Directory + * @param trackTempOutput if true, will also track temporary outputs created by this directory + */ + public ByteWritesTrackingDirectoryWrapper(Directory in, boolean trackTempOutput) { +super(in); +this.trackTempOutput = trackTempOutput; + } + + @Override + public IndexOutput createOutput(String name, IOContext ioContext) throws IOException { +IndexOutput output = in.createOutput(name, ioContext); +return createByteTrackingOutput(output, ioContext.context); + } + + @Override + public IndexOutput createTempOutput(String prefix, String suffix, IOContext ioContext) + throws IOException { +IndexOutput output = in.createTempOutput(prefix, suffix, ioContext); +return trackTempOutput ? createByteTrackingOutput(output, ioContext.context) : output; + } + + private IndexOutput createByteTrackingOutput(IndexOutput output, IOContext.Context context) { +switch (context) { + case FLUSH: +return new ByteTrackingIndexOutput(output, flushedBytes); + case MERGE: +return new ByteTrackingIndexOutput(output, mergedBytes); + case DEFAULT: + case READ: + default: Review Comment: DEFAULT and READ can be handled by default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org