[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


uschindler commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r982891128


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   Volatile is worse than AtomicLong, because it has a barrier on the first 
read (to get actual value), then do addition, and then write new value with a 
barrier again. So the AtomicLong increment is better as it is one atomic 
operation with one barrier (it is actually a separate CPU instruction).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


uschindler commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   I would add a local plain `int` field (an int is enough here, as we only 
write small amounts and it fits better to the parameters passed to the methods) 
to the IndexOutput, increment it on every write completely unsychronized. After 
every write call a private method  like `maybeUpdateGlobal()` which checks in 
the plain `int` field if it is larger than a certain value and then does 
something like `AtomicLong#addAndGet(localIntValue); localIntValue = 0;`
   
   Nevertheless we should have a benchmark how much this affects writing. If 
the filter directory is just for debugging purposes, we can of course go with 
the full atomic implementation as we have now. Nobody takes care if you want to 
just watch the amplification factors while debugging your app and it gets 
slower because of this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


uschindler commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   I would add a local plain `int` field (an int is enough here, as we only 
write small amounts and it fits better to the parameters passed to the methods) 
to the IndexOutput, increment it on every write completely unsychronized. After 
every write call a private method  like `maybeUpdateGlobal()` which checks in 
the plain `int` field if it is larger than a certain value and then does 
something like `AtomicLong#addAndGet(localIntValue); localIntValue = 0;`. Of 
course if somebody is writing a large `byte[]` we can directly update the 
atomic long (to prevent overflows of the int).
   
   Nevertheless we should have a benchmark how much this affects writing. If 
the filter directory is just for debugging purposes, we can of course go with 
the full atomic implementation as we have now. Nobody takes care if you want to 
just watch the amplification factors while debugging your app and it gets 
slower because of this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


uschindler commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983415631


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   I would add a local plain `int` field (an int is enough here, as we only 
write small amounts and it fits better to the parameters passed to the methods) 
to the IndexOutput, increment it on every write completely unsychronized. After 
every write call a private method  like `maybeUpdateGlobal()` which checks in 
the plain `int` field if it is larger than a certain value and then does 
something like `AtomicLong#addAndGet(localIntValue); localIntValue = 0;`. Of 
course if somebody is writing a large `byte[]` we can directly update the 
atomic long (to prevent overflows of the int). I think we should make the 
writes of single bytes effective as it is often used for other data types like 
short, int, long unless we optimize in FilterDirectory, too.
   
   Nevertheless we should have a benchmark how much this affects writing. If 
the filter directory is just for debugging purposes, we can of course go with 
the full atomic implementation as we have now. Nobody takes care if you want to 
just watch the amplification factors while debugging your app and it gets 
slower because of this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #11828: TermInSetQuery optimization when all docs in a field match a term

2022-09-29 Thread GitBox


mikemccand commented on PR #11828:
URL: https://github.com/apache/lucene/pull/11828#issuecomment-1262145491

   Awesome use of Lucene's aggregate (corpus) statistics!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a diff in pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

2022-09-29 Thread GitBox


mikemccand commented on code in PR #11822:
URL: https://github.com/apache/lucene/pull/11822#discussion_r983427086


##
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java:
##
@@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() {
 throw new AssertionError("missing VERSION_KEY");
   }
 
+  /**
+   * @return the number of milliseconds to wait during shutdown for remote 
replicas to close
+   */
+  public int getRemoteCloseTimeout() {
+return remoteCloseTimeout;
+  }
+
+  /**
+   * Set the number of milliseconds to wait during shutdown for remote 
replicas to close. {@code -1}
+   * (the default) means forever, and {@code 0} means don't wait at all.
+   */
+  public void setRemoteCloseTimeout(int remoteCloseTimeout) {
+this.remoteCloseTimeout = remoteCloseTimeout;

Review Comment:
   Could we throw `IllegalArgumentException` if this is < -1?



##
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java:
##
@@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() {
 throw new AssertionError("missing VERSION_KEY");
   }
 
+  /**
+   * @return the number of milliseconds to wait during shutdown for remote 
replicas to close
+   */
+  public int getRemoteCloseTimeout() {
+return remoteCloseTimeout;
+  }
+
+  /**
+   * Set the number of milliseconds to wait during shutdown for remote 
replicas to close. {@code -1}
+   * (the default) means forever, and {@code 0} means don't wait at all.
+   */
+  public void setRemoteCloseTimeout(int remoteCloseTimeout) {

Review Comment:
   In addition to a specific test case, we could also randomize the timeout we 
pass in our randomized tests.  Randomly sometimes pass 0, other times a random 
range or so.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mikemccand commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262156464

   > Also @dsmiley that's an interesting suggestion! I'm not as familiar with 
Lucene as some of the other people commenting here but I would be open to 
adding this to metadata if there are no objections.
   
   It would indeed be awesome if this was a first class metric in Lucene.  But 
let's first start with this `Directory` wrapper approach and see how it's used 
/ improved, and later upgrade it to first class if it is helpful?  Progress not 
perfection!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mikemccand commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262157624

   > I'm considering exposing write amplification separately for flushes (as 
`flushedBytes / totalIndexSize`), merges (as `(totalIndexSize + mergedBytes) / 
totalIndexSize`) and temporary files (as `(totalIndexSize + tempBytes) / 
totalIndexSize`) and pushing the responsibility to users of whether and how 
they would like to combine these various metrics?
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a diff in pull request #11780: GH#11601: Add ability to compute reader states after refresh

2022-09-29 Thread GitBox


mikemccand commented on code in PR #11780:
URL: https://github.com/apache/lucene/pull/11780#discussion_r983451389


##
lucene/core/src/java/org/apache/lucene/search/ReferenceManager.java:
##
@@ -219,6 +219,36 @@ public final boolean maybeRefresh() throws IOException {
 return doTryRefresh;
   }
 
+  /** Compute some state of the reference using a parameter. */
+  @FunctionalInterface
+  public interface StateCalculator {
+R calculate(G current, P param);
+  }
+
+  /**
+   * If you need to compute something after the refresh, you can use this 
method instead of {@link
+   * #maybeRefresh()}.
+   *
+   * @throws IOException if refreshing the resource causes an {@link 
IOException}
+   */
+  public final  R maybeRefreshAndReturnState(
+  StateCalculator refreshedStateCalculator, P param) throws 
IOException {
+ensureOpen();
+
+// Ensure only 1 thread does refresh at once; other threads just return 
immediately:
+final boolean doTryRefresh = refreshLock.tryLock();
+if (doTryRefresh) {
+  try {
+doMaybeRefresh();
+return refreshedStateCalculator.calculate(current, param);

Review Comment:
   Yeah I don't think that's important to guard against?
   
   An application typically has a separate refresh thread, and that (one) 
thread will call refresh, compute top-level data structures like `OrdinalMap`, 
SSDV states, etc., and then wait some time and refresh again.  So the 
application can easily ensure it's not double-calling refresh, and if it might, 
it should use its own locking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mikemccand commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983455247


##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** An {@link IndexOutput} that wraps another instance and tracks the number 
of bytes written */
+public class ByteTrackingIndexOutput extends IndexOutput {
+
+  private final IndexOutput output;
+  private final AtomicLong byteTracker;
+
+  protected ByteTrackingIndexOutput(IndexOutput output, AtomicLong 
byteTracker) {
+super(
+"Byte tracking wrapper for: " + output.getName(),
+"ByteTrackingIndexOutput{" + output.getName() + "}");
+this.output = output;
+this.byteTracker = byteTracker;
+  }
+
+  @Override
+  public void writeByte(byte b) throws IOException {
+byteTracker.incrementAndGet();
+output.writeByte(b);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int offset, int length) throws IOException {
+byteTracker.addAndGet(length);
+output.writeBytes(b, offset, length);
+  }
+
+  @Override
+  public void close() throws IOException {
+output.close();
+  }
+
+  @Override
+  public long getFilePointer() {
+return output.getFilePointer();
+  }
+
+  @Override
+  public long getChecksum() throws IOException {
+return output.getChecksum();
+  }
+
+  public String getWrappedName() {
+return output.getName();
+  }
+
+  public String getWrappedToString() {
+return output.toString();
+  }
+}

Review Comment:
   I think this is why we need `FilterIndexOutput`?  To make delegation always 
100%.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mikemccand commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983456888


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   If we switch to tracking only on `close` (and explain the small loss of 
real-time-ness in the javadocs), then 1) we can privately track in a local 
`long` the byte count written, 2) the risk to performance should be lower, 3) 
PR gets simpler/smaller?  We can "transfer" the `long bytesWritten` to the 
shared `AtomicLong` on close.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mikemccand commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983459042


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   Actually, on `close` couldn't we just call `getFilePointer()` to get the 
number of bytes written?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


rmuir commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262185696

   why does this need to do anything more than `getFilePointer()` on `close()` 
or something to capture how much was written?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #11803: DrillSideways optimizations

2022-09-29 Thread GitBox


gsmiller merged PR #11803:
URL: https://github.com/apache/lucene/pull/11803


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shubhamvishu commented on issue #11462: Should we create a static factory method for loading VectorValues? [LUCENE-10426]

2022-09-29 Thread GitBox


shubhamvishu commented on issue #11462:
URL: https://github.com/apache/lucene/issues/11462#issuecomment-1262204187

   I would like to work on it. I'll work on having a PR for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on issue #11462: Should we create a static factory method for loading VectorValues? [LUCENE-10426]

2022-09-29 Thread GitBox


gsmiller commented on issue #11462:
URL: https://github.com/apache/lucene/issues/11462#issuecomment-1262243446

   Sounds good @shubhamvishu. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


uschindler commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262284939

   Robert is right, why do we need to see the values live? getFilePointer() 
always works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


uschindler commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983549013


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   yes, Mike or Robert are right. you can just capture getFilePointer() on 
close(). This makes the risk of performance issues zero.
   IMHO, we don't need the live information while the stuff is written. If we 
record it at end it should be fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


uschindler commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983565903


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   If you are wondering why getFilePointer() works: The name getFilePointer() 
is a bit misnomed from former times. Nowadays you can't seek while writing to 
IndexOutput, the semantics are now identical to an OutputStream (it goes 
forward only, only one thread can write). So after you have written everything, 
the file pointer is just pointing after the last byte, which is the number of 
bytes written.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on issue #11830: Store HNSW graph connections more compactly

2022-09-29 Thread GitBox


mayya-sharipova commented on issue #11830:
URL: https://github.com/apache/lucene/issues/11830#issuecomment-1262311806

   Nice idea, @jtibshirani! 
   Have you tested what's the performance of reading this way packed values 
during search?  Does it make searches any slower? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #11828: TermInSetQuery optimization when all docs in a field match a term

2022-09-29 Thread GitBox


gsmiller merged PR #11828:
URL: https://github.com/apache/lucene/pull/11828


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


jpountz commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262327111

   >> I'm considering exposing write amplification separately for flushes (as 
flushedBytes / totalIndexSize), merges (as (totalIndexSize + mergedBytes) / 
totalIndexSize) and temporary files (as (totalIndexSize + tempBytes) / 
totalIndexSize) and pushing the responsibility to users of whether and how they 
would like to combine these various metrics?
   >
   > We could add these in addition to the write amplification factor methods 
that are already present?
   
   I think that one question I was raising was also whether the current way 
we're measuring write amplification is correct. Say you write 10 1MB segments, 
which then get merged into a 9MB segment (smaller the the sum of the sizes of 
flushed segments, possibly because many terms are no longer duplicated across 
segments). What is the write amplification of merging? Is it `(10 + 9) / 10 = 
1.9` like the formula that this PR is currently using, computing write 
ampfilication as a function of the total flush size. Or is it `(9 + 9) / 9 = 
2`, computing write amplification as a function of the total index size? The 
latter feels more intuitive to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-29 Thread GitBox


jpountz commented on issue #11761:
URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262350445

   If someone opens a PR to decrease the limit from 20% to 5% I'll happily 
approve the change given the results I shared above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


dsmiley commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262364880

   The former is more intuitive to me -- how much more data do we write beyond 
the initial segment flush.  This is the added cost of a system immutable files 
with log structured merge.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on issue #11830: Store HNSW graph connections more compactly

2022-09-29 Thread GitBox


msokolov commented on issue #11830:
URL: https://github.com/apache/lucene/issues/11830#issuecomment-1262415650

   I like the idea, although I confess I don't understand how we can make it 
fixed width! I guess if we know the max number and it is small, we can quantize 
more cheaply?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mdmarshmallow commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983753267


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   Sure that sounds really simple then, I'll make a PR to just use 
`getFilePointer()` on close and just update an `AtomicLong`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mdmarshmallow commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262549378

   I agree, the former is also more intuitive to me as well for the same 
reasons. In your example, 10MB were initially written, and 9MB were written 
again in merges, so 1.9 to me means that we did 1.9 times the work of initially 
just writing the 10MB to get to the final 9MB index state.
   
   Though since this does seem like it might be a point of confusion for users 
in that there are multiple ways to calculate this, do you think it would be 
better to just expose `getFlushedBytes()` and `getMergedBytes()` and let users 
calculate the write amplification as they see fit?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow opened a new pull request, #11831: GITHUB-11761: Move minimum TieredMergePolicy delete percentage from 2…

2022-09-29 Thread GitBox


mdmarshmallow opened a new pull request, #11831:
URL: https://github.com/apache/lucene/pull/11831

   …0% to 5%
   
   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-29 Thread GitBox


mdmarshmallow commented on issue #11761:
URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262618406

   Here is a PR: https://github.com/apache/lucene/pull/11831


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] danmuzi commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


danmuzi commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983858651


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  public final boolean trackTempOutput;
+
+  public ByteWritesTrackingDirectoryWrapper(Directory in) {
+this(in, false);
+  }
+  /**
+   * Constructor with option to track tempOutput
+   *
+   * @param in input Directory
+   * @param trackTempOutput if true, will also track temporary outputs created 
by this directory
+   */
+  public ByteWritesTrackingDirectoryWrapper(Directory in, boolean 
trackTempOutput) {
+super(in);
+this.trackTempOutput = trackTempOutput;
+  }
+
+  @Override
+  public IndexOutput createOutput(String name, IOContext ioContext) throws 
IOException {
+IndexOutput output = in.createOutput(name, ioContext);
+ByteTrackingIndexOutput byteTrackingIndexOutput;
+if (ioContext.context.equals(IOContext.Context.FLUSH)) {
+  byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
flushedBytes);
+} else if (ioContext.context.equals(IOContext.Context.MERGE)) {
+  byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
mergedBytes);
+} else {
+  return output;
+}
+return byteTrackingIndexOutput;
+  }
+
+  @Override
+  public IndexOutput createTempOutput(String prefix, String suffix, IOContext 
ioContext)
+  throws IOException {
+IndexOutput output = in.createTempOutput(prefix, suffix, ioContext);
+if (trackTempOutput) {
+  ByteTrackingIndexOutput byteTrackingIndexOutput;
+  if (ioContext.context.equals(IOContext.Context.FLUSH)) {
+byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
flushedBytes);
+  } else if (ioContext.context.equals(IOContext.Context.MERGE)) {
+byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
mergedBytes);
+  } else {
+return output;
+  }
+  return byteTrackingIndexOutput;

Review Comment:
   There are duplicate codes.
   How about changing it to:
   ```java
 @Override
 public IndexOutput createOutput(String name, IOContext ioContext) throws 
IOException {
   IndexOutput output = in.createOutput(name, ioContext);
   return createByteTrackingOutput(output, ioContext.context);
 }
   
 @Override
 public IndexOutput createTempOutput(String prefix, String suffix, 
IOContext ioContext)
 throws IOException {
   IndexOutput output = in.createTempOutput(prefix, suffix, ioContext);
   return trackTempOutput ? createByteTrackingOutput(output, 
ioContext.context) : output;
 }
   
 private IndexOutput createByteTrackingOutput(IndexOutput output, 
IOContext.Context context) {
  // If you want to add this feature to Lucene 9, you should modify it to 
the old switch-case style.
   return switch (context) {
 case FLUSH -> new ByteTrackingIndexOutput(output, flushedBytes);
 case MERGE -> new ByteTrackingIndexOutput(output, mergedBytes);
 default -> output;
   };
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: is

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mdmarshmallow commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983963180


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  public final boolean trackTempOutput;
+
+  public ByteWritesTrackingDirectoryWrapper(Directory in) {
+this(in, false);
+  }
+  /**
+   * Constructor with option to track tempOutput
+   *
+   * @param in input Directory
+   * @param trackTempOutput if true, will also track temporary outputs created 
by this directory
+   */
+  public ByteWritesTrackingDirectoryWrapper(Directory in, boolean 
trackTempOutput) {
+super(in);
+this.trackTempOutput = trackTempOutput;
+  }
+
+  @Override
+  public IndexOutput createOutput(String name, IOContext ioContext) throws 
IOException {
+IndexOutput output = in.createOutput(name, ioContext);
+ByteTrackingIndexOutput byteTrackingIndexOutput;
+if (ioContext.context.equals(IOContext.Context.FLUSH)) {
+  byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
flushedBytes);
+} else if (ioContext.context.equals(IOContext.Context.MERGE)) {
+  byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
mergedBytes);
+} else {
+  return output;
+}
+return byteTrackingIndexOutput;
+  }
+
+  @Override
+  public IndexOutput createTempOutput(String prefix, String suffix, IOContext 
ioContext)
+  throws IOException {
+IndexOutput output = in.createTempOutput(prefix, suffix, ioContext);
+if (trackTempOutput) {
+  ByteTrackingIndexOutput byteTrackingIndexOutput;
+  if (ioContext.context.equals(IOContext.Context.FLUSH)) {
+byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
flushedBytes);
+  } else if (ioContext.context.equals(IOContext.Context.MERGE)) {
+byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
mergedBytes);
+  } else {
+return output;
+  }
+  return byteTrackingIndexOutput;

Review Comment:
   Hi, thanks for the feedback! I made the changes you suggested + added a new 
CHANGED.txt entry under 9.5



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-29 Thread GitBox


rmuir commented on issue #11761:
URL: https://github.com/apache/lucene/issues/11761#issuecomment-1262741708

   > I got some numbers for write amplification for the case tested in 
TestTieredMergePolicy#testSimulateUpdates:
   
   @jpountz based on these numbers wouldn't it also make sense to consider 
changing the default from `33%` to `20%` as well? Just an idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


mdmarshmallow commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r983963180


##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  public final boolean trackTempOutput;
+
+  public ByteWritesTrackingDirectoryWrapper(Directory in) {
+this(in, false);
+  }
+  /**
+   * Constructor with option to track tempOutput
+   *
+   * @param in input Directory
+   * @param trackTempOutput if true, will also track temporary outputs created 
by this directory
+   */
+  public ByteWritesTrackingDirectoryWrapper(Directory in, boolean 
trackTempOutput) {
+super(in);
+this.trackTempOutput = trackTempOutput;
+  }
+
+  @Override
+  public IndexOutput createOutput(String name, IOContext ioContext) throws 
IOException {
+IndexOutput output = in.createOutput(name, ioContext);
+ByteTrackingIndexOutput byteTrackingIndexOutput;
+if (ioContext.context.equals(IOContext.Context.FLUSH)) {
+  byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
flushedBytes);
+} else if (ioContext.context.equals(IOContext.Context.MERGE)) {
+  byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
mergedBytes);
+} else {
+  return output;
+}
+return byteTrackingIndexOutput;
+  }
+
+  @Override
+  public IndexOutput createTempOutput(String prefix, String suffix, IOContext 
ioContext)
+  throws IOException {
+IndexOutput output = in.createTempOutput(prefix, suffix, ioContext);
+if (trackTempOutput) {
+  ByteTrackingIndexOutput byteTrackingIndexOutput;
+  if (ioContext.context.equals(IOContext.Context.FLUSH)) {
+byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
flushedBytes);
+  } else if (ioContext.context.equals(IOContext.Context.MERGE)) {
+byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, 
mergedBytes);
+  } else {
+return output;
+  }
+  return byteTrackingIndexOutput;

Review Comment:
   Hi, thanks for the feedback! I made the changes you suggested + added a new 
CHANGES.txt entry under 9.5



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shubhamvishu opened a new pull request, #11832: Added static factory method for loading VectorValues

2022-09-29 Thread GitBox


shubhamvishu opened a new pull request, #11832:
URL: https://github.com/apache/lucene/pull/11832

   I have added a static factory method `getVectorValues` in `VectorValues` 
which returns the EMPTY `VectorValues` instance if the field is not found in 
the segment or if its not a vector field it throws an IAE else returns the  
`VectorValues` instance. Looking forward to the feedback.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11832: Added static factory method for loading VectorValues

2022-09-29 Thread GitBox


rmuir commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r983993590


##
lucene/core/src/java/org/apache/lucene/index/VectorValues.java:
##
@@ -35,6 +35,25 @@ public abstract class VectorValues extends DocIdSetIterator {
   /** Sole constructor */
   protected VectorValues() {}
 
+  /**
+   * Returns the {@link VectorValues} index for this field, or {@link #EMPTY} 
if it has none.
+   *
+   * @param reader Leaf reader instance
+   * @param field Field name
+   * @return VectorValues instance, or an empty instance if {@code field} does 
not exist in this
+   * reader
+   * @throws IOException if the field does not have any vector values
+   */
+  public static VectorValues getVectorValues(LeafReader reader, String field) 
throws IOException {
+VectorValues values = reader.getVectorValues(field);
+if (values == null) {
+  return EMPTY;
+} else if (!reader.getFieldInfos().fieldInfo(field).hasVectorValues()) {

Review Comment:
   this line of code is unreachable. It is actually the null case where you 
want to check if the field exists, and if so, does it have vector values. See 
DocValues as an example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #11832: Added static factory method for loading VectorValues

2022-09-29 Thread GitBox


gsmiller commented on code in PR #11832:
URL: https://github.com/apache/lucene/pull/11832#discussion_r984025608


##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -2585,7 +2585,7 @@ public static Status.VectorValuesStatus testVectors(
   + "\" has vector values but dimension is "
   + dimension);
 }
-VectorValues values = reader.getVectorValues(fieldInfo.name);
+VectorValues values = VectorValues.getVectorValues(reader, 
fieldInfo.name);
 if (values == null) {

Review Comment:
   We shouldn't need to check `values == null` here anymore right since 
`VectorValues#getVectorValues` never returns null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] danmuzi commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-29 Thread GitBox


danmuzi commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r984184997


##
lucene/CHANGES.txt:
##
@@ -98,6 +98,11 @@ API Changes
 
 * GITHUB#11804: FacetsCollector#collect is no longer final, allowing 
extension. (Greg Miller)
 
+New Features
+-
+* GITHUB#11795: Add ByteWritesTrackingDirectoryWrapper to expose metrics for 
bytes merged, flushed, and overall
+write amplification factor. (Marc D'Mello)

Review Comment:
   Usually, when a new line is needed for a description in CHANGES.txt, two 
whitespaces are added in front.
   (see other multi-line description)
   
   write amp... -> {space}{space}write amp...



##
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  public final boolean trackTempOutput;
+
+  public ByteWritesTrackingDirectoryWrapper(Directory in) {
+this(in, false);
+  }
+  /**
+   * Constructor with option to track tempOutput
+   *
+   * @param in input Directory
+   * @param trackTempOutput if true, will also track temporary outputs created 
by this directory
+   */
+  public ByteWritesTrackingDirectoryWrapper(Directory in, boolean 
trackTempOutput) {
+super(in);
+this.trackTempOutput = trackTempOutput;
+  }
+
+  @Override
+  public IndexOutput createOutput(String name, IOContext ioContext) throws 
IOException {
+IndexOutput output = in.createOutput(name, ioContext);
+return createByteTrackingOutput(output, ioContext.context);
+  }
+
+  @Override
+  public IndexOutput createTempOutput(String prefix, String suffix, IOContext 
ioContext)
+  throws IOException {
+IndexOutput output = in.createTempOutput(prefix, suffix, ioContext);
+return trackTempOutput ? createByteTrackingOutput(output, 
ioContext.context) : output;
+  }
+
+  private IndexOutput createByteTrackingOutput(IndexOutput output, 
IOContext.Context context) {
+switch (context) {
+  case FLUSH:
+return new ByteTrackingIndexOutput(output, flushedBytes);
+  case MERGE:
+return new ByteTrackingIndexOutput(output, mergedBytes);
+  case DEFAULT:
+  case READ:
+  default:

Review Comment:
   DEFAULT and READ can be handled by default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org