[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2345: Benchmark custom

GitBox Thu, 25 Feb 2021 04:14:50 -0800


mikemccand commented on a change in pull request #2345:
URL: https://github.com/apache/lucene-solr/pull/2345#discussion_r581398351




##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +104,43 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    /*
+     * synchronized (this) {
+     * if (nextFile >= inputFiles.size()) { // exhausted files, start a new 
round, unless forever set to false.
+     * if (!forever) {
+     *    throw new NoMoreDataException();
+     * }
+     * nextFile = 0;
+     * iteration++;
+     * }
+     * f = inputFiles.get(nextFile++);
+     * name = f.toRealPath() + "_" +iteration;
+     * }
+     */
+    if (!threadIndexCreated) {

Review comment:
       `if (threadIndexCreated == false) {` instead (to reduce chance of 
accidental future refactoring bugs)?  This likely won't pass our code style 
checker (`gradle precommit`).

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +104,43 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    /*
+     * synchronized (this) {
+     * if (nextFile >= inputFiles.size()) { // exhausted files, start a new 
round, unless forever set to false.
+     * if (!forever) {
+     *    throw new NoMoreDataException();
+     * }
+     * nextFile = 0;
+     * iteration++;
+     * }
+     * f = inputFiles.get(nextFile++);
+     * name = f.toRealPath() + "_" +iteration;
+     * }
+     */
+    if (!threadIndexCreated) {
+      createThreadIndex();
+    }
+
+    int index = (int) Thread.currentThread().getId() % threadIndex.length;
+    int fIndex = index + threadIndex[index] * threadIndex.length;
+    threadIndex[index]++;
+
+    // Sanity check, if # threads is greater than # input files, wrap index
+    if (index >= inputFilesSize) index %= inputFilesSize;

Review comment:
       Can you move the `index %= inputFilesSize` to newline and inside `{ ... 
}` body?

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +104,43 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    /*
+     * synchronized (this) {

Review comment:
       Just delete this old code?  You are replacing it with a more concurrent 
version, yay!

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -146,4 +172,11 @@ public synchronized void resetInputs() throws IOException {
     nextFile = 0;
     iteration = 0;
   }
+
+  private synchronized void createThreadIndex() {
+    if (!threadIndexCreated) {

Review comment:
       `== false` instead?  Or maybe change to `assert threadIndexCreated == 
false` since you also check this up above with a real `if` already?

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +104,43 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    /*
+     * synchronized (this) {
+     * if (nextFile >= inputFiles.size()) { // exhausted files, start a new 
round, unless forever set to false.
+     * if (!forever) {
+     *    throw new NoMoreDataException();
+     * }
+     * nextFile = 0;
+     * iteration++;
+     * }
+     * f = inputFiles.get(nextFile++);
+     * name = f.toRealPath() + "_" +iteration;
+     * }
+     */
+    if (!threadIndexCreated) {
+      createThreadIndex();
+    }
+
+    int index = (int) Thread.currentThread().getId() % threadIndex.length;
+    int fIndex = index + threadIndex[index] * threadIndex.length;
+    threadIndex[index]++;

Review comment:
       I'm confused how this approach ensures that we will indeed index every 
document in the `inputFiles`?
   
   `Thread.currentThread().getId() % threadIndex.length` is not guaranteed to 
reach every possible int from `0 .. threadIndex.length`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2345: Benchmark custom

Reply via email to