[GitHub] [lucene] mikemccand commented on a change in pull request #132: Parallel processing

GitBox Tue, 11 May 2021 08:02:00 -0700


mikemccand commented on a change in pull request #132:
URL: https://github.com/apache/lucene/pull/132#discussion_r630244090




##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +103,33 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    if (threadIndexCreated == false) {
+      createThreadIndex();
+    }
+
+    // Getting file index value which is set for each thread
+    int index = 
Integer.parseInt(Thread.currentThread().getName().substring(12));

Review comment:
       Is `TaskSequence` / `ParallelTask` the only place where new `Thread`s 
are created in benchmarks?
   
   Could you add a comment here pointing to `ParallelTask.java` explaining that 
we named/numbered the threads carefully, and that's why this parsing to `int` 
is safe?

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +103,33 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    if (threadIndexCreated == false) {
+      createThreadIndex();
+    }
+
+    // Getting file index value which is set for each thread
+    int index = 
Integer.parseInt(Thread.currentThread().getName().substring(12));
+    int fIndex = index + threadIndex[index] * threadIndex.length;
+    threadIndex[index]++;
+
+    // Sanity check, if # threads is greater than # input files, wrap index
+    if (index >= inputFilesSize) {
+      index %= inputFilesSize;
     }
 
+    // Check if this thread has exhausted its files
+    if (fIndex >= inputFilesSize) {
+      threadIndex[index] = 0;

Review comment:
       Hmm, in the case where number-of-threads is bigger than 
number-of-input-files, aren't we (always) setting the wrong `index` back to `0` 
here?  Does that matter?  Maybe add a dedicated test case so this new code is 
exercised?

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +103,33 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    if (threadIndexCreated == false) {
+      createThreadIndex();
+    }
+
+    // Getting file index value which is set for each thread
+    int index = 
Integer.parseInt(Thread.currentThread().getName().substring(12));
+    int fIndex = index + threadIndex[index] * threadIndex.length;

Review comment:
       Maybe add `assert index >= 0 && index < threadIndex.length` above this?  
This way if there is some thread naming bug, and assertions are enabled, we hit 
`AssertionError` before `AIOOBE`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand commented on a change in pull request #132: Parallel processing

Reply via email to