mikemccand commented on a change in pull request #132:
URL: https://github.com/apache/lucene/pull/132#discussion_r645598176



##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/TaskSequence.java
##########
@@ -340,12 +340,17 @@ private int doParallelTasks() throws Exception {
 
     initTasksArray();
     ParallelTask t[] = runningParallelTasks = new ParallelTask[repetitions * 
tasks.size()];
+    //Get number of parallel threads from algo file and set it to use in 
ReuersContentSource.java's docCountArrInit()
+    this.getRunData().getConfig().setNumThreads(t.length);
     // prepare threads
     int index = 0;
     for (int k = 0; k < repetitions; k++) {
       for (int i = 0; i < tasksArray.length; i++) {
         final PerfTask task = tasksArray[i].clone();
-        t[index++] = new ParallelTask(task);
+        t[index] = new ParallelTask(task);
+        //Setting unique ThreadName with index value which is used in 
ReuersContentSource.java's getNextDocData()

Review comment:
       Can you strengthen the comment to state that we should NOT change this 
thread name, unless we also fix the String -> int parsing logic in 
`ReutersContentSource`?
   
   Actually, could we factor out this string part of the thread name into a 
`static final String` constant, e.g.`static final String 
PARALLEL_TASK_THREAD_NAME_PREFIX = "ParallelTaskThread";`, and reference that 
constant from both places?

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/TaskSequence.java
##########
@@ -340,12 +340,17 @@ private int doParallelTasks() throws Exception {
 
     initTasksArray();
     ParallelTask t[] = runningParallelTasks = new ParallelTask[repetitions * 
tasks.size()];
+    //Get number of parallel threads from algo file and set it to use in 
ReuersContentSource.java's docCountArrInit()
+    this.getRunData().getConfig().setNumThreads(t.length);
     // prepare threads
     int index = 0;
     for (int k = 0; k < repetitions; k++) {
       for (int i = 0; i < tasksArray.length; i++) {
         final PerfTask task = tasksArray[i].clone();
-        t[index++] = new ParallelTask(task);
+        t[index] = new ParallelTask(task);
+        //Setting unique ThreadName with index value which is used in 
ReuersContentSource.java's getNextDocData()
+        t[index].setName("IndexThread-" + index);

Review comment:
       In general, parallel tasks might be running queries too right?   Maybe 
we should pick a more generic name?  Maybe `ParallelTaskThread-N`?

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -100,21 +100,24 @@ public void close() throws IOException {
 
   @Override
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
-    Path f = null;
-    String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    if (docCountArrCreated == false) {
+      docCountArrInit();
     }
 
+    //Extract ThreadIndex from unique ThreadName (at position 12), which is 
set with '"IndexThread-"+index', in TaskSequence.java's doParallelTasks()
+    int threadIndex = 
Integer.parseInt(Thread.currentThread().getName().substring(12));
+    assert (threadIndex >= 0 && threadIndex < docCountArr.length):"Please 
check threadIndex or docCountArr length";
+    int stride = threadIndex + docCountArr[threadIndex] * docCountArr.length;
+    int inFileSize = inputFiles.size();
+
+    //Modulo Operator covers all three possible senarios i.e. 1. If 
inputFiles.size() < Num Of Threads 2.inputFiles.size() == Num Of Threads 
3.inputFiles.size() > Num Of Threads
+    int fileIndex = stride % inFileSize;

Review comment:
       Hmm do we already guard for the (degenerate) case of `inFileSize == 0`?  
If not can we add some protection here, e.g. maybe throw a clear exception that 
there is nothing to index?

##########
File path: 
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -100,21 +100,24 @@ public void close() throws IOException {
 
   @Override
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, 
IOException {
-    Path f = null;
-    String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    if (docCountArrCreated == false) {
+      docCountArrInit();
     }
 
+    //Extract ThreadIndex from unique ThreadName (at position 12), which is 
set with '"IndexThread-"+index', in TaskSequence.java's doParallelTasks()
+    int threadIndex = 
Integer.parseInt(Thread.currentThread().getName().substring(12));
+    assert (threadIndex >= 0 && threadIndex < docCountArr.length):"Please 
check threadIndex or docCountArr length";
+    int stride = threadIndex + docCountArr[threadIndex] * docCountArr.length;
+    int inFileSize = inputFiles.size();
+
+    //Modulo Operator covers all three possible senarios i.e. 1. If 
inputFiles.size() < Num Of Threads 2.inputFiles.size() == Num Of Threads 
3.inputFiles.size() > Num Of Threads
+    int fileIndex = stride % inFileSize;
+    int iteration = stride / inFileSize;

Review comment:
       Thank you for improving this logic -- much easier to understand now!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to