nicktelford commented on code in PR #16922:
URL: https://github.com/apache/kafka/pull/16922#discussion_r1735043231


##########
streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java:
##########
@@ -182,6 +206,118 @@ private boolean lockStateDirectory() {
         return stateDirLock != null;
     }
 
+    public void initializeTasksForLocalState(final TopologyMetadata 
topologyMetadata, final StreamsMetricsImpl streamsMetrics) {
+        final List<TaskDirectory> nonEmptyTaskDirectories = 
listNonEmptyTaskDirectories();
+        if (hasPersistentStores && !nonEmptyTaskDirectories.isEmpty()) {
+            final LogContext logContext = new LogContext("main-thread ");
+            final ThreadCache dummyCache = new ThreadCache(logContext, 0, 
streamsMetrics);
+            final boolean eosEnabled = StreamsConfigUtils.eosEnabled(config);
+            final boolean stateUpdaterEnabled = 
StreamsConfig.InternalConfig.stateUpdaterEnabled(config.originals());
+
+            // discover all non-empty task directories in StateDirectory
+            for (final TaskDirectory taskDirectory : nonEmptyTaskDirectories) {
+                final String dirName = taskDirectory.file().getName();
+                final TaskId id = parseTaskDirectoryName(dirName, 
taskDirectory.namedTopology());
+                final ProcessorTopology topology = 
topologyMetadata.buildSubtopology(id);
+                final Set<TopicPartition> inputPartitions = 
topology.sourceTopics().stream().map(topic -> new TopicPartition(topic, 
id.partition())).collect(Collectors.toSet());
+
+                if (topology.hasStateWithChangelogs()) {
+                    final ProcessorStateManager stateManager = new 
ProcessorStateManager(
+                        id,
+                        Task.TaskType.STANDBY,
+                        eosEnabled,
+                        logContext,
+                        this,
+                        null,
+                        topology.storeToChangelogTopic(),
+                        inputPartitions,
+                        stateUpdaterEnabled
+                    );
+
+                    final InternalProcessorContext<Object, Object> context = 
new ProcessorContextImpl(
+                        id,
+                        config,
+                        stateManager,
+                        streamsMetrics,
+                        dummyCache
+                    );
+
+                    final Task task = new StandbyTask(
+                        id,
+                        inputPartitions,
+                        topology,
+                        topologyMetadata.taskConfig(id),
+                        streamsMetrics,
+                        stateManager,
+                        this,
+                        dummyCache,
+                        context
+                    );
+
+                    // initialize and suspend new Tasks
+                    try {
+                        task.initializeIfNeeded();
+                        task.suspend();
+
+                        tasksForLocalState.put(id, task);
+                    } catch (final TaskCorruptedException e) {
+                        // Task is corrupt - wipe it out (under EOS) and don't 
initialize a Standby for it
+                        task.suspend();
+                        task.closeDirty();
+                    }
+                }
+            }
+        }
+    }
+
+    public boolean hasInitialTasks() {
+        return !tasksForLocalState.isEmpty();
+    }
+
+    public Task assignInitialTask(final TaskId taskId) {
+        final Task task = tasksForLocalState.remove(taskId);
+        if (task != null) {
+            lockedTasksToOwner.replace(taskId, Thread.currentThread());
+        }
+        return task;
+    }
+
+    public void closeInitialTasksIfLastAssginedThread() {
+        if (hasInitialTasks() && threadsWithAssignment.incrementAndGet() >= 
numStreamThreads.get()) {

Review Comment:
   > While the KafkaStreams#threads variable is a synchronized list, I am still 
wondering about a race condition?
   > 
   > We call KafkaStreams#start() and the StreamsThreads do stuff in the 
background, and the rebalance step executes, and all threads might call 
closeInitialTasksIfLastAssginedThread concurrently. In parallel, 
addStreamThread() could be called, modifying (increase) the thread count 
breaking this condition? Or would it eventually fix itself, because the new 
StreamsThreads would trigger another rebalance, and thus we would execute this 
again, and would close "dandling tasks" eventually)?
   
   Yeah, I don't love this implementation tbh. The goal is to keep these 
"initial" Tasks open until the end of the first assignment, and then close any 
that weren't assigned to us, as an optimization to avoid open/close/open for 
most tasks. To achieve that, we need some way to know when all the 
StreamThreads that share the same StateDirectory have completed their 
assignment.
   
   I'll have a think if there might be a better way to do this, and look into 
the potential for races here.
   
   > Btw: also wondering about threadsWithAssignment -- if we have more threads 
than tasks, and some threads don't get anything assigned, could this become an 
issue? Or would the rebalance callback get executed in a way such that we call 
closeInitialTasksIfLastAssginedThread even if nothing is assigned to a thread?
   
   My understanding from looking at `TaskManager#handleAssignment` is that 
there's no shortcut, and it would call `closeInitialTasksIfLastAssignedThread` 
regardless of any assigned tasks. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to