bobbai00 opened a new issue, #5267:
URL: https://github.com/apache/texera/issues/5267

   ### Task Summary
   
   Each agent in the agent service is a stateful in-memory object with no 
durable backing store.
   
   In `agent-service/src/server.ts` the entire fleet lives in a process-local 
map:
   
   ```
   const agentStore = new Map<string, TexeraAgent>();
   let agentCounter = 0;
   ```
   
   All agent state — the ReAct step tree, current HEAD/checkout pointer, 
per-agent settings, delegate config, and the cached operator-result state — is 
held in `TexeraAgent` instance fields. The only thing persisted today is the 
**workflow content**, which is pushed back to the dashboard backend 
(`persistWorkflow`, debounced 500ms). The agent and its conversation/step 
history are not.
   
   Consequences:
   
   ```
   Before:  restart / crash / redeploy  ->  agentStore is empty  ->  all agents 
+ history lost
            second replica              ->  separate agentStore   ->  agent not 
found
   After:   restart / crash / redeploy  ->  agents rehydrated from store
            any replica                 ->  shared store          ->  agent 
reachable
   ```
   
   This also blocks horizontal scaling: a request routed to a replica that 
doesn't hold the agent gets `Agent not found`.
   
   Introduce a persistence layer (e.g. database or Redis) for agent records and 
their step history, with load-on-demand / rehydration on startup, so agents 
survive restarts and can be served by more than one replica.
   
   ### Task Type
   
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [ ] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [x] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to