bobbai00 opened a new issue, #5267:
URL: https://github.com/apache/texera/issues/5267
### Task Summary
Each agent in the agent service is a stateful in-memory object with no
durable backing store.
In `agent-service/src/server.ts` the entire fleet lives in a process-local
map:
```
const agentStore = new Map<string, TexeraAgent>();
let agentCounter = 0;
```
All agent state — the ReAct step tree, current HEAD/checkout pointer,
per-agent settings, delegate config, and the cached operator-result state — is
held in `TexeraAgent` instance fields. The only thing persisted today is the
**workflow content**, which is pushed back to the dashboard backend
(`persistWorkflow`, debounced 500ms). The agent and its conversation/step
history are not.
Consequences:
```
Before: restart / crash / redeploy -> agentStore is empty -> all agents
+ history lost
second replica -> separate agentStore -> agent not
found
After: restart / crash / redeploy -> agents rehydrated from store
any replica -> shared store -> agent
reachable
```
This also blocks horizontal scaling: a request routed to a replica that
doesn't hold the agent gets `Agent not found`.
Introduce a persistence layer (e.g. database or Redis) for agent records and
their step history, with load-on-demand / rehydration on startup, so agents
survive restarts and can be served by more than one replica.
### Task Type
- [ ] Refactor / Cleanup
- [ ] DevOps / Deployment / CI
- [ ] Testing / QA
- [ ] Documentation
- [ ] Performance
- [x] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]