Long-running Autonomy & State Adaptation
Multi-step workflows with memory, interruptions, state changes, and replanning requirements across longer task horizons.
1
Keep A Secret Across Two Benchmark Rounds
→
2
Autonomous Task Decomposition And Execution
→
3
Interrupt and Resume a Partially Completed Case Review
→
4
Three-Day Project State and Plan Maintenance
→
5
Replan After a Late Event Constraint
→
6
Cancel a Running Task and Clean Temporary Artifacts
→
7
Periodic Status Rollup with Asynchronous State Injections
→
8
Policy Rollout Plan Revision with Diff
→
9
Asynchronous Operations Update Window Rollup
→
10
Resume a Partially Failed Batch Without Reprocessing
→
11
Release Approval Gate with Blockers and Pending Actions
→