Task Domains
Explore Harness Bench Tasks
106 sandboxed offline agent tasks across eight workflow categories, with prompts, fixtures, hooks, oracle graders, LLM rubrics, and model run matrices.
01
Workspace, Tool Use & Multimodal Operations
Files, shell, local web, artifacts
15tasks
โ
02
Office & Business Communication
Meetings, email, docs, slides
12tasks
โ
03
Long-running Autonomy & State Adaptation
Memory, interruptions, replanning
11tasks
โ
04
Software Engineering & Codebase Maintenance
Code repair, tests, CI, migrations
22tasks
โ
05
Knowledge, Evidence & Retrieval
Offline QA, citations, evidence
13tasks
โ
06
SRE, DevOps & Release Ops
Incidents, K8s, rollout decisions
7tasks
โ
07
Data, BI & Finance Analytics
SQL, joins, audits, forecasting
14tasks
โ
08
Vertical Professional Workflows
Legal, HR, medical, support
12tasks
โ