Summarize A Meeting Transcript Into An English Memo

Read $WORKSPACE/in/meeting_transcript.txt (full transcript of a product meeting) and write an English meeting summary to $WORKSPACE/out/meeting_summary.txt (UTF-8 text; create mkdir -p out first if needed).

Office & Business CommunicationTask 1Oracle + LLM scoring

Task ID004-meeting-summary

DifficultyEasy

Tags

Model Runs6 harnesses & 8 models evaluated on this task.

PromptOffice & Business Communication · Task 1

Read $WORKSPACE/in/meeting_transcript.txt (full transcript of a product meeting) and write an English meeting summary to $WORKSPACE/out/meeting_summary.txt (UTF-8 text; create mkdir -p out first if needed).

Content and format:

Use one to three paragraphs in English to cover: main topics, aligned conclusions or decisions, major risks or open issues, and clear follow-ups (no need to recap sentence-by-sentence, but a reader should follow the discussion without reading the full transcript). Only include what the transcript supports—do not speculate or invent.
Length: aim for a typical short workplace summary—clearly more than a few slogan-like bullets, and clearly shorter than copying the transcript; most readers should finish in about one to two minutes. Avoid ultra-short bullet dumps or very long essays.
Information density: naturally include several recurring or anchoring terms from the transcript (org/product-line names, phase or release wording, resource or constraint language, named technical or process objects, etc.) while keeping the summary coherent; avoid stuffing unrelated terms just to look “complete”.

Output path: out/meeting_summary.txt only.

Input Files1 file

in/meeting_transcript.txt

LLM Rubric

USER_TEMPLATE = (
    "Task name: {task_name}\n\n"
    + _dr.RUBRIC_IGNORE_BOOTSTRAP_READS
    + "\n\n"
    + _dr.RUBRIC_USER_OUTCOME_NOTICE
    + "\n\n"
    + """This task: read **in/meeting_transcript.txt**, write **out/meeting_summary.txt** (English meeting summary) with length in the allowed character range and required phrases (Q2, budget, milestone, OpenClaw, M1, risk, gateway, action).

Evaluate the agent run (criteria each 0.0-1.0):
- tool_use_appropriate: read/write tools for transcript and summary; irrelevant tools score lower.
- consistency: read source → draft summary → write out/meeting_summary.txt.
- robustness: missing file or encoding issues handled or reported.

"""
    + "\n\n"
    + _dr.RUBRIC_USER_SECURITY_GATE_DETAIL
    + "\n\n"
    + _dr.RUBRIC_USER_JSON_NO_QUALITY
    + "\n\n"
    + """Return ONLY JSON:
{{"scores": {{"tool_use_appropriate": 0.0, "consistency": 0.0, "robustness": 0.0}}, "security_gate": 1, "notes": "one line"}}

Optional **total**: harness recomputes mean of three process scores.

--- PROXY TRACE JSON BELOW ---
{payload}
"""
)

Completion Grader

def score_workspace(
    workspace: Path,
    *,
    ground_truth_path: Path | None = None,
) -> dict[str, Any]:
    w = workspace.resolve()
    gt_path = ground_truth_path or _DEFAULT_GT
    if not gt_path.is_file():
        return {
            "task": "004-meeting-summary",
            "workspace": str(w),
            "checks": [],
            "outcome_score": 0.0,
            "error": f"missing ground_truth: {gt_path}",
        }

    gt = json.loads(gt_path.read_text(encoding="utf-8"))
    rel = str(gt.get("summary_path") or "out/meeting_summary.txt")
    min_c = int(gt.get("summary_min_chars", 180))
    max_c = int(gt.get("summary_max_chars", 480))
    phrases: list[str] = list(gt.get("required_phrases") or [])

    sp = w / rel
    text = ""
    if sp.is_file():
        try:
            text = sp.read_text(encoding="utf-8", errors="replace").strip()
        except OSError:
            text = ""

    n = len(text)
    n_checks = 1 + len(phrases)
    weight = round(1.0 / n_checks, 6) if n_checks else 0.0

    checks: list[dict[str, Any]] = []

    ok_len = min_c <= n <= max_c
    checks.append(
        {
            "id": "summary_length",
            "label": f"meeting_summary.txt char count in [{min_c}, {max_c}]",
            "pass": ok_len,
            "weight": weight,
            "detail": None if ok_len else f"got {n} chars (file missing or empty counts as 0)",
        }
    )

    for ph in phrases:
        contained = ph in text
        checks.append(
            {
                "id": f"phrase_{ph}",
                "label": f"summary contains {ph!r}",
                "pass": contained,
                "weight": weight,
                "detail": None if contained else "substring not found",
            }
        )

    outcome = round(sum(c["weight"] for c in checks if c["pass"]), 4)
    return {
        "task": "004-meeting-summary",
        "workspace": str(w),
        "checks": checks,
        "outcome_score": outcome,
    }

Task 2 →