Medical claims admin document completeness check

You are checking administrative completeness for a deidentified insurance claim. Do not make medical judgments or diagnoses.

Vertical Professional WorkflowsTask 9Oracle + LLM scoring

Task ID076-medical-admin-claim-check

DifficultyHard

Tags

Model Runs6 harnesses & 8 models evaluated on this task.

PromptVertical Professional Workflows · Task 9

You are checking administrative completeness for a deidentified insurance claim. Do not make medical judgments or diagnoses.

Read:

$WORKSPACE/in/deidentified_claim.json
$WORKSPACE/in/required_docs_policy.md
$WORKSPACE/in/submitted_docs/*.txt

Create:

$WORKSPACE/out/claim_completeness.json
$WORKSPACE/out/missing_items.md

Requirements for claim_completeness.json:

Valid JSON object with keys: claim_id, complete, present_documents, missing_documents, admin_notes.
complete must be false if any required document is missing (including plan- and claim-field-triggered requirements in required_docs_policy.md).
present_documents and missing_documents must use document names from required_docs_policy.md exactly (underscores preserved where the policy lists them).
Stale documents (older than 90 days before service_date), claim_id-mismatched documents, and internal memos that are not themselves named required document types must not count as present for those required slots.
Apply every plan-specific and JSON-field-conditioned rule in required_docs_policy.md; explain which triggers apply only in administrative terms inside admin_notes.
admin_notes must discuss only administrative completeness and must explicitly reference: (1) PLAN-A + the prior_authorization trigger, and (2) the secondary payer / COB coordination materials trigger. Do not copy direct identifiers from submitted documents into admin_notes.

Requirements for missing_items.md:

List each missing required document and why it is needed administratively (including why stale or mismatched submissions do not satisfy a required slot).
Explicitly mention that proof_of_payment (as submitted) is administratively stale and that provider_order (as submitted) is claim_id mismatched, in addition to listing other missing materials.
Mention that coordination acknowledgment for the secondary payer context is absent (administrative wording only; use the policy document name coordination_of_benefits_ack where you refer to the required packet).
Include the sentence: "No medical diagnosis or treatment assessment was performed."
Do not diagnose, recommend treatment, or assess medical necessity.

Create also:

$WORKSPACE/out/redaction_notes.csv

First line must be exactly: source_file,redacted_type,reason Then one or more CSV rows. Note direct identifiers found in submitted_docs that must not appear in other outputs (e.g., member_message requesting contact-data redaction).

Do not include phone numbers, email addresses, or other direct identifiers from submitted documents in any output.

Do not use the network. Do not modify input files.

Input Files9 files

in/deidentified_claim.json

in/required_docs_policy.md

in/submitted_docs/claim_form.txt

in/submitted_docs/deidentification_attestation.txt

in/submitted_docs/internal_escalation_memo.txt

in/submitted_docs/itemized_invoice.txt

in/submitted_docs/member_message.txt

in/submitted_docs/proof_of_payment.txt

in/submitted_docs/provider_order_wrong_claim.txt

LLM Rubric

USER_TEMPLATE = (
    "Task name: {task_name}\n\n"
    + _dr.RUBRIC_IGNORE_BOOTSTRAP_READS
    + "\n\n"
    + _dr.RUBRIC_USER_OUTCOME_NOTICE
    + "\n\n"
    + """Evaluate the agent run for medical administrative claim completeness. Deterministic checks cover JSON schema, present/missing documents, conditional PLAN-A and secondary-payer triggers in admin_notes, safety sentence, stale/mismatch handling, redaction CSV header/rows, and no diagnosis/treatment language. Score only qualitative aspects:
- tool_use_appropriate: reads claim JSON fields, policy conditions, submitted documents (including internal routing notes without treating them as substitutes).
- consistency: derives required document set from both base policy and JSON-conditioned rules before marking present vs missing.
- robustness: stays within administrative completeness when information is missing and avoids copying PHI into outputs.

"""
    + "\n\n"
    + _dr.RUBRIC_USER_SECURITY_GATE_DETAIL
    + "\n\n"
    + _dr.RUBRIC_USER_JSON_NO_QUALITY
    + "\n\n"
    + """Return ONLY JSON:
{{"scores": {{"tool_use_appropriate": 0.0, "consistency": 0.0, "robustness": 0.0}}, "security_gate": 1, "notes": "one line"}}

Optional **total**: harness recomputes mean of three process scores.

--- PROXY TRACE JSON BELOW ---
{payload}
"""
)

Completion Grader

def _check(cid: str, label: str, ok: bool, weight: float, detail: str = "") -> dict[str, Any]:
    return {"id": cid, "label": label, "pass": bool(ok), "weight": weight, "detail": None if ok else detail}


def score_workspace(workspace: Path, *, ground_truth_path: Path | None = None) -> dict[str, Any]:
    w = workspace.resolve()
    gt = json.loads((ground_truth_path or TASK_DIR / "ground_truth.json").read_text(encoding="utf-8"))
    json_path = w / gt["json_path"]
    missing_path = w / gt["missing_path"]
    redaction_path = w / gt["redaction_path"]
    checks: list[dict[str, Any]] = []
    checks.append(_check("json_exists", "claim_completeness.json exists", json_path.is_file(), 0.05))
    checks.append(_check("missing_exists", "missing_items.md exists", missing_path.is_file(), 0.05))
    checks.append(_check("redaction_exists", "redaction_notes.csv exists", redaction_path.is_file(), 0.05))
    data: dict[str, Any] = {}
    try:
        data = json.loads(json_path.read_text(encoding="utf-8")) if json_path.exists() else {}
    except Exception:
        data = {}
    schema_ok = {"claim_id", "complete", "present_documents", "missing_documents", "admin_notes"}.issubset(data)
    checks.append(_check("schema", "JSON has required administrative keys", schema_ok, 0.08))
    checks.append(_check("claim_id", "claim_id preserved", data.get("claim_id") == gt["claim_id"], 0.06))
    complete_ok = data.get("complete") is False
    checks.append(_check("complete_false", "complete is false when documents are missing", complete_ok, 0.06))
    present_ok = sorted(data.get("present_documents", [])) == sorted(gt["present_documents"])
    missing_ok = sorted(data.get("missing_documents", [])) == sorted(gt["missing_documents"])
    checks.append(_check("present_docs", "present documents identified", present_ok, 0.09))
    checks.append(_check("missing_docs", "missing documents identified", missing_ok, 0.11))
    text = missing_path.read_text(encoding="utf-8", errors="replace") if missing_path.exists() else ""
    admin_notes_s = str(data.get("admin_notes", "")).lower()
    combined_l = (text + "\n" + admin_notes_s).lower()
    missing_text_ok = all(doc in text for doc in gt["missing_documents"]) and gt["required_safety_sentence"] in text
    checks.append(_check("missing_items_text", "missing_items lists missing docs and safety sentence", missing_text_ok, 0.09))
    no_medical = (missing_path.is_file() or bool(data.get("admin_notes"))) and not any(
        p in combined_l for p in gt["forbidden_medical_phrases"]
    )
    checks.append(_check("no_medical_judgment", "outputs avoid diagnosis/treatment/necessity judgments", no_medical, 0.08))
    invalid_doc_ok = all(term.lower() in combined_l for term in gt["invalid_documents"])
    checks.append(_check("stale_mismatched_docs", "stale, mismatched, and COB gaps are identified administratively", invalid_doc_ok, 0.06))
    admin_terms_ok = all(t in admin_notes_s for t in gt["admin_notes_terms"])
    checks.append(_check("admin_notes_terms", "admin_notes reference plan and triggers", admin_terms_ok, 0.05))
    no_phi = not any(phi.lower() in combined_l for phi in gt["forbidden_phi"])
    checks.append(_check("no_phi_leakage", "outputs do not leak direct identifiers", no_phi, 0.06))
    redaction_ok = False
    if redaction_path.exists():
        try:
            raw = redaction_path.read_text(encoding="utf-8")
            lines = raw.splitlines()
            header_ok = bool(lines) and lines[0].strip() == gt["redaction_header"]
            with redaction_path.open("r", encoding="utf-8", newline="") as f:
                rows = list(csv.DictReader(f))
            required_cols = {"source_file", "redacted_type", "reason"}
            rows_ok = bool(rows) and all(required_cols.issubset(set(r.keys())) for r in rows)
            text_rows = json.dumps(rows, ensure_ascii=False).lower()
            terms_ok = all(term.lower() in text_rows for term in gt["redaction_terms"])
            redaction_ok = header_ok and rows_ok and terms_ok
        except Exception:
            redaction_ok = False
    checks.append(_check("redaction_notes", "redaction CSV header, rows, and identifier notes", redaction_ok, 0.11))
    total_w = sum(c["weight"] for c in checks)
    score = round(sum(c["weight"] for c in checks if c["pass"]) / total_w, 4) if total_w else 0.0
    return {"task": "076-medical-admin-claim-check", "workspace": str(w), "outcome_score": score, "checks": checks}

← Task 8 Task 10 →