AI Agent Readiness Scorecard

Task being evaluated

Score all five dimensions above to see your result.

All completed tasks side by side. Score another task in the Scorecard tab to add it here.

What Your Score Means

Each band has a specific next step. Use this to decide what to do — not just what your score is.

5 – 10NOT READY

Don't build the agent yet. One or more dimensions are too weak — automating now will accelerate the problem, not solve it. Identify your lowest-scoring dimension and fix it first. Usually it's process stability (the task isn't well defined yet) or data quality (the inputs can't be trusted). Fix that, rescore, and only proceed when you're out of this band.

11 – 16FIX FIRST, THEN BUILD

You're close but something needs work. Find the dimension pulling your score down and address it specifically before building. In most cases this means documenting the process properly, cleaning a data source, or defining what the escalation path looks like for edge cases. This is typically a 2–4 week fix, not a project. Rescore once addressed.

17 – 22BUILD WITH CAUTION

This task is mostly ready but has at least one dimension that needs monitoring. Proceed with a narrow scope — automate the core steps only. Build in a human review checkpoint for the first 60 days. Log every exception the agent can't handle and review weekly. Only expand scope once the exception rate is consistently below 5%.

23 – 25BUILD NOW

This task is ready. Start narrow — automate the core steps, run it supervised for 30 days, and review outputs daily. Only expand the agent's scope once the error rate has earned it. The biggest mistake at this stage is moving too fast. Reliability first, then scale.

What Each Score Means

Use this to calibrate how you rate each dimension.

Score	Label	What it signals
1	Not Ready	This dimension is a blocker. Do not automate until resolved.
2	Weak	Significant risk. Needs active remediation before building.
3	Borderline	Manageable but requires monitoring. Proceed carefully.
4	Strong	This dimension supports automation. Minor watch items only.
5	Ready	No concerns. This dimension is fully prepared for agent deployment.

Honest scoring matters more than a high score. A task that scores 12 and gets fixed is more valuable than a task that scores 22 and fails silently after go-live.

Three complete examples — one per outcome band. Use these to calibrate your scoring.

Example 1 — Vendor Invoice Processing

BUILD NOW

A finance team receives 200+ vendor invoices per week. The process is fully documented: invoices arrive by email, logged in the ERP, matched against POs, and routed for approval if under $10,000. Rules unchanged for 18 months.

Process Stability5Fully documented. Same rules for 18 months.

Frequency5200+ invoices per week.

Judgment Required490%+ follow clear rules.

Failure Tolerance3Errors catchable before payment.

Data Quality4ERP clean, occasional format issues.

Total: 21 / 25BUILD NOW

What to do next: Automate the standard flow first (invoices under $10K matching existing POs). Run supervised for 30 days. Expand to exception handling only after the core is stable.

Example 2 — Sales Proposal Generation

FIX FIRST, THEN BUILD

A sales team generates 30–40 proposals per month using a rough template reps customise significantly. CRM entries are inconsistent. Proposals require judgment on case study selection.

Process Stability3Template exists, high variability.

Frequency3Weekly — worth automating.

Judgment Required2Heavy judgment on tone and positioning.

Failure Tolerance3A bad proposal loses a deal.

Data Quality2CRM inconsistent. Cleaning needed.

Total: 13 / 25FIX FIRST, THEN BUILD

What to do next: (1) Clean and standardise CRM fields. (2) Create a decision framework for case study selection that removes judgment from the process. Rescore in 3–4 weeks.

Example 3 — Executive Escalation Routing

NOT READY

A COO receives escalations from multiple departments. No written criteria for what goes to the COO vs. director level. Every escalation is different. Wrong routing delays critical decisions.

Process Stability1No documented criteria.

Frequency3Several escalations per week.

Judgment Required1Entirely context-dependent.

Failure Tolerance1Wrong routing causes real damage.

Data Quality2Email, Slack, verbal — unstructured.

Total: 8 / 25NOT READY

What to do next: Do not automate this. Start by documenting escalation criteria with the COO. Once there's a written decision framework a new hire could follow, rescore.