AI Agent Readiness Scorecard

Is this task ready to hand to an AI agent — or not?
Score five dimensions and find out in under 5 minutes.

Score all five dimensions above to see your result.

What Your Score Means

Each band has a specific next step. Use this to decide what to do — not just what your score is.

5 – 10 NOT READY

Don't build the agent yet. One or more dimensions are too weak — automating now will accelerate the problem, not solve it. Identify your lowest-scoring dimension and fix it first. Usually it's process stability (the task isn't well defined yet) or data quality (the inputs can't be trusted). Fix that, rescore, and only proceed when you're out of this band.

11 – 16 FIX FIRST, THEN BUILD

You're close but something needs work. Find the dimension pulling your score down and address it specifically before building. In most cases this means documenting the process properly, cleaning a data source, or defining what the escalation path looks like for edge cases. This is typically a 2–4 week fix, not a project. Rescore once addressed.

17 – 22 BUILD WITH CAUTION

This task is mostly ready but has at least one dimension that needs monitoring. Proceed with a narrow scope — automate the core steps only. Build in a human review checkpoint for the first 60 days. Log every exception the agent can't handle and review weekly. Only expand scope once the exception rate is consistently below 5%.

23 – 25 BUILD NOW

This task is ready. Start narrow — automate the core steps, run it supervised for 30 days, and review outputs daily. Only expand the agent's scope once the error rate has earned it. The biggest mistake at this stage is moving too fast. Reliability first, then scale.

What Each Score Means

Use this to calibrate how you rate each dimension.

Score Label What it signals
1 Not Ready This dimension is a blocker. Do not automate until resolved.
2 Weak Significant risk. Needs active remediation before building.
3 Borderline Manageable but requires monitoring. Proceed carefully.
4 Strong This dimension supports automation. Minor watch items only.
5 Ready No concerns. This dimension is fully prepared for agent deployment.
Honest scoring matters more than a high score. A task that scores 12 and gets fixed is more valuable than a task that scores 22 and fails silently after go-live.

Three complete examples showing how to score a task — one per outcome band. Use these to calibrate your own scoring before you start.

Example 1 — Vendor Invoice Processing

BUILD NOW

A finance team receives 200+ vendor invoices per week. The process is fully documented: invoices arrive by email, are logged in the ERP, matched against POs, and routed for approval if under $10,000. The rules haven't changed in 18 months.

Process Stability 5 Fully documented. Same rules for 18 months. No exceptions.
Frequency 5 200+ invoices per week — daily volume.
Judgment Required 4 90%+ follow clear rules. Only unusual disputes need humans.
Failure Tolerance 3 Errors catchable in approval step before payment.
Data Quality 4 ERP data is clean. Occasional format issues from new vendors.
Total: 21 / 25 BUILD NOW
What to do next: Start narrow — automate the standard flow first (invoices under $10K matching existing POs). Run supervised for 30 days. Expand to exception handling only after the core flow is stable.

Example 2 — Sales Proposal Generation

FIX FIRST, THEN BUILD

A sales team generates 30–40 proposals per month. There's a rough template but reps customise it significantly. Client data lives in the CRM but entries are inconsistent — some fields blank, company sizes misclassified. Proposals require judgment about which case studies to include.

Process Stability 3 Template exists but high variability in execution.
Frequency 3 Weekly volume — worth automating but not urgent.
Judgment Required 2 Heavy judgment on tone, case studies, and positioning.
Failure Tolerance 3 A bad proposal loses a deal — moderate risk.
Data Quality 2 CRM is inconsistent. Significant cleaning needed.
Total: 13 / 25 FIX FIRST, THEN BUILD
What to do next: Two things to fix before building: (1) Clean and standardise CRM fields — especially company size, industry, and deal stage. (2) Create a decision framework for case study selection that removes judgment from the process. Rescore in 3–4 weeks.

Example 3 — Executive Escalation Routing

NOT READY

A COO receives escalations from multiple departments. There are no written criteria for what gets escalated to the COO vs. handled at director level. Every escalation is different. A wrong routing decision delays critical decisions or creates political issues across teams.

Process Stability 1 No documented criteria. Every case treated individually.
Frequency 3 Several escalations per week.
Judgment Required 1 Entirely context-dependent. Reading stakeholder dynamics is essential.
Failure Tolerance 1 Wrong routing causes real damage — delayed decisions, trust issues.
Data Quality 2 Escalations come via email, Slack, and verbal — no structured input.
Total: 8 / 25 NOT READY
What to do next: Do not automate this. The task isn't well defined enough for a human, let alone an agent. Start by documenting escalation criteria with the COO. Once there's a written decision framework that a new hire could follow, rescore — it may eventually reach 'fix first' territory.