Skip to content

Testing and optimization

Quality workflow

Reliable agents are built through repeatable tests, not one good demo call. Cover happy paths, vague callers, missing information, tool timing, routing, and fallback behavior.

Keep changes only when they improve the test set without breaking another scenario.

Testing is how you turn a promising agent into a reliable production workflow.

Do not rely on one successful demo conversation. Test the situations that customers actually create.

What to test

For every agent, test:

  • common questions,
  • vague questions,
  • missing information,
  • wrong assumptions,
  • tool usage,
  • tool failure,
  • knowledge-base answers,
  • escalation or forwarding,
  • refusal and boundary cases.

Prompt test cases

Prompt tests are reusable conversations or scenarios that help you compare behavior after changes.

Useful test cases include:

Scenario Why it matters
Happy path Confirms the main workflow still works.
Ambiguous caller Checks clarification behavior.
Missing details Checks data collection.
Out-of-scope request Checks boundaries.
Tool required Checks tool trigger and parameters.
Tool not allowed Checks the agent does not act too early.
Human handover Checks escalation behavior.

Optimization workflow

Use a structured workflow:

  1. Identify one problem.
  2. Create or update a test case for it.
  3. Change the prompt, knowledge, parameter, or tool instruction.
  4. Run the same tests again.
  5. Keep the change only if it improves the result without breaking another scenario.

Comparing variants

When comparing a baseline with an improved variant, look for:

  • correctness,
  • instruction following,
  • concise answers,
  • appropriate tool usage,
  • safe fallback behavior,
  • better caller experience.

Do not choose a variant only because it sounds more polished. It must also be more reliable.

Voice QA

For voice agents, also test the live experience:

  • greeting cannot be interrupted if that is required,
  • caller interruptions are handled naturally,
  • the agent asks one question at a time,
  • forwarding waiting messages are clear,
  • speech speed and accent fit the audience,
  • long answers are not exhausting to listen to.

Release checklist

  • Prompt tests cover the main workflows.
  • Knowledge-base questions were tested.
  • Tools were tested for success and failure.
  • Routing was tested for default and override paths.
  • Voice behavior was tested with real calls when applicable.
  • A rollback plan exists for major changes.