# Agent Instructions Guide

System design principles for production-grade conversational AI

## Introduction

Effective prompting transforms urvo Agents from robotic to lifelike. A system prompt is the personality and policy blueprint of your AI agent. In enterprise use, it tends to be elaborate—defining the agent's role, goals, allowable tools, step-by-step instructions for certain tasks, and guardrails describing what the agent should not do.

**Note:** The system prompt controls conversational behavior and response style, but does not control conversation flow mechanics like turn-taking, or agent settings like which languages an agent can speak. These aspects are handled at the platform level.

## Prompt Engineering Fundamentals

The following principles form the foundation of production-grade prompt engineering:

### Separate Instructions into Clean Sections

Separating instructions into dedicated sections with markdown headings helps the model prioritize and interpret them correctly. Use whitespace and line breaks to separate instructions.

**Why this matters for reliability:** Models are tuned to pay extra attention to certain headings (especially `# Guardrails`), and clear section boundaries prevent instruction bleed where rules from one context affect another.

**❌ Less Effective:**

You are a customer service agent. Be polite and helpful. Never share sensitive data. You can look up orders and process refunds. Always verify identity first. Keep responses under 3 sentences unless the user asks for details.

**✓ Recommended:**

**# Personality**
You are a customer service agent for Acme Corp. You are polite, efficient, and solution-oriented.

**# Goal**
Help customers resolve issues quickly by looking up orders and processing refunds when appropriate.

**# Guardrails**
Never share sensitive customer data across conversations.

### Be as Concise as Possible

Keep every instruction short, clear, and action-based. Remove filler words and restate only what is essential for the model to act correctly.

**Why this matters for reliability:** Concise instructions reduce ambiguity and token usage. Every unnecessary word is a potential source of misinterpretation.

**❌ Less Effective:**

When you're talking to customers, you should try to be really friendly and approachable, making sure that you're speaking in a way that feels natural and conversational, kind of like how you'd talk to a friend, but still maintaining a professional demeanor.

**✓ Recommended:**

Speak in a friendly, conversational manner while maintaining professionalism.

### Emphasize Critical Instructions

Highlight critical steps by adding "This step is important" at the end of the line. Repeating the most important 1-2 instructions twice in the prompt can help reinforce them.

**Why this matters for reliability:** In complex prompts, models may prioritize recent context over earlier instructions. Emphasis and repetition ensure critical rules aren't overlooked.

**# Goal**
Verify customer identity before accessing their account. **This step is important.**
Look up order details and provide status updates.
Process refund requests when eligible.

**# Guardrails**
Never access account information without verifying customer identity first. **This step is important.**

### Normalize Inputs and Outputs

Voice agents often misinterpret or misformat structured information such as emails, IDs, or record locators. To ensure accuracy, separate (or "normalize") how data is spoken to the user from how it is written when used in tools or APIs.

**Why this matters for reliability:** Text-to-speech models sometimes mispronounce symbols like "@" or "." naturally. Normalizing to spoken format creates natural, understandable speech while maintaining correct written format for tools.

**# Character normalization**
When collecting structured data (emails, phone numbers, confirmation codes):

**Spoken format** (to/from user):
- Email: "john dot smith at company dot com"
- Phone: "five five five... one two three... four five six seven"
- Code: "A B C one two three"

**Written format** (for tools/APIs):
- Email: "john.smith@company.com"
- Phone: "5551234567"
- Code: "ABC123"

### Provide Clear Examples

Include examples in the prompt to illustrate how agents should behave, use tools, or format data. Large language models follow instructions more reliably when they have concrete examples to reference.

**Why this matters for reliability:** Examples reduce ambiguity and provide a reference pattern. They're especially valuable for complex formatting, multi-step processes, and edge cases.

When a customer provides a confirmation code:
1. Listen for the spoken format (e.g., "A B C one two three")
2. Convert to written format (e.g., "ABC123")
3. Pass to `lookupReservation` tool

**## Examples**
User says: "My code is A... B... C... one... two... three"
You format: "ABC123"

User says: "X Y Z four five six seven eight"
You format: "XYZ45678"

### Dedicate a Guardrails Section

List all non-negotiable rules the model must always follow in a dedicated `# Guardrails` section. Models are tuned to pay extra attention to this heading.

**# Guardrails**
- Never share customer data across conversations or reveal sensitive account information without proper verification.
- Never process refunds over $500 without supervisor approval.
- Never make promises about delivery dates that aren't confirmed in the order system.
- Acknowledge when you don't know an answer instead of guessing.
- If a customer becomes abusive, politely end the conversation and offer to escalate to a supervisor.

## Tool Configuration for Reliability

Agents capable of handling transactional workflows can be highly effective. To enable this, they must be equipped with tools that let them perform actions in other systems or fetch live data from them. Equally important as prompt structure is how you describe the tools available to your agent.

### Describe Tools Precisely with Detailed Parameters

When creating a tool, add descriptions to all parameters. This helps the LLM construct tool calls accurately.

**Tool description:**
"Looks up customer order status by order ID and returns current status, estimated delivery date, and tracking number."

**Parameter descriptions:**
- `order_id` (required): "The unique order identifier, formatted as written characters (e.g., 'ORD123456')"
- `include_history` (optional): "If true, returns full order history including status changes"

### Explain When and How to Use Each Tool

Clearly define in your system prompt when and how each tool should be used. Don't rely solely on tool descriptions—provide usage context and sequencing logic.

**# Tools**

**## getOrderStatus**
Use this tool when a customer asks about their order. Always call this tool before providing order information—never rely on memory or assumptions.

**When to use:**
- Customer asks "Where is my order?"
- Customer provides an order number
- Customer asks about delivery estimates

**How to use:**
1. Collect the order ID from the customer in spoken format
2. Convert to written format using character normalization rules
3. Call `getOrderStatus` with the formatted order ID
4. Present the results to the customer in natural language

**Error handling:**
If the tool returns "Order not found", ask the customer to verify the order number and try again.

### Handle Tool Call Failures Gracefully

Tools can sometimes fail due to network issues, missing data, or other errors. Include clear instructions in your system prompt for recovery.

**# Tool error handling**

If any tool call fails or returns an error:
1. Acknowledge the issue to the customer: "I'm having trouble accessing that information right now."
2. Do not guess or make up information
3. Offer alternatives:
   - Try the tool again if it might be a temporary issue
   - Offer to escalate to a human agent
   - Provide a callback option
4. If the error persists after 2 attempts, escalate to a supervisor

## Architecture Patterns for Enterprise Agents

While strong prompts and tools form the foundation of agent reliability, production systems require thoughtful architectural design.

### Keep Agents Specialized

Overly broad instructions or large context windows increase latency and reduce accuracy. Each agent should have a narrow, clearly defined knowledge base and set of responsibilities.

**Why this matters for reliability:** Specialized agents have fewer edge cases to handle, clearer success criteria, and faster response times. They're easier to test, debug, and improve.

**Note:** A general-purpose "do everything" agent is harder to maintain and more likely to fail in production than a network of specialized agents with clear handoffs.

### Use Orchestrator and Specialist Patterns

For complex tasks, design multi-agent workflows that hand off tasks between specialized agents—and to human operators when needed.

**Architecture pattern:**

1. **Orchestrator agent:** Routes incoming requests to appropriate specialist agents based on intent classification
2. **Specialist agents:** Handle domain-specific tasks (billing, scheduling, technical support, etc.)
3. **Human escalation:** Defined handoff criteria for complex or sensitive cases

**Benefits:**
- Each specialist has a focused prompt and reduced context
- Easier to update individual specialists without affecting the system
- Clear metrics per domain (billing resolution rate, scheduling success rate, etc.)
- Reduced latency per interaction (smaller prompts, faster inference)

### Define Clear Handoff Criteria

When designing multi-agent workflows, specify exactly when and how control should transfer between agents or to human operators.

**## Routing logic**
- **Billing specialist:** Customer mentions payment, invoice, refund, charge, subscription, or account balance
- **Technical support specialist:** Customer reports error, bug, issue, not working, broken
- **Scheduling specialist:** Customer wants to book, reschedule, cancel, or check appointment
- **Human escalation:** Customer is angry, requests supervisor, or issue is unresolved after 2 specialist attempts

## Model Selection for Reliability

Selecting the right model depends on your performance requirements—particularly latency, accuracy, and tool-calling reliability.

### Understand the Tradeoffs

| Factor | Consideration |
|---|---|
| Latency | Smaller models (fewer parameters) generally respond faster, suitable for high-frequency, low-complexity interactions |
| Accuracy | Larger models provide stronger reasoning capabilities for complex, multi-step tasks, but with higher latency and cost |
| Tool-calling reliability | Not all models handle tool/function calling with equal precision. Some excel at structured output, while others may require more explicit prompting |

### Model Recommendations by Use Case

#### General Purpose

GPT-4o or similar models for general-purpose enterprise agents where latency, accuracy, and cost must all be balanced. Best for customer support, scheduling, and order management.

#### Ultra-Low Latency

Gemini 2.5 Flash Lite for high-frequency, simple interactions where speed is critical. Best for initial routing, simple FAQs, and basic data collection.

#### Complex Reasoning

Claude Sonnet 4 or 4.5 for multi-step problem-solving and nuanced judgment. Best for technical troubleshooting and compliance-sensitive workflows.

## Iteration and Testing

Reliability in production comes from continuous iteration. Even well-constructed prompts can fail in real use. What matters is learning from those failures and improving through disciplined testing.

### Analyze Failure Patterns

When agents underperform, identify patterns in problematic interactions:

- **Where does the agent provide incorrect information?** → Strengthen instructions in specific sections
- **When does it fail to understand user intent?** → Add examples or simplify language
- **Which user inputs cause it to break character?** → Add guardrails for edge cases
- **Which tools fail most often?** → Improve error handling or parameter descriptions

### Make Targeted Refinements

Update specific sections of your prompt to address identified issues:

1. **Isolate the problem:** Identify which prompt section or tool definition is causing failures
2. **Test changes on specific examples:** Use conversations that previously failed as test cases
3. **Make one change at a time:** Isolate improvements to understand what works
4. **Re-evaluate with same test cases:** Verify the change fixed the issue without creating new problems

**Warning:** Avoid making multiple prompt changes simultaneously. This makes it impossible to attribute improvements or regressions to specific edits.

## Example Prompts

The following examples demonstrate how to apply the principles outlined in this guide to real-world enterprise use cases.

### Technical Support Agent

```
# Personality

You are a technical support specialist for CloudTech, a B2B SaaS platform.
You are patient, methodical, and focused on resolving issues efficiently.
You speak clearly and adapt technical language based on the user's familiarity.

# Goal

Resolve technical issues through structured troubleshooting:

1. Verify customer identity using email and account ID
2. Identify affected service and severity level
3. Run diagnostics using `runSystemDiagnostic` tool
4. Provide step-by-step resolution or escalate if unresolved after 2 attempts

This step is important: Always run diagnostics before suggesting solutions.

# Guardrails

Never access customer accounts without identity verification. This step is important.
Never guess at solutions—always base recommendations on diagnostic results.
If an issue persists after 2 troubleshooting attempts, escalate to engineering team.
Acknowledge when you don't know the answer instead of speculating.

# Tone

Keep responses clear and concise (2-3 sentences unless troubleshooting requires more detail).
Use a calm, professional tone with brief affirmations ("I understand," "Let me check that").
```

### Customer Service Refund Agent

```
# Personality

You are a refund specialist for RetailCo.
You are empathetic, solution-oriented, and efficient.
You balance customer satisfaction with company policy compliance.

# Goal

Process refund requests through this workflow:

1. Verify customer identity using order number and email
2. Look up order details with `getOrderDetails` tool
3. Confirm refund eligibility (within 30 days, not digital download, not already refunded)
4. For refunds under $100: Process immediately with `processRefund` tool
5. For refunds $100-$500: Apply secondary verification, then process
6. For refunds over $500: Escalate to supervisor with case summary

This step is important: Never process refunds without verifying eligibility first.

# Guardrails

Never process refunds outside the 30-day return window without supervisor approval.
Never process refunds over $500 without supervisor approval. This step is important.
Never access order information without verifying customer identity.
If a customer becomes aggressive, remain calm and offer supervisor escalation.
```

## Formatting Best Practices

How you format your prompt impacts how effectively the language model interprets it:

- **Use markdown headings:** Structure sections with `#` for main sections, `##` for subsections
- **Prefer bulleted lists:** Break down instructions into digestible bullet points
- **Use whitespace:** Separate sections and instruction groups with blank lines
- **Keep headings in sentence case:** `# Goal` not `# GOAL`
- **Be consistent:** Use the same formatting pattern throughout the prompt

## Frequently Asked Questions

#### How long should my system prompt be?

No universal limit exists, but prompts over 2000 tokens increase latency and cost. Focus on conciseness: every line should serve a clear purpose. If your prompt exceeds 2000 tokens, consider splitting into multiple specialized agents or extracting reference material into a knowledge base.

#### Can I update prompts after deployment?

Yes. System prompts can be modified at any time to adjust behavior. This is particularly useful for addressing emerging issues or refining capabilities as you learn from user interactions. Always test changes in a staging environment before deploying to production.

#### How do I prevent agents from hallucinating when tools fail?

Include explicit error handling instructions for every tool. Emphasize "never guess or make up information" in the guardrails section. Repeat this instruction in tool-specific error handling sections. Test tool failure scenarios during development.

#### What's the minimum viable prompt for production?

At minimum, include: (1) Personality/role definition, (2) Primary goal, (3) Core guardrails, and (4) Tool descriptions if tools are used. Even simple agents benefit from explicit section structure and error handling instructions.

#### How do I balance consistency with adaptability?

Define core personality traits, goals, and guardrails firmly while allowing flexibility in tone and verbosity based on user communication style. Use conditional instructions: "If the user is frustrated, acknowledge their concerns before proceeding."

## Next Steps

This guide establishes the foundation for reliable agent behavior through prompt engineering, tool configuration, and architectural patterns. Continue improving your agents:

- [Configure your agent](/how-to/configure-agent) with the right settings
- [Test your agent](/how-to/test-agent) thoroughly before deployment
- [Add a knowledgebase](/how-to/knowledgebase) for domain-specific information
- [Review best practices](/guides/best-practices) for optimal results