Monocle Test Tools¶

A comprehensive testing and validation framework for monocle AI agent tracing. This package provides tools for validating agent behavior, tool invocations, inference responses, and overall AI system performance.

Features¶

Agentic Response: Verify that agent request got the appropreate response.
Agent Invocation: Verify that specific agents are invoked and delegate tasks correctly.
Tool Validation: Ensure tools are called with expected inputs and produce expected outputs
Inference Testing: Test model inference responses against expected schemas or content
Cost/Performance/Quality: Verify token usage, error states, warnings
Evaluation: Integrate with any third party or custom evaluation tools to validate LLM responses

How does it work¶

The test tool runs your agent or workflow code with Monocle instrumentation enabled. It examines the traces generated by the genAI components used in your code (eg Google ADK, LangGraph etc) and verifies the test conditions you want to validated.

Installation¶

pip install monocle_test_tools

Quick Start¶

Here's a test that executes a root_travel_agent with a few inputs and validates it's response and tools invoked.

from monocle_test_tools import TestCase, MonocleValidator
from adk_travel_agent import root_travel_agent

# Test cases for testing travel booking agent
agent_test_cases:list[TestCase] = [
    {
        "test_input": ["Book a flight from San Francisco to Mumbai for 26th Nov 2025. Book a two queen room at Marriot Intercontinental at Juhu, Mumbai for 27th Nov 2025 for 4 nights."],
        "test_output": "A flight from San Francisco to Mumbai has been booked, along with a four night stay in a two queen room at the Marriot Intercontinental in Juhu, Mumbai, starting November 27th, 2025.",
        "comparer": "similarity",
    },
    {
        "test_input": ["Book a flight from San Francisco to Mumbai for 26th Nov 2025. Book a two queen room at Marriot Intercontinental at Juhu, Mumbai for 27th Nov 2025 for 4 nights."],
        "test_spans": [
            {
            "span_type": "agentic.tool.invocation",
            "entities": [
                {"type": "tool", "name": "adk_book_hotel"},
                {"type": "agent", "name": "adk_hotel_booking_agent"}
            ],
        }
        ]
    }
]

# Run test cases using Monocle test framework
@MonocleValidator().monocle_testcase(agent_test_cases)
async def test_run_workflows(my_test_case: TestCase):
   await MonocleValidator().test_workflow_async(root_travel_agent, my_test_case)

if __name__ == "__main__":
    pytest.main([__file__])

Test format¶

TestCase¶

A TestCase defines the input, expected output, and evaluation criteria for testing AI agent behaviors. It can contain multiple test spans representing different interaction points (tool invocations, agent delegations, etc.) within the test.

Each test case can specify comparison methods for evaluating test results against expected outcomes and can be configured to expect certain errors or warnings.

{
    "test_input": "Input data provided to the test case, can be a prompt or structured data.",
    "test_output": "Expected output that the test should produce.",
    "comparer": "Method used to compare actual results with expected results. The default comparer is does exact match. The 'similarty' comparer does a fuzzy match using bert score",
    "test_spans": "Array of TestSpan objects defining specific interactions to test."
}

TestSpan¶

Represents a specific interaction or event within a test case in the Monocle testing framework.

A TestSpan defines a single testable unit of interaction such as a tool invocation, agent delegation, or inference process. Each span captures the entities involved, inputs and outputs, and validation criteria for that specific interaction.

Test spans enforce specific validation rules based on their type. For example:

Tool invocation spans must include at least one tool entity
Agentic delegation spans must include at least two agent entities (delegator and delegatee)
Agentic invocation spans must include at least one agent entity

{
    "span_type": "Type of interaction this span represents (e.g., tool invocation, agent delegation)",
    "entities": "List of entities (tools, agents) involved in this interaction. Each entity has two attributes, name and type. The type can be 'tool' or 'agent' or 'inference'",
    "input": "Input provided to this interaction",
    "output": "Expected output from this interaction",
    "test_type": "Whether this is a 'positive' (expected to succeed) or 'negative' (expected to fail) test"
}

Check out these examples of test cases.