Monocle Trace Analysis Guide¶

This guide explains how to access, understand, and analyze Monocle traces. You'll learn about trace file formats, span structure, and how to interpret the telemetry data that Monocle generates.

Accessing Monocle Traces¶

By default, Monocle generates traces in a JSON file created in the local directory where the application is running. The file name by default is monocle_trace_{workflow_name}_{trace_id}_{timestamp}.json where the trace_id is a unique number generated by Monocle for every trace. The file path and format can be changed by setting those properties as arguments to setup_monocle_telemetry(). For example:

setup_monocle_telemetry(
    workflow_name="simple_math_app",
    span_processors=[BatchSpanProcessor(FileSpanExporter(
        out_path="/tmp",
        file_prefix="map_app_prod_trace_",
        time_format="%Y-%m-%d"))
    ]
)

To print the trace on the console, use ConsoleSpanExporter() instead of FileSpanExporter().

For Azure: Install the Azure support as shown in the setup section, then use AzureBlobSpanExporter() to upload the traces to Azure.

For AWS: Install the AWS support as shown in the setup section, then use S3SpanExporter() to upload the traces to an S3 bucket.

Understanding the Trace Output¶

Trace Span JSON¶

Monocle generates spans which adhere to Tracing API | OpenTelemetry format. The trace output is an array of spans. Each trace has a unique id. Every span in the trace has this parent trace_id. Please note that trace_id groups related spans and is auto-generated within Monocle.

Span JSON	Description
`"name": "calculator.add"`	Span name and is configurable in `setup_monocle_telemetry(...)`
`"context": {`	This gets auto-generated
`"trace_id": "0xe5269f0e534efa098b240f974220d6b7"`	Unique trace identifier
`"span_id": "0x30b13075eca52f44"`	Unique span identifier
`"trace_state": "[]"`	Trace state information
`}`
`"kind": "SpanKind.INTERNAL"`	An enum that describes what this span is about. Default value is SpanKind.INTERNAL, as current enums do not cover ML apps
`"parent_id": null`	If null, this is root span
`"start_time": "2024-07-16T17:05:15.544861Z"`	Span start timestamp
`"end_time": "2024-07-16T17:05:43.502007Z"`	Span end timestamp
`"status": {`	Span status information
`"status_code": "OK"`	Status of span: OK or ERROR. Default is UNSET
`}`
`"attributes": {`	Span attributes
`"workflow.name": "calculator_app"`	Defines the name of the service being set in `setup_monocle_telemetry(...)` during initialization of instrumentation
`"span.type": "generic"`	Type of span
`}`
`"events": []`	Captures the log records. For custom instrumentation without output processor, this is typically empty
`"links": []`	Unused. Ideally this links other causally-related spans, but as spans are grouped by `trace_id`, and `parent_id` links to parent span, this is unused
`"resource": {`	Represents the service name or server or machine or container which generated the span
`"attributes": {`
`"service.name": "calculator_app"`	Only service.name is being populated and defaults to the value of 'workflow_name'
`}`
`"schema_url": ""`	Unused
`}`

Understanding Data Flow¶

Attributes vs Events¶

Attributes are set once when the span is created and contain: - Static configuration (precision, batch size, model names) - Method metadata (class name, method name) - Configuration values that don't change during execution

Events are added during span execution and contain: - Input data (operands, parameters, queries) - Output data (results, responses, metrics) - Timing information (timestamps for different phases) - Dynamic data that changes with each method call

Trace Analysis Examples¶

Basic Custom Instrumentation Trace¶

For a simple calculator operation, you'll see traces like:

{
  "name": "calculator.add",
  "context": {
    "trace_id": "0xd02d65f1c3de5493c5e3e420738e6c61",
    "span_id": "0xb88682cc29b10275",
    "trace_state": "[]"
  },
  "kind": "SpanKind.INTERNAL",
  "parent_id": "0xe7dc5c8af648d74a",
  "start_time": "2025-10-20T23:12:28.945444Z",
  "end_time": "2025-10-20T23:12:28.945477Z",
  "status": {
    "status_code": "OK"
  },
  "attributes": {
    "monocle_apptrace.version": "0.6.0",
    "monocle_apptrace.language": "python",
    "span_source": "",
    "workflow.name": "calculator_app",
    "span.type": "generic"
  },
  "events": [],
  "links": [],
  "resource": {
    "attributes": {
      "service.name": "calculator_app"
    },
    "schema_url": ""
  }
}

Workflow Span¶

Every trace includes a workflow span that represents the overall operation:

{
  "name": "workflow",
  "context": {
    "trace_id": "0xd02d65f1c3de5493c5e3e420738e6c61",
    "span_id": "0xe7dc5c8af648d74a",
    "trace_state": "[]"
  },
  "kind": "SpanKind.INTERNAL",
  "parent_id": null,
  "start_time": "2025-10-20T23:12:28.945382Z",
  "end_time": "2025-10-20T23:12:28.945490Z",
  "status": {
    "status_code": "OK"
  },
  "attributes": {
    "monocle_apptrace.version": "0.6.0",
    "monocle_apptrace.language": "python",
    "span_source": "",
    "workflow.name": "calculator_app",
    "span.type": "workflow",
    "entity.1.name": "calculator_app",
    "entity.1.type": "workflow.generic",
    "entity.2.type": "app_hosting.generic",
    "entity.2.name": "generic"
  },
  "events": [],
  "links": [],
  "resource": {
    "attributes": {
      "service.name": "calculator_app"
    },
    "schema_url": ""
  }
}

Trace Analysis Best Practices¶

Look for Error Spans: Check status.status_code for "ERROR" to identify failures
Analyze Timing: Compare start_time and end_time to identify performance bottlenecks
Follow Trace Hierarchy: Use parent_id to understand the call flow
Examine Attributes: Look for custom attributes that provide context about the operation
Check Events: Events contain the actual input/output data for analysis

Data Capture Capabilities¶

Arguments Dictionary¶

The arguments dictionary contains all available data for extraction:

arguments = {
    "instance": instance,    # The class instance (self)
    "args": args,           # Positional arguments as tuple
    "kwargs": kwargs,       # Keyword arguments as dict  
    "output": return_value  # Method return value
}

Detailed breakdown:

instance: The object instance (self) that contains the method being called. You can access instance attributes like instance.precision, instance.batch_size, etc.
args: A tuple containing all positional arguments passed to the method. For calc.add(1.5, 2.3), this would be (1.5, 2.3).
kwargs: A dictionary containing all keyword arguments passed to the method. For calc.add(a=1.5, b=2.3), this would be {"a": 1.5, "b": 2.3}.
output: The actual return value from the method execution. This is only available after the method completes successfully.

Accessor Functions¶

Accessor functions are lambda expressions that extract data from the arguments:

# Extract from instance attributes
"accessor": lambda arguments: arguments['instance'].precision

# Extract from positional arguments
"accessor": lambda arguments: arguments['args'][0]  # First argument

# Extract from keyword arguments  
"accessor": lambda arguments: arguments['kwargs'].get('max_tokens', 100)

# Extract from return value
"accessor": lambda arguments: arguments['output']

# Complex extraction with error handling
"accessor": lambda arguments: arguments['output']['count'] if arguments['output'] else 0

What Can Be Captured¶

Monocle instrumentation can capture:

✅ Method inputs: All arguments passed to the method (both positional and keyword)

✅ Method outputs: Return values from the method

✅ Instance state: Instance attributes and properties

✅ Method metadata: Method name, class name, package information

✅ Execution timing: Start time, end time, duration

✅ Error information: Exceptions and error states

What Cannot Be Captured¶

Monocle instrumentation has limitations and cannot capture:

❌ Local variables: Variables defined within the method body

❌ Private method calls: Internal method calls within the instrumented method

❌ Loop iterations: Individual iterations of loops within the method

❌ Conditional branches: Which specific code paths were taken

Example: What Gets Captured vs. What Doesn't¶

class DataProcessor:
    def process_data(self, data_list, multiplier=2):
        # ✅ CAN capture: data_list, multiplier, self.batch_size
        # ❌ CANNOT capture: local_var, temp_result, loop_counter

        local_var = "processing"  # ❌ Not accessible
        temp_result = 0           # ❌ Not accessible

        for i, item in enumerate(data_list):  # ❌ Loop details not captured
            temp_result += item * multiplier
            # ❌ Individual loop iterations not captured

        if temp_result > 100:     # ❌ Conditional path not captured
            return temp_result * 2
        else:
            return temp_result    # ✅ Return value CAN be captured

Next Steps¶

Custom Instrumentation: See Monocle Custom Instrumentation Guide to learn how to create custom instrumentation