Skip to content

Monocle Trace Analysis Guide

This guide explains how to access, understand, and analyze Monocle traces. You'll learn about trace file formats, span structure, and how to interpret the telemetry data that Monocle generates.

Accessing Monocle Traces

By default, Monocle generates traces in a JSON file created in the local directory where the application is running. The file name by default is monocle_trace_{workflow_name}_{trace_id}_{timestamp}.json where the trace_id is a unique number generated by Monocle for every trace. The file path and format can be changed by setting those properties as arguments to setup_monocle_telemetry(). For example:

setup_monocle_telemetry(
    workflow_name="simple_math_app",
    span_processors=[BatchSpanProcessor(FileSpanExporter(
        out_path="/tmp",
        file_prefix="map_app_prod_trace_",
        time_format="%Y-%m-%d"))
    ]
)

To print the trace on the console, use ConsoleSpanExporter() instead of FileSpanExporter().

For Azure: Install the Azure support as shown in the setup section, then use AzureBlobSpanExporter() to upload the traces to Azure.

For AWS: Install the AWS support as shown in the setup section, then use S3SpanExporter() to upload the traces to an S3 bucket.

Understanding the Trace Output

Trace Span JSON

Monocle generates spans which adhere to Tracing API | OpenTelemetry format. The trace output is an array of spans. Each trace has a unique id. Every span in the trace has this parent trace_id. Please note that trace_id groups related spans and is auto-generated within Monocle.

Span JSON Description
"name": "calculator.add" Span name and is configurable in setup_monocle_telemetry(...)
"context": { This gets auto-generated
    "trace_id": "0xe5269f0e534efa098b240f974220d6b7" Unique trace identifier
    "span_id": "0x30b13075eca52f44" Unique span identifier
    "trace_state": "[]" Trace state information
    }
"kind": "SpanKind.INTERNAL" An enum that describes what this span is about. Default value is SpanKind.INTERNAL, as current enums do not cover ML apps
"parent_id": null If null, this is root span
"start_time": "2024-07-16T17:05:15.544861Z" Span start timestamp
"end_time": "2024-07-16T17:05:43.502007Z" Span end timestamp
"status": { Span status information
    "status_code": "OK" Status of span: OK or ERROR. Default is UNSET
    }
"attributes": { Span attributes
    "workflow.name": "calculator_app" Defines the name of the service being set in setup_monocle_telemetry(...) during initialization of instrumentation
    "span.type": "generic" Type of span
    }
"events": [] Captures the log records. For custom instrumentation without output processor, this is typically empty
"links": [] Unused. Ideally this links other causally-related spans, but as spans are grouped by trace_id, and parent_id links to parent span, this is unused
"resource": { Represents the service name or server or machine or container which generated the span
    "attributes": {
        "service.name": "calculator_app" Only service.name is being populated and defaults to the value of 'workflow_name'
        }
    "schema_url": "" Unused
    }

Understanding Data Flow

Attributes vs Events

Attributes are set once when the span is created and contain: - Static configuration (precision, batch size, model names) - Method metadata (class name, method name) - Configuration values that don't change during execution

Events are added during span execution and contain: - Input data (operands, parameters, queries) - Output data (results, responses, metrics) - Timing information (timestamps for different phases) - Dynamic data that changes with each method call

Trace Analysis Examples

Basic Custom Instrumentation Trace

For a simple calculator operation, you'll see traces like:

{
  "name": "calculator.add",
  "context": {
    "trace_id": "0xd02d65f1c3de5493c5e3e420738e6c61",
    "span_id": "0xb88682cc29b10275",
    "trace_state": "[]"
  },
  "kind": "SpanKind.INTERNAL",
  "parent_id": "0xe7dc5c8af648d74a",
  "start_time": "2025-10-20T23:12:28.945444Z",
  "end_time": "2025-10-20T23:12:28.945477Z",
  "status": {
    "status_code": "OK"
  },
  "attributes": {
    "monocle_apptrace.version": "0.6.0",
    "monocle_apptrace.language": "python",
    "span_source": "",
    "workflow.name": "calculator_app",
    "span.type": "generic"
  },
  "events": [],
  "links": [],
  "resource": {
    "attributes": {
      "service.name": "calculator_app"
    },
    "schema_url": ""
  }
}

Workflow Span

Every trace includes a workflow span that represents the overall operation:

{
  "name": "workflow",
  "context": {
    "trace_id": "0xd02d65f1c3de5493c5e3e420738e6c61",
    "span_id": "0xe7dc5c8af648d74a",
    "trace_state": "[]"
  },
  "kind": "SpanKind.INTERNAL",
  "parent_id": null,
  "start_time": "2025-10-20T23:12:28.945382Z",
  "end_time": "2025-10-20T23:12:28.945490Z",
  "status": {
    "status_code": "OK"
  },
  "attributes": {
    "monocle_apptrace.version": "0.6.0",
    "monocle_apptrace.language": "python",
    "span_source": "",
    "workflow.name": "calculator_app",
    "span.type": "workflow",
    "entity.1.name": "calculator_app",
    "entity.1.type": "workflow.generic",
    "entity.2.type": "app_hosting.generic",
    "entity.2.name": "generic"
  },
  "events": [],
  "links": [],
  "resource": {
    "attributes": {
      "service.name": "calculator_app"
    },
    "schema_url": ""
  }
}

Trace Analysis Best Practices

  1. Look for Error Spans: Check status.status_code for "ERROR" to identify failures
  2. Analyze Timing: Compare start_time and end_time to identify performance bottlenecks
  3. Follow Trace Hierarchy: Use parent_id to understand the call flow
  4. Examine Attributes: Look for custom attributes that provide context about the operation
  5. Check Events: Events contain the actual input/output data for analysis

Data Capture Capabilities

Arguments Dictionary

The arguments dictionary contains all available data for extraction:

arguments = {
    "instance": instance,    # The class instance (self)
    "args": args,           # Positional arguments as tuple
    "kwargs": kwargs,       # Keyword arguments as dict  
    "output": return_value  # Method return value
}

Detailed breakdown:

  • instance: The object instance (self) that contains the method being called. You can access instance attributes like instance.precision, instance.batch_size, etc.

  • args: A tuple containing all positional arguments passed to the method. For calc.add(1.5, 2.3), this would be (1.5, 2.3).

  • kwargs: A dictionary containing all keyword arguments passed to the method. For calc.add(a=1.5, b=2.3), this would be {"a": 1.5, "b": 2.3}.

  • output: The actual return value from the method execution. This is only available after the method completes successfully.

Accessor Functions

Accessor functions are lambda expressions that extract data from the arguments:

# Extract from instance attributes
"accessor": lambda arguments: arguments['instance'].precision

# Extract from positional arguments
"accessor": lambda arguments: arguments['args'][0]  # First argument

# Extract from keyword arguments  
"accessor": lambda arguments: arguments['kwargs'].get('max_tokens', 100)

# Extract from return value
"accessor": lambda arguments: arguments['output']

# Complex extraction with error handling
"accessor": lambda arguments: arguments['output']['count'] if arguments['output'] else 0

What Can Be Captured

Monocle instrumentation can capture:

Method inputs: All arguments passed to the method (both positional and keyword)

Method outputs: Return values from the method

Instance state: Instance attributes and properties

Method metadata: Method name, class name, package information

Execution timing: Start time, end time, duration

Error information: Exceptions and error states

What Cannot Be Captured

Monocle instrumentation has limitations and cannot capture:

Local variables: Variables defined within the method body

Private method calls: Internal method calls within the instrumented method

Loop iterations: Individual iterations of loops within the method

Conditional branches: Which specific code paths were taken

Example: What Gets Captured vs. What Doesn't

class DataProcessor:
    def process_data(self, data_list, multiplier=2):
        # ✅ CAN capture: data_list, multiplier, self.batch_size
        # ❌ CANNOT capture: local_var, temp_result, loop_counter

        local_var = "processing"  # ❌ Not accessible
        temp_result = 0           # ❌ Not accessible

        for i, item in enumerate(data_list):  # ❌ Loop details not captured
            temp_result += item * multiplier
            # ❌ Individual loop iterations not captured

        if temp_result > 100:     # ❌ Conditional path not captured
            return temp_result * 2
        else:
            return temp_result    # ✅ Return value CAN be captured

Next Steps