Skip to content

Build the CI/CD workflow

The CI/CD workflow runs the instrumented pipeline and, on failure, queries Okahu for traces and calls the Kahu SRE Agent for root cause analysis.

Which workflow to use

This learning path uses the cicd-deploy-summary.yml workflow. The repository contains other workflow files — use only this one.

Workflow overview

name: CI/CD Deploy Issue Summary

on:
  workflow_dispatch:

permissions:
  contents: read
  issues: write

jobs:
  deploy:
    runs-on: ubuntu-latest
    env:
      OKAHU_API_KEY: ${{ secrets.OKAHU_API_KEY }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

The workflow_dispatch trigger means you run this manually from the GitHub Actions UI.

Step 1: Install and configure

      - name: Install dependencies
        run: |
          pip install monocle_apptrace==0.8.1a5 pyyaml \
            --extra-index-url https://okahu.jfrog.io/artifactory/api/pypi/okahu-patch-pypi/simple

      - name: Load .env and set exporters
        env:
          OKAHU_INGESTION_ENDPOINT: ${{ secrets.OKAHU_INGESTION_ENDPOINT }}
        run: |
          if [ -f .env ]; then
            export $(grep -v '^#' .env | xargs)
          fi
          EXPORTERS="file"
          if [ -n "$OKAHU_API_KEY" ]; then
            EXPORTERS="file,okahu"
            echo "OKAHU_API_KEY=$OKAHU_API_KEY" >> $GITHUB_ENV
          fi
          if [ -n "$OKAHU_INGESTION_ENDPOINT" ]; then
            echo "OKAHU_INGESTION_ENDPOINT=$OKAHU_INGESTION_ENDPOINT" >> $GITHUB_ENV
          fi
          echo "MONOCLE_EXPORTER=$EXPORTERS" >> $GITHUB_ENV

The exporter is set to file,okahu when OKAHU_API_KEY is present, sending traces both to a local file and to Okahu Cloud.

Step 2: Run the pipeline with Monocle

      - name: Run deployment pipeline
        id: deploy
        continue-on-error: true
        run: |
          set +e
          MONOCLE_WORKFLOW_NAME=cicd_azure_provisioning \
            python -m monocle_apptrace deploy_app.py > output.txt 2>&1; EXIT_CODE=$?
          cat output.txt
          echo "deploy_exit_code=$EXIT_CODE" >> $GITHUB_OUTPUT
          exit $EXIT_CODE

MONOCLE_WORKFLOW_NAME tags every span with the workflow name so traces are grouped correctly in Okahu. Monocle auto-discovers the instrumentation rules from .monocle/custom_instrumentation.yaml — no --config flag needed. The continue-on-error: true pattern lets the workflow proceed to trace collection and SRE agent analysis after a failure.

Step 3: Query Okahu for traces

      - name: Resolve traces for this run
        id: traces
        if: steps.deploy.outcome == 'failure' || steps.deploy.outputs.deploy_exit_code != '0'
        env:
          OKAHU_API_KEY: ${{ secrets.OKAHU_API_KEY }}
          CICD_OKAHU_APP_NAME: ${{ secrets.CICD_OKAHU_APP_NAME }}
          OKAHU_INGESTION_ENDPOINT: ${{ secrets.OKAHU_INGESTION_ENDPOINT }}
        run: |
          set -euo pipefail
          APP="${CICD_OKAHU_APP_NAME:-unknown_app}"
          if echo "${OKAHU_INGESTION_ENDPOINT:-}" | grep -q "stage"; then
            API_BASE="https://api-stage.okahu.co"
          else
            API_BASE="https://api.okahu.co"
          fi
          echo "Waiting for traces to be ingested..."
          sleep 10
          echo "Fetching traces for github_${GITHUB_RUN_ID} on app ${APP}..."
          TRACES_RESPONSE=$(curl -s \
            "${API_BASE}/api/v1/apps/${APP}/traces?duration_fact=test_runs&fact_ids=github_${GITHUB_RUN_ID}" \
            -H "x-api-key: ${OKAHU_API_KEY}")
          TRACE_IDS=$(echo "$TRACES_RESPONSE" | jq -r '.traces[]?.trace_id // empty' 2>/dev/null)
          if [ -z "$TRACE_IDS" ]; then
            echo "No traces found for github_${GITHUB_RUN_ID}"
            echo "trace_ids=" >> "$GITHUB_OUTPUT"
            exit 0
          fi
          echo "Found trace IDs: $TRACE_IDS"
          TRACE_ID_LIST=$(echo "$TRACE_IDS" | tr '\n' ',' | sed 's/,$//')
          printf 'trace_ids=%s\n' "$TRACE_ID_LIST" >> "$GITHUB_OUTPUT"

Monocle automatically tags traces with the GitHub run ID, making them queryable via the Okahu API. The API base URL is selected automatically based on whether the ingestion endpoint points to the staging environment.

Step 4: Call the Kahu SRE Agent

      - name: Call Kahu SRE Agent
        id: kahu
        if: steps.deploy.outcome == 'failure' || steps.deploy.outputs.deploy_exit_code != '0'
        env:
          OKAHU_API_KEY: ${{ secrets.OKAHU_API_KEY }}
          CICD_OKAHU_APP_NAME: ${{ secrets.CICD_OKAHU_APP_NAME }}
          OKAHU_SRE_AGENT_URL: ${{ secrets.OKAHU_SRE_AGENT_URL }}
          TRACE_IDS: ${{ steps.traces.outputs.trace_ids }}
        run: |
          set -euo pipefail
          APP="${CICD_OKAHU_APP_NAME:-unknown_app}"
          SRE_URL="${OKAHU_SRE_AGENT_URL:-https://sre-agent.okahu.co/api/v1/ask_agent}"
          read -r -d '' QUERY << 'PROMPT' || true
          Investigate deploy failure for this CI/CD run. For each trace, look at the spans in detail — especially the data.output events which contain the full console output and error messages from each deployment step.
          Format your response as follows:
          1. Start with a brief SUMMARY section at the top: one paragraph stating what failed, the root cause, and the immediate next steps to resolve it.
          2. For failed or warning steps: provide detailed root cause analysis citing actual error messages from data.output, specific recommendations to fix, and files/configurations to check.
          3. For successful steps: just list them in one line each (e.g. "azure_blob.deploy: succeeded"). Do not elaborate on successful steps.
          PROMPT
          QUERY="For app ${APP}, run github_${GITHUB_RUN_ID}: ${QUERY}"
          if [ -n "${TRACE_IDS:-}" ]; then
            QUERY="${QUERY} Trace IDs: ${TRACE_IDS}"
          fi
          RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "${SRE_URL}" \
            -H "Content-Type: application/json" \
            -H "x-api-key: ${OKAHU_API_KEY}" \
            -d "$(jq -n --arg q "$QUERY" '{query: $q}')")
          HTTP_CODE=$(echo "$RESPONSE" | tail -1)
          BODY=$(echo "$RESPONSE" | sed '$d')
          AGENT_RESPONSE=$(echo "$BODY" | jq -r '.response // empty')
          {
            echo "response<<KAHU_EOF"
            echo "${AGENT_RESPONSE:-No response content from SRE Agent.}"
            echo "KAHU_EOF"
          } >> "$GITHUB_OUTPUT"

The query instructs the agent to produce a structured response: a brief summary up front, detailed analysis only for failed steps, and single-line confirmation for successful ones.

Step 5: Create GitHub issue with analysis

      - name: Create GitHub Issue with Kahu response
        if: steps.deploy.outcome == 'failure' || steps.deploy.outputs.deploy_exit_code != '0'
        env:
          GH_TOKEN: ${{ secrets.GH_PAT }}
          AGENT_RESPONSE: ${{ steps.kahu.outputs.response }}
        run: |
          set -euo pipefail
          TITLE="Deploy Failure: deploy_app.py - $(date -u +'%Y-%m-%d %H:%M:%S UTC')"
          cat > issue_body.md << EOF
          ### RESPONSE FROM KAHU
          ${AGENT_RESPONSE:-Kahu SRE Agent did not return a response. Check workflow run ${GITHUB_RUN_ID} artifacts for deploy output.}
          ---
          - Workflow Run ID: ${GITHUB_RUN_ID}
          - GitHub Actor: ${GITHUB_ACTOR}
          *Response from [Okahu SRE Agent](https://okahu.co)*
          EOF
          ISSUE_URL=$(gh issue create --title "$TITLE" --body-file issue_body.md)
          echo "Created issue: $ISSUE_URL"

Step 6: Upload artifacts

      - name: Upload deployment log
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: deployment-log
          path: output.txt

      - name: Upload traces
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: monocle-traces
          path: .monocle/

      - name: Fail job if deploy failed
        if: steps.deploy.outcome == 'failure' || steps.deploy.outputs.deploy_exit_code != '0'
        run: |
          echo "Deployment failed. GitHub issue has been created with Kahu analysis."
          exit 1

Both output.txt and the .monocle/ trace files are uploaded as artifacts on every run (if: always()), so you can inspect them even when the issue creation step fails. The final step explicitly fails the job after the issue is created, so the GitHub Actions run is correctly marked red.

Trigger the workflow

Once your fork has the secrets configured (see Prerequisites), trigger the workflow manually:

  1. Go to your fork on GitHub
  2. Click the Actions tab
  3. Select CI/CD Deploy Issue Summary from the left sidebar
  4. Click Run workflowRun workflow

The workflow will run the 4-step Azure provisioning pipeline, detect the intentional Step 4 failure, collect traces from Okahu, call the Kahu SRE Agent, and open a GitHub issue with the root cause analysis.

No manual investigation needed

The full loop runs automatically: deploy → fail → collect traces → AI analysis → GitHub issue with root cause. The on-call engineer gets an issue with actionable analysis, not just a stack trace.