Build the CI/CD workflow¶
The CI/CD workflow runs the instrumented pipeline and, on failure, queries Okahu for traces and calls the Kahu SRE Agent for root cause analysis.
Which workflow to use
This learning path uses the cicd-deploy-summary.yml workflow. The repository contains other workflow files — use only this one.
Workflow overview¶
name: CI/CD Deploy Issue Summary
on:
workflow_dispatch:
permissions:
contents: read
issues: write
jobs:
deploy:
runs-on: ubuntu-latest
env:
OKAHU_API_KEY: ${{ secrets.OKAHU_API_KEY }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
The workflow_dispatch trigger means you run this manually from the GitHub Actions UI.
Step 1: Install and configure¶
- name: Install dependencies
run: |
pip install monocle_apptrace==0.8.1a5 pyyaml \
--extra-index-url https://okahu.jfrog.io/artifactory/api/pypi/okahu-patch-pypi/simple
- name: Load .env and set exporters
env:
OKAHU_INGESTION_ENDPOINT: ${{ secrets.OKAHU_INGESTION_ENDPOINT }}
run: |
if [ -f .env ]; then
export $(grep -v '^#' .env | xargs)
fi
EXPORTERS="file"
if [ -n "$OKAHU_API_KEY" ]; then
EXPORTERS="file,okahu"
echo "OKAHU_API_KEY=$OKAHU_API_KEY" >> $GITHUB_ENV
fi
if [ -n "$OKAHU_INGESTION_ENDPOINT" ]; then
echo "OKAHU_INGESTION_ENDPOINT=$OKAHU_INGESTION_ENDPOINT" >> $GITHUB_ENV
fi
echo "MONOCLE_EXPORTER=$EXPORTERS" >> $GITHUB_ENV
The exporter is set to file,okahu when OKAHU_API_KEY is present, sending traces both to a local file and to Okahu Cloud.
Step 2: Run the pipeline with Monocle¶
- name: Run deployment pipeline
id: deploy
continue-on-error: true
run: |
set +e
MONOCLE_WORKFLOW_NAME=cicd_azure_provisioning \
python -m monocle_apptrace deploy_app.py > output.txt 2>&1; EXIT_CODE=$?
cat output.txt
echo "deploy_exit_code=$EXIT_CODE" >> $GITHUB_OUTPUT
exit $EXIT_CODE
MONOCLE_WORKFLOW_NAME tags every span with the workflow name so traces are grouped correctly in Okahu. Monocle auto-discovers the instrumentation rules from .monocle/custom_instrumentation.yaml — no --config flag needed. The continue-on-error: true pattern lets the workflow proceed to trace collection and SRE agent analysis after a failure.
Step 3: Query Okahu for traces¶
- name: Resolve traces for this run
id: traces
if: steps.deploy.outcome == 'failure' || steps.deploy.outputs.deploy_exit_code != '0'
env:
OKAHU_API_KEY: ${{ secrets.OKAHU_API_KEY }}
CICD_OKAHU_APP_NAME: ${{ secrets.CICD_OKAHU_APP_NAME }}
OKAHU_INGESTION_ENDPOINT: ${{ secrets.OKAHU_INGESTION_ENDPOINT }}
run: |
set -euo pipefail
APP="${CICD_OKAHU_APP_NAME:-unknown_app}"
if echo "${OKAHU_INGESTION_ENDPOINT:-}" | grep -q "stage"; then
API_BASE="https://api-stage.okahu.co"
else
API_BASE="https://api.okahu.co"
fi
echo "Waiting for traces to be ingested..."
sleep 10
echo "Fetching traces for github_${GITHUB_RUN_ID} on app ${APP}..."
TRACES_RESPONSE=$(curl -s \
"${API_BASE}/api/v1/apps/${APP}/traces?duration_fact=test_runs&fact_ids=github_${GITHUB_RUN_ID}" \
-H "x-api-key: ${OKAHU_API_KEY}")
TRACE_IDS=$(echo "$TRACES_RESPONSE" | jq -r '.traces[]?.trace_id // empty' 2>/dev/null)
if [ -z "$TRACE_IDS" ]; then
echo "No traces found for github_${GITHUB_RUN_ID}"
echo "trace_ids=" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "Found trace IDs: $TRACE_IDS"
TRACE_ID_LIST=$(echo "$TRACE_IDS" | tr '\n' ',' | sed 's/,$//')
printf 'trace_ids=%s\n' "$TRACE_ID_LIST" >> "$GITHUB_OUTPUT"
Monocle automatically tags traces with the GitHub run ID, making them queryable via the Okahu API. The API base URL is selected automatically based on whether the ingestion endpoint points to the staging environment.
Step 4: Call the Kahu SRE Agent¶
- name: Call Kahu SRE Agent
id: kahu
if: steps.deploy.outcome == 'failure' || steps.deploy.outputs.deploy_exit_code != '0'
env:
OKAHU_API_KEY: ${{ secrets.OKAHU_API_KEY }}
CICD_OKAHU_APP_NAME: ${{ secrets.CICD_OKAHU_APP_NAME }}
OKAHU_SRE_AGENT_URL: ${{ secrets.OKAHU_SRE_AGENT_URL }}
TRACE_IDS: ${{ steps.traces.outputs.trace_ids }}
run: |
set -euo pipefail
APP="${CICD_OKAHU_APP_NAME:-unknown_app}"
SRE_URL="${OKAHU_SRE_AGENT_URL:-https://sre-agent.okahu.co/api/v1/ask_agent}"
read -r -d '' QUERY << 'PROMPT' || true
Investigate deploy failure for this CI/CD run. For each trace, look at the spans in detail — especially the data.output events which contain the full console output and error messages from each deployment step.
Format your response as follows:
1. Start with a brief SUMMARY section at the top: one paragraph stating what failed, the root cause, and the immediate next steps to resolve it.
2. For failed or warning steps: provide detailed root cause analysis citing actual error messages from data.output, specific recommendations to fix, and files/configurations to check.
3. For successful steps: just list them in one line each (e.g. "azure_blob.deploy: succeeded"). Do not elaborate on successful steps.
PROMPT
QUERY="For app ${APP}, run github_${GITHUB_RUN_ID}: ${QUERY}"
if [ -n "${TRACE_IDS:-}" ]; then
QUERY="${QUERY} Trace IDs: ${TRACE_IDS}"
fi
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "${SRE_URL}" \
-H "Content-Type: application/json" \
-H "x-api-key: ${OKAHU_API_KEY}" \
-d "$(jq -n --arg q "$QUERY" '{query: $q}')")
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
BODY=$(echo "$RESPONSE" | sed '$d')
AGENT_RESPONSE=$(echo "$BODY" | jq -r '.response // empty')
{
echo "response<<KAHU_EOF"
echo "${AGENT_RESPONSE:-No response content from SRE Agent.}"
echo "KAHU_EOF"
} >> "$GITHUB_OUTPUT"
The query instructs the agent to produce a structured response: a brief summary up front, detailed analysis only for failed steps, and single-line confirmation for successful ones.
Step 5: Create GitHub issue with analysis¶
- name: Create GitHub Issue with Kahu response
if: steps.deploy.outcome == 'failure' || steps.deploy.outputs.deploy_exit_code != '0'
env:
GH_TOKEN: ${{ secrets.GH_PAT }}
AGENT_RESPONSE: ${{ steps.kahu.outputs.response }}
run: |
set -euo pipefail
TITLE="Deploy Failure: deploy_app.py - $(date -u +'%Y-%m-%d %H:%M:%S UTC')"
cat > issue_body.md << EOF
### RESPONSE FROM KAHU
${AGENT_RESPONSE:-Kahu SRE Agent did not return a response. Check workflow run ${GITHUB_RUN_ID} artifacts for deploy output.}
---
- Workflow Run ID: ${GITHUB_RUN_ID}
- GitHub Actor: ${GITHUB_ACTOR}
*Response from [Okahu SRE Agent](https://okahu.co)*
EOF
ISSUE_URL=$(gh issue create --title "$TITLE" --body-file issue_body.md)
echo "Created issue: $ISSUE_URL"
Step 6: Upload artifacts¶
- name: Upload deployment log
if: always()
uses: actions/upload-artifact@v4
with:
name: deployment-log
path: output.txt
- name: Upload traces
if: always()
uses: actions/upload-artifact@v4
with:
name: monocle-traces
path: .monocle/
- name: Fail job if deploy failed
if: steps.deploy.outcome == 'failure' || steps.deploy.outputs.deploy_exit_code != '0'
run: |
echo "Deployment failed. GitHub issue has been created with Kahu analysis."
exit 1
Both output.txt and the .monocle/ trace files are uploaded as artifacts on every run (if: always()), so you can inspect them even when the issue creation step fails. The final step explicitly fails the job after the issue is created, so the GitHub Actions run is correctly marked red.
Trigger the workflow¶
Once your fork has the secrets configured (see Prerequisites), trigger the workflow manually:
- Go to your fork on GitHub
- Click the Actions tab
- Select CI/CD Deploy Issue Summary from the left sidebar
- Click Run workflow → Run workflow
The workflow will run the 4-step Azure provisioning pipeline, detect the intentional Step 4 failure, collect traces from Okahu, call the Kahu SRE Agent, and open a GitHub issue with the root cause analysis.
No manual investigation needed
The full loop runs automatically: deploy → fail → collect traces → AI analysis → GitHub issue with root cause. The on-call engineer gets an issue with actionable analysis, not just a stack trace.