Build the CI/CD workflow¶
The CI/CD workflow runs the instrumented pipeline and, on failure, queries Okahu for traces and calls the Kahu SRE Agent for root cause analysis.
Workflow overview¶
name: CI/CD Deploy Issue Summary
on:
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
Step 1: Install and configure¶
- name: Install dependencies
run: pip install monocle_apptrace pyyaml
- name: Load .env and set exporters
run: |
EXPORTERS="file"
if [ -n "$OKAHU_API_KEY" ]; then
EXPORTERS="file,okahu"
fi
echo "MONOCLE_EXPORTER=$EXPORTERS" >> $GITHUB_ENV
Step 2: Run the pipeline with Monocle¶
- name: Run deployment pipeline
id: deploy
continue-on-error: true
run: |
python -m monocle_apptrace --config okahu.yaml deploy_app.py \
> output.txt 2>&1; EXIT_CODE=$?
cat output.txt
echo "deploy_exit_code=$EXIT_CODE" >> $GITHUB_OUTPUT
exit $EXIT_CODE
The continue-on-error: true pattern lets the workflow proceed to trace collection and SRE agent analysis after a failure.
Step 3: Query Okahu for traces¶
- name: Resolve traces for this run
if: steps.deploy.outcome == 'failure'
run: |
# Wait for trace ingestion
sleep 10
# Query by GitHub run ID
TRACES_RESPONSE=$(curl -s \
"${API_BASE}/api/v1/apps/${APP}/traces\
?duration_fact=test_runs&fact_ids=github_${GITHUB_RUN_ID}" \
-H "x-api-key: ${OKAHU_API_KEY}")
TRACE_IDS=$(echo "$TRACES_RESPONSE" \
| jq -r '.traces[]?.trace_id // empty')
Monocle automatically tags traces with the GitHub run ID, making them queryable via the Okahu API.
Step 4: Call the Kahu SRE Agent¶
- name: Call Kahu SRE Agent
if: steps.deploy.outcome == 'failure'
run: |
QUERY="Investigate deploy failure for this CI/CD run. \
For each trace, look at the spans in detail — especially \
the data.output events which contain the full console output \
and error messages from each deployment step."
RESPONSE=$(curl -s -X POST "${SRE_URL}" \
-H "Content-Type: application/json" \
-H "x-api-key: ${OKAHU_API_KEY}" \
-d "$(jq -n --arg q "$QUERY" '{query: $q}')")
Step 5: Create GitHub issue with analysis¶
- name: Create GitHub Issue with Kahu response
if: steps.deploy.outcome == 'failure'
run: |
gh issue create \
--title "Deploy Failure: deploy_app.py - $(date -u)" \
--body-file issue_body.md
The issue contains the Kahu SRE Agent's analysis — a structured summary with root cause, affected steps, and recommended fixes.
No manual investigation needed
The full loop runs automatically: deploy → fail → collect traces → AI analysis → GitHub issue with root cause. The on-call engineer gets an issue with actionable analysis, not just a stack trace.