Portfolio Dashboard¶

Executive command center for monitoring all process definitions at scale

Overview¶

The Portfolio Dashboard is your strategic command center for managing and monitoring your entire Camunda process landscape. Unlike feature-specific views that focus on individual processes or instances, the Portfolio Dashboard provides a bird's-eye view of your complete BPM ecosystem, enabling executives, operations managers, and process owners to understand portfolio-wide health, identify critical issues instantly, and track performance trends over time.

The Challenge: Managing Process Portfolios at Scale¶

Organizations running Camunda in production typically manage dozens or hundreds of process definitions across multiple versions, environments, and business domains. Without portfolio-level visibility, teams face:

❌ No unified view - Must check each process individually to understand overall health
❌ Reactive firefighting - Discover critical issues only after escalation
❌ Resource inefficiency - Can't identify where to focus optimization efforts
❌ Blind spot on trends - Miss patterns that indicate systemic problems
❌ Lack of executive metrics - No way to report business velocity and operational excellence

The Portfolio Dashboard solves these problems by aggregating, analyzing, and visualizing metrics across your entire process portfolio in a single, actionable interface.

Key Capabilities¶

📊 At-a-Glance Executive KPIs¶

Eight critical metrics provide instant understanding of portfolio health:

Operational Metrics:

Total Processes - Number of unique process definitions being monitored
Active Instances - Real-time count of currently running process instances across all definitions
Open Incidents - Total unresolved incidents indicating immediate operational issues
Started This Month - New instances launched in the current calendar month

Business Velocity:

Started 30d - Instances started in the last 30 days (throughput indicator)
Success Rate - Percentage of instances completed without incidents in last 30 days (quality metric)
Avg Completion Time - Historical average duration across all completed instances (efficiency benchmark)
Processes At Risk - Count of process definitions with one or more open incidents (risk exposure)

These KPIs answer the fundamental questions:

How much work are we processing? (Active Instances, Started metrics)
How well are we performing? (Success Rate, Completion Time)
What needs immediate attention? (Open Incidents, Processes At Risk)

🚨 Critical Attention Module¶

The "Critical Attention Required" section automatically identifies the top 5 most problematic processes based on a weighted score combining:

Incident count - Absolute number of open incidents
Incident rate - Percentage of active instances with incidents

This intelligent prioritization ensures teams focus optimization efforts where they'll have the greatest impact. Each entry shows:

Process key (clickable to Journey Monitoring)
Number of open incidents
Number of active instances
Incident rate percentage

Empty state: When no incidents exist, displays "All Systems Operational" with green checkmark—instant confidence for stakeholders.

📈 Portfolio KPIs Table¶

Comprehensive metrics table for every process definition in your portfolio:

Metric	Description	Use Case
Process Key	Definition identifier (clickable)	Navigate to details
Ver.	Latest version number	Version tracking
Active	Currently running instances	Real-time workload
Month	Started this month	Current month activity
30d Start	Started last 30 days	Recent throughput
30d Done	Completed last 30 days	Completion throughput
Compl. %	Completion rate (Done/Start × 100)	Process efficiency
Daily	Average completions per day	Steady-state capacity
Total	All-time completed instances	Historical volume
Avg Duration	Mean completion time	Performance baseline
Incidents	Current open incidents	Real-time problems
Inc. Rate	Incidents/Active × 100	Problem density
Success %	Completed without incidents (30d)	Quality metric
Health	Composite health score (0-100)	Overall status

🔍 Process Health Matrix¶

Visual grid providing rapid health assessment of all processes:

Three Health States:

🟢 Healthy (Green)

No open incidents
Has active instances
Operating normally

⚪ Idle (Gray)

No open incidents
Zero active instances
Recently inactive

🔴 Incidents (Red)

One or more open incidents
Requires attention

Each tile shows:

Process key
Health status indicator
Active instances count
Instances started this month

Use Case: Quickly scan the portfolio to identify patterns (e.g., "Why are 5 processes idle?" or "Which healthy processes need capacity planning?")

📉 12-Month Trend Analysis¶

Interactive ApexCharts line chart plotting historical trends:

Dual Metrics:

Blue Line: Instances Started per month
Red Line: Incidents Created per month

Insights Enabled:

Seasonal Patterns - Identify monthly workload variations (e.g., month-end spikes)
Trend Detection - Spot increasing incident rates over time
Release Impact - Correlate deployments with stability changes
Capacity Planning - Forecast based on growth trends
Performance Degradation - Catch declining quality before crisis

Interactive Controls:

Hover to see exact values
Legend toggle to focus on single metric
Responsive to dark mode theme changes

🔄 Auto-Loading Architecture¶

The dashboard uses intelligent lazy loading to optimize performance:

Initial Page Load:

Renders shell immediately
Shows loading skeletons for data sections
No blocking database queries

Progressive Loading:

Overview data loads when summary cards enter viewport
Trends data loads when chart section enters viewport
Refresh button reloads all data on-demand

Benefits:

Fast perceived performance - Page appears instantly
Reduced server load - Only loads data when viewed
Responsive updates - Refresh specific sections independently

Understanding the Metrics¶

Time Scopes Explained¶

The dashboard uses three distinct time windows for different purposes:

Real-Time (Current Moment)

Active Instances
Open Incidents
Incident Rate
Health Score

These reflect right now - the live state of your portfolio.

Last 30 Days (Rolling Window)

Started 30d
Completed 30d
Completion Rate
Success Rate

These track recent performance - how well you've been doing lately.

Current Month (Calendar Month)

Started This Month

Tracks this month's activity - useful for monthly reporting.

All-Time (Historical)

Total Completed
Average Duration

Provides long-term context - your complete history.

Calculated Metrics¶

Completion Rate

Completion Rate = (Completed Last 30d / Started Last 30d) × 100

Values below 80% indicate backlog buildup.

Success Rate

Success Rate = ((Completed - Instances with Incidents) / Completed) × 100

Target: Above 95% for production-grade processes.

Incident Rate

Incident Rate = (Open Incidents / Active Instances) × 100

Any value above 5% warrants investigation.

Throughput (Daily)

Daily Throughput = Completed Last 30d / 30

Measures steady-state capacity.

Health Score

Health Score = 100 
  - (Open Incidents × 15)
  - (Recently Active but Now Idle ? 20 : 0)

Composite metric factoring incidents and activity. Range: 0-100.

Using the Portfolio Dashboard¶

For Executives¶

Morning Check (2 minutes):

Open Portfolio Dashboard
Review KPI cards at top
Check "Critical Attention Required" section
If all green → Review "Processes At Risk" count
If issues exist → Delegate to operations team with process keys

Key Questions Answered:

Is our BPM platform healthy overall?
How much business are we processing?
Are we meeting quality targets?
Where should we invest improvement efforts?

For Operations Managers¶

Daily Workflow:

Morning Standup Prep
Review "Critical Attention Required"
Note top 3 processes by incident count
Check Success Rate trend
Incident Triage
Click processes with incidents
Navigate to Journey Monitoring for root cause
Assign fixes to team
Capacity Planning
Review "Throughput" column
Compare "Active" to historical baseline
Identify processes approaching capacity
Trend Analysis
Check 12-month chart weekly
Look for degrading trends
Correlate with deployments

For Process Owners¶

Weekly Review:

Find Your Processes
Scan KPIs table for your process keys
Sort by "Health" to find issues
Performance Assessment
Compare "Avg Duration" to target SLAs
Check "Success Rate" against goal (typically 95%+)
Review "Completion Rate" for efficiency
Optimization Opportunities
Low "Daily Throughput" + High "Active Instances" = Bottleneck
High "Avg Duration" = Optimization candidate
Low "Completion Rate" = Investigation needed

For SRE/DevOps Teams¶

Integration Scenarios:

Prometheus/Grafana:

# Scrape portfolio metrics
prometheus.yml:
  scrape_configs:
    - job_name: 'camunda-portfolio'
      static_configs:
        - targets: ['champa:5000']
      metrics_path: '/portfolio/overview/metrics'
      scrape_interval: 5m

Alerting Rules:

# Grafana alert rules
- alert: HighIncidentRate
  expr: camunda_process_incident_rate > 10
  annotations:
    summary: "Process {{ $labels.process }} has high incident rate"

- alert: ProcessAtRisk
  expr: camunda_portfolio_processes_at_risk > 3
  annotations:
    summary: "Multiple processes require attention"

Dashboard Integration: Create Grafana dashboard combining Portfolio metrics with infrastructure metrics for correlation analysis.

Prometheus Integration¶

All Portfolio KPIs are exposed via Prometheus endpoint: /portfolio/overview/metrics

Available Metrics¶

Portfolio-Level Aggregates:

# Portfolio-wide totals
camunda_portfolio_total_processes
camunda_portfolio_active_instances  
camunda_portfolio_open_incidents
camunda_portfolio_started_last_30_days
camunda_portfolio_completed_last_30_days
camunda_portfolio_processes_at_risk

Per-Process Metrics:

# Labels: process, version
camunda_process_active_instances{process="order-fulfillment",version="3"}
camunda_process_open_incidents{process="order-fulfillment",version="3"}
camunda_process_started_last_30_days{process="order-fulfillment",version="3"}
camunda_process_completed_last_30_days{process="order-fulfillment",version="3"}
camunda_process_started_this_month{process="order-fulfillment",version="3"}
camunda_process_success_rate_30d{process="order-fulfillment",version="3"}
camunda_process_health_score{process="order-fulfillment",version="3"}
camunda_process_incident_rate{process="order-fulfillment",version="3"}
camunda_process_avg_duration_seconds{process="order-fulfillment",version="3"}
camunda_process_total_completed_all_time{process="order-fulfillment",version="3"}
camunda_process_completion_rate_30d{process="order-fulfillment",version="3"}
camunda_process_throughput_30d{process="order-fulfillment",version="3"}

Example PromQL Queries¶

Portfolio Health Check:

# Alert when portfolio has critical incident load
camunda_portfolio_open_incidents > 50

# Alert when too many processes at risk
camunda_portfolio_processes_at_risk > 5

# Portfolio-wide success rate
100 * (
  camunda_portfolio_completed_last_30_days - 
  sum(camunda_process_open_incidents)
) / camunda_portfolio_completed_last_30_days

Process Performance:

# Top 5 processes by incident rate
topk(5, camunda_process_incident_rate)

# Processes with declining health
camunda_process_health_score < 60

# Average completion time across portfolio
avg(camunda_process_avg_duration_seconds)

# Throughput leaders
topk(10, camunda_process_throughput_30d)

Capacity & Load:

# Portfolio-wide capacity utilization
sum(camunda_process_active_instances) / 
sum(camunda_process_throughput_30d * 30)

# Processes nearing capacity
camunda_process_active_instances / 
camunda_process_throughput_30d > 10

Common Patterns & Use Cases¶

Pattern 1: Morning Health Check¶

Scenario: Daily standup prep

Workflow:

Open Portfolio Dashboard
Check "Open Incidents" KPI
If 0 → "All clear, team!"
If >0 → Review "Critical Attention Required"
Note top 3 problem processes
Click process key → Navigate to details
Assign incidents to team members

Time: 2-3 minutes

Pattern 2: Capacity Planning¶

Scenario: Quarterly capacity review

Workflow:

Review "12-Month Trend" chart
Identify growth rate in instances started
Calculate projected growth:
Current: 10,000/month
Growth: +15%/quarter
Projection: 11,500/month next quarter
Check "Active Instances" across all processes
Compare to infrastructure capacity
Plan scaling if projected load exceeds 80% capacity

Time: 15 minutes

Pattern 3: Release Impact Analysis¶

Scenario: Post-deployment validation

Workflow:

Note date of deployment
Check "Success Rate" before vs after
Review "12-Month Trend" for incident spike
Check specific deployed process in KPIs table
If incident rate increased → Rollback or hotfix

Decision Point: - Success Rate dropped >5% → Investigate immediately - Incident Rate >10% on new process → Consider rollback

Time: 5 minutes

Pattern 4: Executive Reporting¶

Scenario: Monthly board report

Workflow:

Export Prometheus metrics to spreadsheet
Calculate month-over-month changes:
Instances processed: +12%
Success rate: 96.5% (target: 95%)
Processes at risk: 2 (down from 5)
Avg completion time: 2.3 hours (improved from 2.8)
Create PowerPoint slides with trend chart
Highlight achievements and areas of concern

Time: 30 minutes monthly

Pattern 5: Process Portfolio Rationalization¶

Scenario: Annual process review

Workflow:

Sort KPIs table by "Active Instances" ascending
Identify processes with 0 active for 30+ days
Check "Total Completed" to confirm legacy status
Review "Health Matrix" for idle processes
Propose decommissioning candidates
Document business justification for remaining idle processes

Criteria for Decommissioning:

Zero active instances for 90 days
Low total completed (< 100)
No future business need
Replaced by newer process

Time: 2-4 hours annually

Troubleshooting¶

Issue: KPI Cards Show "Loading" Forever¶

Symptoms: Cards display skeleton animation indefinitely

Causes:

Database connectivity issue
Authentication token expired
API endpoint returning error

Solutions:

Open browser developer console (F12)
Check Network tab for failed requests
Check Console tab for JavaScript errors
Refresh page (Ctrl+R)
Verify authentication status
Contact administrator if issue persists

Issue: Metrics Seem Outdated¶

Symptoms: Numbers don't match current reality

Causes:

Looking at cached data (5-minute TTL)
Recent incident not yet reflected

Solutions:

Wait 5 minutes for cache to expire
Click "Refresh" button to reload
Check "Last Updated" timestamp
Verify database replication lag (distributed systems)

Issue: Health Matrix Shows Everything as Idle¶

Symptoms: All processes gray despite activity

Causes:

Database query issue
Timezone mismatch
No recent process starts

Solutions:

Check "Active Instances" in KPI cards (should be >0)
Verify instances are actually running in Camunda Cockpit
Check database query filters for time zones
Review process deployment status

Issue: Chart Not Rendering¶

Symptoms: Empty space where chart should be

Causes:

JavaScript error
No trend data available
Dark mode rendering bug

Solutions:

Check browser console for errors
Verify at least 1 month of historical data exists
Try toggling dark mode
Disable browser extensions
Try different browser

Best Practices¶

Dashboard Review Cadence¶

Daily (5 minutes):

Check "Open Incidents" count
Review "Critical Attention Required"
Triage any red-flagged processes

Weekly (15 minutes):

Scan entire KPIs table for anomalies
Review "Success Rate" trends
Check "Health Matrix" for new idle processes
Note any declining health scores

Monthly (1 hour):

Analyze "12-Month Trend" chart
Calculate month-over-month metrics
Review capacity projections
Generate executive report

Quarterly (4 hours):

Full portfolio health assessment
Process rationalization review
Capacity planning
SLA compliance review

Metric Interpretation¶

Success Rate:

98-100%: Excellent
95-97%: Good (target range)
90-94%: Acceptable but monitor
<90%: Investigation required

Incident Rate:

0-2%: Healthy
2-5%: Acceptable
5-10%: Monitor closely
10%: Critical - immediate action

Health Score:

90-100: Excellent health
70-89: Good, minor issues
50-69: Fair, attention needed
<50: Poor, critical issues

Completion Rate:

90-100%: Normal
70-89%: Backlog building
50-69%: Significant backlog
<50%: Capacity crisis

Alerting Thresholds¶

Critical (Immediate Response):

Open Incidents > 50
Processes At Risk > 5
Any process Incident Rate > 20%
Portfolio Success Rate < 90%

Warning (Within 24 Hours):

Processes At Risk > 2
Any process Incident Rate > 10%
Any process Health Score < 50
Completion Rate < 80%

Info (Weekly Review):

Any process idle >7 days
Success Rate declined >2% week-over-week
Throughput increased >50% week-over-week

API Reference¶

REST Endpoints¶

Endpoint	Method	Description
`/portfolio`	GET	Portfolio dashboard page
`/portfolio/api/overview`	GET	KPIs and process metrics (JSON)
`/portfolio/api/trends`	GET	12-month trend data (JSON)
`/portfolio/overview/metrics`	GET	Prometheus metrics (text)

JSON Response Format¶

Overview API:

{
  "success": true,
  "overview": [
    {
      "process_key": "order-fulfillment",
      "latest_version": 3,
      "total_completed_all_time": 15420,
      "avg_duration_seconds": 8640,
      "started_last_30_days": 892,
      "completed_last_30_days": 845,
      "started_this_month": 312,
      "active_instances": 47,
      "open_incidents": 2,
      "success_rate_30d": 97.5,
      "health_score": 85,
      "incident_rate": 4.3
    }
  ],
  "summary_stats": {
    "total_processes": 43,
    "active_instances": 1234,
    "open_incidents": 12,
    "started_this_month": 5678,
    "started_last_30_days": 8901,
    "completed_last_30_days": 8234,
    "processes_at_risk": 5,
    "avg_completion_time": 7200,
    "success_rate_30d": 96.2
  },
  "top_issues": [...],
  "timestamp": "2025-11-01T10:30:00Z"
}

FAQ¶

Q: How often does the dashboard update?
A: Data refreshes every 5 minutes automatically via caching. Click "Refresh" button for immediate update.

Q: Can I export the KPI table?
A: Not directly in the UI. Use Prometheus metrics endpoint and export to CSV, or copy-paste from table.

Q: What's the difference between "Started This Month" and "Started 30d"?
A: "This Month" uses calendar month (resets monthly). "30d" is rolling 30-day window (more consistent for trends).

Q: Why is Success Rate different from (100% - Incident Rate)?
A: Success Rate = completed without incidents (quality). Incident Rate = active with incidents (current problems). Different denominators.

Q: Can I filter to specific processes?
A: Not currently. Workaround: Use browser search (Ctrl+F) to find process in table, or use Prometheus metrics with PromQL filters.

Q: What causes Health Score to be low with no incidents?
A: Process might be "Recently active but now idle" (penalty of 20 points). Indicates potential issue even without open incidents.

Q: How do I track improvement over time?
A: Use Prometheus integration to store metrics in time-series database, then create Grafana dashboards showing trends.

Q: Can multiple teams use different Portfolio views?
A: Not currently. All users see same portfolio. Recommendation: Use Grafana with filtered views per team.

Summary¶

The Portfolio Dashboard transforms BPM operations from reactive firefighting to proactive management:

✅ Unified visibility across entire process landscape
✅ Intelligent prioritization of issues requiring attention
✅ Executive-ready KPIs for business reporting
✅ Trend analysis for strategic planning
✅ Prometheus integration for enterprise monitoring
✅ Performance-optimized with smart caching

By providing the right metrics at the right level of detail, the Portfolio Dashboard enables informed decision-making and operational excellence at scale.