Skip to content

System Architecture

Champa Intelligence is built on a modern, scalable architecture designed for high performance, reliability, and maintainability.


Architecture Overview

graph TB
    subgraph "Client Layer"
        A[Web Browser]
        B[Alpine.js Framework]
        C[Tailwind CSS]
        D[BPMN.js / DMN.js]
    end

    subgraph "Application Layer"
        E[Gunicorn WSGI Server]
        F[Flask Application]
        G[Blueprint Architecture]

        G --> H1[Auth BP]
        G --> H2[Dashboard BP]
        G --> H3[Health BP]
        G --> H4[AI Analysis BP]
        G --> H5[Journey BP]
        G --> H6[Diff Tool BP]
        G --> H7[Portfolio BP]
        G --> H8[Linter BP]
    end

    subgraph "Data Layer"
        I[(Camunda DB<br/>PostgreSQL)]
        J[(System DB<br/>PostgreSQL)]
        K[Redis Cache]
    end

    subgraph "External Services"
        L[Google Gemini AI]
        M[Camunda REST API]
        N[Prometheus/Grafana]
    end

    A --> B
    B --> E
    E --> F
    F --> G

    F --> I
    F --> J
    F --> K
    F --> L
    F --> M
    F --> N
Hold "Alt" / "Option" to enable pan & zoom

Architectural Principles

1. Separation of Concerns

Blueprint Architecture:

  • Each feature is a self-contained Flask Blueprint
  • Clean separation between auth, analytics, monitoring, etc.
  • Independent testing and deployment per blueprint

Database Separation:

  • Camunda DB: Read-only access to process data
  • System DB: Champa's own data (users, sessions, cache)
  • No schema pollution in customer database

2. Performance First

Multi-Level Caching:

  • Redis for hot data (sessions, query results)
  • PostgreSQL fallback for session management
  • Smart cache invalidation on deployments

Lazy Loading:

  • Dashboard components load on-demand
  • Reduces initial page load to <500ms
  • Parallel data fetching with ThreadPoolExecutor

Optimized Queries:

  • 80+ hand-crafted SQL queries
  • Strategic indexes and query planning
  • Batch operations for bulk data

3. Scalability

Horizontal Scaling:

  • Stateless application design
  • Session data in Redis (shared across instances)
  • Multiple Gunicorn workers per container

Vertical Scaling:

  • Configurable worker/thread count
  • Connection pooling for databases
  • Efficient memory management

4. Security by Design

Defense in Depth:

  • JWT-based authentication
  • Role-based access control (RBAC)
  • Audit logging for all actions
  • SQL injection prevention (parameterized queries)

Data Protection:

  • Salted password hashing (PBKDF2)
  • Secure session management
  • API token lifecycle management

Technology Stack

Backend

Component Technology Purpose
Language Python 3.12 Core application logic
Framework Flask 3.x Web framework
WSGI Server Gunicorn Production application server
Database Driver psycopg2-binary PostgreSQL connectivity
Cache Client redis High-performance caching
AI SDK google-genai Gemini API (default)
HTTP Client requests External API calls
XML Parser lxml BPMN/DMN parsing
Auth PyJWT Token-based authentication

Frontend

Component Technology Purpose
JavaScript Framework Alpine.js 3.x Reactive UI components
CSS Framework Tailwind CSS 3.x Utility-first styling
BPMN Rendering bpmn-js 18.x Process diagram visualization
DMN Rendering dmn-js 17.x Decision table visualization
Charts Chart.js 4.x Data visualization
Build Tool Webpack 5.x Module bundling
Transpiler Babel ES6+ to ES5
Documentation MkDocs Material Project documentation

Infrastructure

Component Technology Purpose
Container Docker Application containerization
Orchestration Docker Compose Multi-container deployment
Database PostgreSQL 15+ Data persistence
HA Database Patroni + etcd PostgreSQL high availability
Cache Redis 7+ In-memory data store
Web Server Nginx Reverse proxy, static files, docs
Monitoring Prometheus + Grafana Metrics and dashboards

Component Architecture

Database Schema

Camunda Database (Read-Only):

-- Process Definitions
act_re_procdef
act_re_deployment

-- Process Instances
act_hi_procinst
act_ru_execution

-- Activities
act_hi_actinst

-- Variables
act_hi_varinst
act_ru_variable

-- Incidents
act_hi_incident
act_ru_incident

-- Jobs
act_hi_job_log
act_ru_job

-- User Tasks
act_hi_taskinst
act_ru_task

-- DMN
act_hi_decinst
act_re_decision_def

System Database (Champa):

-- Authentication Schema
auth.users
auth.roles
auth.permissions
auth.role_permissions
auth.sessions
auth.audit_log

-- Configuration
auth.system_config (linter rules, settings)

Caching Strategy

graph LR
    A[Request] --> B{Cache?}
    B -->|Hit| C[Return from Cache]
    B -->|Miss| D[Query Database]
    D --> E[Store in Cache]
    E --> F[Return Data]

    G[Deployment Event] --> H[Invalidate Cache]
    H --> I[Static Data Only]
Hold "Alt" / "Option" to enable pan & zoom

Cache Layers:

Session Cache (Redis)

  • TTL: 1 hour (normal) / 30 days (remember me)
  • Fallback: PostgreSQL
  • Purpose: User authentication

Query Cache (Redis)

  • TTL: 5 min - 24 hours (data-dependent)
  • Purpose: Expensive SQL queries
  • Invalidation: Smart, per-query-type

AI Cache (Redis)

  • TTL: 30 min - 24 hours
  • Purpose: AI analysis components
  • Key structure: ai:{type}:{process}:{version}:{params}

Request Flow

1. Authentication Flow

sequenceDiagram
    participant Browser
    participant Flask
    participant Redis
    participant SystemDB

    Browser->>Flask: POST /auth/login
    Flask->>SystemDB: Verify credentials
    SystemDB-->>Flask: User data
    Flask->>Flask: Generate JWT
    Flask->>Redis: Store session
    Redis-->>Flask: OK
    Flask-->>Browser: Set cookie + JWT

    Browser->>Flask: GET /dashboard (with JWT)
    Flask->>Redis: Validate session
    Redis-->>Flask: Session data
    Flask->>Flask: Check permissions
    Flask-->>Browser: Dashboard HTML
Hold "Alt" / "Option" to enable pan & zoom

2. Dashboard Load Flow

sequenceDiagram
    participant Browser
    participant Flask
    participant Redis
    participant CamundaDB

    Browser->>Flask: GET /dashboard/<key>/<v1>/<v2>
    Flask->>Redis: Check auth session
    Redis-->>Flask: Valid session
    Flask-->>Browser: Dashboard shell (HTML)

    Note over Browser: Page renders instantly

    Browser->>Flask: GET /api/dashboard/section/incidents
    Flask->>Redis: Check cache
    alt Cache Hit
        Redis-->>Flask: Cached data
    else Cache Miss
        Flask->>CamundaDB: Query incidents
        CamundaDB-->>Flask: Raw data
        Flask->>Redis: Store in cache
    end
    Flask-->>Browser: JSON data

    Note over Browser: Section updates dynamically
Hold "Alt" / "Option" to enable pan & zoom

3. AI Analysis Flow

sequenceDiagram
    participant Browser
    participant Flask
    participant Redis
    participant CamundaDB
    participant Gemini

    Browser->>Flask: POST /ai-analysis/api/generate
    Flask->>Redis: Check cached components

    par Parallel Data Fetch
        Flask->>CamundaDB: Query incidents
        Flask->>CamundaDB: Query performance
        Flask->>CamundaDB: Query variables
    end

    CamundaDB-->>Flask: All data
    Flask->>Redis: Cache components
    Flask->>Flask: Build prompt
    Flask->>Gemini: Generate analysis
    Gemini-->>Flask: AI response
    Flask->>SystemDB: Save to history
    Flask-->>Browser: HTML report
Hold "Alt" / "Option" to enable pan & zoom

Deployment Architecture

Development Environment

graph TB
    subgraph "Developer Workstation"
        A[Flask Dev Server<br/>:5000]
        B[(Local PostgreSQL<br/>Camunda DB)]
        C[(Local PostgreSQL<br/>System DB)]
        D[Local Redis<br/>:6379]
        E[npm watch<br/>Frontend Build]

        A --> B
        A --> C
        A --> D
        E -.->|Hot Reload| A
    end

    style A fill:#4CAF50
    style E fill:#2196F3
Hold "Alt" / "Option" to enable pan & zoom

Production (Single Server - Docker Compose)

graph TB
    subgraph "Docker Host"
        subgraph "Nginx Container :80/:443"
            N1[Static Files<br/>/static]
            N2[Documentation<br/>docs.champa-bpmn.com]
            N3[Reverse Proxy<br/>www.champa-bpmn.com]
        end

        subgraph "Application Container :8088"
            A[Gunicorn<br/>4 Workers]
            F[Flask Application<br/>Blueprints]
        end

        subgraph "Data Layer"
            SDB[(System DB<br/>PostgreSQL :5433)]
            R[Redis<br/>:6379]
        end

        EXT[(External Camunda DB<br/>Customer Infrastructure)]
    end

    Internet --> N1
    Internet --> N2
    N3 --> A
    A --> F
    F --> SDB
    F --> R
    F --> EXT

    style A fill:#4CAF50
    style F fill:#66BB6A
    style N3 fill:#2196F3
    style EXT fill:#FF9800
Hold "Alt" / "Option" to enable pan & zoom

High Availability Setup (K8)

graph TB
    subgraph "Load Balancer Layer"
        LB[HAProxy/Nginx LB<br/>:80/:443]
    end

    subgraph "Application Layer - 3 Nodes"
        APP1[Champa Node 1<br/>Gunicorn+Flask]
        APP2[Champa Node 2<br/>Gunicorn+Flask]
        APP3[Champa Node 3<br/>Gunicorn+Flask]
    end

    subgraph "Cache Layer - Redis Cluster"
        subgraph "Redis Sentinel"
            RS1[Sentinel 1<br/>:26379]
            RS2[Sentinel 2<br/>:26379]
            RS3[Sentinel 3<br/>:26379]
        end

        RMASTER[Redis Master<br/>:6379]
        RSLAVE1[Redis Replica 1<br/>:6380]
        RSLAVE2[Redis Replica 2<br/>:6381]

        RS1 -.Monitor.-> RMASTER
        RS2 -.Monitor.-> RMASTER
        RS3 -.Monitor.-> RMASTER
        RMASTER -->|Replicate| RSLAVE1
        RMASTER -->|Replicate| RSLAVE2
    end

    subgraph "Database Layer - Patroni HA"
        subgraph "etcd Cluster - Distributed Consensus"
            E1[etcd Node 1<br/>:2379]
            E2[etcd Node 2<br/>:2379]
            E3[etcd Node 3<br/>:2379]
        end

        subgraph "PostgreSQL Cluster"
            PG1[Patroni + PostgreSQL<br/>Primary :5432]
            PG2[Patroni + PostgreSQL<br/>Standby :5432]
        end

        E1 <-->|Raft Consensus| E2
        E2 <-->|Raft Consensus| E3
        E3 <-->|Raft Consensus| E1

        PG1 -.Patroni API.-> E1
        PG1 -.Patroni API.-> E2
        PG1 -.Patroni API.-> E3
        PG2 -.Patroni API.-> E1
        PG2 -.Patroni API.-> E2
        PG2 -.Patroni API.-> E3

        PG1 -->|Streaming<br/>Replication| PG2
    end

    subgraph "External Services"
        CAMDB[(Customer<br/>Camunda DB)]
        DOCS[Documentation<br/>Static Site]
    end

    Internet --> LB
    LB --> APP1
    LB --> APP2
    LB --> APP3

    APP1 --> RMASTER
    APP2 --> RMASTER
    APP3 --> RMASTER

    APP1 --> PG1
    APP2 --> PG1
    APP3 --> PG1

    APP1 -.ReadOnly.-> CAMDB
    APP2 -.ReadOnly.-> CAMDB
    APP3 -.ReadOnly.-> CAMDB

    LB --> DOCS

    style LB fill:#2196F3
    style APP1 fill:#4CAF50
    style APP2 fill:#4CAF50
    style APP3 fill:#4CAF50
    style RMASTER fill:#FF5722
    style PG1 fill:#1976D2
    style PG2 fill:#64B5F6
    style E1 fill:#9C27B0
    style E2 fill:#9C27B0
    style E3 fill:#9C27B0
    style CAMDB fill:#FF9800
    style DOCS fill:#00BCD4
Hold "Alt" / "Option" to enable pan & zoom

High Availability Components:

Load Balancer (HAProxy/Nginx)

  • Health checks on application nodes
  • Session affinity (sticky sessions)
  • SSL termination
  • Automatic failover

Application Layer (3+ Nodes)

  • Stateless Flask applications
  • Horizontal scaling
  • Zero-downtime deployments
  • Independent failure domains

Redis Sentinel (3 Nodes)

  • Automatic master failover
  • Configuration provider
  • Notification system
  • Quorum-based decisions
  • Promotes replica to master on failure

PostgreSQL with Patroni (2+ Nodes)

  • Automatic failover via Patroni
  • Streaming replication
  • Point-in-time recovery
  • Connection pooling via PgBouncer
  • Watchdog for split-brain prevention

etcd Cluster (3 Nodes)

  • Distributed consensus (Raft algorithm)
  • Configuration storage for Patroni
  • Leader election
  • Highly consistent key-value store
  • Tolerates single node failure

Failover Scenarios:

sequenceDiagram
    participant App as Application
    participant S1 as Sentinel 1
    participant S2 as Sentinel 2
    participant S3 as Sentinel 3
    participant RM as Redis Master
    participant RS as Redis Replica

    Note over RM: Master Fails
    S1->>RM: PING (timeout)
    S1->>S2: Master down?
    S1->>S3: Master down?
    S2-->>S1: Confirmed down
    S3-->>S1: Confirmed down
    Note over S1,S3: Quorum reached (3/3)
    S1->>RS: Promote to Master
    RS-->>S1: Promotion complete
    S1->>App: Config update: New master
    App->>RS: Connect to new master
Hold "Alt" / "Option" to enable pan & zoom
sequenceDiagram
    participant App as Application
    participant E as etcd Cluster
    participant P1 as Patroni Primary
    participant P2 as Patroni Standby
    participant PG1 as PostgreSQL Primary
    participant PG2 as PostgreSQL Standby

    Note over PG1: Primary DB Fails
    P1->>E: Failed to update lease
    P2->>E: Attempt to acquire lease
    E-->>P2: Lease acquired (leader)
    P2->>PG2: Promote to primary
    PG2-->>P2: Promotion complete
    P2->>E: Update cluster state
    E-->>App: New primary endpoint
    App->>PG2: Connect to new primary
Hold "Alt" / "Option" to enable pan & zoom

Monitoring & Health Checks:

  • Application: /health/ping, /health/db
  • Redis: PING command via Sentinel
  • PostgreSQL: pg_isready, replication lag monitoring
  • etcd: HTTP health endpoint, cluster health API
  • Patroni: REST API health endpoint

Security Architecture

Authentication & Authorization

graph TD
    A[User Request] --> B{Has JWT?}
    B -->|No| C[Redirect to Login]
    B -->|Yes| D{Valid Token?}
    D -->|No| C
    D -->|Yes| E{User Active?}
    E -->|No| C
    E -->|Yes| F{Has Permission?}
    F -->|No| G[403 Forbidden]
    F -->|Yes| H[Process Request]
Hold "Alt" / "Option" to enable pan & zoom

Security Layers:

  1. Authentication: JWT tokens with expiration
  2. Session Management: Redis-backed with TTL
  3. Authorization: RBAC with 12+ permissions
  4. Audit: All actions logged
  5. Rate Limiting: API token lifecycle management

Permission Model

PERMISSIONS = {
    'full_access': 'Complete system access',
    'api_access': 'Programmatic API access',
    'portfolio_data': 'Portfolio dashboard',
    'extended_dashboard_data': 'Process intelligence',
    'bpmn_analysis_data': 'BPMN analytics viewer',
    'dmn_analysis_data': 'DMN analytics',
    'health_monitor_data': 'Health monitoring',
    'journey_analysis_data': 'Journey monitoring',
    'ai_analysis_data': 'AI-powered analysis',
    'diff_tool_data': 'BPMN diff tool',
    'model_validation_data': 'Model validator',
    'manage_users': 'User management',
    'manage_roles': 'Role management'
}

Monitoring & Observability

Application Logs

Log Levels:

  • DEBUG: Detailed execution flow
  • INFO: Normal operations
  • WARNING: Recoverable issues
  • ERROR: Errors with stack traces
  • CRITICAL: System failures

Log Categories:

logs/
├── application.log    # General application
├── access.log         # HTTP requests
├── security.log       # Auth & security events
├── database.log       # DB queries & performance
├── cache.log          # Cache operations
├── ai.log             # AI analysis operations
└── structured.log     # Machine-readable JSON

Prometheus Metrics

Exported Metrics:

  • Cluster health (nodes, instances, incidents)
  • Per-node metrics (workload, job rates)
  • JVM metrics (heap, GC, threads)
  • Database metrics (connections, latency)
  • Process-level KPIs (health scores, rates)

Health Checks

GET /health/ping          # Simple liveness
GET /health/db            # Database connectivity
GET /health/api/full      # Comprehensive health
GET /health/light/metrics # Prometheus metrics

Next Steps