Skip to content

Zero Trust Security Report: AI Analysis Module

Date: 2025-11-30

Module: ai_analysis.py & Dependencies

Security Level: Zero Trust / Defense in Depth

1. Executive Summary

A comprehensive security audit of the ai_analysis.py blueprint and its dependencies has been conducted. The system implements a Zero Trust architecture for data handling, ensuring that no data is trusted implicitly. All data destined for the AI LLM undergoes rigorous, multi-layered sanitization to prevent the leakage of Personally Identifiable Information (PII), commercial secrets, and internal system logic.

Verdict:SECURE - The system effectively mitigates identified risks through active sanitization and masking.

2. Data Flow & Security Architecture

Data flows from the database to the AI model through a strict security filter pipeline.

graph LR
    DB[(Database)] -->|Raw Data| Py[Python Backend]
    Py -->|Incident Msgs| Scrubber[PII Scrubber]
    Py -->|DMN Values| Masker[Smart Masker]
    Py -->|BPMN/DMN XML| Cleaner[XML Cleaner]
    Scrubber -->|Sanitized Text| Prompt[AI Prompt]
    Masker -->|Masked Values| Prompt
    Cleaner -->|Clean XML| Prompt
    Prompt -->|Safe Context| AI[Google Gemini LLM]
Hold "Alt" / "Option" to enable pan & zoom

3. Data Sources & Sensitivity Analysis

The following data points are retrieved from the database and processed:

Data Source Raw Data Field Sensitivity Mitigation Strategy
Incidents incident_msg_ High (May contain PII/Secrets) PII Scrubbing: Emails, SSNs, IPs, Credit Cards are redacted.
DMN Rules input_value, output_value High (Commercial Logic/User Data) Smart Masking: Only "safe" values (bools, small nums, enums) are allowed. Others are masked.
BPMN XML bytes_ (XML Content) Medium (Comments, Scripts) XML Cleaning: Comments stripped. Scripts scanned for hardcoded secrets.
Variables text_ (Variable Value) Medium Sampling Limit: Only variable names and counts are primarily used. Values are not bulk-fed.

4. Implemented Security Controls

4.1. Layer 1: PII Scrubbing (PIIScrubber)

Applied to unstructured text like error messages.

  • Regex Patterns:

    • Email: [EMAIL]

    • SSN: [SSN]

    • Credit Card: [CREDIT_CARD]

    • IPv4: [IP_ADDRESS]

  • Effect: Prevents accidental PII leakage in error logs sent to the AI.

4.2. Layer 2: Smart Value Masking (SmartMasker)

Applied to structured business data (DMN inputs/outputs).

  • Allow-list Approach: Only explicitly "safe" values are passed through.

    • True/False

    • Integers between -1000 and 10000

    • Safe Enums: approved, rejected, active, success, etc.

  • Masking: Everything else is replaced with type-aware placeholders:

    • Large Numbers -> <number>

    • Long Strings -> <string:Nchars>

    • Complex Data -> <masked_value>

4.3. Layer 3: XML & Script Sanitization (XMLCleaner & SecretScanner)

Applied to process definition files.

  • Comment Removal: Strips <!-- ... --> to hide developer notes/todos.

  • Secret Detection: Scans <bpmn:script> and attributes for:

    • AWS Keys: AKIA... -> <AWS_ACCESS_KEY>

    • JWT Tokens: eyJ... -> <JWT_TOKEN>

    • Generic Secrets: apiKey="xyz" -> apiKey="<GENERIC_SECRET_VAR>"

5. Residual Risk Assessment

Risk Scenario Likelihood Impact Mitigation Status
PII in Error Messages Low Medium Mitigated (Scrubber)
Secret Leakage in Scripts Very Low High Mitigated (Secret Scanner)
Commercial Logic Exposure Low Low Mitigated (Masking hides specific values, only logic structure remains)
Prompt Injection Low Medium Mitigated (Data is treated as context, not instructions. Output is read-only analysis)

6. Conclusion

The ai_analysis.py module adheres to Zero Trust principles. It assumes all incoming data is potentially sensitive and applies strict, automated sanitization before any external transmission. The implementation of PIIScrubber, SmartMasker, and XMLCleaner provides a robust defense-in-depth strategy, making the system suitable for handling commercial process data with minimal security risk.