School Project • Data Security

DBNYP: DigiLoan Application System

Full-stack loan application platform with defense-in-depth security — automated data classification, dual-layer encryption, NRIC OCR verification, and cryptographic non-repudiation

Timeline October 2025 - January 2026
Team Size 4 Members
My Role ADC Engine & Security Lead
Status Completed
Python Flask MySQL SQLAlchemy Vertex AI EasyOCR OpenCV AWS KMS AES-256-GCM HMAC-SHA256

The Challenge

Financial institutions today face a dual challenge: processing loan applications efficiently while ensuring sensitive personal data is rigorously protected throughout its lifecycle. In Singapore's regulatory landscape, this means complying with the Personal Data Protection Act (PDPA) while handling highly sensitive fields — NRIC numbers, salary details, employment records, and supporting documents.

Data Classification at Scale

Every loan application contains a mix of data sensitivity levels. Without automated classification, officers would need to manually tag each application — which is error-prone, inconsistent, and unscalable.

End-to-End Encryption

Personal data must be encrypted both at rest and in transit. Different data types require different encryption strategies — draft data, permanent records, and uploaded documents each have distinct lifecycle needs.

Identity Verification

Applicants upload NRIC images, but verifying that the uploaded document matches the declared identity requires high-accuracy optical character recognition (OCR) with cryptographic validation.

Non-Repudiation

Loan decisions (approve/reject/escalate) must be cryptographically signed so no officer can deny their actions, and no decision can be tampered with after the fact.

Defense-in-Depth

A single security layer is never sufficient. The system needed multiple overlapping security controls: input sanitization, CSRF protection, secure headers, step-up authentication, audit logging, and more — each independently effective, all working in concert.

The Solution

DigiLoan is a full-stack loan application platform built with Flask (Python) that implements a defense-in-depth security architecture across every layer of the application.

🎯

Automated Data Classification

Three-phase ADC Engine (Rule Engine → Heuristic Scanner → Vertex AI) assigns security tags C0–C6 to every loan application automatically

🔐

Dual Encryption Strategy

Fernet (AES-128-CBC) for draft data; AES-256-GCM with AWS KMS envelope encryption for permanent loan records

🪪

NRIC OCR Verification

EasyOCR + OpenCV pipeline extracts NRIC from card images using three extraction strategies, verified via MOD-11 checksum, stored as SHA-256 hash

✍️

Digital Non-Repudiation

Three-stage HMAC-SHA256 signatures at submission, RO escalation, and AO approval — with constant-time verification preventing timing attacks

🤖

Vertex AI Integration

Gemini 2.5 Flash Lite for semantic loan analysis with data minimization — NRIC and personal identifiers never leave the server

🛡️

12+ Security Layers

CSRF, CSP, HSTS, input sanitization, file upload restrictions, rate limiting, directory traversal prevention, open redirect protection, step-up re-authentication, and encrypted audit trails

My Role

As the Subsystem 1 (Automated Data Classification) Lead, I was responsible for architecting the foundational codebase, designing the security classification scheme, and building every component of the ADC engine, NRIC OCR pipeline, and application security layer.

🏗️

Flask Blueprint Architecture

Designed the modular codebase structure using Flask Blueprints, enabling each subsystem to operate independently within a shared framework

🎯

ADC Engine (Three-Phase)

Built the Rule Engine, Heuristic Content Scanner, and Vertex AI integration — the complete classification pipeline with "Strictest Wins" escalation policy

🪪

NRIC OCR Service

Implemented EasyOCR + OpenCV preprocessing with three extraction strategies and MOD-11 checksum verification for NRIC identity validation

📋

Loan Application Wizard

Built the multi-step loan wizard with server-side validation, draft saving with encryption, and file upload handling

🏷️

C0–C6 Classification Scheme

Designed the 7-level tag scheme that maps to encryption levels, MFA requirements, and access control policies — the central security contract between subsystems

🧹

Input Sanitization

Built sanitization.py — XSS prevention, SQL injection protection, and format validation for NRIC, phone, and email fields

🔒

OWASP Security Headers

Implemented security.py — CSP, HSTS, X-Frame-Options, directory traversal prevention, filename sanitization, and open redirect protection

✍️

Digital Signature System

Built decision_signature.py — HMAC-SHA256 three-stage signatures for cryptographic non-repudiation across the loan decision workflow

📧

Email Notification Service

Built enforced TLS 1.2+ email delivery with certificate verification and audit-logged delivery for loan status notifications

Features in Action

End-to-end walkthrough of the DigiLoan platform — from application submission to officer decision and email notification

Click image to enlarge

DigiLoan Homepage

DigiLoan Homepage

The main landing page of the DigiLoan platform — providing applicants with a clean entry point into the loan application workflow, with navigation to begin the multi-step loan wizard

Multi-Step Loan Application Wizard

Multi-Step Loan Application Wizard

Guided multi-step form collecting personal details, employment information, loan amount, and supporting documents — with server-side validation at each step and CSRF protection on all POST operations

Encrypted Loan Draft Saving

Encrypted Draft Saving

Applicants can save incomplete applications as drafts. Draft data is encrypted using Fernet (AES-128-CBC + HMAC-SHA256) at the field level — ensuring sensitive form data is protected even during mid-application pauses

NRIC OCR Verification

NRIC Identity Verification via OCR

Applicants upload their NRIC card image — the EasyOCR + OpenCV pipeline preprocesses the image, extracts the NRIC using three strategies, validates via MOD-11 checksum, and stores only the SHA-256 hash. Plaintext is deleted in-memory immediately after use

Loan Submission Confirmation

Loan Submission Confirmation

Upon submission, the ADC Engine automatically classifies the application (C0–C6), the Stage 1 HMAC-SHA256 digital signature is generated, permanent fields are encrypted with AES-256-GCM, and the applicant receives a confirmation screen with their application reference

Vertex AI Semantic Analysis

Vertex AI Semantic Classification

Phase 3 of the ADC Engine — Gemini 2.5 Flash Lite performs contextual semantic analysis of the loan application. Data minimization ensures NRIC and personal identifiers are stripped before leaving the server. The system uses a fail-open design: classification continues even if AI is unavailable

RO Escalation Workflow

Reviewing Officer (RO) Escalation

The Reviewing Officer assesses the application and escalates to the Approving Officer with a justification. This action triggers Stage 2 of the digital signature system — an HMAC-SHA256 signature binding the officer ID, loan ID, justification text, and timestamp for non-repudiation

AO Approval Decision

Approving Officer (AO) Decision

The Approving Officer makes the final approve or reject decision. Step-up re-authentication is required for C3+ classified loans. Stage 3 of the HMAC-SHA256 digital signature is generated — binding the officer ID, decision, loan ID, and timestamp — completing the non-repudiation chain

Loan Received Email Notification

Application Received Email

Applicants receive an automated email confirming their loan application has been received and is under review. Delivered via SMTP with enforced TLS 1.2+ and certificate verification, with every delivery event audit-logged

Loan Approval Email Notification

Loan Decision Email Notification

Upon AO decision, applicants automatically receive an approval or rejection email — completing the end-to-end loan workflow. The notification service enforces TLS 1.2+, verifies certificates, and logs all delivery events to the audit trail

Implementation Process

01

Foundation & Architecture

Set up the Flask project skeleton with Blueprints for modular subsystem development. Defined database models (Loan, LoanDraft, Classification, Document) with SQLAlchemy ORM. Established the ClassificationTag enum (C0–C6) as the central security contract between subsystems, ensuring each team member's code only needed to read the tag to know what actions to take.

02

Core ADC Engine

Built the ClassificationRuleEngine with deterministic business rules — high-value loans, DTI ratio thresholds, business indicators, and employment risk factors. Implemented the ContentScanner for regex-based PII detection (NRIC, phone, email patterns) and keyword scanning (fraud, terrorism, money laundering triggers). Applied the "Strictest Wins" escalation policy where tags can only increase, never decrease.

03

Encryption & Data Protection

Implemented FieldEncryption using Fernet for draft data. Built LoanFieldEncryption using AES-256-GCM with AWS KMS envelope encryption for permanent records — the Data Encryption Key (DEK) is unwrapped once at startup and lives only in RAM, never written to disk. Created EncryptedTextField and EncryptedNumericField SQLAlchemy TypeDecorators for transparent, automatic encryption and decryption at the ORM layer.

04

AI Integration & OCR Pipeline

Integrated Vertex AI (Gemini 2.5 Flash Lite) for semantic loan analysis with explicit data minimization — stripping all NRIC and personal identifiers before transmission. Designed the fail-open/fail-secure pattern so classification continues reliably even when the AI service is unavailable. Built the NRIC OCR pipeline with EasyOCR, OpenCV grayscale and adaptive thresholding preprocessing, three-strategy extraction, fuzzy OCR correction (confusion maps: 6→G, 8→B, 5→S), and MOD-11 checksum validation.

05

Security Hardening

Implemented CSRF protection via Flask-WTF on all POST forms with SameSite=Lax cookies. Deployed OWASP security headers (CSP, HSTS with 1-year max-age, X-Frame-Options: DENY, X-Content-Type-Options: nosniff). Built the three-stage HMAC-SHA256 digital signature system for non-repudiation. Added step-up re-authentication requiring TOTP re-verification within a 5-minute freshness window for C3+ classified loans. Applied Flask-Limiter rate limiting on login, OTP resend, and email verification endpoints.

06

Testing & Verification

Validated classification logic against edge cases including high-value loans, fraud keyword triggers, PII pattern combinations, and DTI boundary conditions. Tested OCR extraction accuracy across the three-strategy pipeline with varied NRIC card image quality. Verified HMAC signature generation and constant-time comparison under normal and adversarial conditions. Confirmed fail-open and fail-secure pathways function correctly under simulated AI service outages.

Technical Implementation

Automated Data Classification (ADC) Engine

Phase 1: Rule Engine (Deterministic)

Purpose: Applies hard-coded business rules to produce a deterministic baseline classification tag.

Rules: Loan amount thresholds, Debt-to-Income (DTI) ratio evaluation, employment status risk flags, loan purpose category scoring.

Outcome: Produces an initial C0–C6 tag that subsequent phases can only escalate, never reduce.

Phase 2: Heuristic Content Scanner

Purpose: Regex-based scanning of application content for PII patterns and high-risk keywords.

PII Detection: NRIC format pattern (S/T/F/G followed by 7 digits and a letter), Singapore phone numbers, email addresses.

Keyword Scanning: High-risk terms including fraud, terrorism, money laundering, and related indicators trigger tag escalation.

Phase 3: Vertex AI Semantic Analysis

Model: Gemini 2.5 Flash Lite via Vertex AI API for contextual understanding beyond pattern matching.

Data Minimization: NRIC numbers and personal identifiers are stripped from the payload before transmission — no PII leaves the server.

Resilience: Fail-open design — if Vertex AI is unavailable, the system falls back to Phases 1 and 2, ensuring uninterrupted classification.

Classification-Driven Security Cascade

Strictest Wins: Final tag = max(Phase 1, Phase 2, Phase 3) — tags can only escalate across phases.

Downstream Effects: C3+ tags automatically trigger AES-256-GCM encryption, mandatory MFA enforcement, and step-up re-authentication before officer access.

Key Files: classification_engine.py, vertex_ai_service.py, classification.py

Dual Encryption Strategy

Draft Data: Fernet (AES-128-CBC)

Algorithm: Fernet symmetric encryption — AES-128-CBC with HMAC-SHA256 authentication.

Use Case: LoanDraft.form_data and LoanDraft.uploaded_files — short-lived, frequently updated data.

Rationale: Fernet's simplicity and built-in integrity verification suits the temporary nature of draft records.

Permanent Records: AES-256-GCM + AWS KMS

Algorithm: AES-256-GCM providing authenticated encryption with associated data (AEAD).

Key Management: AWS KMS envelope encryption — the Data Encryption Key (DEK) is unwrapped once at startup and lives only in RAM, never persisted to disk.

Scope: Loan.nric (as SHA-256 hash), Loan.monthly_salary, Loan.loan_purpose, and all sensitive permanent fields.

Transparent ORM Encryption

Implementation: EncryptedTextField and EncryptedNumericField as SQLAlchemy TypeDecorators — encryption and decryption happen automatically at the ORM layer.

Developer Experience: Application code reads and writes plaintext; the TypeDecorators handle all cryptographic operations transparently.

Key Files: field_encryption.py, enhanced_encryption.py, encrypted_types.py

NRIC OCR Extraction Pipeline

OpenCV Image Preprocessing

Steps: Greyscale conversion → adaptive thresholding → 2x upscaling to improve EasyOCR recognition accuracy on low-resolution card photographs.

Impact: Preprocessing significantly improved raw EasyOCR extraction rates compared to unprocessed image input.

Three-Strategy NRIC Extraction

Strategy 1 — Token Match: Strict regex applied to individual OCR tokens for clean, high-confidence extractions.

Strategy 2 — Joined Text: Concatenates fragmented OCR tokens to recover NRICs split across recognition boundaries.

Strategy 3 — Fuzzy Correction: Applies OCR confusion maps (6→G, 8→B, 5→S) to recover NRICs with common character misreads, followed by MOD-11 checksum validation.

Privacy-by-Design Storage

Processing: All OCR operations are performed entirely in-memory. Plaintext NRIC is deleted immediately after extraction.

Storage: Only the SHA-256 hash of the NRIC is stored — enabling identity verification without retaining the sensitive identifier itself.

Key File: nric_ocr_service.py

Three-Stage Digital Signatures

Stage 1: Applicant Submission

Trigger: Generated when applicant submits the loan application.

Payload Signed: loan_id + applicant_id + submission timestamp.

Purpose: Binds the application content to the submitting applicant — preventing content tampering post-submission.

Stage 2: RO Escalation

Trigger: Generated when the Reviewing Officer escalates to the Approving Officer.

Payload Signed: loan_id + officer_id + justification text + escalation timestamp.

Purpose: Non-repudiation of the escalation decision — the RO cannot deny having escalated with a specific justification.

Stage 3: AO Approval/Rejection

Trigger: Generated when the Approving Officer makes the final loan decision.

Payload Signed: loan_id + officer_id + decision (approve/reject) + decision timestamp.

Purpose: Cryptographic proof of the final decision — decisions cannot be altered or denied after signing.

Algorithm & Verification

Algorithm: HMAC-SHA256 — keyed hash message authentication code providing both integrity and authenticity.

Timing Attack Prevention: All signature comparisons use hmac.compare_digest() for constant-time comparison, preventing timing side-channel attacks.

Key File: decision_signature.py

General Application Security

CSRF & XSS Protection

CSRF: Flask-WTF CSRFProtect on all POST forms; SameSite=Lax cookies preventing cross-site request forgery.

XSS: sanitize_text() strips HTML tags, escapes entities, and removes null bytes from all user-provided text inputs.

SQL Injection: SQLAlchemy ORM exclusively; injection-prone characters blocked in NRIC, phone, and email field sanitizers.

File Upload & Path Security

Directory Traversal: sanitize_filename() applies os.path.basename() and regex cleanup to strip path traversal sequences from all uploaded filenames.

File Type Validation: Allowlist-based MIME type and extension validation on all uploaded documents.

Open Redirect: is_safe_url() validates all redirect targets against the application domain before executing redirects.

Security Headers (OWASP)

CSP: Content Security Policy restricting script, style, and resource origins.

HSTS: HTTP Strict Transport Security with 1-year max-age enforcing HTTPS-only connections.

X-Frame-Options: DENY — prevents clickjacking by blocking iframe embedding.

X-Content-Type-Options: nosniff — prevents MIME-type sniffing attacks.

Step-Up Authentication & Rate Limiting

Step-Up Auth: C3+ classified loans require TOTP re-verification within a 5-minute freshness window before officers can access sensitive data.

Rate Limiting: Flask-Limiter applied to login, OTP resend, and email verification endpoints — preventing brute-force and credential stuffing attacks.

Audit Logging: "5 Ws" logging pattern (Who, What, When, Where, Why) across all security-relevant events — producing a forensic-grade audit trail for regulatory compliance.

Key Learning Outcomes

🛡️

Security Engineering

  • Defense-in-depth is non-negotiable — No single layer is sufficient. Combining input validation, encryption, access control, and audit logging creates a posture where a failure in one layer doesn't compromise the system.
  • Encryption strategy must match the data lifecycle — Draft data benefits from Fernet's simplicity; permanent records require AES-256-GCM with KMS for enterprise-grade key management.
  • Zero Trust in practice — Step-up re-authentication proved that "already logged in" is not sufficient proof of identity for accessing high-classification data.
🤖

AI Integration

  • AI should enhance, not replace — The fail-open design ensures the system never depends on AI availability. Rules and heuristics provide a reliable baseline; AI adds semantic depth when available.
  • Data minimization is a design choice — Explicitly stripping NRIC and personal identifiers before sending data to Vertex AI was a deliberate privacy-by-design decision, not an afterthought.
🏗️

Software Architecture

  • Modular architecture enables parallel development — Flask Blueprints allowed each subsystem to evolve independently while sharing common models and utilities.
  • Classification as a security contract — Defining C0–C6 as a shared enum created a clear interface between subsystems, where each only needs to read the tag to know what security actions to apply.
👁️

OCR & Computer Vision

  • OCR is unreliable without preprocessing — Raw NRIC card images had poor extraction rates; OpenCV grayscale conversion, adaptive thresholding, and upscaling significantly improved EasyOCR accuracy.
  • Fuzzy correction reclaims borderline results — OCR confusion maps (6→G, 8→B, 5→S) combined with MOD-11 checksum validation allowed the system to recover correct NRICs from otherwise failed reads.
📈

Professional Development

  • Security is a process, not a feature — Every new component (email, OCR, AI) introduced a new attack surface requiring its own security analysis and mitigation strategy.
  • Audit logging provides accountability — The "5 Ws" logging pattern (Who, What, When, Where, Why) transformed debug logs into a forensic-grade audit trail suitable for regulatory compliance.