Full-stack loan application platform with defense-in-depth security — automated data classification, dual-layer encryption, NRIC OCR verification, and cryptographic non-repudiation
Financial institutions today face a dual challenge: processing loan applications efficiently while ensuring sensitive personal data is rigorously protected throughout its lifecycle. In Singapore's regulatory landscape, this means complying with the Personal Data Protection Act (PDPA) while handling highly sensitive fields — NRIC numbers, salary details, employment records, and supporting documents.
Every loan application contains a mix of data sensitivity levels. Without automated classification, officers would need to manually tag each application — which is error-prone, inconsistent, and unscalable.
Personal data must be encrypted both at rest and in transit. Different data types require different encryption strategies — draft data, permanent records, and uploaded documents each have distinct lifecycle needs.
Applicants upload NRIC images, but verifying that the uploaded document matches the declared identity requires high-accuracy optical character recognition (OCR) with cryptographic validation.
Loan decisions (approve/reject/escalate) must be cryptographically signed so no officer can deny their actions, and no decision can be tampered with after the fact.
A single security layer is never sufficient. The system needed multiple overlapping security controls: input sanitization, CSRF protection, secure headers, step-up authentication, audit logging, and more — each independently effective, all working in concert.
DigiLoan is a full-stack loan application platform built with Flask (Python) that implements a defense-in-depth security architecture across every layer of the application.
Three-phase ADC Engine (Rule Engine → Heuristic Scanner → Vertex AI) assigns security tags C0–C6 to every loan application automatically
Fernet (AES-128-CBC) for draft data; AES-256-GCM with AWS KMS envelope encryption for permanent loan records
EasyOCR + OpenCV pipeline extracts NRIC from card images using three extraction strategies, verified via MOD-11 checksum, stored as SHA-256 hash
Three-stage HMAC-SHA256 signatures at submission, RO escalation, and AO approval — with constant-time verification preventing timing attacks
Gemini 2.5 Flash Lite for semantic loan analysis with data minimization — NRIC and personal identifiers never leave the server
CSRF, CSP, HSTS, input sanitization, file upload restrictions, rate limiting, directory traversal prevention, open redirect protection, step-up re-authentication, and encrypted audit trails
As the Subsystem 1 (Automated Data Classification) Lead, I was responsible for architecting the foundational codebase, designing the security classification scheme, and building every component of the ADC engine, NRIC OCR pipeline, and application security layer.
Designed the modular codebase structure using Flask Blueprints, enabling each subsystem to operate independently within a shared framework
Built the Rule Engine, Heuristic Content Scanner, and Vertex AI integration — the complete classification pipeline with "Strictest Wins" escalation policy
Implemented EasyOCR + OpenCV preprocessing with three extraction strategies and MOD-11 checksum verification for NRIC identity validation
Built the multi-step loan wizard with server-side validation, draft saving with encryption, and file upload handling
Designed the 7-level tag scheme that maps to encryption levels, MFA requirements, and access control policies — the central security contract between subsystems
Built sanitization.py — XSS prevention, SQL injection protection, and format validation for NRIC, phone, and email fields
Implemented security.py — CSP, HSTS, X-Frame-Options, directory traversal prevention, filename sanitization, and open redirect protection
Built decision_signature.py — HMAC-SHA256 three-stage signatures for cryptographic non-repudiation across the loan decision workflow
Built enforced TLS 1.2+ email delivery with certificate verification and audit-logged delivery for loan status notifications
End-to-end walkthrough of the DigiLoan platform — from application submission to officer decision and email notification
Click image to enlarge
The main landing page of the DigiLoan platform — providing applicants with a clean entry point into the loan application workflow, with navigation to begin the multi-step loan wizard
Guided multi-step form collecting personal details, employment information, loan amount, and supporting documents — with server-side validation at each step and CSRF protection on all POST operations
Applicants can save incomplete applications as drafts. Draft data is encrypted using Fernet (AES-128-CBC + HMAC-SHA256) at the field level — ensuring sensitive form data is protected even during mid-application pauses
Applicants upload their NRIC card image — the EasyOCR + OpenCV pipeline preprocesses the image, extracts the NRIC using three strategies, validates via MOD-11 checksum, and stores only the SHA-256 hash. Plaintext is deleted in-memory immediately after use
Upon submission, the ADC Engine automatically classifies the application (C0–C6), the Stage 1 HMAC-SHA256 digital signature is generated, permanent fields are encrypted with AES-256-GCM, and the applicant receives a confirmation screen with their application reference
Phase 3 of the ADC Engine — Gemini 2.5 Flash Lite performs contextual semantic analysis of the loan application. Data minimization ensures NRIC and personal identifiers are stripped before leaving the server. The system uses a fail-open design: classification continues even if AI is unavailable
The Reviewing Officer assesses the application and escalates to the Approving Officer with a justification. This action triggers Stage 2 of the digital signature system — an HMAC-SHA256 signature binding the officer ID, loan ID, justification text, and timestamp for non-repudiation
The Approving Officer makes the final approve or reject decision. Step-up re-authentication is required for C3+ classified loans. Stage 3 of the HMAC-SHA256 digital signature is generated — binding the officer ID, decision, loan ID, and timestamp — completing the non-repudiation chain
Applicants receive an automated email confirming their loan application has been received and is under review. Delivered via SMTP with enforced TLS 1.2+ and certificate verification, with every delivery event audit-logged
Upon AO decision, applicants automatically receive an approval or rejection email — completing the end-to-end loan workflow. The notification service enforces TLS 1.2+, verifies certificates, and logs all delivery events to the audit trail
Set up the Flask project skeleton with Blueprints for modular subsystem development. Defined database models (Loan, LoanDraft, Classification, Document) with SQLAlchemy ORM. Established the ClassificationTag enum (C0–C6) as the central security contract between subsystems, ensuring each team member's code only needed to read the tag to know what actions to take.
Built the ClassificationRuleEngine with deterministic business rules — high-value loans, DTI ratio thresholds, business indicators, and employment risk factors. Implemented the ContentScanner for regex-based PII detection (NRIC, phone, email patterns) and keyword scanning (fraud, terrorism, money laundering triggers). Applied the "Strictest Wins" escalation policy where tags can only increase, never decrease.
Implemented FieldEncryption using Fernet for draft data. Built LoanFieldEncryption using AES-256-GCM with AWS KMS envelope encryption for permanent records — the Data Encryption Key (DEK) is unwrapped once at startup and lives only in RAM, never written to disk. Created EncryptedTextField and EncryptedNumericField SQLAlchemy TypeDecorators for transparent, automatic encryption and decryption at the ORM layer.
Integrated Vertex AI (Gemini 2.5 Flash Lite) for semantic loan analysis with explicit data minimization — stripping all NRIC and personal identifiers before transmission. Designed the fail-open/fail-secure pattern so classification continues reliably even when the AI service is unavailable. Built the NRIC OCR pipeline with EasyOCR, OpenCV grayscale and adaptive thresholding preprocessing, three-strategy extraction, fuzzy OCR correction (confusion maps: 6→G, 8→B, 5→S), and MOD-11 checksum validation.
Implemented CSRF protection via Flask-WTF on all POST forms with SameSite=Lax cookies. Deployed OWASP security headers (CSP, HSTS with 1-year max-age, X-Frame-Options: DENY, X-Content-Type-Options: nosniff). Built the three-stage HMAC-SHA256 digital signature system for non-repudiation. Added step-up re-authentication requiring TOTP re-verification within a 5-minute freshness window for C3+ classified loans. Applied Flask-Limiter rate limiting on login, OTP resend, and email verification endpoints.
Validated classification logic against edge cases including high-value loans, fraud keyword triggers, PII pattern combinations, and DTI boundary conditions. Tested OCR extraction accuracy across the three-strategy pipeline with varied NRIC card image quality. Verified HMAC signature generation and constant-time comparison under normal and adversarial conditions. Confirmed fail-open and fail-secure pathways function correctly under simulated AI service outages.
Purpose: Applies hard-coded business rules to produce a deterministic baseline classification tag.
Rules: Loan amount thresholds, Debt-to-Income (DTI) ratio evaluation, employment status risk flags, loan purpose category scoring.
Outcome: Produces an initial C0–C6 tag that subsequent phases can only escalate, never reduce.
Purpose: Regex-based scanning of application content for PII patterns and high-risk keywords.
PII Detection: NRIC format pattern (S/T/F/G followed by 7 digits and a letter), Singapore phone numbers, email addresses.
Keyword Scanning: High-risk terms including fraud, terrorism, money laundering, and related indicators trigger tag escalation.
Model: Gemini 2.5 Flash Lite via Vertex AI API for contextual understanding beyond pattern matching.
Data Minimization: NRIC numbers and personal identifiers are stripped from the payload before transmission — no PII leaves the server.
Resilience: Fail-open design — if Vertex AI is unavailable, the system falls back to Phases 1 and 2, ensuring uninterrupted classification.
Strictest Wins: Final tag = max(Phase 1, Phase 2, Phase 3) — tags can only escalate across phases.
Downstream Effects: C3+ tags automatically trigger AES-256-GCM encryption, mandatory MFA enforcement, and step-up re-authentication before officer access.
Key Files: classification_engine.py, vertex_ai_service.py, classification.py
Algorithm: Fernet symmetric encryption — AES-128-CBC with HMAC-SHA256 authentication.
Use Case: LoanDraft.form_data and LoanDraft.uploaded_files — short-lived, frequently updated data.
Rationale: Fernet's simplicity and built-in integrity verification suits the temporary nature of draft records.
Algorithm: AES-256-GCM providing authenticated encryption with associated data (AEAD).
Key Management: AWS KMS envelope encryption — the Data Encryption Key (DEK) is unwrapped once at startup and lives only in RAM, never persisted to disk.
Scope: Loan.nric (as SHA-256 hash), Loan.monthly_salary, Loan.loan_purpose, and all sensitive permanent fields.
Implementation: EncryptedTextField and EncryptedNumericField as SQLAlchemy TypeDecorators — encryption and decryption happen automatically at the ORM layer.
Developer Experience: Application code reads and writes plaintext; the TypeDecorators handle all cryptographic operations transparently.
Key Files: field_encryption.py, enhanced_encryption.py, encrypted_types.py
Steps: Greyscale conversion → adaptive thresholding → 2x upscaling to improve EasyOCR recognition accuracy on low-resolution card photographs.
Impact: Preprocessing significantly improved raw EasyOCR extraction rates compared to unprocessed image input.
Strategy 1 — Token Match: Strict regex applied to individual OCR tokens for clean, high-confidence extractions.
Strategy 2 — Joined Text: Concatenates fragmented OCR tokens to recover NRICs split across recognition boundaries.
Strategy 3 — Fuzzy Correction: Applies OCR confusion maps (6→G, 8→B, 5→S) to recover NRICs with common character misreads, followed by MOD-11 checksum validation.
Processing: All OCR operations are performed entirely in-memory. Plaintext NRIC is deleted immediately after extraction.
Storage: Only the SHA-256 hash of the NRIC is stored — enabling identity verification without retaining the sensitive identifier itself.
Key File: nric_ocr_service.py
Trigger: Generated when applicant submits the loan application.
Payload Signed: loan_id + applicant_id + submission timestamp.
Purpose: Binds the application content to the submitting applicant — preventing content tampering post-submission.
Trigger: Generated when the Reviewing Officer escalates to the Approving Officer.
Payload Signed: loan_id + officer_id + justification text + escalation timestamp.
Purpose: Non-repudiation of the escalation decision — the RO cannot deny having escalated with a specific justification.
Trigger: Generated when the Approving Officer makes the final loan decision.
Payload Signed: loan_id + officer_id + decision (approve/reject) + decision timestamp.
Purpose: Cryptographic proof of the final decision — decisions cannot be altered or denied after signing.
Algorithm: HMAC-SHA256 — keyed hash message authentication code providing both integrity and authenticity.
Timing Attack Prevention: All signature comparisons use hmac.compare_digest() for constant-time comparison, preventing timing side-channel attacks.
Key File: decision_signature.py
CSRF: Flask-WTF CSRFProtect on all POST forms; SameSite=Lax cookies preventing cross-site request forgery.
XSS: sanitize_text() strips HTML tags, escapes entities, and removes null bytes from all user-provided text inputs.
SQL Injection: SQLAlchemy ORM exclusively; injection-prone characters blocked in NRIC, phone, and email field sanitizers.
Directory Traversal: sanitize_filename() applies os.path.basename() and regex cleanup to strip path traversal sequences from all uploaded filenames.
File Type Validation: Allowlist-based MIME type and extension validation on all uploaded documents.
Open Redirect: is_safe_url() validates all redirect targets against the application domain before executing redirects.
CSP: Content Security Policy restricting script, style, and resource origins.
HSTS: HTTP Strict Transport Security with 1-year max-age enforcing HTTPS-only connections.
X-Frame-Options: DENY — prevents clickjacking by blocking iframe embedding.
X-Content-Type-Options: nosniff — prevents MIME-type sniffing attacks.
Step-Up Auth: C3+ classified loans require TOTP re-verification within a 5-minute freshness window before officers can access sensitive data.
Rate Limiting: Flask-Limiter applied to login, OTP resend, and email verification endpoints — preventing brute-force and credential stuffing attacks.
Audit Logging: "5 Ws" logging pattern (Who, What, When, Where, Why) across all security-relevant events — producing a forensic-grade audit trail for regulatory compliance.