Skip to content

Architecture Deep Dive

Understanding HKFR's technical architecture and design decisions.

System Architecture Overview

HKFR follows a modern, cloud-native architecture designed for scalability, security, and maintainability.

graph TB
    subgraph "Client Layer"
        A[Web Browser]
        B[Mobile Browser]
    end

    subgraph "CDN & Load Balancing"
        C[GCP Load Balancer]
        D[HTTPS Termination]
    end

    subgraph "Application Layer"
        E[Next.js Frontend]
        F[Next.js API Routes]
        G[Authentication Middleware]
    end

    subgraph "Business Logic"
        H[Document Management]
        I[User Management]
        J[File Processing]
        K[Export Generation]
    end

    subgraph "Data Layer"
        L[(MongoDB)]
        M[(PostgreSQL)]
        N[Google Cloud Storage]
    end

    subgraph "External Services"
        O[Email Service]
        P[AI Services]
        Q[Pandoc Processing]
    end

    A --> C
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    G --> I
    H --> J
    I --> K
    J --> L
    J --> M
    K --> N
    F --> O
    F --> P
    J --> Q

Application Architecture

Frontend Architecture (Next.js 15)

HKFR uses Next.js 15 with the App Router for a modern, performant frontend:

Frontend Structure:
├── App Router (src/app/)
│   ├── Page Components
│   ├── Layout Components
│   ├── Loading States
│   └── Error Boundaries
├── UI Components (src/app/ui/)
│   ├── Reusable Components
│   ├── Form Components
│   └── Layout Components
└── Client-Side State
    ├── React State
    ├── Context API
    └── Local Storage

Key Frontend Features

  • Server-Side Rendering (SSR): Improved SEO and initial load performance
  • Client-Side Routing: Fast navigation between pages
  • Streaming: Progressive page loading
  • Automatic Code Splitting: Optimized bundle sizes
  • Image Optimization: Next.js built-in image optimization

Backend Architecture (Next.js API Routes)

The backend is built using Next.js API Routes, providing a unified fullstack solution:

API Structure:
├── Authentication (/api/auth/)
│   ├── login/route.js
│   ├── register/route.js
│   ├── refresh/route.js
│   ├── logout/route.js
│   └── verify/route.js
├── Document Management (/api/manage_files/)
│   ├── create/route.js
│   ├── get_documents/route.js
│   ├── save/route.js
│   ├── delete/route.js
│   ├── upload/route.js
│   ├── download/route.js
│   └── export/route.js
└── AI Features (/api/generate_data/)
    └── route.js

Data Architecture

Database Design

HKFR uses a hybrid database approach optimizing for different data types:

MongoDB (Primary Document Store)

// User Schema
{
  _id: ObjectId,
  email: String,
  passwordHash: String,
  profile: {
    name: String,
    company: String,
    role: String
  },
  createdAt: Date,
  lastLogin: Date
}

// Document Schema
{
  _id: ObjectId,
  title: String,
  content: {
    sections: [{
      id: String,
      title: String,
      content: Object,  // Rich text content
      metadata: Object
    }]
  },
  metadata: {
    company: String,
    fiscalYear: String,
    reportType: String,
    status: String
  },
  collaborators: [{
    userId: ObjectId,
    role: String,
    permissions: [String],
    addedAt: Date
  }],
  versions: [{
    version: Number,
    content: Object,
    createdBy: ObjectId,
    createdAt: Date,
    description: String
  }],
  createdAt: Date,
  updatedAt: Date
}

PostgreSQL (Relational Data)

Used for complex relational queries and ACID compliance:

-- File metadata and relationships
CREATE TABLE file_metadata (
    id SERIAL PRIMARY KEY,
    document_id VARCHAR(24),  -- MongoDB ObjectId
    file_path VARCHAR(500),
    file_type VARCHAR(50),
    size_bytes BIGINT,
    upload_timestamp TIMESTAMP,
    processed BOOLEAN DEFAULT FALSE
);

-- Audit logs
CREATE TABLE audit_logs (
    id SERIAL PRIMARY KEY,
    user_id VARCHAR(24),
    document_id VARCHAR(24),
    action VARCHAR(100),
    details JSONB,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

File Storage Strategy

graph LR
    A[File Upload] --> B[Validation]
    B --> C[Virus Scan]
    C --> D[Google Cloud Storage]
    D --> E[Database Metadata]
    E --> F[Processing Pipeline]
    F --> G[Format Conversion]
    G --> H[Integration Ready]

Google Cloud Storage Structure

hkfr-storage/
├── documents/
│   ├── raw/          # Original uploaded files
│   ├── processed/    # Converted/processed files
│   └── exports/      # Generated exports
├── images/
│   ├── originals/    # Original images
│   ├── thumbnails/   # Auto-generated thumbnails
│   └── optimized/    # Compressed versions
└── backups/
    ├── daily/        # Daily backups
    └── weekly/       # Weekly backups

Authentication Architecture

JWT-Based Authentication Flow

sequenceDiagram
    participant C as Client
    participant API as API Routes
    participant DB as MongoDB
    participant JWT as JWT Service

    C->>API: POST /api/auth/login
    API->>DB: Validate credentials
    DB->>API: User data
    API->>JWT: Generate tokens
    JWT->>API: Access + Refresh tokens
    API->>C: Tokens (HTTP-only cookies)

    Note over C,API: Subsequent requests
    C->>API: Request with access token
    API->>JWT: Verify access token
    JWT->>API: Token valid
    API->>C: Protected resource

    Note over C,API: Token refresh
    C->>API: Request with expired access token
    API->>JWT: Verify refresh token
    JWT->>API: New access token
    API->>C: New token + resource

Security Features

  • HTTP-only Cookies: Prevents XSS attacks
  • CSRF Protection: Built-in Next.js CSRF protection
  • Token Rotation: Automatic refresh token rotation
  • Secure Headers: Security headers via Next.js middleware

File Processing Pipeline

Document Conversion Architecture

graph TD
    A[Upload File] --> B{File Type}
    B -->|Excel| C[ExcelJS Parser]
    B -->|Word| D[Pandoc Converter]
    B -->|PDF| E[PDF Parser]

    C --> F[Data Extraction]
    D --> G[Content Extraction]
    E --> H[Text Extraction]

    F --> I[Structure Analysis]
    G --> I
    H --> I

    I --> J[Content Integration]
    J --> K[Document Update]

Processing Utilities

// File processing utilities (src/utils/)
├── converter.js      # Format conversion logic
├── pandoc.js         # Pandoc wrapper for document conversion
├── sheetToCsv.js     # Excel to CSV conversion
└── gcs.js            # Google Cloud Storage operations

Export Generation

graph LR
    A[Document Content] --> B[Template Selection]
    B --> C[Data Merge]
    C --> D[Format Generation]
    D --> E[PDF Export]
    D --> F[DOCX Export]
    E --> G[Cloud Storage]
    F --> G
    G --> H[Download Link]

Real-Time Collaboration

WebSocket-like Updates via Polling

Currently uses polling for real-time updates (future WebSocket upgrade planned):

// Collaboration update flow
const collaborationUpdate = {
  documentId: String,
  userId: String,
  section: String,
  action: 'edit' | 'comment' | 'cursor',
  data: Object,
  timestamp: Date
};

Conflict Resolution Strategy

graph TD
    A[User A Edit] --> C[Conflict Detection]
    B[User B Edit] --> C
    C --> D{Conflict?}
    D -->|Yes| E[Operational Transform]
    D -->|No| F[Apply Changes]
    E --> F
    F --> G[Broadcast Update]
    G --> H[Update All Clients]

Performance Optimization

Frontend Optimizations

  • Code Splitting: Automatic route-based splitting
  • Image Optimization: Next.js Image component
  • Bundle Analysis: Regular bundle size monitoring
  • Caching: Strategic use of browser caching

Backend Optimizations

  • Database Indexing: Optimized MongoDB indexes
  • Connection Pooling: Efficient database connections
  • API Caching: Response caching for static data
  • Compression: Response compression middleware

Database Indexing Strategy

// MongoDB Indexes
db.documents.createIndex({ "createdBy": 1 });
db.documents.createIndex({ "metadata.company": 1 });
db.documents.createIndex({ "collaborators.userId": 1 });
db.documents.createIndex({ "updatedAt": -1 });

// Compound indexes for complex queries
db.documents.createIndex({ 
  "collaborators.userId": 1, 
  "metadata.reportType": 1,
  "updatedAt": -1 
});

Security Architecture

Security Layers

graph TB
    A[HTTPS/TLS] --> B[Load Balancer]
    B --> C[API Gateway]
    C --> D[Authentication Middleware]
    D --> E[Input Validation]
    E --> F[Rate Limiting]
    F --> G[Business Logic]
    G --> H[Data Encryption]
    H --> I[Audit Logging]

Security Measures

  1. Transport Security: HTTPS everywhere
  2. Authentication: JWT with secure storage
  3. Authorization: Role-based access control
  4. Input Validation: All inputs validated and sanitized
  5. Output Encoding: XSS prevention
  6. CSRF Protection: Built-in Next.js protection
  7. Rate Limiting: API rate limiting
  8. Audit Logging: Comprehensive activity logging

Scalability Considerations

Horizontal Scaling

  • Stateless API: All API routes are stateless
  • Database Sharding: MongoDB sharding capability
  • File Storage: Cloud storage auto-scaling
  • Container Orchestration: Kubernetes scaling

Vertical Scaling

  • Resource Optimization: Efficient resource usage
  • Caching: Multi-level caching strategy
  • Database Optimization: Query optimization
  • CDN: Static asset delivery via CDN

Monitoring and Observability

Application Monitoring

graph LR
    A[Application] --> B[Metrics Collection]
    B --> C[Google Cloud Monitoring]
    C --> D[Dashboards]
    C --> E[Alerts]

    A --> F[Log Collection]
    F --> G[Google Cloud Logging]
    G --> H[Log Analysis]

    A --> I[Tracing]
    I --> J[Google Cloud Trace]
    J --> K[Performance Analysis]

Key Metrics

  • Response Times: API and page load times
  • Error Rates: 4xx and 5xx error tracking
  • Database Performance: Query performance metrics
  • User Activity: Active users and feature usage
  • Resource Utilization: CPU, memory, storage usage

Future Architecture Considerations

Planned Improvements

  1. WebSocket Integration: Real-time collaboration
  2. Microservices: Service decomposition for better scaling
  3. GraphQL: Flexible API queries
  4. Advanced Caching: Redis integration
  5. TypeScript Migration: Type safety improvement
  6. Event-Driven Architecture: Asynchronous processing
  7. Advanced AI Integration: Enhanced content generation

Scalability Roadmap

  • Service Mesh: Istio for service communication
  • Event Streaming: Apache Kafka for async processing
  • Edge Computing: Cloudflare Workers for global distribution
  • Advanced Monitoring: Prometheus + Grafana stack