Architecture Deep Dive¶
Understanding HKFR's technical architecture and design decisions.
System Architecture Overview¶
HKFR follows a modern, cloud-native architecture designed for scalability, security, and maintainability.
graph TB
subgraph "Client Layer"
A[Web Browser]
B[Mobile Browser]
end
subgraph "CDN & Load Balancing"
C[GCP Load Balancer]
D[HTTPS Termination]
end
subgraph "Application Layer"
E[Next.js Frontend]
F[Next.js API Routes]
G[Authentication Middleware]
end
subgraph "Business Logic"
H[Document Management]
I[User Management]
J[File Processing]
K[Export Generation]
end
subgraph "Data Layer"
L[(MongoDB)]
M[(PostgreSQL)]
N[Google Cloud Storage]
end
subgraph "External Services"
O[Email Service]
P[AI Services]
Q[Pandoc Processing]
end
A --> C
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
G --> I
H --> J
I --> K
J --> L
J --> M
K --> N
F --> O
F --> P
J --> Q
Application Architecture¶
Frontend Architecture (Next.js 15)¶
HKFR uses Next.js 15 with the App Router for a modern, performant frontend:
Frontend Structure:
├── App Router (src/app/)
│ ├── Page Components
│ ├── Layout Components
│ ├── Loading States
│ └── Error Boundaries
├── UI Components (src/app/ui/)
│ ├── Reusable Components
│ ├── Form Components
│ └── Layout Components
└── Client-Side State
├── React State
├── Context API
└── Local Storage
Key Frontend Features¶
- Server-Side Rendering (SSR): Improved SEO and initial load performance
- Client-Side Routing: Fast navigation between pages
- Streaming: Progressive page loading
- Automatic Code Splitting: Optimized bundle sizes
- Image Optimization: Next.js built-in image optimization
Backend Architecture (Next.js API Routes)¶
The backend is built using Next.js API Routes, providing a unified fullstack solution:
API Structure:
├── Authentication (/api/auth/)
│ ├── login/route.js
│ ├── register/route.js
│ ├── refresh/route.js
│ ├── logout/route.js
│ └── verify/route.js
├── Document Management (/api/manage_files/)
│ ├── create/route.js
│ ├── get_documents/route.js
│ ├── save/route.js
│ ├── delete/route.js
│ ├── upload/route.js
│ ├── download/route.js
│ └── export/route.js
└── AI Features (/api/generate_data/)
└── route.js
Data Architecture¶
Database Design¶
HKFR uses a hybrid database approach optimizing for different data types:
MongoDB (Primary Document Store)¶
// User Schema
{
_id: ObjectId,
email: String,
passwordHash: String,
profile: {
name: String,
company: String,
role: String
},
createdAt: Date,
lastLogin: Date
}
// Document Schema
{
_id: ObjectId,
title: String,
content: {
sections: [{
id: String,
title: String,
content: Object, // Rich text content
metadata: Object
}]
},
metadata: {
company: String,
fiscalYear: String,
reportType: String,
status: String
},
collaborators: [{
userId: ObjectId,
role: String,
permissions: [String],
addedAt: Date
}],
versions: [{
version: Number,
content: Object,
createdBy: ObjectId,
createdAt: Date,
description: String
}],
createdAt: Date,
updatedAt: Date
}
PostgreSQL (Relational Data)¶
Used for complex relational queries and ACID compliance:
-- File metadata and relationships
CREATE TABLE file_metadata (
id SERIAL PRIMARY KEY,
document_id VARCHAR(24), -- MongoDB ObjectId
file_path VARCHAR(500),
file_type VARCHAR(50),
size_bytes BIGINT,
upload_timestamp TIMESTAMP,
processed BOOLEAN DEFAULT FALSE
);
-- Audit logs
CREATE TABLE audit_logs (
id SERIAL PRIMARY KEY,
user_id VARCHAR(24),
document_id VARCHAR(24),
action VARCHAR(100),
details JSONB,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
File Storage Strategy¶
graph LR
A[File Upload] --> B[Validation]
B --> C[Virus Scan]
C --> D[Google Cloud Storage]
D --> E[Database Metadata]
E --> F[Processing Pipeline]
F --> G[Format Conversion]
G --> H[Integration Ready]
Google Cloud Storage Structure¶
hkfr-storage/
├── documents/
│ ├── raw/ # Original uploaded files
│ ├── processed/ # Converted/processed files
│ └── exports/ # Generated exports
├── images/
│ ├── originals/ # Original images
│ ├── thumbnails/ # Auto-generated thumbnails
│ └── optimized/ # Compressed versions
└── backups/
├── daily/ # Daily backups
└── weekly/ # Weekly backups
Authentication Architecture¶
JWT-Based Authentication Flow¶
sequenceDiagram
participant C as Client
participant API as API Routes
participant DB as MongoDB
participant JWT as JWT Service
C->>API: POST /api/auth/login
API->>DB: Validate credentials
DB->>API: User data
API->>JWT: Generate tokens
JWT->>API: Access + Refresh tokens
API->>C: Tokens (HTTP-only cookies)
Note over C,API: Subsequent requests
C->>API: Request with access token
API->>JWT: Verify access token
JWT->>API: Token valid
API->>C: Protected resource
Note over C,API: Token refresh
C->>API: Request with expired access token
API->>JWT: Verify refresh token
JWT->>API: New access token
API->>C: New token + resource
Security Features¶
- HTTP-only Cookies: Prevents XSS attacks
- CSRF Protection: Built-in Next.js CSRF protection
- Token Rotation: Automatic refresh token rotation
- Secure Headers: Security headers via Next.js middleware
File Processing Pipeline¶
Document Conversion Architecture¶
graph TD
A[Upload File] --> B{File Type}
B -->|Excel| C[ExcelJS Parser]
B -->|Word| D[Pandoc Converter]
B -->|PDF| E[PDF Parser]
C --> F[Data Extraction]
D --> G[Content Extraction]
E --> H[Text Extraction]
F --> I[Structure Analysis]
G --> I
H --> I
I --> J[Content Integration]
J --> K[Document Update]
Processing Utilities¶
// File processing utilities (src/utils/)
├── converter.js # Format conversion logic
├── pandoc.js # Pandoc wrapper for document conversion
├── sheetToCsv.js # Excel to CSV conversion
└── gcs.js # Google Cloud Storage operations
Export Generation¶
graph LR
A[Document Content] --> B[Template Selection]
B --> C[Data Merge]
C --> D[Format Generation]
D --> E[PDF Export]
D --> F[DOCX Export]
E --> G[Cloud Storage]
F --> G
G --> H[Download Link]
Real-Time Collaboration¶
WebSocket-like Updates via Polling¶
Currently uses polling for real-time updates (future WebSocket upgrade planned):
// Collaboration update flow
const collaborationUpdate = {
documentId: String,
userId: String,
section: String,
action: 'edit' | 'comment' | 'cursor',
data: Object,
timestamp: Date
};
Conflict Resolution Strategy¶
graph TD
A[User A Edit] --> C[Conflict Detection]
B[User B Edit] --> C
C --> D{Conflict?}
D -->|Yes| E[Operational Transform]
D -->|No| F[Apply Changes]
E --> F
F --> G[Broadcast Update]
G --> H[Update All Clients]
Performance Optimization¶
Frontend Optimizations¶
- Code Splitting: Automatic route-based splitting
- Image Optimization: Next.js Image component
- Bundle Analysis: Regular bundle size monitoring
- Caching: Strategic use of browser caching
Backend Optimizations¶
- Database Indexing: Optimized MongoDB indexes
- Connection Pooling: Efficient database connections
- API Caching: Response caching for static data
- Compression: Response compression middleware
Database Indexing Strategy¶
// MongoDB Indexes
db.documents.createIndex({ "createdBy": 1 });
db.documents.createIndex({ "metadata.company": 1 });
db.documents.createIndex({ "collaborators.userId": 1 });
db.documents.createIndex({ "updatedAt": -1 });
// Compound indexes for complex queries
db.documents.createIndex({
"collaborators.userId": 1,
"metadata.reportType": 1,
"updatedAt": -1
});
Security Architecture¶
Security Layers¶
graph TB
A[HTTPS/TLS] --> B[Load Balancer]
B --> C[API Gateway]
C --> D[Authentication Middleware]
D --> E[Input Validation]
E --> F[Rate Limiting]
F --> G[Business Logic]
G --> H[Data Encryption]
H --> I[Audit Logging]
Security Measures¶
- Transport Security: HTTPS everywhere
- Authentication: JWT with secure storage
- Authorization: Role-based access control
- Input Validation: All inputs validated and sanitized
- Output Encoding: XSS prevention
- CSRF Protection: Built-in Next.js protection
- Rate Limiting: API rate limiting
- Audit Logging: Comprehensive activity logging
Scalability Considerations¶
Horizontal Scaling¶
- Stateless API: All API routes are stateless
- Database Sharding: MongoDB sharding capability
- File Storage: Cloud storage auto-scaling
- Container Orchestration: Kubernetes scaling
Vertical Scaling¶
- Resource Optimization: Efficient resource usage
- Caching: Multi-level caching strategy
- Database Optimization: Query optimization
- CDN: Static asset delivery via CDN
Monitoring and Observability¶
Application Monitoring¶
graph LR
A[Application] --> B[Metrics Collection]
B --> C[Google Cloud Monitoring]
C --> D[Dashboards]
C --> E[Alerts]
A --> F[Log Collection]
F --> G[Google Cloud Logging]
G --> H[Log Analysis]
A --> I[Tracing]
I --> J[Google Cloud Trace]
J --> K[Performance Analysis]
Key Metrics¶
- Response Times: API and page load times
- Error Rates: 4xx and 5xx error tracking
- Database Performance: Query performance metrics
- User Activity: Active users and feature usage
- Resource Utilization: CPU, memory, storage usage
Future Architecture Considerations¶
Planned Improvements¶
- WebSocket Integration: Real-time collaboration
- Microservices: Service decomposition for better scaling
- GraphQL: Flexible API queries
- Advanced Caching: Redis integration
- TypeScript Migration: Type safety improvement
- Event-Driven Architecture: Asynchronous processing
- Advanced AI Integration: Enhanced content generation
Scalability Roadmap¶
- Service Mesh: Istio for service communication
- Event Streaming: Apache Kafka for async processing
- Edge Computing: Cloudflare Workers for global distribution
- Advanced Monitoring: Prometheus + Grafana stack