Skip to content

Infrastructure Overview

HKFR runs on Google Cloud Platform using modern cloud-native infrastructure patterns.

High-Level Architecture

graph TB
    subgraph "External Users"
        A[Web Browsers]
        B[API Clients]
    end

    subgraph "Google Cloud Platform"
        subgraph "Network Layer"
            C[Cloud Load Balancer]
            D[Cloud CDN]
            E[Cloud Armor]
        end

        subgraph "Compute Layer"
            F[GKE Autopilot Cluster]
            G[HKFR Application Pods]
            H[ArgoCD]
        end

        subgraph "Data Layer"
            I[Cloud SQL PostgreSQL]
            J[Cloud Storage Buckets]
            K[Secret Manager]
        end

        subgraph "External Integrations"
            L[MongoDB Atlas]
            M[Email Services]
        end
    end

    A --> C
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> I
    G --> J
    G --> K
    G --> L
    G --> M

    H --> F

Infrastructure Components

Networking

  • Cloud Load Balancer: HTTPS termination and traffic distribution
  • Cloud CDN: Global content delivery for static assets
  • Cloud Armor: DDoS protection and security policies
  • VPC: Private network for secure communication

Compute

  • GKE Autopilot: Managed Kubernetes cluster
  • Container Images: Application deployed as Docker containers
  • Horizontal Pod Autoscaler: Automatic scaling based on load
  • Node Auto-provisioning: Automatic node scaling

Storage & Data

  • Cloud SQL: Managed PostgreSQL for relational data
  • Cloud Storage: Object storage for files and assets
  • MongoDB Atlas: Managed MongoDB for document storage
  • Secret Manager: Secure credential management

Security

  • Identity and Access Management (IAM): Service account permissions
  • Network Security: VPC firewall rules
  • SSL/TLS: End-to-end encryption
  • Secret Management: Encrypted secrets storage

Geographic Distribution

Current Deployment

  • Primary Region: us-central1 (Iowa, USA)
  • Multi-Zone: Deployed across multiple availability zones
  • Global CDN: Content cached at edge locations worldwide

Future Expansion

  • Asia-Pacific: asia-southeast1 (Singapore) for Hong Kong users
  • Multi-Region: Database replication for disaster recovery
  • Edge Locations: Additional CDN points of presence

Infrastructure as Code

Terraform Structure

infra/
├── modules/                 # Reusable Terraform modules
│   ├── gke-autopilot/      # GKE cluster configuration
│   ├── gke-network/        # Network and security setup
│   ├── cloud-sql/          # Database configuration
│   ├── google-cloud-storage/ # Storage bucket setup
│   └── api/                # API Gateway configuration
├── live/                   # Environment-specific configurations
│   ├── project/            # Project-level resources
│   │   ├── api/            # API Gateway
│   │   └── network/        # Base networking
│   └── kubernetes/         # Kubernetes resources
│       └── us-central1/testing/ # Testing environment
└── helm/                   # Helm chart configurations

Terragrunt Configuration

Terragrunt manages multiple environments and reduces code duplication:

# Common configuration
terraform {
  source = "../../../modules/gke-autopilot"
}

# Environment-specific inputs
inputs = {
  cluster_name = "hkfr-testing"
  region       = "us-central1"
  node_config = {
    machine_type = "e2-standard-2"
    disk_size_gb = 20
  }
}

Deployment Architecture

GitOps with ArgoCD

graph LR
    A[Developer Push] --> B[GitHub Repo]
    B --> C[GitHub Actions]
    C --> D[Build Container]
    D --> E[Push to GCR]
    E --> F[Update Manifest]
    F --> G[ArgoCD Sync]
    G --> H[Deploy to GKE]

Deployment Flow

  1. Code Push: Developer pushes to main branch
  2. CI Pipeline: GitHub Actions builds and tests
  3. Container Build: Docker image created and pushed to Google Container Registry
  4. Manifest Update: Kubernetes manifests updated with new image
  5. ArgoCD Sync: ArgoCD detects changes and deploys
  6. Health Check: Application health verified post-deployment

Environment Management

Testing Environment

  • Purpose: Integration testing and staging
  • Resources: Minimal resource allocation
  • Access: Development team access
  • Data: Synthetic/anonymized test data

Production Environment (Future)

  • Purpose: Live user traffic
  • Resources: Production-grade resource allocation
  • Access: Restricted access with approval workflows
  • Data: Real user data with compliance measures

Monitoring and Observability

Google Cloud Operations Suite

graph TB
    A[HKFR Application] --> B[Cloud Monitoring]
    A --> C[Cloud Logging]
    A --> D[Cloud Trace]
    A --> E[Cloud Profiler]

    B --> F[Dashboards]
    B --> G[Alerts]
    C --> H[Log Analysis]
    D --> I[Performance Insights]
    E --> J[Resource Optimization]

Monitoring Components

  • Cloud Monitoring: Metrics collection and alerting
  • Cloud Logging: Centralized log aggregation
  • Cloud Trace: Distributed tracing for performance
  • Cloud Profiler: Application performance profiling

Key Metrics Tracked

  • Application Performance: Response times, throughput
  • Resource Utilization: CPU, memory, disk usage
  • Database Performance: Query times, connection counts
  • Error Rates: 4xx/5xx errors, exception tracking
  • User Experience: Page load times, user flows

Security Architecture

Network Security

Internet
    ↓ [HTTPS Only]
Cloud Load Balancer
    ↓ [Cloud Armor Protection]
GKE Cluster (Private)
    ↓ [VPC Network]
Application Pods
    ↓ [Service Account Auth]
Google Cloud Services

Security Layers

  1. Perimeter Security: Cloud Armor DDoS protection
  2. Network Security: Private GKE cluster, VPC firewall
  3. Identity Security: Service account authentication
  4. Application Security: Pod security policies
  5. Data Security: Encryption at rest and in transit

Compliance Considerations

  • Data Residency: Data stored in appropriate regions
  • Encryption: AES-256 encryption for data at rest
  • Access Controls: Role-based access control (RBAC)
  • Audit Logging: Comprehensive audit trail
  • Backup Strategy: Regular automated backups

Cost Optimization

Resource Optimization Strategies

  • GKE Autopilot: Pay only for running pods
  • Committed Use Discounts: Reserved capacity pricing
  • Preemptible Instances: Cost-effective for development
  • Storage Lifecycle: Automatic data archiving
  • CDN Optimization: Reduced origin server load

Cost Monitoring

graph LR
    A[Resource Usage] --> B[Cloud Billing]
    B --> C[Cost Alerts]
    B --> D[Budget Controls]
    C --> E[Notification]
    D --> F[Spending Limits]

Cost Controls

  • Budget Alerts: Automated spending notifications
  • Resource Quotas: Prevent resource overspend
  • Rightsizing: Regular resource usage analysis
  • Reserved Capacity: Long-term cost savings

Disaster Recovery

Backup Strategy

Data Backup:
├── Database Backups
│   ├── Daily automated backups
│   ├── Point-in-time recovery
│   └── Cross-region backup storage
├── File Storage Backups
│   ├── Multi-regional storage
│   ├── Versioning enabled
│   └── Lifecycle management
└── Configuration Backups
    ├── Terraform state backup
    ├── Kubernetes manifest backup
    └── Secret backup (encrypted)

Recovery Procedures

  1. Database Recovery: Point-in-time restoration from backups
  2. Application Recovery: Container redeployment via ArgoCD
  3. File Recovery: Object restoration from versioned storage
  4. Configuration Recovery: Infrastructure recreation via Terraform

Scaling Strategy

Horizontal Scaling

  • Pod Autoscaling: CPU/memory-based scaling
  • Cluster Autoscaling: Automatic node provisioning
  • Database Scaling: Read replicas for query load
  • Storage Scaling: Automatic storage expansion

Performance Optimization

  • Connection Pooling: Efficient database connections
  • CDN Caching: Static asset optimization
  • Image Optimization: Compressed container images
  • Resource Requests: Optimized resource allocation

Future Infrastructure Roadmap

Short-term Improvements (3-6 months)

  • Production Environment: Dedicated production cluster
  • Enhanced Monitoring: Custom dashboards and alerts
  • Backup Automation: Automated disaster recovery testing
  • Security Hardening: Advanced security policies

Long-term Vision (6-12 months)

  • Multi-Region Deployment: Asia-Pacific region expansion
  • Advanced Networking: Service mesh implementation
  • AI/ML Infrastructure: Dedicated compute for AI features
  • Compliance Certification: SOC2, ISO 27001 preparation

Access and Management

Administrative Access

  • ArgoCD Dashboard: https://argocd.hkfr.live
  • Google Cloud Console: Project-based access
  • Terraform State: Secure state management
  • Kubernetes Access: RBAC-controlled kubectl access

Contact Information

  • Infrastructure Lead: Zhen Yuan (@zhenyuan1001)
  • Cloud Admin Access: Contact for emergency access
  • Terraform State: Managed in Google Cloud Storage
  • Monitoring Access: Google Cloud Console