Infrastructure Overview¶
HKFR runs on Google Cloud Platform using modern cloud-native infrastructure patterns.
High-Level Architecture¶
graph TB
subgraph "External Users"
A[Web Browsers]
B[API Clients]
end
subgraph "Google Cloud Platform"
subgraph "Network Layer"
C[Cloud Load Balancer]
D[Cloud CDN]
E[Cloud Armor]
end
subgraph "Compute Layer"
F[GKE Autopilot Cluster]
G[HKFR Application Pods]
H[ArgoCD]
end
subgraph "Data Layer"
I[Cloud SQL PostgreSQL]
J[Cloud Storage Buckets]
K[Secret Manager]
end
subgraph "External Integrations"
L[MongoDB Atlas]
M[Email Services]
end
end
A --> C
B --> C
C --> D
D --> E
E --> F
F --> G
G --> I
G --> J
G --> K
G --> L
G --> M
H --> F
Infrastructure Components¶
Networking¶
- Cloud Load Balancer: HTTPS termination and traffic distribution
- Cloud CDN: Global content delivery for static assets
- Cloud Armor: DDoS protection and security policies
- VPC: Private network for secure communication
Compute¶
- GKE Autopilot: Managed Kubernetes cluster
- Container Images: Application deployed as Docker containers
- Horizontal Pod Autoscaler: Automatic scaling based on load
- Node Auto-provisioning: Automatic node scaling
Storage & Data¶
- Cloud SQL: Managed PostgreSQL for relational data
- Cloud Storage: Object storage for files and assets
- MongoDB Atlas: Managed MongoDB for document storage
- Secret Manager: Secure credential management
Security¶
- Identity and Access Management (IAM): Service account permissions
- Network Security: VPC firewall rules
- SSL/TLS: End-to-end encryption
- Secret Management: Encrypted secrets storage
Geographic Distribution¶
Current Deployment¶
- Primary Region:
us-central1(Iowa, USA) - Multi-Zone: Deployed across multiple availability zones
- Global CDN: Content cached at edge locations worldwide
Future Expansion¶
- Asia-Pacific:
asia-southeast1(Singapore) for Hong Kong users - Multi-Region: Database replication for disaster recovery
- Edge Locations: Additional CDN points of presence
Infrastructure as Code¶
Terraform Structure¶
infra/
├── modules/ # Reusable Terraform modules
│ ├── gke-autopilot/ # GKE cluster configuration
│ ├── gke-network/ # Network and security setup
│ ├── cloud-sql/ # Database configuration
│ ├── google-cloud-storage/ # Storage bucket setup
│ └── api/ # API Gateway configuration
├── live/ # Environment-specific configurations
│ ├── project/ # Project-level resources
│ │ ├── api/ # API Gateway
│ │ └── network/ # Base networking
│ └── kubernetes/ # Kubernetes resources
│ └── us-central1/testing/ # Testing environment
└── helm/ # Helm chart configurations
Terragrunt Configuration¶
Terragrunt manages multiple environments and reduces code duplication:
# Common configuration
terraform {
source = "../../../modules/gke-autopilot"
}
# Environment-specific inputs
inputs = {
cluster_name = "hkfr-testing"
region = "us-central1"
node_config = {
machine_type = "e2-standard-2"
disk_size_gb = 20
}
}
Deployment Architecture¶
GitOps with ArgoCD¶
graph LR
A[Developer Push] --> B[GitHub Repo]
B --> C[GitHub Actions]
C --> D[Build Container]
D --> E[Push to GCR]
E --> F[Update Manifest]
F --> G[ArgoCD Sync]
G --> H[Deploy to GKE]
Deployment Flow¶
- Code Push: Developer pushes to main branch
- CI Pipeline: GitHub Actions builds and tests
- Container Build: Docker image created and pushed to Google Container Registry
- Manifest Update: Kubernetes manifests updated with new image
- ArgoCD Sync: ArgoCD detects changes and deploys
- Health Check: Application health verified post-deployment
Environment Management¶
Testing Environment¶
- Purpose: Integration testing and staging
- Resources: Minimal resource allocation
- Access: Development team access
- Data: Synthetic/anonymized test data
Production Environment (Future)¶
- Purpose: Live user traffic
- Resources: Production-grade resource allocation
- Access: Restricted access with approval workflows
- Data: Real user data with compliance measures
Monitoring and Observability¶
Google Cloud Operations Suite¶
graph TB
A[HKFR Application] --> B[Cloud Monitoring]
A --> C[Cloud Logging]
A --> D[Cloud Trace]
A --> E[Cloud Profiler]
B --> F[Dashboards]
B --> G[Alerts]
C --> H[Log Analysis]
D --> I[Performance Insights]
E --> J[Resource Optimization]
Monitoring Components¶
- Cloud Monitoring: Metrics collection and alerting
- Cloud Logging: Centralized log aggregation
- Cloud Trace: Distributed tracing for performance
- Cloud Profiler: Application performance profiling
Key Metrics Tracked¶
- Application Performance: Response times, throughput
- Resource Utilization: CPU, memory, disk usage
- Database Performance: Query times, connection counts
- Error Rates: 4xx/5xx errors, exception tracking
- User Experience: Page load times, user flows
Security Architecture¶
Network Security¶
Internet
↓ [HTTPS Only]
Cloud Load Balancer
↓ [Cloud Armor Protection]
GKE Cluster (Private)
↓ [VPC Network]
Application Pods
↓ [Service Account Auth]
Google Cloud Services
Security Layers¶
- Perimeter Security: Cloud Armor DDoS protection
- Network Security: Private GKE cluster, VPC firewall
- Identity Security: Service account authentication
- Application Security: Pod security policies
- Data Security: Encryption at rest and in transit
Compliance Considerations¶
- Data Residency: Data stored in appropriate regions
- Encryption: AES-256 encryption for data at rest
- Access Controls: Role-based access control (RBAC)
- Audit Logging: Comprehensive audit trail
- Backup Strategy: Regular automated backups
Cost Optimization¶
Resource Optimization Strategies¶
- GKE Autopilot: Pay only for running pods
- Committed Use Discounts: Reserved capacity pricing
- Preemptible Instances: Cost-effective for development
- Storage Lifecycle: Automatic data archiving
- CDN Optimization: Reduced origin server load
Cost Monitoring¶
graph LR
A[Resource Usage] --> B[Cloud Billing]
B --> C[Cost Alerts]
B --> D[Budget Controls]
C --> E[Notification]
D --> F[Spending Limits]
Cost Controls¶
- Budget Alerts: Automated spending notifications
- Resource Quotas: Prevent resource overspend
- Rightsizing: Regular resource usage analysis
- Reserved Capacity: Long-term cost savings
Disaster Recovery¶
Backup Strategy¶
Data Backup:
├── Database Backups
│ ├── Daily automated backups
│ ├── Point-in-time recovery
│ └── Cross-region backup storage
├── File Storage Backups
│ ├── Multi-regional storage
│ ├── Versioning enabled
│ └── Lifecycle management
└── Configuration Backups
├── Terraform state backup
├── Kubernetes manifest backup
└── Secret backup (encrypted)
Recovery Procedures¶
- Database Recovery: Point-in-time restoration from backups
- Application Recovery: Container redeployment via ArgoCD
- File Recovery: Object restoration from versioned storage
- Configuration Recovery: Infrastructure recreation via Terraform
Scaling Strategy¶
Horizontal Scaling¶
- Pod Autoscaling: CPU/memory-based scaling
- Cluster Autoscaling: Automatic node provisioning
- Database Scaling: Read replicas for query load
- Storage Scaling: Automatic storage expansion
Performance Optimization¶
- Connection Pooling: Efficient database connections
- CDN Caching: Static asset optimization
- Image Optimization: Compressed container images
- Resource Requests: Optimized resource allocation
Future Infrastructure Roadmap¶
Short-term Improvements (3-6 months)¶
- Production Environment: Dedicated production cluster
- Enhanced Monitoring: Custom dashboards and alerts
- Backup Automation: Automated disaster recovery testing
- Security Hardening: Advanced security policies
Long-term Vision (6-12 months)¶
- Multi-Region Deployment: Asia-Pacific region expansion
- Advanced Networking: Service mesh implementation
- AI/ML Infrastructure: Dedicated compute for AI features
- Compliance Certification: SOC2, ISO 27001 preparation
Access and Management¶
Administrative Access¶
- ArgoCD Dashboard: https://argocd.hkfr.live
- Google Cloud Console: Project-based access
- Terraform State: Secure state management
- Kubernetes Access: RBAC-controlled kubectl access
Contact Information¶
- Infrastructure Lead: Zhen Yuan (@zhenyuan1001)
- Cloud Admin Access: Contact for emergency access
- Terraform State: Managed in Google Cloud Storage
- Monitoring Access: Google Cloud Console